Title:
Apparatus and method for modular dynamically power managed power supply and cooling system for computer systems, server applications, and other electronic devices
Document Type and Number:
Kind Code:
A1

Abstract:
Network architecture, computer system and/or server, circuit, device, apparatus, method, and computer program and control mechanism for managing power consumption and workload in computer system and data and information servers. Further provides power and energy consumption and workload management and control systems and architectures for high-density and modular multi-server computer systems that maintain performance while conserving energy and method for power management and workload management. Dynamic server power management and optional dynamic workload management for multi-server environments is provided by aspects of the invention. Modular network devices and integrated server system, including modular servers, management units, switches and switching fabrics, modular power supplies and modular fans and a special backplane architecture are provided as well as dynamically reconfigurable multi-purpose modules and servers. Backplane architecture, structure, and method that has no active components and separate power supply lines and protection to provide high reliability in server environment.
Inventors:
Fung, Henry T. (San Jose, CA, US)
      Plaque It!

Sponsored by:
Flash of Genius
Application Number:
09/860237
Publication Date:
01/17/2002
Filing Date:
05/18/2001
View Patent Images:
Images are available in PDF form when logged in. To view PDFs, Login  or  Create Account (Free!)
Assignee:
Amphus, Inc.
Primary Class:
International Classes:
(IPC1-7): G06F001/26; G06F001/32
Attorney, Agent or Firm:
Flehr Hohbach, Test Michael Ananian R. (ALBRITTON & HERBERT LLP, San Francisco, CA, 94111, US)
Claims:

I claim:



1. An electrical apparatus comprising: a frame or enclosure; at least one electrical circuit drawing electrical power in the form of an alternating or direct electrical voltage, current, or a combination of an electrical voltage and an electrical current disposed within said frame or enclosure, said electrical circuit utilizing said electrical power and generating heat as a result of said utilization; at least one temperature sensor within said enclosure for monitoring and reporting the temperature proximate the sensor to a temperature monitor; and a power manager receiving said reported temperature and controlling the temperature at the temperature sensor by controlling electrical power drawn by said electrical circuit and thereby the heat generated by operation of said circuit.

2. The apparatus in claim 1, wherein said at least one electrical circuit comprises a computer having a processor receiving an operating voltage and a processor clock signal.

3. The apparatus in claim 2, wherein said computer is configured as a server.

4. The apparatus in claim 3, wherein said power manager comprises a power management circuit.

5. The apparatus in claim 3, wherein said server comprises a server module and said power manager comprises a management module.

6. The apparatus in claim 1, wherein said apparatus comprises a plurality of said electrical circuits each including a computer having a processor receiving an operating voltage and a processor clock signal.

7. The apparatus in claim 6, wherein said power manager controls the electrical power drawn and the heat generated by said electrical circuits by controlling either the frequency of said processor clock signal, or said operating voltage, or a combination of said processor clock frequency and said processor operating voltage.

8. The apparatus in claim 7, wherein said power manager reduces the electrical power drawn by said electrical circuits by monitoring said temperature sensor and controlling an output signal generated at least in part by said temperature sensor to be within a predetermined range.

9. The apparatus in claim 8, wherein said predetermined range includes a predetermined maximum.

10. The apparatus in claim 6, wherein at least some of said plurality of electrical circuits are configured as network devices including said processor receiving said operating voltage and said processor clock signal; and said power manager controls the electrical power drawn and the heat generated by said network devices by controlling either the frequency of said processor clock signal, or said operating voltage, or a combination of said processor clock frequency and said processor operating voltage.

11. The apparatus in claim 10, wherein at least some of said network devices comprise circuits configured as a network device selected from the set consisting of a web server, a streaming media server, a cache server, a file server, an application server, and a router.

12. The apparatus in claim 10, wherein at least some of said network devices comprise server computers that further include at least one hard disk drive for storing data or other content to be served and a network communication circuit for communicating with an external client over a communication link.

13. The apparatus in claim 10, wherein said server computers comprises server modules and said power manager comprises at least one management module.

14. The apparatus in claim 10, wherein said configured network device comprises a management node type network device.

15. The apparatus in claim 10, wherein said system includes a plurality of temperature sensors within said enclosure reporting to one or more network devices.

16. The apparatus in claim 15, wherein said plurality of temperature sensors are spatially distributed to provide temperature monitoring of different network devices within said enclosure.

17. The apparatus in claim 15, wherein said plurality of temperature sensors are spatially distributed to provide temperature monitoring of different network devices and power supplies within said enclosure.

18. The apparatus in claim 12, wherein when the temperature sensed by a temperature sensor is within a predetermined magnitude relationship of a first predetermined value at least one network device is transitioned to a lower power consumption state thereby generating less heat.

19. The apparatus in claim 18, wherein when the temperature sensed by a temperature sensor is within a predetermined magnitude relationship of a second predetermined value at least one network device is transitioned to a powered off state.

20. The apparatus in claim 1, wherein the operational state of at least one network device is reduced to a lower power consuming and heat dissipating state in response to a temperature sensor reporting a temperature greater than or equal to a predetermined value.

21. The apparatus in claim 20, wherein after said power consumption state has been lowered permitting said network device to be operated at a higher power consuming state when the temperature sensed is below a predetermined temperature value, said lower temperature value being selected to provide hysteresis and prevent oscillation between higher power state and lower powered state.

22. The apparatus in claim 1, when the temperature sensed by a temperature sensor is within a predetermined magnitude relationship of a first predetermined value at least one network device is transitioned to a lower power consumption state.

23. The apparatus in claim 22, wherein the lower power consumption state is achieved by lowering the clock frequency of the processor, the clock frequency of a bus coupling a processor to other components, or the operating voltage of the processor or other components.

24. The apparatus in claim 22, wherein additional networked devices are sent to lower energy consuming modes if the temperature remains above a predetermined temperature value.

25. The apparatus in claim 7, wherein said controlling of either the frequency of said processor clock signal, or said operating voltage, or a combination of said processor clock frequency and said processor operating voltage, is controlled by a computer program executing instructions to implement a control procedure at least in part in at least one of said processors of said computers that transition one or more of said processors between different operating modes having different electrical power consumptions and different heat generation; said procedure including: while operating in a first selected operating mode exhibiting that first selected mode's characteristic power consumption range, (i) monitoring said computer system to detect the occurrence or non-occurrence of a first event; and (ii) transitioning said computer system from said first selected operating mode to a second selected operating mode exhibiting that second selected operating mode's power consumption range.

26. The apparatus in claim 25, wherein said procedure further including: while operating in said second selected operating mode exhibiting that second selected mode's characteristic power consumption range, (i) monitoring said computer system to detect the occurrence or non-occurrence of a second event; and (ii) transitioning said computer system from said second selected operating mode to a third selected operating mode exhibiting that third selected operating mode's power consumption range.

27. The apparatus in claim 26, wherein said first selected operating mode and said second selected operating mode comprises different operating modes, and said second selected operating mode and said third selected operating mode comprise different operating modes, each of said first, second, and third operating modes being selected from the set of modes consisting of: (i) a mode in which said processing unit is operated at substantially maximum rated processing unit clock frequency and at substantially maximum rated processing unit core voltage, and said logic circuit is operated at substantially maximum rated logic circuit clock frequency; (ii) a mode in which said processing unit is operated at less than maximum rated processing unit clock frequency and at less than or equal to a maximum rated processing unit core voltage, and said logic circuit is operated at substantially maximum rated logic circuit clock frequency; and (iii) a mode in which said processing unit is operated at a substantially zero frequency processing unit clock frequency (clock stopped) and at less than or equal to a maximum rated processing unit core voltage sufficient to maintain processor unit state, and said logic circuit is operated at substantially maximum rated logic circuit clock frequency.

28. The apparatus in claim 27, wherein said set further consists of a mode in which said processing unit is powered off by removing a processing unit clock frequency (processing unit clock stopped) and a processing unit core voltage.

29. The apparatus in claim 1, further comprising at least one cooling fan and said apparatus controlling a speed of said fan, including an on/off condition of said fan, to achieve a desired temperature at said sensor.

30. The apparatus in claim 29, wherein said fan is not rotated and passive cooling is used when electrical power drawn and heat generated are sufficiently small to permit such passive cooling while maintaining a predetermined temperature range.

31. The apparatus in claim 29, wherein said apparatus includes a plurality of cooling fans and said plurality of cooling fans are controlled to achieve a desired temperature.

32. The apparatus in claim 31, wherein said apparatus further includes a plurality of temperature sensors and said plurality of cooling fans are operated in a coordinated manner to achieve a desired temperature range proximate at least some of said temperature sensors.

33. The apparatus in claim 31, wherein said cooling fans are modular cooling fan units that provide mechanical connectors and electrical circuits to provide powered-on hot-swappability.

34. The apparatus in claim 33, wherein said modular cooling fan units are organized into cooling fan banks that provide mechanical connectors and electrical circuits to provide powered-on hot-swappability.

35. The apparatus in claim 34, wherein said at least two banks of three cooling fan units are provided at different locations within said frame or enclosure.

36. The apparatus in claim 33, wherein said cooling fan units include fail-over protection circuits.

37. The apparatus in claim 31, wherein different ones of said plurality of cooling fan units are operated or not operated in a coordinated manner to provide desired cooling of said apparatus and to achieve a desired life cycle and/or reliability for said cooling fans.

38. The apparatus in claim 31, wherein different ones of said plurality of cooling fan units are operated or not operated or operated at different speeds in a coordinated manner to provide desired cooling of said apparatus and to provide such cooling at a minimum aggregate cooling fan power consumption.

39. The apparatus in claim 1, wherein power consumption within said apparatus is further reduced by adjusting the number and motor speed of cooling fans responsible for cooling said apparatus.

40. The apparatus in claim 11, wherein said apparatus further includes a plurality of temperature sensors and a plurality of cooling devices, said cooling devices operating under control of a control device that controls each cooling device to provide cooling at the rate and location desired to maintain said network devices within a predetermined operating temperature range.

41. The apparatus in claim 40, wherein a plurality of temperature sensors are disposed in said frame of enclosure and a plurality of cooling devices are disposed within said enclosure, said plurality of temperature sensors communicating a temperature signal to a control means and said control means adjusting the on/off status and operational parameters of the cooling units to extract heat according to predetermined rules.

42. The apparatus in claim 41, wherein the cooling devices comprise motor driven fans.

43. The apparatus in claim 41, wherein the cooling devices comprise valves controlling the circulation of a cooling fluid.

44. The apparatus in claim 41, wherein the cooling devices comprise conductive heat exchangers.

45. The apparatus in claim 41, wherein the cooling devices comprise convective heat exchangers.

46. The apparatus in claim 10, wherein: said server computers comprises server modules and said power manager comprises at least one management module; power consumption within said apparatus is controlled reduced by adjusting the number and motor speed of cooling fans responsible for cooling said apparatus.

47. The apparatus in claim 11, wherein said apparatus further includes a plurality of temperature sensors and a plurality of cooling devices, said cooling devices operating under control of a control device that controls each cooling device to provide cooling at the rate and location desired to maintain said network devices within a predetermined operating temperature range.

48. The apparatus in claim 47, wherein a plurality of temperature sensors are disposed in said frame of enclosure and a plurality of cooling devices are disposed within said enclosure, said plurality of temperature sensors communicating a temperature signal to a control means and said control means adjusting the on/off status and operational parameters of the cooling units to extract heat according to predetermined rules.

49. A system as in claim 48, wherein the rotational speed of a motor drive cooling is adjusted to maintain a predetermined temperature range proximate a temperature sensor.

50. A system as in claim 48, wherein the rotational speed of a motor drive cooling is adjusted to maintain a predetermined temperature range within an enclosure.

51. A system as in claim 48, wherein the amount of heat extracted from an enclosure is adjusted to maintain a predetermined temperature and reduce power consumed by said cooling device.

52. A system as in claim 48, wherein the heat extractor comprises a motor driven cooling device.

53. The apparatus in claim 1, further including a plurality of power supplies wherein said plurality of power supplies are controlled to maintain a required power output level drawn by said at least one electrical circuit and to operate said power supplies according to predetermined power supply management policy.

54. The apparatus in claim 53, wherein operating said plurality of power supplies at a preferred efficiency includes operating at least some of said power supplies a preferred output and/or efficiency at a partial electrical output loading less than a maximum loading to extend a lifetime of said power supplies.

55. The apparatus in claim 53, wherein operating said plurality of power supplies according to said policy includes operating at least some of said power supplies at up to a maximum rating and not operating other of said plurality of power supplies so that the aggregate power consumed by said apparatus including power lost in operation of said power supplies is reduced.

56. The apparatus in claim 55, wherein said power supplies comprise battery power supplies.

57. The apparatus in claim 55, wherein said power supplies comprise power supplies receiving an alternating current utility line voltage and current and generating at least one direct current voltage and current.

58. The apparatus in claim 57, wherein said alternating current utility line (ac) voltage is a voltage substantially in the range of between about 90 volts and substantially 300 volts, and the direct current (dc) voltage is in the range of between about ±0.5 volt and about ±20 volts.

59. The apparatus in claim 57, wherein said alternating current utility line (ac) voltage is a voltage substantially in the range of between substantially 100 volts and 130 volts, and the direct current (dc) voltage is in the range of between about 1 volt and about 5 volts.

60. The apparatus in claim 57, wherein said power supply management policy further includes automatically alternating a plurality of power supplies so that the aggregate plurality of power supplies are operated efficiently and have an extended lifetime.

61. The apparatus in claim 60, wherein said automatically alternating said plurality of power supplies includes changing the electrical power that may be drawn from each of said plurality of power supplies under computer control so that the aggregate plurality of power supplies are operated efficiently and have an extended lifetime.

62. The apparatus in claim 53, wherein only selected ones of said plurality of power supplies are operated.

63. The apparatus in claim 53, wherein multiple ones of said power supplies are operated concurrently but each is operated at less than rated power output capacity.

64. The apparatus in claim 53, wherein said plurality of power supply units include fail-over protection circuits.

65. The apparatus in claim 53, wherein the elapsed time and/or power supply loading history are monitored and stored in a non-volatile memory store and used with said power supply management policy.

66. The apparatus in claim 65, wherein said stored history are utilized to predict failure and/or equalize lifetime of said power supplies according to a power supply lifetime prediction routine.

67. The apparatus in claim 66, wherein said power supply lifetime prediction routine is statistically based prediction routine utilizing a lifetime and failure model adapted to each particular type of power supply.

68. The apparatus in claim 53, wherein said plurality of power supplies comprise power supplies having different output characteristics types and the combination of power supplies providing electrical operating power to satisfy electrical loading at any particular time and having a desired aggregate operating characteristic are dynamically selected.

69. The apparatus in claim 68, wherein said desired aggregate operating characteristic is a substantially minimized power consumption at the required power output.

70. A power-conservative multi-node network device, comprising: an enclosure having a power supply and a back-plane bus; a plurality of hot-pluggable node devices in the form of printed circuit (PC) cards adapted for connection with said back-plane buss; and each said node device being reconfigurable in substantially real-time to adapt to changing conditions on the network.

71. The power-conservative multi-node network device in claim 70, wherein said plurality of hot-pluggable node devices comprise sixteen node devices.

72. The power-conservative multi-node network device in claim 70, wherein each of said node devices includes power saving control features.

Description:

RELATED APPLICATIONS

[0001] This application is a continuing application under 35 U.S.C. §§ 119(e) and 120, wherein applicant and inventor claim the benefit of priority to U.S. Provisional Application Ser. No. 60/283,375 entitled System, Method And Architecture For Dynamic Server Power Management And Dynamic Workload Management for Multi-Server Environment filed Apr. 11, 2001; U.S. Provisional Application Ser. No. 60/236,043 entitled System, Apparatus, and Method for Power-Conserving Multi-Node Server Architecture filed Sep. 27, 2000; and U.S. Provisional Application Ser. No. 60/236,062 entitled System, Apparatus, and Method for Power Conserving and Disc-Drive Life Prolonging RAID Configuration filed Sep. 27, 2000; each of which application is hereby incorporated by reference.

[0002] The following U.S. utility patent applications are also related applications: U.S. utility patent application Ser. No. ______ (Attorney Docket No. A-70531/RMA) entitled System, Method, and Architecture for Dynamic Server Power Management and Dynamic Workload Management for Multi-server Environment filed ______ May 2001; U.S. utility patent application Ser. No. ______ (Attorney Docket No. A-70532/RMA) entitled System and Method for Activity or Event Based Dynamic Energy Conserving Server Reconfiguration filed ______ May 2001; U.S. utility patent application Ser. No. ______ (Attorney Docket No. A-70533/RMA) entitled System, Method, Architecture, and Computer Program Product for Dynamic Power Management in a Computer System filed ______ May 2001; U.S. utility patent application Ser. No. ______ (Attorney Docket No. A-70534/RMA) entitled Apparatus, Architecture, and Method for Integrated Modular Server System Providing Dynamically Power-managed and Work-load Managed Network Devices filed ______ May 2001; U.S. utility patent application Ser. No. ______ (Attorney Docket No. A-70535/RMA) entitled System, Architecture, and Method for Logical Server and Other Network Devices in a Dynamically Configurable Multi-server Network Environment filed ______ May 2001; U.S. utility patent application Ser. No. ______ (Attorney Docket No. A-70536/RMA) entitled Apparatus and Method for Modular Dynamically Power-Managed Power Supply and Cooling System for Computer Systems, Server Applications, and Other Electronic Devices filed ______ May 2001; and, U.S. utility patent application Ser. No. ______ (Attorney Docket No. A-70537/RMA) entitled Power on Demand and Workload Management System and Method; each of which applications is hereby incorporated by reference.

[0003] This is also a continuing application claiming the benefit of priority under 35 U.S.C. § 120 to each of the following applications: U.S. patent application Ser. No. 09/558,473 filed Apr. 25, 2000, entitled System and Method Of Computer Operating Mode Clock Control For Power Consumption Reduction; which is a continuation of U.S. patent application Ser. No. 09/121,352 filed Jul. 23, 1998, entitled System and Method of Computer Operating Mode Control for Power Consumption Reduction; which is a division of application Ser. No. 08/767,821 filed Dec. 17, 1996, entitled Computer Activity Monitor Providing Idle Thread and Other Event Sensitive Clock and Power Control abandoned; which is a continuation of application Ser. No. 08/460,191 filed Jun. 2, 1995, entitled Activity Monitor That Allows Activity Sensitive Reduced Power Operation of a Computer System abandoned; which is a continuation of application Ser. No. 08/285,169 filed Aug. 3, 1994, entitled Power Management for Data Processing System, abandoned; which is a continuation of application Ser. No. 08/017,975 filed Feb. 12, 1993 entitled Power Conservation Apparatus Having Multiple Power Reduction Levels Dependent Upon the Activity of a Computer System, U.S. Pat. No. 5,396,635; which is a continuation of application Ser. No. 07/908,533 filed Jun. 29, 1992 entitled Improved Power Management for Data Processing System, abandoned; which is a continuation of application Ser. No. 07/532,314 filed Jun. 1, 1990 entitled, Power Management for Data Processing System, now abandoned; each of which applications are hereby incorporated by reference.

[0004] This application is also related to: U.S. Pat. No. 6,079,025 issued Jun. 20, 2000 entitled system and Method of Computer Operating Mode Control For Power Consumption System; U.S. Pat. No. 5,892,959 issued Apr. 6, 1999 entitled Computer Activity Monitor Providing Idle Thread And Other Event Sensitive Clock and Power Control; U.S. Pat. No. 5,799,198 issued Aug. 25, 1998 entitled Activity Monitor For Computer systems Power Management; U.S. Pat. No. 5,758,175 issued May 26, 1998 entitled Multi-Mode Power Switching For Computer Systems; U.S. Pat. No. 5,710,929 issued Jan. 20, 1998 entitled Multi-State Power Management For Computer System; and U.S. Pat. No. 5,396,635 issued Mar. 7, 1995 for Power Conservation Apparatus Having Multiple Power Reduction Levels Dependent Upon the Activity of a Computer System; each of which patents are herein incorporated by reference.

FIELD OF THE INVENTION

[0005] This invention pertains generally to architecture, apparatus, systems, methods, and computer programs and control mechanisms for managing power consumption and work-load in data and information servers; more particularly to power consumption and workload management and control systems for high-density multi-server computer system architectures that maintain performance while conserving energy and to the method for power management and workload management used therein, and most particularly to system, method, architectures, and computer programs for dynamic server power management and dynamic workload management for multi-server environments.

BACKGROUND

[0006] Heretofore, servers generally, and multi-node network servers in particular, have paid little if any attention to power or energy conservation. Such servers were designed and constructed to run at or near maximum levels so as to serve data or other content as fast as possible, or where service demands were less than capacity to remain ever vigilant to provide fast response to service requests. Increasing processor and memory speeds have typically been accompanied by higher processor core voltages to support the faster device switching times, and faster hard disk drives have typically lead to faster and more energy-hungry disk drive motors. Larger memories and caches have also lead to increased power consumption even for small single-node servers. Power conservation efforts have historically focused on the portable battery-powered notebook market where battery life is an important marketing and use characteristic. However, in the server area, little attention has been given to saving power, such servers usually not adopting or utilizing even the power conserving suspend, sleep, or hibernation states that are available with some Microsoft 95/98/2000, Linux, Unix, or other operating system based computers, personal computers, PDAs, or information appliances.

[0007] Multi-node servers present a particular energy consumption problem as they have conventionally be architected as a collection of large power hungry boxes interconnected by external interconnect cables. Little attention has been placed on the size or form factor of such network architectures, the expansability of such networks, or on the problems associated with large network configurations. Such conventional networks have also by-and-large paid little attention to the large amounts of electrical power consumed by such configurations or in the savings possible. This has been due in part because of the rapid and unexpected expansion in the Internet and in servers connected with and serving to Internet clients. Internet service companies and entrepreneurs have been more interested in a short time to market and profit than on the effect on electrical power consumption and electrical power utilities; however, continuing design and operation without due regard to power consumption in this manner is problematic.

[0008] Networks servers have also by-and-large neglected to factor into the economics of running a network server system the physical plant cost associated with large rack mounted equipment carrying perhaps one network node per chassis. These physical plant and real estate costs also contribute to large operating costs.

[0009] In the past, more attention was given to the purchase price of equipment and little attention to the operating costs. It would be apparent to those making the calculation that operating costs may far exceed initial equipment purchase price, yet little attention has been paid to this fact. More recently, the power available in the California electrical market has been at crisis levels with available power reserves dropping below a few percent reserve and rolling blackouts occurring as electrical power requirements drop below available electrical power generation capacity. High technology companies in the heart of Silicon Valley cannot get enough electrical power to make or operate product, and server farms which consume vast quantities of electrical energy for the servers and for cooling equipment and facilities in which they are housed, have stated that they may relocated to areas with stable supplies of low-cost electricity.

[0010] Even were server manufactures motivated to adopt available power management techniques, such techniques represent only a partial solution. Conventional computer system power management tends to focus on power managing a single CPU, such as by monitoring certain restricted aspects of the single CPU operation and making a decision that the CPU should be run faster to provide greater performance or more slowly to reduce power consumption.

[0011] Heretofore, computer systems generally, and server systems having a plurality of servers where each server includes at least one processor or central processing unit (CPU) in particular have not been power managed to maintain performance and reduce power consumption. Even where a server system having more than one server component and CPU may possibly have utilized a conventional personal computer architecture that provided some measure of localized power management separately within each CPU, no global power management architecture or methods have conventionally been applied to power manage the set of servers and CPUs as a single entity.

[0012] The common practice of over-provisioning a server system so as to be able to meet peak demands has meant that during long periods of time, individual servers are consuming power and yet doing no useful work, or several servers are performing some tasks that could be performed by a single server at a fraction of the power consumption.

[0013] Operating a plurality of servers, including their CPU, hard disk drive, power supply, cooling fans, and any other circuits or peripherals that are associated with the server, at such minimal loading also unnecessarily shortens their service life. However, conventional server systems do not consider the longevity of their components. To the extent that certain of the CPUs, hard disk drives, power supplies, and cooling fans may be operated at lower power levels or for mechanical systems (hard disk drive and cooling fans in particular) their effective service life may be extended.

[0014] Therefore there remains a need for a network architecture and network operating method is that provides large capacity and multiple network nodes or servers in a small physical footprint and that is power conservative relative to server performance and power consumed by the server, as well as power conservative from the standpoint of power for server facility air conditioning. These and other problems are solved by the inventive system, apparatus and method. There also remains a need for server farms that are power managed in an organized global manner so that performance is maintained while reducing power consumption. There also remains a need to extend the effective lifetime of computer system components and servers so that the total cost of ownership is reduced.

SUMMARY

[0015] Aspects of the invention provide network architecture, computer system and/or server, circuit, device, apparatus, method, and computer program and control mechanism for managing power consumption and workload in computer system and data and information servers. Further provides power and energy consumption and workload management and control systems and architectures for high-density and modular multi-server computer systems that maintain performance while conserving energy and method for power management and workload management. Dynamic server power management and optional dynamic workload management for multi-server environments is provided by aspects of the invention. Modular network devices and integrated server system, including modular servers, management units, switches and switching fabrics, modular power supplies and modular fans and a special backplane architecture are provided as well as dynamically reconfigurable multi-purpose modules and servers.

BRIEF DESCRIPTION OF THE DRAWINGS

[0016] FIG. 1 is a diagrammatic illustration showing a exemplary embodiment of an inventive power conserving high-density server system.

[0017] FIG. 2 is a diagrammatic illustration showing an exemplary embodiment of a single 2U high rack mountable Integrated Server System Unit having a plurality of modular server units.

[0018] FIG. 3 is a diagrammatic illustration showing a standard server farm architecture in which multiple nodes are individually connected by cables to each other to form the desired network.

[0019] FIG. 4 is a diagrammatic illustration showing an embodiment of the inventive Integrated Appliance Server (IAS) standard architecture also or alternatively referred to as an Integrated Server System (ISS) architecture in which multiple nodes selected from at least a computer node (CN) such as a server module (SM), network node (NN) also referred to as a switch module, and monitor or management node (MN) also referred to as a Management Module (MM) are provided within a common enclosure and coupled together via an internal backplane bus.

[0020] FIG. 5 is a diagrammatic illustration showing another embodiment of the invention in which multiple modular IAS (or ISS) clusters each containing multiple nodes are cascaded to define a specialized system.

[0021] FIG. 6 is a diagrammatic illustration showing an embodiment of an Integrated Server System Architecture having two interconnected integrated server system units (ISSUs) and their connectivity with the external world.

[0022] FIG. 7 is a diagrammatic illustration showing an exemplary embodiment of an AMPC bus and the connectivity of Server Modules and Management Modules to the bus to support serial data, video, keyboard, mouse, and other communication among and between the modules.

[0023] FIG. 8 is a diagrammatic illustration showing an exemplary embodiment of ISSU connectivity to gigabit switches, routers, load balances, and a network.

[0024] FIG. 9 is a diagrammatic illustration showing an embodiment of the inventive power conserving power management between two servers and a manager.

[0025] FIG. 10 is a diagrammatic illustration showing an alternative embodiment of a server system showing detail as to how activity may be detected and operating mode and power consumption controlled in response.

[0026] FIG. 11 is a diagrammatic illustration showing another alternative embodiment of a is server system particular adapted for a Transmeta Crusoe™ type processor having LongRun™ features showing detail as to how activity may be detected and operating mode and power consumption controlled in response.

[0027] FIG. 12 is a diagrammatic illustration showing aspects of the connectivity of two management modules to a plurality of server modules and two Ethernet switch modules.

[0028] FIG. 13 is a diagrammatic illustration showing an exemplary internetwork and the manner in which two different types of master may be deployed to power manage such system.

[0029] FIG. 14 is a diagrammatic illustration showing a graph of the CPU utilization (processor activity) as a function of time, wherein the CPU utilization is altered by entering different operating modes.

[0030] FIG. 15 is a diagrammatic illustration showing an exemplary state engine state diagram graphically illustrating the relationships amongst the modes and identifying some of the transitions between states or modes for operation of an embodiment of the inventive system and method.

[0031] FIGS. 16 - 23 are diagrammatic illustrations showing exemplary state diagram for operating mode transitions.

[0032] FIG. 24 is a diagrammatic illustration showing the manner in which a plurality of servers may operate in different modes based on local detection and control of selected mode transitions and local detection but global control of other selected mode transitions.

[0033] FIG. 25 is a diagrammatic illustration showing an embodiment of a computer system having a plurality of hard disc drives configured in a RAID configuration and using a separate RAID hardware controller.

[0034] FIG. 26 is a diagrammatic illustration showing an alternative embodiment of a computer system having a plurality of hard disc drives configured in a RAID configuration and using software RAID control in the host processor.

[0035] FIG. 27 is a diagrammatic illustration showing an exemplary RAID 1 configuration.

[0036] FIG. 28 is a diagrammatic illustration showing an exemplary RAID 0+1 (RAID 10) configuration.

DETAILED DESCRIPTION OF EMBODIMENTS OF THE INVENTION

[0037] The present invention pertains to computer system architectures and structures and methods for operating such computer system architectures in a compact high-performance low-power consumption manner. Computers, information appliances, data processing systems, and all manner of electronic systems and devices may utilize and benefit from the innovations described herein. Aspects of the invention also contribute to reliability, ease of maintenance, and longevity of the system as a whole and operation components thereof. In an application that is of particular importance and which benefits greatly from the innovations described here, the computer system is or includes a server system having at least one and more typically a plurality of servers. Each server will include at least one processor or CPU but may include multiple CPUs. In multiple server configurations significant power consumption reduction is achieved by applying the inventive power management scheme. These and other aspects of the invention are described in the sections that follow.

[0038] The physical form factors of the server modules and management modules provide significant advantages, however, it will be appreciated that the invention need not be limited to such modular servers or modular management elements, and that the invention extends to discrete servers and management elements. It is also to be appreciated that although the exemplary embodiments focus attention toward servers, server systems, and power saving features for server systems, that aspects of the invention transcend such servers and server environments. For example, distributed computer systems of all types may benefit from the form of coordinated management and control to determine CPU loading and coordinate computational processing over a multiplicity of processors.

[0039] Section headers, where provided, are merely for the convenience of the reader and are not to be taken as limiting the scope of the invention in any way, as it will be understood that certain elements and features of the invention have more than one function and that aspects of the invention and particular elements are described throughout the specification.

[0040] With respect to FIG. 1 there is shown an exemplary rack mounted server system 50 . The rack carries a plurality of 2U high integrated server system units 52 each having one or more management modules (MM) 53 and one or more server modules (SM) 54 , each server module providing a fully independent server. Each server includes a processor or CPU and memory, mass storage device such as a hard disk drive, and input/output ports. In the embodiment illustrated each 2U high chassis 55 has 16 slots each of which may contain a PC-board mounted server module 54 or management module 53 . The chassis 55 also provides one or more power supplies 56 and one or more cooling fan banks 57 . These elements are coupled for communication by switches 59 and a backplane 58 .

[0041] The different ISS chassis units 55 may be coupled together to form a larger system and these server units share a gigabit uplink 60 , load balancer 61 , a router 62 to connect to a network such as the Internet 63 . Network Attached Storage (NAS) 64 may desirably be provided to increase storage capacity over that provided in individual server modules. Local and/or remote management nodes or workstations 65 may be provided to permit access to the system 50 . As power management is an important feature of aspects of the invention, the provision of electric service 66 to the system 50 as well as electric service 68 to building or facilities air conditioning or cooling 69 is also illustrated. Content or data may readily be served to remote clients 70 over the Internet 63 .

[0042] The illustration in FIG. 1 shows how the form factor of the server and management modules increases server density and reduces the footprint of the server system. Of course multiple racks may be added to increase system capacity. The inventive power management feature extends to individual server modules, to groups of server modules, and to the entire set of server modules in the system 50 as desired. Power management may also be applied to the management modules, power supply modules, switches, cooling fan modules, and other components of the ISS.

[0043] An exemplary embodiment of an ISS unit is illustrated in FIG. 2 , which shows the manner in which PC board based server modules and management modules plug into a back plane along with power supplies, cooling fan units, switches, and other components to provide the high-density system. These and other features are described in greater detail in the remainder of this specification.

[0044] With respect to FIG. 3 , there is shown in diagrammatic form, an illustration showing a standard server farm architecture in which multiple nodes are individually connected by cables to each other to form the desired network. Server farms such as this are typically power hungry, operate continuously with little or no regard for actual usage, have a large footprint, and generate large amounts of heat that require considerable air conditioning to dissipate or remove.

[0045] FIG. 4 is a diagrammatic illustration showing an embodiment of the inventive Integrated Server System (ISS) standard architecture in which multiple nodes selected from at least a computer node (CN) or Server Module (SM), network node (NN) or Switch Module (SWM), and monitor node (MN) or Management Module (MM) are provided within a common enclosure and coupled together via an internal backplane bus and internal switch. Two separate switching fabrics sw 1 and sw 0 are provided and described hereinafter. Up-link (up 0 and up 1 ) and down-link (down 0 and down 1 ) are provided to permit cascading multiple ISS cluster units. Monitor nodes (MN or MonX) such as Mon 0 and Mon 1 are coupled or connected via any one or more of serial I/O interfaces, RJ-45 interfaces, and RJ-11 modem interfaces to each switching node or other switching means, network node (NN), or to a network node via a switching node or other means.

[0046] FIG. 5 is a diagrammatic illustration showing another embodiment of the invention in which multiple modular ISS clusters each containing multiple nodes are cascaded to define a specialized system. This is an example of the manner in which multiple nodes within an ISS unit and multiple cascaded ISS units may be transformed or morphed to suit network configuration requirements.

[0047] It is noted that each Integrated Appliance Server (IAS) or Integrated Server System (ISS) cluster desirably includes some intelligence. In order to configure there is some master that is selected during initialization of the system, such as when it is booted or reset. The system can be designed such that any one of the nodes can be the master node. For example, one node may be designated as the master or the first node that becomes available after initialization, boot, or reset may assume the role of master node. There is no need for a separate processor or control within the box or enclosure. The master can control the rest of the system. Factors used in such control include the load, the quality of service desired or required. The system can reconfigure itself at any time in real-time in response to conditions encountered and predetermined or adaptive rules or procedures. For example, in during a period of time the number of email requests increases and the number of web page requests decreases or is static, then nodes may converted to serve email so that the email service capacity and performance are increased to handle the additional load. A node can also serve more than one function, for example it can function to serve email and web pages and can be self balancing.

[0048] The architecture or topology may be morphed or transformed into many alternative structures. All nodes are connected by an internal backplane thereby eliminate the need for external and fragile connectors and cables. Each node can be adapted to perform any one of numerous functions, or a plurality of the functions concurrently. Any node can be a cache node, an email node, a web page server node, or the like. Selection of the function or functions of the nodes are selected (manually or automatically) based on such factors as the load for each type of function and the desired level or quality of service (QOS) for that function. For example, if rapid web page service is desired as compared to email service, more node resources may be allocated to serving web pages.

[0049] All nodes are reconfigurable at any time based on circumstances, such as load and QOS. For example, if only need to serve so many pages per second then may choose not to allocate additional node resources to web page serving. In some instances, the tasks performed by one node (such as node serving web pages) may be shifted to one or more other nodes that have additional capacity, and that former web server node powered down or put into another power or energy saving mode. This adaptive reconfiguration and distribution of node functions maintains QOS while minimizing power consumption, heat dissipation, and other negative or detrimental effects. Placing the equipment or portions of the equipment in to power saving modes or standby modes also has the potential benefit of prolonging effective service life.

[0050] The power consumption of each node is therefore also adjustable based on the load and/or QOS requirement. On one level this adjustment is enabled by using or not using one or more nodes, and at a second level, the performance characteristics of the node may be adapted or configured to suit operational requirements. For example, a processor clock speed may be increased when demands are high and decreased or turned off when demands are modest or there is no demand. Again, these adjustments may be made automatically based on sensed load and feedback as to whether quality of service requirements have been met.

[0051] The invention also provides a functional and architectural topology in which each node represents a cell in a network of interconnected cells. These nodes or cells are linked and interoperate with each other such that when the operating characteristics of one node change in response to a command or sensed conditions (e.g. current loading and/or QOS) the other nodes become aware of this change and may also optionally but desirably be reflected in reconfiguration of other of the nodes. Advantageously, the number or frequency of such changes may be controlled so that the system remains stable. For example, reconfiguration may be limited in frequency or predetermined delays may be built into the system so that a settling time is provided after each node is reconfigured.

[0052] Other intelligence can be put into the node clusters if desired. Recall that a cluster includes a set of interconnected nodes, in a preferred embodiment each cluster includes 16 nodes in a single physical enclosure.

[0053] Each ISS consists of multiple nodes. Nodes may be configured as computer nodes, monitor nodes, network nodes, and any other type of node known in the art. Normally, the nodes are physically housed in a single box or enclosure and connected by an enclosure backplane. The architecture may be morphed or transformed into many different alternative organizations. For example, the ISS standard architecture may be configures into a server farm. This can be done for either the entire ISS, a part of a single ISS, or among multiple ISS units.

[0054] The computer nodes (also known as server nodes or server modules) may be configured or mapped to email, FTP, or Web nodes. One or more of such computer nodes may then be coupled together with other nodes. This exemplary first implementation is illustrated as the inner box in FIG. 5 . Each node may be configured in any way desired as in at least one embodiment of the invention, the structure and function of each node at the time of manufacture is identical, and any one of such nodes may be placed in service or later reconfigured to provide the desired functionality. In one embodiment, each computer node type is the same while in other embodiments they are of different types.

[0055] Furthermore, in one embodiment, every node in a cluster of nodes is identical as they come from the factory, and any node may be adapted, such as through software that is loaded into a node, to provide any one of a plurality of available functions. In another embodiment, somewhat to very different node structures are provided within a single cluster to provide more highly optimized network nodes, computer nodes, and monitor nodes. The existence and distribution of such nodes in a cluster may be selected by the customer or user so that each cluster provides the desired number of computer, monitor, network, or other nodes as may become available. Advantageously, the nodes are implemented as plug-in or removable modules, such as printed circuit boards, so that the configuration of any particular cluster or of a system having a plurality of clusters may be modified after manufacture. In this way additional nodes of any desired type may be added when the need arises. Not all locations within a cluster need be populated thereby providing initial cost savings as well as allowing later expansion. Nodes may be dynamic configured, either identical nodes or specialized nodes, are supported in response to changing loading and QOS.

[0056] Recall that in the standard Integrated Server System (ISS) architecture includes a single 2U (3.5-inch tall) box, has N nodes where in one embodiment N=16. Internally there is a switching fabric that makes connections between the nodes. The switching fabric may be a hub, a switch, or any other means for making connections between all the different the nodes. Internally, it is preferred to provide to such switching fabrics. This is advantageous (but not required) as it permits implementation and configuration to two separate and independent networks. For example, one network can connect multiple nodes of any type and a second network can connect to data in mass storage units such as may be used in a Storage Area Network (SAN). This is desirable in some circumstances as it reduces contention over the network and reduces the likelihood of collisions of traffic over the network.

[0057] A second reason for providing two (or more) switching fabrics relates to providing high is availability or redundancy. High availability pertains to providing the 24 hour/day 7day/week (“24/7”) presence and availability over the internet. When only a single switching fabric and its set of interconnected nodes is used, a failure of that switching fabric or of a critical node not redundantly provided will fail to provide the high 24/7 availability expected. Provision of two independent switching fabrics and appropriately configured node sets provides either actual redundancy or the ability to either manually or automatically reconfigure either of the node/switch sets to maintain service availability.

[0058] Therefore, it will be appreciated that the two (or other plurality) switching fabrics and their couple nodes may be used either as two (or more) separate networks or maintained as a backup that assumes the responsibilities of the primary set in the event of failure. Again, this rollover from primary to backup may occur either manually or automatically.

[0059] Typically, the two switching fabric means SW 1 and SW 2 in the embodiment of FIG. 4 will be identical, though they are not required to be identical, and in at least one embodiment are implemented as separate printed circuit boards that plug into the backplane of the cluster.

[0060] The inventive architecture also provides means for cascading or interconnecting multiple clusters, and by implication, for cascading or interconnecting the nodes in one cluster to the nodes in any number of other clusters. Usually two such links are provided for coupling to other clusters, thereby allowing cascading of any number of clusters and nodes. For example, if each cluster box includes 16 nodes, connection to other clusters provides additional nodes. Cascading of any number between two and twenty or more units may be provided. When multiple clusters are interconnected in this way required functionality may optionally be provided in only one cluster and need not be duplicated in all clusters. For example, if a monitor type node is desired it need only be provided in one of the clusters to permit monitoring of all of the nodes of the connected clusters. Switching fabrics may also optionally be shared between interconnected or cascaded clusters.

[0061] In the embodiment of FIG. 4 , the ISS standard architecture includes a Computer Node (CN) having a switching fabric that we call the Network Node (NN). The monitor node has a serial port that has a RJ-11 modem built in. In the event of a problem with the switch or any other component, a page or phone call can be placed to a local or remote administrator with diagnostic information and allow the administrator to interact with the cluster to take corrective action. For example, the administrator may access local software diagnostic tools to trouble shoot and correct the problem, perform a hardware reset, perform a power cycle (OFF/ON) type reset, or otherwise debug, diagnose or correct the problem.

[0062] Advantageously, but optionally, a separate monitor node (MN) is provided for each switching fabric means even though either of the monitors may be configured to monitor both switching fabrics any all of the nodes coupled to or through the switching fabric. This duplication is provided for purposes of redundancy so that in the event that one of the independent networks fails or the modem itself fails, the remaining operational network may be monitored so that intervention by the administration may be accomplished as desired. Also, in the event that a modem fails, modem redundancy allows the administrator to query either or both networks. It also facilitates a determination that a modem has failed versus the network having failed.

[0063] Physically, it is a rectangular rack-mountable box. In one embodiment, the 16-node ISS enclosure is provided as a standard 19-inch wide, 3.5-inch high (2U) rack mountable chassis. Hot swapping any and all of the boards with which the nodes are implemented is supported. The box need never be powered down and therefore so long as a minimum set of nodes remain in the box, the network remains available. There are 16 computer node boards (also referred to as server modules) that may be plugged or unplugged at any time. Each board (computer node or server module) is coupled to the other nodes and to the switching fabric via a backplane bus so that no external cables or wires are required for connecting the nodes within any cluster box. In preferred embodiments of the invention, the switch or switches are built into the box, though in other embodiments external switches, such as switches within a cascaded cluster, may be used. Where clusters are to be cascaded (see description above) the connections between cluster boxes may be made with external cables. It will be appreciated that for a 16-node per cluster box the reduction in cables is substantial (up to 31 cables between nodes are eliminated).

[0064] It will therefore be clear to workers having ordinary skill in the art in light of the description provided here that the inventive structure and method provides numerous features and advantages over conventional systems and methods. For example, the invention provides a Integrated Server System (ISS) comprising multiple nodes housed within a single enclosure or box. In one embodiment, 16 nodes within a single enclosure are supported, but any number that may physically be placed within a single enclosure may be used, including for example any number of nodes between 1 node and 32 nodes or more. Configurations having 4, 8, 10, 12, 16, 20, 24, and 32 nodes are specifically provided. Larger numbers of nodes may readily be accommodated if the size of the enclosure is increased and due attention is provided for cooling or other heat dissipation. Nodes available in any particular enclosure may be selected from network nodes (NN), computer nodes (CN), monitor nodes (MN), as well as variations and combinations of these node types.

[0065] In another aspect, the inventive structure and method may be transformed, morphed, or otherwise configured to provide (either alone or in combination with other cluster units) a great variety of organizations and architectural topologies, and therefore provide an almost unlimited number of functional configurations. In another aspect, all nodes within an enclosure are connected to each other and to a switching means by a backplane bus internal to the enclosure, thereby eliminating the need for external node-to-node and node-to-switch connection cables. Such conventional cables are prone to failure and inadvertent disconnection during service operations that may result in network downtime. In yet another aspect, the inventive structure and method facilitates and permits any node to perform any supported function or operation. In one embodiment, all nodes are identical and can be adapted, such as by programming or loading appropriate software, to provide any function or operation. In another embodiment, different classes or types of nodes are provided that are somewhat specialized and/or optimized to perform selected classes of functions or operations very well. In yet another embodiment, highly specialized nodes are available to perform specific functions. In each of these embodiments, the nodes are desirably provided as removable hot-pluggable modular units, such as PC boards or cards, that may be added or removed from the enclosure without powering off or otherwise making the network unavailable. This facilitates the interchange of hot spares which may remain ready and available within the enclosure for immediate use in the event of a node failure. In still another aspect, each Integrated Server System (or cluster) unit is cascadable so that multiple sets of nodes may be interconnected to provide the desired number and type of operation. In yet another aspect, any and all nodes are reconfigurable at any time based on such factors as load or quality of service (QOS) requirements. Furthermore, the change or reconfiguration may be communicated to other nodes and the effect of such reconfiguration ripple through to the other nodes and to the network as a whole. This permits the entire system to be self balancing to the extent desired. In another aspect, each cluster is provided with sufficient intelligence so that at least some network administration operations that is conventionally required some degree of supervision or intervention may be performed autonomously and dynamically in response to sensed conditions experienced on the network or within one or more nodes of the network.

[0066] In still another aspect the inventive structure and method provide for significant power consumption reduction and energy savings as compared to conventional network and server architectures as only those power consuming resources that are actually needed to provide the quality of service required are in an active mode. Those node resources that are not needed may be powered off or placed in some power conserving standby mode until needed. In addition, operations performed by one or more nodes may be shifted to another node so that only the remaining active nodes consume power and the remaining nodes are in standby mode or powered off until needed. The intelligence within one of the nodes acting as a master node for the cluster or ISS may then wake up the inactive node and configure it for operation. A system may be woken up and placed in any of the available operating modes by any one of a plurality of events. Nodes may also be placed into an inactive or power conserving mode when no demands are made on their resources independent of whether responsibility for their functionality has been shifted to another node or nodes. In one embodiment of the invention the power consumed is reduced by a factor of about 10-times as compared to a standard 19-inch wide by 1.75-inch high (1U) rack mountable network node device. This power savings is accomplished at least in part by one or more of the following measures: the reduction in the number of power supplied, use of the mounting plate as a heat sink to assist in removing heat from the enclosure, providing power saving controls to circuits and devices within the ISS enclosure, and the above described ability to reconfigure and take off line unneeded capacity.

[0067] The architecture is referred to as the Integrated Server System (ISS) or the integrated server architecture, and each unit is referred to as an Integrated Server System Unit. One embodiment of the ISS Unit is being developed by Amphus under the proprietary name Virgo™.

[0068] Having now described a first embodiment of the Integrated Server System (ISS) (also referred to as the Integrated Server Architecture), attention is now directed to several further embodiments which are described in somewhat greater detail so that the advanced power consumption reduction features may be more readily understood.

[0069] An exemplary embodiment of an ISS based system is illustrated in FIG. 6 . Each Integrated Server System (ISS) architecture comprises a number of functional components. A particular exemplary embodiment is now described although will be clear from the description provided in various changes to the configuration may be accomplished without departing from a spirit and scope of the invention. In this embodiment aid to the new high chassis and/or enclosure 101 houses a backplane 103 mounting a plurality of connectors 105 adapted to receive a plurality of printed circuit boards 107 . The nature, type, characteristics, and number of these printed circuit boards 107 may vary from installation to installation as will be described subsequently. Will also be appreciated, that the physical form and/or connectivity of these components may be through other means.

[0070] In one embodiment of the invention, multiple ISS units may be coupled together or interconnected. In the embodiment illustrated in FIG. 6 two such ISS units 102 are shown. A first of these is referred to as the “A-unit” and the second unit is referred to as the “B-unit”. Additional units, may also be provided. It is noted that although the configurations of the A-unit and B-unit are the same here, in any practical implementation, they may be the same or different, depending upon a functional purpose all of the overall system, and/or all of individual modules within the system. The manner in which configurations are chosen, physically altered such as through the addition or removal modules, and/or through dynamic allocation of modules are made in accordance with principals described hereinafter. With this in mind, components resident within the a-unit are typically designated with an “a” suffix to the reference numeral and be components resident within the bee-unit are typically designated with an “b” suffix to the reference numeral. However, where a general reference to a component of a particular type is made without specific reference to diagram, the “a” and the “b” suffix may be dropped for convenience.

[0071] Each ISS units also comprises at least one, and generally are plurality, all of server modules 112 a - 1 , . . . , 112 a -N, we are in a particular embodiment of the ISS maximum number called server modules 112 is fixed at 16 due to current physical size constraints of the chassis 101 . Each ISS may also included one or a plurality of management modules 108 a - 1 , . . . , 108 a -M, where in a particular embodiment of the ISS maximum number of management modules is two. It should be understood about that although each ISS unit may include one or more management modules 108 , management functionality may alternatively be delegated to management modules physically residing within other ISS units so that the management module functionality of any particular ISS unit may reside elsewhere.

[0072] In one implementation, the integrated server system includes at least one primary switching fabric 104 a - 1 also referred to as a primary switch module, and advantageously includes a secondary switching fabric or secondary switch module 104 a - 2 . The first (sometimes referred to as the primary) switch module 104 a - 1 operates to connect for communication each (any and all) the modules that are present in the ISS Unit, such as each of the Server Modules, Management Modules, Power supplies, cooling units, and any other module or unit that might be present. Having the second (or secondary) switch module 104 a - 2 operates to provide the same function as the first module as well as providing a redundant communication path between and among the modules or other units that are present in the ISS. Therefore while a second (or secondary) switch module is not required for any particular ISS, the presence provides significant benefits in high-end applications.

[0073] Each switch module provides a multi-connection switching fabric to link the modules with one another. In one embodiment, each switch has the equivalent of a switching matrix inside that establishes connections between different modules. For example, one or more of server modules, management modules, power supplies, fan modules, may be coupled together for communication. More particularly, the switch module may connect management module 1 with any of the server modules (for example with server module 5 ) or with the other management module, power supply module, fan modules, or the like. In general, the switch module makes one or a plurality of direct connection and is not typically implemented as a bus architecture that would allow only dedicated use by a single device or module (or a pair of communicating devices or modules) at any particular time. Switch module permits multiple simultaneous communication without collision.

[0074] One or a plurality of server modules (SM) 112 are also provided. Server modules are operative to serve data or other content in a manner that is well known in the art and not described in greater detail here. For example, a server module may be configured so as to enhance, improve, or optimize serving web pages, cached data or content, streaming video, or other data or content types as is known in the art. Server module hard disk drive configuration parameters that may be adjusted or modified according to the type and quantity of data or other content to be served. Such configuration and configuration utilities are known in the art, and include but are not limited to the data organization on the server hard disk drive (such as a modified RAID data organization and the RAID level).

[0075] Each SM 112 is advantageously implemented as a printed circuit (PC) board or card having an edge connector (or electrical contacts) adapted for plug-in connection to a mating receiving connector associated with a chassis 101 backplane board 103 . An SM also includes a PC card mounted processor, such as a microprocessor, microcontroller, or CPU, and associated memory. At least one mass storage device, such as a rotatable magnetic hard disc drive, optical drive, solid state storage device, or the like is mounted to the PC card and coupled to the processor. The mass storage device provides storage or the data or content to be served, or information concerning a location or link at which the data or content may be found if it is not served directly from the particular SM 112 . While physical, functional, and operational aspects of the server modules are novel, especially in the areas of power consumption and power management, data or content throughput control (QoS throttling), heat dissipation and cooling, mass storage device characteristics, form factor and the like, the manner in which data or content is stored and served is generally conventional in nature, and not described in greater detail here.

[0076] A management module (MM) 108 is operable to provide overall ISSU monitoring and control. These management and control functions are described in greater detail in the context of the power management function. In general, each ISS unit will contain at least one MM 108 and in high-performance implementations and where redundancy is desired, each ISSU will include multiple MMs. In one embodiment of the ISS, two MM are provided. In such implementations, the two MMs may share responsibilities or more typically the second MM 108 a - 2 will provide redundant backup for the first MM 108 a - 1 . Management Modules 108 are described in greater detail in a elsewhere in this description.

[0077] At least one, and advantageously a plurality of temperature sensors are disposed within the ISS enclosure. Each of these temperature sensors are desirably located at diverse locations within the enclosure so that the temperature of heat sensitive components may be adequately monitored and corrective action taken as needed. These diverse locations may be selected from locations on the internal surface of the enclosure, locations on the chassis, locations on one, more than one, or all of the server modules, management modules, switch modules, power supply modules, fan modules, or back plane, and may be integrated within solid state devices such as within the CPU.

[0078] In one embodiment of the invention, a fully populated ISS Unit having sixteen server modules, two management modules, two switching modules, two power supplies, two fan modules, and the backplane that supports these components, includes about 30 temperature sensors. Here each server module includes one temperature sensor integrated in the CPU and one on the edge connect board that supports the CPU and other circuitry as well as the hard disk drive. There is also at least one temperature sensor on each management module. While some embodiments may provide temperature sensing of the chassis, enclosure, or backplane, in the preferred embodiment no such temperature sensors are provided in these locations for reasons of reliability. As described in detail elsewhere in this specification, the preferred embodiment of the ISS Unit backplane does not include any active components. It merely provides printed circuit traces that provide electrical operating power (voltages and current) and communication, as well as providing physical support and connectors that receive the edge connector (or other) plug in modules.

[0079] In one embodiment, the temperature sensors have a preset temperature at which an output signal changes state so that they effectively generate an over temperature signal, in another embodiment the temperature sensors 150 generate a signal that indicates a temperature or temperature range. Sensors on different devices and/or at different locations may be of different types and/or the circuitry (for hardware based sensing and control) and/or algorithm (for sensing and control involving software or a computation element as well as hardware) may provide for different response to a particular temperature. Temperature awareness and control for an ISS Unit (ISSU) may even involve control based on multiple sensors, temperature differences, and/or a time rate of change of temperature.

[0080] Different physical device types may be used as well. For example, temperature sensors 150 may include a temperature sensor (such as for example a thermistor, thermal-couple, or other devices known in the art that have an electrical characteristic that changes with temperature.) Mechanical or electromechanical sensors such as sensors that use bimetallic switches to oven and close a connection may be used. In one embodiment, temperature sensing circuitry is integrated into a PC board mounted component or as a surface mounted component on the PC board of the server modules, management modules, switch modules, or other components of the ISS.

[0081] Independent of the form or the temperature sensor, the signals generated by the sensor or circuitry associated with the temperature sensors provide signals (analog or digital) to a management module (or a server module adapted to provide some management function) so that the intelligence built into the management module may control the operational parameters for one or more head generating elements (for example, the server, management, or switch modules) and the heat dissipating elements (for example, the fan modules or the individual fans within the or each fan module.).

[0082] Each ISS also advantageously includes dual redundant fan modules 114 a, each of the modules including a plurality (typically two) of fans or other heat absorption or heat dissipation devices. Such cooling may be accomplished by conduction, convention, or radiation generally. Air or other fluid flow may be used. In one embodiment each fan module includes first 114 a - 1 and second 114 a - 2 electric motor driven fans.

[0083] Dual redundant fan modules 114 , each having one or a plurality of fans, are advantageously provided so as to accomplish the required cooling function, at a reduced or minimized power consumption level, to provide cooling system redundancy, and to support hot-plug maintenance and/or replacement of the fans and fan modules. The manner in which ISS power consumption is reduced using this fan and fan module configuration are described elsewhere in this description.

[0084] Each ISS 102 includes at least one power supply, advantageously implemented as a hot-pluggable replaceable power supply module. Desirably, an ISS includes two such or dual-redundant power supply modules so as to provide sufficient power or energy for operating the switch module(s) 104 , management modules 108 , server module(s), and fan modules 114 within the ISS 102 as well as connected components that may draw power from the ISS. Power consumption and control aspects of the power supplies are described in greater detail elsewhere in this description.

[0085] A backplane providing operating power (for example, one or more of ±3 Volt, ±5 Volt, ±12 Volt depending upon the voltage and current requirements of the modules, and ground), communication (such as in-band and out-of-band communication via ethernet, serial interface, and/or other interface) is mounted in chassis 101 . The backplane also provides circuit protection in the form of circuit breakers or other over current or over voltage protection devices to protect the backplane traces and the modules that are or may be connected at the time of an undesired electrical component failure or other hazardous or damaging event. Protection may also be provided either in conjunction with the backplane or the modules themselves for under current or under voltage conditions.

[0086] A plurality of appropriately sized and shaped electrical connectors (for receiving PC board based edge connectors are disposed on the backplane PC board to connect to the management modules, server modules, and switch modules. The fan modules, power supply modules may couple directly to the backplane or communicate with backplane coupled modules (such as the management module) via separate couplings. In conventional manner, the chassis 101 includes guides or slots that assist in physically locating and guiding the different modules or other components physically in place to make secure electrical contact with the mating connectors.

[0087] In a preferred embodiment of the invention, each ISSU includes a backplane in the form of a multi-layer printed circuit board that is devoid of active electrical circuit components. This increases the reliability of each ISSU and the system as a whole. It is noted that a preferred configuration of an ISSU provides multiple redundant hot-swappable server modules, management modules, power supplies, switch modules, and fan (cooling) modules. In such a configuration, there is no single point of failure as redundancy is provided everywhere. As only one backplane can reasonably be provided within an ISSU, only electrical traces (or wires) are provided on the backplane. In a preferred embodiment, no electrical circuit components are present and only electrical traces (and connectors) are present. While an ISSU having conventional backplane technology may be used to achieve the power saving benefits described throughout this specification, the inherent redundancy and reliability of the ISSU would be compromised by conventional backplane technology that incorporates active failure-prone circuit elements. For example, if a backplane failed in such conventional implementation, the unit would need to be powered down and all modules removed so that the backplane could be replaced. There are no other daughter boards other than the ones described. There are only connectors and traces, because active components could not be replaced without downtime.

[0088] All components are hot swappable to the backplane. For a sixteen server module configuration, it is desirable that a failure of any one not negatively impact the operation or performance of any other. (Of course control is provided for surviving server modules, management modules, switch modules, fan modules, and power supply modules to recognize a failure of another module or component and provide backup operation until the failure is corrected. Even with respect to power delivery, there is a separate set of traces and circuit breaker, fuse, or other circuit protection for every plug-in module (server, management, switch, and fan or cooling). For example, without such separate power plane for each module, if one server or other module were to short-circuit it would take down all of the other modules in the ISS Unit or box. It is noted, that even the failure of a capacitor within a circuit of a server module may act as a short circuit and that such capacitor failures may commonly occur. Each power plane for the servers are separate and isolated from one another. The inventive backplane and module connectivity protects the integrity and operation of the system from even direct short circuits. Also, since there are no active components in the backplane, the failed module is merely replaced and operation continues without need to repair or replace the backplane.

[0089] A serial interface 142 is preferably but optionally provided to support an alternative communication channel to the back plane bus between and among each of the server modules 112 , management modules 108 , switch modules, or other modules or units, as well as to certain external elements or components such as to a local management node 138 when present.

[0090] The provision of the serial communication channel is advantageous as it provides out-of-band communication should the in-band link (for example the ethernet link) fail. It also permits multiple alternative redundant communication. Diagnostics, console operations, and other conventional communication may also be provided. Communication via the local management mode or via a dial-in session are supported. The switch module(s) 104 may also be coupled to the management modules and the server modules as well as the external elements or components via the same serial bus or connection.

[0091] In one embodiment the serial bus provides an alternate communication channel. While this alternate communication channel is provided as a serial communication channel provided in one embodiment, it is understood that this represents a low cost and efficient implementation. Those workers having ordinary skill in the art will appreciate that various types of alternate communications channels or links may alternatively be provided, such as for example a Universal Serial Bus (USB), and IEEE 1394 (Fire Wire), or the like as are known in the art.

[0092] In a preferred embodiment, the serial interface architecture provides two serial ports for each of the sixteen server modules. Each management module picks off all two pairs from the sixteen and multiplexes them into a single physical outlet or connector, this is referred to as the AMPC architecture that includes the AMPC bus.

[0093] In one embodiment, now described relative to FIG. 7 , the AMPC Bus provides a communications channel for communicating serial data, and video, as well as keyboard and mouse inputs. Typically, the serial data and any video data flows from one of the plurality of Server Modules to the Management Module(s) and the keyboard and mouse input or commands flow from the Management Module(s) to the Server Modules. Ethernet and serial I/O (SIO) connections are also provided to and from the Management Module for redundancy and alternative access.

[0094] This time-domain or time-sliced multiplexing and selection eliminates the need for so many physical connectors. Each Management Module has a selector for one of the 32 (2×16) serial lines, and places the selected serial pair on the single Management Module connector. Of course, multiple connectors either with or without some level of multiplexing may be provided, but such configuration is not preferred as it would likely increase the physical size of a Management Module unit and decrease the effective density of the ISSU. Besides the serial interface, keyboard, video, and mouse (KVM) data or signals can be transferred to and/or from the Management Module using the same or a similar scheme.

[0095] A load balancer 128 couples each ISS unit 102 via an uplink, such as a via a gigabit uplink, to a router 130 . The load balancer 128 is of conventional and includes intelligence to sense the load on each of the operating servers and task the servers according to some predetermined rules or policy to serve data or content. When used in connection with the inventive power conserving features, the intelligent load balancer and router are operative to sense which of the server modules are in an active mode and to route server tasking to those active server modules according to some policy. Policies concerning how many server modules should be maintained in an active mode, what CPU core voltage and clock frequency such active server modules operate at, and other server module operating characteristics are described elsewhere herein. Router 130 is interposed between the load balancer 128 and a network of interconnected computers or information appliances, such as for example the Internet 132 . Though advantageously provided, where appropriate, load balancers and/or routers may be eliminated. For example, they would not be required when only a single server module is provided. The structure and operation of load balancers 128 and routers 130 as well as the Internet 132 are well known and not described in further detail here.

[0096] The bi-directional uplinks (and downlinks) 122 , 124 , 126 are communication links that provide high-capacity, high-throughput data communication between the ISS 102 (actually the switch module 104 of the ISS) and the external world, including the load balancer 128 and the Network Attached Storage (NAS) 120 . Gigabit uplinks for uploading (and downloading) data or content provide high data rate communications and are known in the art and therefore not described in greater detail here. Alternatively, an up and down link can be aggregated to provide two uplinks as illustrated in FIG. 8 , which shows a plurality of ISSU (ISS 1 , ISS 2 , . . . , ISS n ) coupled to first and second Gigabyte switches GS 1 , GS 2 . Gigabyte switch GS 1 is coupled to a router which is in turn coupled to a network, such as the Internet. Gigabyte switch GS 2 may be similarly coupled.

[0097] Network Attached Storage NAS is optionally but desirably provided for several reasons. While the storage provided for each server module provides rapid access and response to requests, the size of the server module may necessarily limit the amount of data available on any particular server module. For example, 2.5-inch and 3.5-inch form factor hard disk drives may typically have capacities in the range of 32-Gigabyte to 100-Gigabyte of storage, though such capacity may be expected to increase as new recording media and head technology are developed. In any event, NAS in the form of one or more hard disk drives, RAID arrays, disk farms, or the like mass storage devices, arrays, or systems provide substantially greater storage.

[0098] Content that has been requested or that will be requested and served with high probability may be uploaded from NAS to one or more server modules and cached for later serving. Another benefit of the attached NAS is that a single copy of data is provided that is accessible to all the server modules and can be accessed either directly when only one is present, or through a switch when more than one is present. It is noted that the switch module coupling the ISSU to the load balancer is different than the switch module from the ISSU to the NAS.

[0099] Alternative access nodes and connectivity are provided for monitoring and managing operation and configuration of a particular ISS, component or module of an ISS, or ISS and/or components coupled to an ISS for which monitoring or management are desired. In one embodiment, this access is provided by a remote internet management node 136 coupled via an internet connection 134 to the internet 132 and hence via router 130 , optional load balancer 128 , and uplink/downlink 124 , 126 to the ISS 102 . Within each ISS 102 , monitoring and/or management operations will typically be carried out by a defined communication path (typically over the backplane) to one or more of the management modules 108 . It is noted that the backplane provides multiple sets of traces for multiple communication channels, including ethernet and serial channels, and that the backplane is not limited to a single bus. Monitoring and management access from remote Internet management node 136 over an Internet connection 134 is desirable as it provides additional redundancy and convenient monitoring and control using readily available protocols from virtually any remote location.

[0100] An alternate path is desirably provided to a local management node 138 over the serial communications channel 142 , and a second alternate path may desirably be provided from the local management node 138 to one or more of (and preferably to all of) the management modules over a second ethernet communication channel or link 140 that is different from the ethernet control channel. Monitoring and management access from local management node 138 over ethernet communication link 140 is desirable as it provides another alternative connection, communication, and possible control when desired, and advantageously permits connection using standard TCP/IP software and protocols. A further alternate communication path may desirably be provided via a remote dial-in management node 146 over a Plain Old Telephone Service (POTS), typically trough the local management node 138 , and then either over the ethernet 140 or the serial connection 142 . While communication with the ISS over any of these communication channels may itself suffice, the provision of alternate links and communication schemes provides for considerable flexibility in access, management, and control. The alternate paths also provide considerable redundancy from single channel failure in order to diagnose and service the ISS or ISS-based system in the event of a failure. For example, should a problem occur that disables the switch modules 104 and access via the gigabit uplink/downlink paths 124 , 126 , communication with the management modules 108 and with the rest of the ISS will still be possible on site either over serial bus 142 or ethernet link 140 . When access from a remote location is desired, either dial-up (such as via a phone modem) or Internet based access is generally; however, each serves as a redundant alternate path for the other in the event of failure.

[0101] It is particularly noted that the integrated structure of these ISS units provides a small form factor (2U high chassis/enclosure); high server module density (sixteen server modules per ISS in one embodiment); switch module, cooling/fan module, power supply module, management module, and server module hot plug-and-play and high availability via redundancy; lower energy or power consumption than conventional servers; and many other advantageous features as described in greater detail herein.

[0102] Many different types of servers architectures are known in the art. Typically, such servers have at least one processor with associated fast random access memory (RAM), a mass storage device that stores the data or content to be served by the server, a power supply that receives electrical power (current and voltage) from either a battery or line voltage from an electrical utility, a network communication card or circuit for communicating the data to the outside world, and various other circuits that support the operation of the CPU, such as a memory (typically non-volatile ROM) storing a Basic Input-Output System (BIOS), a Real-Time Clock (RTC) circuit, voltage regulators to generate and maintain the required voltages in conjunction with the power supply, and core logic as well as optional micro-controller(s) that communicate with the CPU and with the external world to participate in the control and operation of the server. This core logic is sometimes referred to as the Northbridge and Southbridge circuits or chipsets.

[0103] From a somewhat different perspective, variations in server architecture, reflect the variations in personal computers, mainframes, and computing systems generally. The vast structural, architectural, methodological, and procedural variations inherent in computer systems having chips, chipsets, and motherboards adapted for use by Intel Processors (such as the Intel x86, Intel Pentium™, Intel Pentium™ II, Intel Pentium™ III, Intel Pentium™ IV), Transmeta Crusoe™ with LongRun™, AMD, Motorola, and others, precludes a detailed description of the manner in which the inventive structure and method will be applied in each situation. Therefore in the sections that follow, aspects of the inventive power management and ISS system architecture are described first in a general case to the extent possible, and second relative to a particular processor/system configuration (the Transmeta Crusoe Processor). Those having ordinary skill will appreciate in light of the description that the inventive structure and method apply to a broad set of different processor and computer/server architecture types and that minor variations within the ordinary skill of a practitioner in the field may be made to adapt the invention to other processor/system environments.

[0104] Before describing particular implementations that relate to more or less specific CPU designs and interfaces, attention first directed to a simplified embodiment of the inventive system and method with respect to FIG. 9 . In this embodiment, at least two (and up to n) server modules 402 - 1 , . . . , 402 -N are provided, each including a CPU 404 and a memory 408 . CPU 404 includes an activity indicator generator 406 which generates activity indicators, and either (i) communicates the activity indicators to memory 408 for storage in an activity indicator(s) data structure 410 , or not shown, (ii) communicates them directly to a server module control unit and algorithm 432 within management module 430 . Different types of activity indicators such as are described elsewhere in the specification, such as for example an idle thread based activity indicator may be used. Whether stored in memory or communicated directly, the activity indicator(s) are used by the management module to determine the loading on each of the server modules individually and as a group. In one embodiment, activity information or indicators created on any one computer or device (such as a server module) is accessible to a manager or supervisor via standard networking protocol.

[0105] Although not illustrated in FIG. 9 , analogous structure and signals generated and received may be used to control the operation of core logic circuits to thereby control core logic voltage and core logic clock signals in a manner to reduce power consumption where such core logic power management is provided.

[0106] Voltage and frequency are regulated locally by the CPU using an activity monitoring scheme, such as for example one of the activity monitoring scheme illustrated in Table I. 1

TABLE I
Exemplary Activity Monitoring Schemes carried out in CPU or PMU
Carried out by CPU Carried out by PMU
Application Layer Port Address NA
Network Layer TCP/IP NA
Physical Layer Idle Threads, Activity Counter I/O Activities

[0107] This power management scheme may be interpreted in one aspect as providing a Mode1-to-Mode2 and Mode2-to-Mode1 power management scheme, where both Mode 1 and Mode2 are active modes and the state of the CPU in either Mode 1 or Mode 2 is controlled locally by the CPU, and in another aspect as providing a Mode3 (inactive mode or maintenance of memory contents only). Mode 3 control may also be performed locally by the CPU, but in one of the preferred embodiments of the invention, entry into a Mode 3 stage is desirably controlled globally in a multi-CPU system. Where the multi-CPU's are operative with a plurality of servers for multi-serverpower management, the Management Module (or a Server Module acting as a manager on behalf of a plurality of server modules) determines which Server Module should enter a Mode 3 state using the Server Module control algorithm and unit 432 . Activity monitoring of individual Server Modules 402 is desirably based on the standard network protocol, such as for example SNMP. Therefore the activity indicators may be retrieved from the CPU 406 or memory 408 via NIC 440 as is known in the art. A communication link coupling microcontrollers (μC) 442 together, and in particular the microcontroller of the Management Module with the microcontrollers of the several Server Modules. This permits the management module to communicate commands or signals to the server modules which are received by the microcontrollers even when the CPUs are in a suspended state (Mode 3). In so providing for monitoring over the first link (the Ethernet) and control over the second link (the AMPC bus), the server modules may be monitored for activity and controlled globally to reduce power consumption while providing sufficient on-line capacity. It is noted that the power management may be effected by altering either or both of the CPU clock frequency 420 or the CPU voltage 416 .

[0108] Although a separate management module 430 is illustrated in FIG. 9 , it should be understood that the management functionality generally, and the server module control algorithm in particular may be implemented by one of the operating server modules. For example, the control algorithm would be implemented as a software or firmware procedure executing in the CPU and processor of a server module designated according to predetermined rules, policies, or procedures to be the master.

[0109] It is noted that although several of the modes described conserve power, they do not compromise performance, as the cumulative combination of server modules is always maintained at or above minimum targeted performance.

[0110] In FIG. 10 there is illustrated an exemplary system 301 including a server (such as for example, an ISSU server module) 302 - 1 , coupled to a switch (such as for example, an ISSU switch module) 304 , and through the switch 304 and optionally via a micro-controller (IC) 314 within server 302 over a separate (optional) direct bus connection 312 (such as for example, the AMPC bus made by Amphus of San Jose, Calif.) to a power management supervisor (such as for example, ISSU management module) 316 . As described elsewhere herein, switch 304 is responsible for connecting the various server module(s) 302 , management module(s) 316 , and other components that are o