Plaque It!
Sponsored by: Flash of Genius |
This application is related to the following Non-Provisional U.S. patent applications:
| Ser. No. | ||
| (Docket No.) | Filing Date | Title |
| 11/051997 | Feb. 4, 2005 | BIFURCATED THREAD SCHEDULER IN A |
| (MIPS.0199-00-US) | MULTITHREADING MICROPROCESSOR | |
| 11/051980 | Feb. 4, 2005 | LEAKY-BUCKET THREAD SCHEDULER IN |
| (MIPS.0200-00-US) | A MULTITHREADING MICROPROCESSOR | |
| 11/051979 | Feb. 4, 2005 | MULTITHREADING MICROPROCESSOR |
| (MIPS.0201-00-US) | WITH OPTIMIZED THREAD SCHEDULER | |
| FOR INCREASING PIPELINE UTILIZATION | ||
| EFFICIENCY | ||
| 11/051998 | Feb. 4, 2005 | MULTITHREADING PROCESSOR |
| (MIPS.0201-01-US) | INCLUDING THREAD SCHEDULER BASED | |
| ON INSTRUCTION STALL LIKELIHOOD | ||
| PREDICTION | ||
| 11/051978 | Feb. 4, 2005 | INSTRUCTION/SKID BUFFERS IN A |
| (MIPS.0202-00-US) | MULTITHREADING MICROPROCESSOR | |
| 11/087064 | Mar. 22, 2005 | BARREL-INCREMENTER-BASED ROUND- |
| (MIPS.0204-00-US) | ROBIN APPARATUS AND INSTRUCTION | |
| DISPATCH SCHEDULER EMPLOYING | ||
| SAME FOR USE IN MULTITHREADING | ||
| MICROPROCESSOR | ||
| 11/087070 | Mar. 22, 2005 | INSTRUCTION DISPATCH SCHEDULER |
| (MIPS.0208-00-US) | EMPLOYING ROUND-ROBIN APPARATUS | |
| SUPPORTING MULTIPLE THREAD | ||
| PRIORITIES FOR USE IN | ||
| MULTITHREADING MICROPROCESSOR | ||
| 11/086258 | Mar. 22, 2005 | RETURN DATA SELECTOR EMPLOYING |
| (MIPS.0209-00-US) | BARREL-INCREMENTER-BASED ROUND- | |
| ROBIN APPARATUS | ||
| 11/087063 | Mar. 22, 2005 | FETCH DIRECTOR EMPLOYING BARREL- |
| (MIPS.0210-00-US) | INCREMENTER-BASED ROUND-ROBIN | |
| APPARATUS FOR USE IN | ||
| MULTITHREADING MICROPROCESSOR | ||
| 11/191258 | Jul. 27, 2005 | MULTITHREADING INSTRUCTION |
| (MIPS.0216-00-US) | SCHEDULER EMPLOYING THREAD GROUP | |
| PRIORITIES | ||
| concurrently | TRANSACTION SELECTOR EMPLOYING | |
| (MIPS.0234-00-US) | herewith | BARREL-INCREMENTER-BASED ROUND- |
| ROBIN APPARATUS SUPPORTING | ||
| DYNAMIC PRIORITIES IN MULTI-PORT | ||
| SWITCH | ||
| concurrently | TRANSACTION SELECTOR EMPLOYING | |
| (MIPS.0234-01-US) | herewith | ROUND-ROBIN APPARATUS SUPPORTING |
| DYNAMIC PRIORITIES IN MULTI-PORT | ||
| SWITCH | ||
| concurrently | TRANSACTION SELECTOR EMPLOYING | |
| (MIPS.0235-01-US) | herewith | TRANSACTION QUEUE GROUP PRIORITIES |
| IN MULTI-PORT SWITCH | ||
The present invention relates in general to switches, and particularly to the fair and efficient arbitration for switch port bandwidth from multiple competing requestors thereof.
In a multi-port switch, each of the ports of the switch receives transactions from the device coupled to the port. The switch routes the transactions from the source port to a destination port of the switch specified by the transaction so that the destination port can output the transactions to the device coupled to the destination port. The destination port may receive transactions from all the other ports of the switch. If the destination port is receiving requests to output transactions from multiple source ports, the destination port must select the order in which to output the transactions received from the various source ports. Thus, each of the source ports competes for output bandwidth of the destination port, and the destination port must implement a policy for arbitrating, or scheduling, the transmission of the transactions from the various competing source ports out the destination port.
As may be observed from the foregoing, the extent to which a switch helps or hinders the overall performance of a system that incorporates the switch to connect various devices may be highly dependent upon the policy for scheduling the transmission of transactions out of the ports of the switch. Furthermore, the appropriate transaction scheduling policy may be highly dependent upon the particular application in which the switch is used. Still further, it may be desirable to vary the transaction scheduling policy from port to port within the switch depending upon the type of device that is coupled to a given port. In particular, it may be desirable to accommodate varying quality-of-service requirements for the various combinations of paths between the different ports of the switch depending upon the types of devices connected to the ports. That is, it may be desirable for each destination port to guarantee different transaction bandwidth requirements for each of the source ports of the switch, and particularly, the avoidance of transaction bandwidth starvation for any of the source ports. Consequently, it is highly desirable to provide customers with various applications the ability to customize the transaction scheduling policy to meet their particular requirements. A customizable transaction scheduling policy is particularly desirable when attempting to design a switch core that may be part of a system that is customizable to meet the needs of various customer applications. This makes the switch core reusable for various designs, which is highly desirable because it avoids having to redesign an entire switch for each application.
However, making the entire transaction scheduling policy circuitry of the switch customizable is problematic since the transaction scheduling policy circuitry is typically closely tied to the internal operation of the switch, which may have undesirable side effects. For example, it may be difficult for the customer to understand the internal workings of the switch, and therefore difficult for the customer to customize the transaction scheduling policy circuitry. Furthermore, timing critical signal paths of the internal switch would necessarily be exposed to the customer, which might potentially lower the overall clock speed of the switch if the customer's custom logic is too slow. Finally, the customer may introduce bugs into the transaction scheduling policy circuitry potentially seriously impacting the overall operation and functionality of the switch core. Therefore, what is needed is a switch with an architecture that enables its transaction scheduling policy circuitry to be customizable without undesirable side effects, such as those mentioned above.
Furthermore, because there are multiple ports in a switch competing for the limited output bandwidth of a given port, there is a need to fairly arbitrate among the requesting ports for the limited output bandwidth. One fair arbitration scheme used in other contexts is a round-robin arbitration scheme. In a round-robin arbitration scheme, an order of the requesters is maintained and each requestor gets a turn to use the requested resource in the maintained order. The circuitry to implement a round-robin arbitration scheme in which each of the requestors requests the resource each time the resource becomes available is not complex. A conventional round-robin circuit may be implemented as a simple N-bit barrel shifter, wherein N is the number of requestors and one bit corresponds to each of the N requesters. One bit of the barrel shifter is initially true, and the single true bit is rotated around the barrel shifter each time a new requester is selected. One characteristic of such a round-robin circuit is that the complexity is N. In particular, the integrated circuit area and power consumed by the barrel shifter grows linearly with the number of requesters N.
However, the circuitry to implement a round-robin arbitration scheme in which only a variable subset of the requestors may be requesting the resource each time the resource becomes available is more complex. A conventional round-robin circuit accommodating a variable subset of requesting requesters may be implemented by a storage element storing an N-bit vector, denoted L, having one bit set corresponding to the previously selected requester and combinational logic receiving the L vector and outputting a new N-bit selection vector, denoted N, according to the following equation, where E.i indicates whether a corresponding one of the requesters is currently requesting:
| N.i = |
| ; This requestor is enabled, i.e., is requesting. |
| E.i AND |
| ; The requestor to the right was selected last time. |
| (L.i−1 OR |
| ; A requestor further to the right was selected last |
| ; time AND the requestors in between are disabled. |
| (~E.i−1 AND L.i−2) OR |
| (~E.i−1 AND ~E.i−2 AND L.i−3) OR |
| ... |
| ; This requestor was selected last time, |
| ; but no other requestors are enabled. |
| (~E.i−1 AND ~E.i−2 AND ~E.i−3 AND .... ~E.i+1 AND L.i)) |
As may be observed from the equation above, the complexity of the conventional round-robin circuit accommodating a variable subset of disabled requestors has complexity N 2 . Thus, as the number of requesters—such as the number of ports in a switch requesting a port to transmit out transactions—becomes relatively large, the size of the conventional circuit may become burdensome on the switch in terms of size and power consumption, particularly if more than one such circuit is needed in the switch.
Furthermore, in some applications, the requesters may have multiple priority levels; i.e., some requesting ports may have higher priority than others. It is desirable to select requesting ports fairly within each of the priority levels. That is, it is desirable for requesting ports to be chosen in a round-robin manner within each priority level independent of the order the requesting ports are chosen within the other priority levels. Furthermore, the priority levels of the various requesting ports may change dynamically over time. Therefore, what is needed is a transaction scheduler for the ports in a switch that incorporates a simple and fast round-robin apparatus and method that accommodates a variable subset of all requesting ports at a time, and which does so independent of the priority level, among multiple priority levels, at which the requesting ports are requesting transmission.
Still further, a problem that may be introduced by allowing different priorities among the requesting ports is that it may be difficult to accomplish the desired quality-of-service in terms of transaction output bandwidth. In particular, low priority requesting ports may be starved for bandwidth in favor of high priority requesting ports.
Therefore, what is needed is a switch with a customizable transaction scheduling policy architecture that allows prioritization among requestors and yet still accomplishes desired quality-of-service requirements by fairly distributing the transaction transmission bandwidth of a switch port.
In one aspect, the present invention provides a bifurcated selector for transmitting transactions from a plurality of transaction queues out a port of a switch. The selector includes a transaction scheduler, configured to select transactions of the plurality of transaction queues for transmission to a device coupled to the port. The selector also includes a policy manager, for enforcing a scheduling policy of the plurality of transaction queues. The selector also includes an interface, coupling the policy manager to the transaction scheduler. The interface includes first signals, for the transaction scheduler to receive from the transaction scheduler a priority for each of the plurality of transaction queues. The transaction scheduler selects the transactions for transmission to the device based on the priorities. The interface also includes second signals, for the policy manager to receive from the transaction scheduler transaction transmission information for each of the plurality of transaction queues. The policy manager updates the priorities based on the transaction transmission information. The transaction transmission information comprises an indication of which of the plurality of transaction queues a transaction was selected from for transmission.
In another aspect, the present invention provides a port in a switch. The port includes a policy manager, configured to prescribe a policy for scheduling transmitting of transactions out the port from a plurality of transaction queues. The port also includes a switch core, coupled to the policy manager. The switch core includes the plurality of transaction queues and a transaction scheduler, coupled to select transactions of the plurality of transaction queues for transmitting out the port based on the policy received from the policy manager. The port also includes an interface, for coupling the policy manager and the switch core. The interface includes priority indicators, for the policy manager to communicate to the transaction scheduler a transmit priority for each of the plurality of transaction queues, to prescribe the policy. The interface also includes transmission indicators, for the transaction scheduler to communicate to the policy manager for each of the plurality of transaction queues an indication of whether a transaction was transmitted out the port from the transaction queue.
In another aspect, the present invention provides a policy manager for enforcing a transaction scheduling policy of a switch port that transmits transactions from a plurality of transaction queues. The policy manager includes inputs, for receiving, each switch clock cycle, an indication of whether a transaction was transmitted for each of the plurality of transaction queues. The policy manager also includes outputs, for transmitting, each switch clock cycle, a transaction transmit priority for each of the plurality of transaction queues. The policy manager also includes logic, coupled to the inputs, for generating the outputs based on the inputs to prescribe the transaction scheduling policy.
In another aspect, the present invention provides a switch core including a plurality of ports each configured to transmit transactions received from a plurality of transaction queues. The switch core includes for each port an interface, for coupling the switch core to a policy manager external to the switch core. The interface includes first signals, for the policy manager to communicate to the switch core a priority of each of the plurality of transaction queues. The interface also includes second signals, for the policy manager to receive from the switch core information for each of the plurality of transaction queues indicating whether the switch core has transmitted a transaction of the transaction queue, for use by the policy manager to update the priorities. The switch core also includes for each port a transaction scheduler, coupled to receive the first signals and to select one of the plurality of transaction queues from which to transmit a transaction out the port, based on the priorities received on the first signals.
In another aspect, the present invention provides a method for scheduling a port of a switch to transmit transactions from a plurality of transaction queues. The method includes signaling by a policy manager to a transaction scheduler a transaction scheduling priority for each of the plurality of transaction queues, during a first clock cycle. The method also includes transmitting by the transaction scheduler one transaction from the plurality of transaction queues, during a second clock cycle, in response to the signaling the priorities. The method also includes signaling by the transaction scheduler to the policy manager an indication whether the transaction scheduler transmitted a transaction for each of the plurality of transaction queues, during a third clock cycle subsequent to the first clock cycle.
In another aspect, the present invention provides a computer program product for use with a computing device, the computer program product including a computer usable storage medium, having computer readable program code embodied in the medium, for providing a switch core including a plurality of ports each configured to transmit transactions received from a plurality of transaction queues. The computer readable program code includes first program code for providing an interface for each port, for coupling the switch core to a policy manager external to the switch core. The interface includes first signals, for the policy manager to communicate to the switch core a priority of each of the plurality of transaction queues. The interface also includes second signals, for the policy manager to receive from the switch core information for each of the plurality of transaction queues indicating whether the switch core has transmitted a transaction of the transaction queue, for use by the policy manager to update the priorities. The computer readable program code also includes second program code for providing a transaction scheduler for each port, coupled to receive the first signals and to select one of the plurality of transaction queues from which to transmit a transaction out the port, based on the priorities received on the first signals.
In another aspect, the present invention provides an interface between a switch core and policy manager for enabling the policy manager to prescribe a policy for scheduling the transmission of transactions from a plurality of transaction queues by the switch core. The interface includes priority indicators, for the policy manager to communicate a transmit priority for each of the plurality of transaction queues to a transaction scheduler of the switch core. The interface also includes feedback indicators, for the switch core to communicate to the policy manager for each of the plurality of transaction queues an indication of whether the switch core transmitted a transaction for the transaction queue.
In another aspect, the present invention provides a method for providing a switch core including a plurality of ports each configured to transmit transactions received from a plurality of transaction queues. The method includes providing computer-readable program code describing the switch core. The program code includes first program code for providing an interface for each port, for coupling the switch core to a policy manager external to the switch core. The interface includes first signals, for the policy manager to communicate to the switch core a priority of each of the plurality of transaction queues. The interface also includes second signals, for the policy manager to receive from the switch core information for each of the plurality of transaction queues indicating whether the switch core has transmitted a transaction of the transaction queue, for use by the policy manager to update the priorities. The program code also includes second program code for providing a transaction scheduler for each port, coupled to receive the first signals and to select one of the plurality of transaction queues from which to transmit a transaction out the port, based on the priorities received on the first signals. The method also includes transmitting the computer-readable program code as a computer data signal on a network.
In another aspect, the present invention provides a switch. The switch includes a network and a plurality of ports, each coupled to receive transactions from other of the plurality of ports via the network. Each port includes a port interface, configured to transmit the transactions to a device coupled to the port. Each port also includes a policy manager, configured to prescribe a policy for scheduling transmitting of transactions out the port interface from a plurality of transaction queues. Each port also includes a switch core, coupled to the policy manager and to the port interface. The switch core includes the plurality of transaction queues, configured to receive the transactions from the network. The switch core also includes a transaction scheduler, coupled to select transactions of the plurality of transaction queues for transmitting out the port based on the policy received from the policy manager. Each port includes an interface, for coupling the policy manager and the switch core. The interface includes priority indicators, for the policy manager to communicate to the transaction scheduler a transmit priority for each of the plurality of transaction queues, to prescribe the policy. The interface also includes transmission indicators, for the transaction scheduler to communicate to the policy manager for each of the plurality of transaction queues an indication of whether a transaction was transmitted out the port from the transaction queue.
FIG. 1 is a block diagram illustrating a switch according to the present invention.
FIG. 2 is a block diagram illustrating a representative port of the switch of FIG. 1 according to the present invention.
FIG. 3 is a block diagram illustrating the transaction selector within the switch of FIG. 1 according to one embodiment of the present invention in which the transaction selector is bifurcated.
FIG. 4 is a block diagram illustrating in more detail the transaction scheduler of FIG. 3 and the transaction selection logic of FIG. 2 according to the present invention.
FIG. 5 is a flowchart illustrating operation of the transaction scheduler of FIG. 4 according to the present invention.
FIGS. 6A and 6B are a block diagram illustrating the transaction scheduler of FIG. 3 including round-robin logic of FIG. 4 according to one embodiment of the present invention.
FIG. 7 is a block diagram illustrating a round-robin generator of FIG. 6 according to one embodiment of the present invention.
FIGS. 8A through 8D are block diagrams illustrating the barrel-incrementer of FIG. 7 according to one embodiment of the present invention.
FIGS. 9A and 9B are block diagrams illustrating examples of operation of the transaction scheduler employing the round-robin generators of FIG. 6 according the present invention.
FIG. 10 is a block diagram illustrating the transaction scheduler of FIG. 3 including round-robin logic of FIG. 4 according to an alternate embodiment of the present invention.
FIG. 11 is a block diagram illustrating the round-robin generator of FIG. 10 according to one embodiment of the present invention.
FIGS. 12A through 12D are block diagrams illustrating examples of operation of the transaction scheduler having round-robin generators of FIG. 10 according the present invention.
FIG. 13 is a block diagram of an example application system for use of the switch of FIG. 1 according to the present invention.
FIG. 14 is a block diagram illustrating the policy manager of FIG. 3 and a QSchedule register according to the present invention.
FIG. 15 is a flowchart illustrating operation of the policy manager of FIG. 14 according to the present invention.
FIG. 16 is a block diagram illustrating the transaction selector within the switch of FIG. 1 according to an alternate embodiment of the present invention in which the transaction selector 108 is bifurcated.
FIG. 17A is a block diagram illustrating in more detail the transaction scheduler of FIG. 16 according to one embodiment of the present invention.
FIG. 17B is a flowchart illustrating operation of the transaction scheduler 602 of FIG. 17A according to the present invention.
FIGS. 18A and 18B are a block diagram illustrating the transaction scheduler of FIG. 16 including round-robin logic of FIG. 17 according to one embodiment of the present invention.
FIG. 19 is a block diagram illustrating a round-robin generator of FIG. 18 according to one embodiment of the present invention.
FIG. 20 is a block diagram illustrating an example of logic for generating the PM_group_priority signals within a policy manager of FIG. 16 according to the present invention.
FIG. 21 is a block diagram illustrating the transaction scheduler of FIG. 16 including round-robin logic of FIG. 17 according to an alternate embodiment of the present invention.
FIG. 22 is a block diagram illustrating the round-robin generator of FIG. 21 according to an alternate embodiment of the present invention.
FIG. 23 is a block diagram illustrating a second example of logic for generating the PM_group_priority signals within a policy manager of FIG. 16 according to the present invention.
FIG. 24 is a table illustrating operation of the logic of FIG. 23 in an example transaction queue configuration of the switch of FIG. 1 according to the present invention.
FIGS. 25 through 27 are flowcharts illustrating a method for providing software embodying the apparatus of the present invention and subsequently transmitting the software as a computer data signal over a communication network.
Referring now to FIG. 1, a block diagram illustrating a switch 100 according to the present invention is shown. The switch 100 includes a plurality of ports 102 each coupled to a network 104 . Each port 102 includes a plurality of transaction queues 106 (also referred to herein as a “Q”) that receive transactions 116 from the network 104 . Each port 102 also includes a transaction selector 108 coupled to receive transactions from the transaction queues 106 . Each port 102 also includes a port interface 114 that receives a transaction 122 from the transaction selector 108 . The transaction selector 108 periodically selects a transaction from one of the transaction queues 106 to provide to the port interface 114 according to one of various embodiments as described herein. The port interface 114 transmits the received transaction 122 on a bus 112 to a device coupled by the bus 112 to the port 102 . The port interface 114 also receives transactions on the bus 112 from the device coupled to the port 102 and forwards the transactions 118 to the network 104 .
The network 104 switches the transactions 118 from the source ports 102 of the switch 100 to the appropriate transaction queue 106 of the appropriate destination port 102 based on the destination of the transaction 118 . The network 104 includes connection paths for connecting the port interface 114 of the source ports 102 to the transaction queue 106 of the destination ports 102 . A source port 102 denotes a port 102 that transmits a transaction to the network 104 , and a destination port 102 denotes a port 102 that receives a transaction from the network 104 . Hence, each port 102 may be both a source and a destination port 102 . Thus, each transaction queue 106 in a given destination port 102 stores transactions transmitted through the network 104 by only one of the other source ports 102 in the switch 100 . That is, each source port 102 for which the network 104 includes a connection path to a destination port 102 has a corresponding transaction queue 106 in the destination port 102 for storing the transactions transmitted by the source port 102 through the network 104 . In one embodiment, there is a one-to-one relationship between the transaction queues 106 of a destination port 102 and the source ports 102 of the switch 100 . In one embodiment, there may be some source ports 102 of the switch 100 that do not transmit transactions to all of the other ports 102 in the switch 100 . In one embodiment, the network 104 may include multiple connection paths between a source port 102 and a destination port 102 , in which case the destination port 102 includes multiple transaction queues 106 associated with the multiple connection paths for storing the transactions received from the source port 102 . In one embodiment, the network 104 comprises a cross-bar type network. However, other types of networks 104 for switching transactions between the various ports 102 are contemplated.
Each transaction queue 106 in a port 102 has an associated priority for being selected to have its transactions transmitted to the port interface 114 . Advantageously, the transaction selector 108 may dynamically vary the priorities of the transaction queues 106 as described herein as needed by a given application in which the switch 100 is employed. In particular, the transaction selector 108 may vary the priorities to avoid a given source port 102 from being starved from having its transactions transmitted to the port interface 114 . Furthermore, the transaction selector 108 may vary the priorities to guarantee a specified minimum amount of bandwidth, or quality-of-service, to each of the source ports 102 , as described herein.
Advantageously, the transaction selector 108 for each port 102 of the switch 100 may be uniquely tailored to accommodate the particular characteristics of the port 102 , such as particular quality-of-service requirements of the port 102 . Advantageously, embodiments of the transaction selector 108 are described that not only provide a high degree of control of the arbitration between the various transaction queues 106 , but do so in a low latency manner. Furthermore, the transaction selectors 108 are relatively small, and grow in size on the order of N, where N is the number of transaction queues 106 that must be selected from. This is important in applications in which the number of ports 102 on the switch 100 becomes relatively large. For example, the switch 100 may be employed in a system-on-chip (SOC) embodiment that includes a processor core, one or more memories, and multiple application blocks.
A transaction may include a command, or data, or both a command and data. For example, a transaction may include a command to write a specified amount of data from a source port 102 to a destination port 102 . In the case of a write command, the transaction may include all or part of the data to be written. If the transaction including the write command does not include all the data to be written, then subsequent transactions from the source port 102 to the destination port 102 may include the remaining data. For another example, a transaction may include a command to read a specified amount of data from the destination port 102 to the source port 102 . In the case of a read command, subsequent transactions sent from the port 102 that received the read command to the port that sent the read command will include the requested data.
Using the system 1300 of FIG. 13 as an example, the CPU 1302 may send a transaction to its port 102 that includes a write command to write data to the memory 1304 . The write command transaction includes the data to be written. The CPU 1302 port 102 port interface 114 provides the transaction to the network 104 , which switches the transaction to a transaction queue 106 in the memory 1304 port 102 associated with the CPU 1302 port 102 . Eventually, the transaction selector 108 in the memory 1304 port 102 selects the transaction queue 106 in the memory 1304 port 102 associated with the CPU 1302 port 102 and transmits the transaction via the port interface 114 to the memory 1304 , which writes the data to the location in the memory 1304 specified in the transaction. Similarly, the CPU 1302 may send a write transaction to the PCI bus bridge 1306 that includes data, for example, to perform a programmed-I/O or memory-mapped I/O operation to read or write control and status registers of an I/O device coupled to the PCI bus bridge 1306 . Still further, the PCI bus bridge 1306 may send a write transaction on behalf of the I/O device to the CPU 1302 to perform a DMA operation, for example.
Using the system 1300 of FIG. 13 again as an example, the CPU 1302 may send a transaction to its port 102 that includes a read command to read data from the memory 1304 . The transaction is switched through the network 104 to the transaction queue 106 in the memory 1304 port 102 associated with the CPU 1302 . When the memory 1304 receives the transaction, it fetches the data from the location specified in the transaction and then sends a transaction to its port 102 that includes the requested data. The transaction including the read data is switched through the network 104 to the CPU 1302 port 102 and eventually transmitted to the CPU 1302 . Similarly, the AGP bus bridge 1308 may send a read transaction to the memory 1304 , for example, to read video data from the memory 1304 for provision to a display adapter coupled to the AGP bus bridge 1308 .
Referring now to FIG. 2, a block diagram illustrating a representative port 102 of the switch 100 of FIG. 1 according to the present invention is shown. FIG. 2 illustrates the plurality of transaction queues 106 of FIG. 1 into which the network 104 writes transactions 116 . Each transaction queue 106 provides a transaction 206 at the bottom of the transaction queue 106 to transaction selection logic 202 of the transaction selector 108 of FIG. 1 of the port 102 . The transaction selection logic 202 selects one of the transactions 206 as selected transaction 204 for provision to the port selector 114 to be transmitted out of the port 102 . The transaction selection logic 202 selects the selected transaction 204 in response to a TS_Q_priority signal 208 provided by logic 212 of the transaction selector 108 of FIG. 1 for each transaction queue 106 . The logic 212 and operation of the TS_Q_priority signal 208 is described in more detail below with respect to FIGS. 4 and 5. Each of the transaction queues 106 provides an empty signal 218 to the logic 212 to indicate whether the transaction queue 106 is empty so that the transaction selector 108 will not attempt to read another transaction from the transaction queue 106 until the transaction queue 106 is no longer empty. In one embodiment, each transaction queue 106 also provides a full signal to the network 104 to indicate that it is full of transactions.
Referring now to FIG. 3, a block diagram illustrating the transaction selector 108 within the switch 100 of FIG. 1 according to one embodiment of the present invention in which the transaction selector 108 is bifurcated is shown. The bifurcated transaction selector 108 comprises a transaction scheduler (TS) 602 portion and a policy manager (PM) 604 portion. The transaction scheduler 602 portion is comprised within a switch core 606 of switch 100 ; whereas, the policy manager 604 portion is comprised outside of the switch core 606 . The switch core 606 is the portion of the switch 100 that is not customizable by the customer; whereas, the policy manager 604 is customizable by the customer. In one embodiment, the switch core 606 is a synthesizable core, also referred to as a soft core. The design of a synthesizable core is capable of being reduced to a manufacturable representation quickly and easily using automated tools, commonly referred to as synthesis tools.
The switch core 606 provides an interface 628 to the policy manager 604 comprising a plurality of signals. In one embodiment, the inputs to the transaction scheduler 602 and output signals from the transaction scheduler 602 are registered, to advantageously enable the non-core policy manager 604 logic to interface with the switch core 606 in a manner that alleviates certain timing problems that might be otherwise introduced by a bifurcated scheduler. Furthermore, the interface 628 is easy for the customer to understand, which eases the design of the policy manager 604 scheduling policy.
In Table 1 below, the various signals comprising the policy manager interface 628 according to one embodiment are shown. Table 1 specifies the signal name, the direction of the signal relative to the policy manager 604 , and a brief description of each signal. Table 1 describes an embodiment in which the switch 100 includes nine transaction queues 106 . Several of the signals described in Table 1 may be used by a device external to the policy manager 604 , such as a CPU, to read and write control registers that may be present in the policy manager 604 . For example, FIGS. 14 and 15 describe an embodiment in which the policy manager 604 includes a QSchedule Register 902 that may be read and written to accomplish an exemplary transaction transmission, or scheduling, policy by a port 102 . However, it should be understood that a policy manager 604 for a given port 102 may or may not comprise control registers, depending upon the transaction scheduling policy required for the particular port 102 .
| TABLE 1 | ||
| Signal Name | Direction | Description |
| PM_gclk | Input | Switch clock |
| PM_gfclk | Input | Free running switch clock |
| PM_greset | Input | Global Reset |
| PM_scanenable | Input | Global Scan Enable. |
| PM_rd_reg | Input | Register number for reads |
| PM_rd | Input | Read strobe |
| PM_rdata | Output | Read data |
| PM_wr_reg | Input | Register number for writes |
| PM_wr | Input | Write strobe |
| PM_wdata | Input | Write data |
| PM_Q_transaction_transmitted[8:0] | Input | A transaction was transmitted for the specified |
| transaction queue. | ||
| PM_Q_priority_0[1:0] | Output | Priority of transaction queue 0. |
| PM_Q_priority_1[1:0] | Output | Priority of transaction queue 1. |
| PM_Q_priority_2[1:0] | Output | Priority of transaction queue 2. |
| PM_Q_priority_3[1:0] | Output | Priority of transaction queue 3. |
| PM_Q_priority_4[1:0] | Output | Priority of transaction queue 4. |
| PM_Q_priority_5[1:0] | Output | Priority of transaction queue 5. |
| PM_Q_priority_6[1:0] | Output | Priority of transaction queue 6. |
| PM_Q_priority_7[1:0] | Output | Priority of transaction queue 7. |
| PM_Q_priority_8[1:0] | Output | Priority of transaction queue 8. |
| PM_Q_block[8:0] | Output | Prevent the transaction scheduler from transmitting |
| transactions for specified transaction queues. | ||
Some of the particular signals of the policy manager interface 628 specified in Table 1 will now be described in more detail. The policy manager 604 specifies to the transaction scheduler 602 the priority of the respective transaction queue 106 via the PM_Q_priority 652 output. In one embodiment, the PM_Q_priority 652 comprises two bits and the transaction scheduler 602 allows the policy manager 604 to specify one of four different priorities for a transaction queue 106 . The policy manager 604 instructs the transaction scheduler 602 to stop transmitting transactions for a transaction queue 106 by generating a true value on the respective PM_Q_block 654 output. Thus, the policy manager 604 may affect how the transaction scheduler 602 transmits transactions for the various transaction queues 106 via the PM_Q_priority 652 and PM_Q_block 654 outputs, as described in more detail below, particularly with respect to FIGS. 4 and 5 below.
The switch core 606 provides the PM_gclk 658 to the policy manager 604 , which enables the policy manager 604 to adjust the PM_Q_priority 652 periodically based on the PM_gclk 658 , as described below.
The transaction scheduler 602 communicates to the policy manager 604 that it has transmitted a transaction for a transaction queue 106 via a respective PM_Q_transaction_transmitted 644 input. Thus, the switch core 606 provides feedback about the transmission of transactions for the various transaction queues 106 via the PM_Q_transaction_transmitted 644 inputs, as described in more detail below, particularly with respect to FIGS. 4 and 5 below. In one embodiment, the transaction scheduler 602 is capable of removing a transaction from a transaction queue 106 in a single clock cycle. In one embodiment, the port interface 114 may take multiple clock cycles to transmit the transaction to the device coupled to the port 102 , depending upon the type of bus interface between the port 102 and the device. In one embodiment, if the transaction is transmitted in a burst as N sets of data over N clock cycles, the transaction scheduler 602 communicates to the policy manager 604 that it has transmitted N transactions for a transaction queue 106 via the respective PM_Q_transaction_transmitted 644 input.
Referring now to FIG. 4, a block diagram illustrating in more detail the transaction scheduler 602 of FIG. 3 and the transaction selection logic 202 of FIG. 2 according to the present invention is shown. The transaction selection logic 202 includes a tree of muxes 724 controlled by comparators 714 . In some of the embodiments discussed herein the comparators 714 are greater-than-equal (GTE) comparators. Each mux 724 receives a transaction 206 of FIG. 2 from two different transaction queues 106 . Each mux 724 also receives the transaction's 206 associated TS_Q_priority 208 of FIG. 2. The comparator 714 associated with each mux 724 also receives the pair of TS_Q_priority 208 signals for the two transaction queues 106 and controls its associated mux 724 to select the transaction 206 and TS_Q_priority 208 with the highest TS_Q_priority 208 value. The selected transactions 206 and TS_Q_priorities 208 propagate down the tree until the final mux 724 selects the selected transaction 204 of FIG. 2 with the highest TS_Q_priority 208 for provision to the transmission pipeline.
FIG. 4 shows logic 212 of the transaction scheduler 602 , namely transmittable transaction logic 708 and round-robin logic 712 . In one embodiment, the transmittable transaction logic 708 is replicated within the transaction scheduler 602 for each transaction queue 106 of the port 102 to generate a TS_Q_priority 208 for each transaction queue 106 . In contrast, the round-robin logic 712 is instantiated once for each possible PM_Q_priority 652 and generates a round-robin indicator for each PM_Q_priority 652 . For example, FIG. 4 illustrates an embodiment in which the policy manager 604 may specify one of four possible PM_Q_priorities 652 ; hence, the round-robin logic 712 is instantiated four times in the transaction scheduler 602 and generates four respective round-robin indicators.
In one embodiment, the round-robin indicator includes one bit per transaction queue 106 of the switch 100 . The bit of the round-robin indicator associated with its respective transaction queue 106 is provided as round-robin bit 748 as shown in FIG. 4. If the round-robin bit 748 is true, then it is the transaction queue's 106 turn in the round-robin scheme to be transmitted among the other transaction queues 106 that are currently at the same PM_Q_priority 652 .
The transmittable transaction logic 708 receives the PM_Q_block 654 signal from the policy manager 604 of FIG. 3 and the empty signal 218 of FIG. 2 from the transaction queue 106 . The transmittable transaction logic 708 generates a transmittable 746 signal in response to its inputs. The transmittable 746 signal is true if the transaction 206 at the bottom of the transaction queue 106 for the transaction queue 106 is transmittable. In one embodiment, a transaction is transmittable if the PM_Q_block 654 and empty 218 signals are false.
The transmittable 746 bit, the PM_Q_priority 652 bits, and the round-robin bit 748 are combined to create the TS_Q_priority 208 . In the embodiment of FIG. 4, the transmittable 746 bit is the most significant bit, the round-robin bit 748 is the least significant bit, and the PM_Q_priority 652 is the two middle significant bits. As may be observed, because the transmittable bit 746 is the most significant bit of the TS_Q_priority 652 , a non-transmittable transaction will be lower priority than all transmittable transactions. Conversely, the round-robin bit 748 is only used to select a transaction queue 106 if more than one transaction queue 106 has a transmittable transaction and has the same highest PM_Q_priority 652 .
Referring now to FIG. 5, a flowchart illustrating operation of the transaction scheduler 602 of FIG. 4 according to the present invention is shown. Flow begins at block 802 .
At block 802 , the transaction scheduler 602 initializes each round-robin indicator for each PM_Q_priority 652 . Flow proceeds to block 804 .
At block 804 , the transaction scheduler 602 determines, for each transaction queue 106 , whether the transaction queue 106 has a transmittable transaction 206 . That is, the transmittable transaction logic 708 for each transaction queue 106 generates a value on the transmittable 746 signal. In one embodiment, the transmittable transaction logic 708 generates a true signal on the transmittable 746 signal only if the PM_Q_block 654 and empty 218 signals are false. Flow proceeds to decision block 806 .
At decision block 806 , the transaction scheduler 602 determines, by examining the transmittable 746 signal for each of the transaction queues 106 , whether there are any transaction queues 106 that have a transmittable transaction 206 . If not, flow returns to block 804 until at least one transaction queue 106 has a transmittable transaction 206 ; otherwise, flow proceeds to block 808 .
At block 808 , the transaction scheduler 602 generates the TS_Q_priority 208 for the transaction 206 of each transaction queue 106 based on the transmittable 746 bit of the transaction queue 106 , the PM_Q_priority 652 of the transaction queue 106 , and the round-robin bit 748 of the PM_Q_priority 652 of the transaction queue 106 . Flow proceeds to block 812 .
At block 812 , the transaction scheduler 602 transmits the transaction 206 with the highest TS_Q_priority 208 . In other words, the transaction scheduler 602 transmits the transaction from the transaction queue 106 that has a transmittable transaction and has the highest PM_Q_priority 652 ; if multiple transaction queues 106 have a transmittable transaction and have the highest PM_Q_priority 652 , the transaction scheduler 602 transmits the transaction from the transaction queue 106 whose turn it is to transmit as indicated by the round-robin bit 748 for the PM_Q_priority 652 of the transaction queues 106 . Flow proceeds to block 814 .
At block 814 , the round-robin logic 712 updates the round-robin indicator for the PM_Q_priority 652 based on which of the transaction queues 106 was selected to have its transaction transmitted. Flow returns to block 804 .
Referring now to FIG. 6, a block diagram illustrating the transaction scheduler 602 of FIG. 3 including round-robin logic 712 of FIG. 4 according to one embodiment of the present invention is shown. FIG. 6 comprises FIGS. 6A and 6B.
FIG. 6A illustrates the round-robin logic 712 of FIG. 4 according to one embodiment of the present invention. The round-robin logic 712 includes four round-robin generators 1606 : one for each of the four PM_Q_priority levels 652 . Each of the round-robin generators 1606 receives an E vector 1646 . The E vector 1646 is an n-bit vector, where n is the number of transaction queues 106 and each of the transaction queues 106 has a corresponding bit in the E vector 1646 . A set bit in the E vector 1646 indicates that the corresponding transaction queue 106 is enabled for transaction transmitting. In one embodiment, the E vector 1646 bits are the transmittable bits 746 of FIG. 4.
Each of the round-robin generators 1606 also receives an L vector 1602 that is unique to the corresponding PM_Q_priority 652 . That is, there is an L vector 1602 for each of the four PM_Q_priority 652 levels. The L vectors 1602 are also n-bit vectors, where n is the number of transaction queues 106 and each of the transaction queues 106 has a corresponding bit in each of the four L vectors 1602 . A set bit in an L vector 1602 indicates that the corresponding transaction queue 106 was the last transaction queue 106 at the corresponding PM_Q_priority 652 actually selected for transaction transmitting by the transaction scheduler 602 . Thus, for example, if the number of transaction queues 106 is eight, an L vector 1602 value of 00000100 for PM_Q_priority 652 level 1 indicates transaction queue 2 106 was the last transaction queue 106 transmitted at PM_Q_priority 652 level 1 . In one embodiment, the L vector 1602 is generated by the transaction selection logic 202 and stored for provision to the round-robin logic 712 . In one embodiment, each L vector 1602 is updated only when the transaction scheduler 602 selects for transmission a transaction from a transaction queue 106 at the corresponding PM_Q_priority 652 . Thus, advantageously, the L vector 1602 is maintained for each PM_Q_priority 652 level so that round-robin fairness is accomplished at each PM_Q_priority 652 level independent of the other PM_Q_priority 652 levels.
Each of the round-robin generators 1606 generates an N vector 1604 that is unique to the corresponding PM_Q_priority 652 . The N vectors 1604 are also n-bit vectors, where n is the number of transaction queues 106 and each of the transaction queues 106 has a corresponding bit in each of the four N vectors 1604 . A set bit in an N vector 1604 indicates that the corresponding transaction queue 106 is the next transaction queue 106 in round-robin order to be selected at the corresponding PM_Q_priority 652 .
The round-robin logic 712 includes n four-input muxes 1608 : one for each of the n transaction queues 106 . Each mux 1608 receives its corresponding bit from each of the four N vectors 1604 . That is, the mux 1608 for transaction queue 0 106 receives bit 0 from each of the N vectors 1604 ; the mux 1608 for transaction queue 1 106 receives bit 1 from each of the N vectors 1604 ; and so forth, to the mux 1608 for transaction queue 106 n−1 that receives bit n−1 from each of the N vectors 1604 . Each mux 1608 also receives as a select control input the PM_Q_priority 652 value for its respective transaction queue 106 . Each of the muxes 1608 selects the input specified by the PM_Q_priority 652 value. The output of each of the muxes 1608 is the corresponding round-robin bit 748 of FIG. 4. The round-robin bits 748 are provided to the selection logic 202 of FIG. 6B.
Referring now to FIG. 6B, the round-robin bit 748 of each transaction queue 106 is combined with its corresponding PM_Q_priority 652 bits and transmittable bit 746 to form its corresponding TS_Q_priority 208 of FIG. 4. FIG. 6B also includes the selection logic 202 of FIG. 4. In one embodiment, the comparators 714 of FIG. 4 are greater-than-or-equal (GTE) comparators. That is, the GTE comparators 714 compare the two TS_Q_priority 208 input values and if the top value is greater-than-or-equal to the bottom value, the GTE comparator 714 outputs a control signal to cause its respective mux 724 to select the top value. The selection logic 202 is configured such that the top value always corresponds to a lower enumerated transaction queue 106 , i.e., a transaction queue 106 which has a bit in the L vectors 1602 , N vectors 1604 , and E vector 1646 that is more to the right, i.e., a less significant bit, than the bottom value. Thus, for example, in FIG. 6B, one of the comparators 714 receives the TS_Q_priority 208 for transaction queue 0 106 and transaction queue 1 106 ; if the TS_Q_priority 208 for transaction queue 0 106 is greater than or equal to the TS_Q_priority 208 for transaction queue 1 106 , then the comparator 714 will control its mux 724 to select the transaction 206 and TS_Q_priority 208 for transaction queue 0 106 ; otherwise (i.e., only if the TS_Q_priority 208 for transaction queue 0 106 is less than the TS_Q_priority 208 for transaction queue 1 106 ), the comparator 714 will control its mux 724 to select the transaction 206 and TS_Q_priority 208 for transaction queue 1 106 .
Referring now to FIG. 7, a block diagram illustrating a round-robin generator 1606 of FIG. 6 according to one embodiment of the present invention is shown. Although only one round-robin generator 1606 is shown in FIG. 7, the transaction scheduler 602 comprises one round-robin generator 1606 for each PM_Q_priority 652 , as shown in FIG. 6A.
The round-robin generator 1606 includes a first set of inverters 1718 that receive the L vector 1602 of FIG. 6 and generate an n-bit ˜L vector 1792 . The round-robin generator 1606 also includes a second set of inverters 1716 that receive the E vector 1646 of FIG. 6 and generate an n-bit ˜E vector 1796 .
The round-robin generator 1606 also includes a barrel-incrementer 1712 that receives the L vector 1602 , the ˜L vector 1792 , and the ˜E vector 1796 . The barrel-incrementer 1712 generates an S vector 1704 , which is the sum of the L vector 1602 rotated left 1-bit and the Boolean AND of the ˜E vector 1796 and the ˜L vector 1792 , according to two embodiments, as described in more detail below with respect to FIGS. 8A and 8B. In two other embodiments, the barrel-incrementer 1712 generates an S vector 1704 , which is the sum of the L vector 1602 rotated left 1-bit and the ˜E vector 1796 , as described in more detail below with respect to FIGS. 8C and 8D.
The round-robin generator 1606 also includes a set of AND gates 1714 that perform the Boolean AND of the S vector 1704 and the E vector 1646 to generate the N vector 1604 of FIG. 6.
Referring now to FIG. 8A, a block diagram illustrating the barrel-incrementer 1712 of FIG. 7 according to one embodiment of the present invention is shown. The barrel-incrementer 1712 includes a plurality of full-adders 1802 coupled in series. In the embodiment illustrated in FIG. 8A, the full-adders 1802 are 1-bit full-adders, and the number of 1-bit full-adders 1802 is n, where n is the number of transaction queues 106 . However, the barrel-incrementer 1712 may be incremented with fewer full-adders capable of adding larger addends, depending upon the number of transaction queues 106 and speed and power requirements.
In the barrel-incrementer 1712 of FIG. 8A, each full-adder 1802 receives two addend bits and a carry-in bit and generates a corresponding sum bit of the S vector 1704 and a carry-out bit. Each full-adder 1802 receives as its carry-in the carry-out of the full-adder 1802 rotatively to its right. Thus, the right-most full-adder 1802 receives as its carry-in the carry-out of the left-most full-adder 1802 . The first addend input to each of the full-adders 1802 is the Boolean AND of the corresponding ˜E vector 1796 and ˜L vector 1792 bits. The second addend input to each of the full-adders 1802 is the 1-bit left rotated version of the corresponding L vector 1602 bit. In the embodiment of FIG. 8A, the ˜E vector 1796 is Boolean ANDed with the ˜L vector 1792 to guarantee that at least one bit of the first addend to the full adders 1802 is clear. This prevents the single set increment bit of the second addend (the 1-bit left rotated L vector 1602 ) from infinitely rippling around the ring of full-adders 1802 of the barrel-incrementer 1712 . As may be observed from FIG. 8A, the apparatus is aptly referred to as a “barrel-incrementer” because it increments one addend, namely the ˜E vector 1796 (modified to guarantee at least one clear bit), by a single set bit in a left-rotative manner; furthermore, the single increment bit may increment the addend at any position in the addend.
By rotating left 1-bit the single set bit L vector 1602 , the single set bit will be in the bit position with respect to the full-adders 1802 corresponding to the next transaction queue 106 1-bit rotatively left of the last transaction queue 106 at the corresponding PM_Q_priority 652 for which the transaction scheduler 602 transmitted a transaction. By using the ˜E vector 1796 as the first addend input, the first addend has a set bit in each transaction queue 106 position that is not enabled and a clear bit in each transaction queue 106 position that is enabled. Consequently, the single set bit of the 1-bit left-rotated L vector 1602 addend will rotatively ripple left from its bit position until it reaches a clear bit position, i.e., a bit position of a transaction queue 106 that is enabled. This is illustrated by the example here, in which only transaction queues 1 and 3 are enabled, and transaction queue 3 106 was the last transmitted transaction queue 106 at the PM_Q_priority 652 :
However, if no transaction queues 106 are enabled, the single set bit of the 1-bit left-rotated L vector 1602 addend will ripple left from its bit position until it returns where it started and stop there, as shown here:
Further, if the single set bit of the 1-bit left-rotated L vector 1602 addend is clear in the ˜E vector 1796 , such as bit 4 here below, then bit 4 of the S vector 1704 will be set and the rotated L vector 1602 set bit will not ripple any further:
Furthermore, the AND gate 1714 of FIG. 7 functions to guarantee that only one bit of the N vector 1604 is set. A bit vector in which only one bit is set is commonly referred to as a 1-hot, or one-hot, vector. For example, in the last example above, even though the S vector 1704 has multiple bits set, the AND gate 1714 generates a resulting N vector 1604 with a single set bit, as here:
Generally, the barrel-incrementer 1712 of FIG. 8A may be described by the following equation:
{ C out. i , Sum. i}=A.i+B.i+C in. i,
where A.i is one of the n bits of the ˜E vector 1796 Boolean ANDed with the corresponding bit of the ˜L vector 1792 , B.i is a 1-bit left rotated corresponding one of the n bits of the L vector 1602 , Sum.i is a binary sum of (A.i+B.i+Cin.i), Cout.i is the carry out of (A.i+B.i+Cin.i), Cin.i=Cout.i−1, and Cin.0=Cout.n−1.
As may be observed from the foregoing, an advantage of the round-robin generator 1606 of FIG. 7 employing the barrel-incrementer 1712 of FIG. 8A is that its complexity is n, where n is the number of transaction queues 106 , rather than n 2 , as the conventional round-robin circuit. That is, the round-robin generator 1606 built around the barrel-incrementer 1712 of FIG. 8A scales linearly with the number of transaction queues 106 . The same is true of the barrel-incrementer 1712 of FIGS. 8B-8D below.
Referring now to FIG. 8B, a block diagram illustrating the barrel-incrementer 1712 of FIG. 7 according to an alternate embodiment of the present invention is shown. The barrel-incrementer 1712 of FIG. 8B is an optimized version of the barrel-incrementer 1712 of FIG. 8A in which the full-adders 1802 are replaced with the combination of a half-adder 1812 and an OR gate 1814 . The half-adder 1812 receives as its carry-in the output of the OR gate 1814 . The OR gate 1814 receives as its two inputs the carry-out of the half-adder 1812 to its right and the corresponding 1-bit left-rotated L vector 1602 bit. Thus, collectively, the half-adder 1812 and OR gate 1814 combination performs the same function as the full-adder 1802 of the barrel-incrementer 1712 of FIG. 8A. The optimization of replacing the full-adder 1802 will a half-adder 1812 and OR gate 1814 is possible due to the fact that it is known that only one of the inputs to the OR gate 1814 , if at all, will be true. That is, only one of the L vector 1602 input bit or the carry-out of the half-adder 1812 to the right will be true. An advantage of the barrel-incrementer 1712 of FIG. 8B is that it may be smaller and consume less power than the barrel-incrementer 1712 of FIG. 8A since it is optimized to take advantage of the fact that only one of the inputs to the OR gate 1814 will be true.
Generally, the barrel-incrementer 1712 of FIG. 8B may be described by the following equation:
{ C out. i , Sum. i}=A.i+ ( B.i OR C in. i ),
where A.i is one of the n bits of the ˜E vector 1796 Boolean ANDed with the corresponding bit of the ˜L vector 1792 , B.i is a 1-bit left rotated corresponding one of the n bits of the L vector 1602 , Sum.i is a binary sum of A.i+(B.i OR Cin.i), Cout.i is the carry out of A.i+(B.i OR Cin.i), Cin.i=Cout.i−1, and Cin.0=Cout.n−1.
Because the embodiments of the barrel-incrementers 1712 of FIGS. 8A and 8B comprise a ring of adders in series, some automated logic synthesis tools may have difficulty synthesizing the circuit. In particular, they may generate a timing loop. To alleviate this problem, the embodiments of FIGS. 8C and 8D break the ring of adders by employing two rows of adders, as will now be described.
Referring now to FIG. 8C, a block diagram illustrating the barrel-incrementer 1712 of FIG. 7 according to an alternate embodiment of the present invention is shown. The embodiment of FIG. 8C employs a first row of full-adders 1822 and a second row of full-adders 1824 coupled in series, but not in a ring. That is, the carry-out of the left-most full-adder 1824 of the second row is not provided to the carry-in of the right-most full-adder 1822 of the first row. Rather, the first row of full-adders 1822 is coupled in series, and receives the same inputs as the full-adders 1802 of FIG. 8A; however, a binary zero value is provided to the carry-in of the right-most full-adder 1822 of the first row, the carry-out of the left-most full-adder 1822 of the first row is provided as the carry in the of the right-most full-adder 1824 of the second row, and the carry-out of the left-most full-adder 1824 of the second row is discarded. Furthermore, the sum output of the first row full-adders 1822 , referred to as intermediate n-bit sum S′ in FIG. 8C, is provided as the first addend input to the second row full-adders 1824 . Still further, the second addend input to the second row full-adders 1824 is a binary zero, except for the right-most second row full-adder 1824 , which receives the left-most bit of the L vector 1602 . The second row of full-adders 1824 generates the S vector 1704 . As may be observed, advantageously, the barrel-incrementer 1712 of FIG. 8C does not include a ring and therefore may be synthesized more successfully by some synthesis software tools than the embodiments of FIGS. 8A and 8B. However, a disadvantage of the barrel-incrementer 1712 of FIG. 8C is that it is larger than the embodiments of FIGS. 8A and 18B, and consumes more power, although its complexity is advantageously still n, rather than n 2 . It is also noted that the embodiments of FIGS. 8C and 8D do not need the ˜L vector 1792 input since there is not a ring of adders for the single increment bit of the second addend (i.e., the L vector 1602 ) to infinitely ripple around.
Referring now to FIG. 8D, a block diagram illustrating the barrel-incrementer 1712 of FIG. 7 according to an alternate embodiment of the present invention is shown. The barrel-incrementer 1712 of FIG. 8D is an optimized version of the barrel-incrementer 1712 of FIG. 8C in which each of the first row of full-adders 1822 is replaced with the combination of a half-adder 1832 and an OR gate 1834 , similar to the embodiment of FIG. 8B; and, each of the second row full-adders 1824 is replaced with a half-adder 1836 . Additionally, the second row includes a single OR gate 1838 that receives the left-most bit of the L vector 1602 and the carry-out of the left-most half-adder 1832 of the first row; the OR gate 1838 provides its output to the carry-in of the right-most half-adder 1836 of the second row. Thus, the barrel-incrementer 1712 of FIG. 8D enjoys the optimization benefits of the barrel-incrementer 1712 of FIG. 8B and the synthesis tool benefits of the barrel-incrementer 1712 of FIG. 8C.
Referring now to FIG. 9A, a block diagram illustrating an example of operation of the transaction scheduler 602 employing the round-robin generators 1606 of FIG. 6 according the present invention is shown. FIG. 9A includes collectively the round-robin generators 1606 and muxes 1608 of FIG. 6A. In the example, the number of transaction queues 106 (denoted n) is 5, and the transaction queues 106 are denoted 0 through 4 . In the example, the number of PM_Q_priority 652 levels is 4, denoted 0 through 3 .
In the example of FIG. 9A, all bits of the E vector 1646 are set, i.e., all transaction queues 106 are enabled for transmitting a transaction; all of the transaction queues 106 are at PM_Q_priority 652 level 3 ; the L vector 1602 for PM_Q_priority 652 level 3 is 00001, indicating the last transaction queue 106 from which the transaction scheduler 602 transmitted a transaction at PM_Q_priority 652 level 3 was transaction queue 0 106 . The L vector 1602 for PM_Q_priority 652 levels 2 , 1 , and 0 , are 00100, 10000, and 00001, respectively.
Given the inputs just described, the round-robin generators 1606 generate an N vector 1604 for PM_Q_priority 652 level 3 with a value of 00010, indicating that transaction queue 1 106 is selected as the next transaction queue 106 in round-robin order for transmission at PM_Q_priority 652 level 3 . Transaction queue 1 106 is selected since it is the first transaction queue 106 rotatively left of transaction queue 0 106 that is enabled, as indicated by a set bit in the E vector 1646 . The round-robin generators 1606 generate an N vector 1604 value of 01000, 00001, and 00010 for PM_Q_priority 652 levels 2 , 1 , and 0 , respectively.
Because each of the transaction queues 106 are at PM_Q_priority 652 level 3 , the corresponding mux 1608 for each transaction queue 106 selects the corresponding bit of the N vector 1604 of PM_Q_priority 652 level 3 . Consequently, the round-robin bit 748 for transaction queue 0 106 (denoted R[ 0 ] in FIG. 9A) is 0; the round-robin bit 748 for transaction queue 1 106 is 1; the round-robin bit 748 for transaction queue 2 106 is 0; the round-robin bit 748 for transaction queue 3 106 is 0; and the round-robin bit 748 for transaction queue 4 106 is 0. Therefore, the resulting TS_Q_priority 208 for transaction queues 106 0 through 4 are: 1110, 1111, 1110, 1110, and 1110, respectively. Consequently, the selection logic 202 selects transaction queue 1 106 for transaction transmission because it has the greatest TS_Q_priority 208 . It is noted that although all the transaction queues 106 are enabled and all are at the same PM_Q_priority 652 , transaction queue 1 106 is selected because it is the next transaction queue 106 in left-rotative round-robin order from the last selected transaction queue 106 (which was transaction queue 0 106 ) at the highest enabled PM_Q_priority 652 level.
Referring now to FIG. 9B, a block diagram illustrating a second example of operation of the transaction scheduler 602 employing the round-robin generators 1606 of FIG. 6 according the present invention is shown. FIG. 9B is similar to FIG. 9A; however, the input conditions are different. In the example of FIG. 9B, the E vector 1646 value is 01011, i.e., only transaction queues 0 , 1 , and 3 are enabled for transmitting a transaction; transaction queues 2 and 4 are at PM_Q_priority 652 level 3 , transaction queues 1 and 3 are at PM_Q_priority 652 level 2 , and transaction queue 0 106 is at PM_Q_priority 652 level 1 ; the L vector 1602 for PM_Q_priority 652 levels 3 through 0 are 01000, 00010, 10000, 00010, indicating the last transaction queue 106 from which the transaction scheduler 602 transmitted a transaction at PM_Q_priority 652 levels 3 through 0 are 3, 1, 4, and 1, respectively.
Given the inputs just described, the round-robin generators 1606 generate an N vector 1604 for PM_Q_priority 652 levels 3 through 0 with a value of 00001, 01000, 00001, and 01000, respectively, indicating that transaction queues 0 , 3 , 0 , and 3 , respectively, are selected as the next transaction queue 106 in round-robin order for transmission within PM_Q_priority 652 levels 3 through 0 , respectively. It is noted that transaction queue 4 106 is skipped over in the PM_Q_priority 652 level 3 N vector 1604 since transaction queue 4 106 is not enabled, even though transaction queue 4 106 is the next transaction queue 106 rotatively-left of transaction queue 3 106 , which was the last selected transaction queue 106 at PM_Q_priority 652 level 3 ; similarly, transaction queue 2 106 is skipped over in PM_Q_priority 652 levels 2 and 0 since transaction queue 2 106 is not enabled.
Because transaction queues 2 and 4 are at PM_Q_priority 652 level 3 , the corresponding muxes 1608 select the corresponding bit of the N vector 1604 of PM_Q_priority 652 level 3 ; because transaction queues 1 and 3 are at PM_Q_priority 652 level 2 , the corresponding muxes 1608 select the corresponding bit of the N vector 1604 of PM_Q_priority 652 level 2 ; because transaction queue 0 is at PM_Q_priority 652 level 1 , the corresponding mux 1608 selects the corresponding bit of the N vector 1604 of PM_Q_priority 652 level 1 . Consequently, the round-robin bit 748 for transaction queues 0 through 4 are 1, 0, 0, 1, and 0, respectively. Therefore, the resulting TS_Q_priority 208 for transaction queues 0 through 4 are: 1011, 1100, 0110, 1101, and 0110, respectively. Consequently, the selection logic 202 selects transaction queue 3 106 for transaction transmission because it has the greatest TS_Q_priority 208 . It is noted that although transaction queue 1 106 is also enabled and at the highest PM_Q_priority 652 that is enabled (PM_Q_priority 652 level 2 ), transaction queue 3 106 is selected because the bit corresponding to transaction queue 3 106 in the N vector 1604 for PM_Q_priority 652 level 2 is set (hence the round-robin bit 748 for transaction queue 3 106 is set) and the bit corresponding to transaction queue 1 106 is clear (hence the round-robin bit 748 for transaction queue 1 106 is clear).
Referring now to FIG. 10, a block diagram illustrating the transaction scheduler 602 of FIG. 3 including round-robin logic 712 of FIG. 4 according to an alternate embodiment of the present invention is shown. The transaction scheduler 602 of FIG. 10 is similar to the transaction scheduler 602 of FIG. 6, except that the round-robin generators 2006 of FIG. 10 are different from the round-robin generators 1606 of FIG. 6, as described below with respect to FIGS. 11 and 12. The portion of the transaction scheduler 602 shown in FIG. 6B is similar to a like portion of the alternate embodiment of FIG. 10, and is therefore not duplicated in the Figures.
In one aspect, the round-robin generators 2006 of FIG. 10 are different from the round-robin generators 1606 of FIG. 6 because they do not receive the E vector 1646 . In another aspect, the round-robin generators 2006 each generate a corresponding NSE vector 2004 , rather than the N vector 1604 generated by the round-robin generators 1606 of FIG. 6. The NSE vectors 2004 are similar to the N vectors 1604 , however, the NSE vectors 2004 are sign-extended; thus, the NSE vectors 2004 are not 1-hot. Consequently, by design, two or more transaction queues 106 may have an equal highest TS_Q_priority 208 . The greater-than-or-equal comparators 714 of FIG. 6B work in conjunction with the round-robin bits 748 selected from the NSE vectors 2004 to select the desired round-robin transaction queue 106 in the highest enabled PM_Q_priority 652 , as described below. For example, assume the NSE vector 2004 at one of the PM_Q_priority 652 levels is 11100. This value indicates that transaction queues 4 , 3 , and 2 have priority over transaction queues 1 and 0 with respect to round-robin order selection. If, for example, all of the transaction queues 106 are at this PM_Q_priority 652 level, the GTE comparators 714 of the transaction scheduler 602 will search for a transmittable transaction queue 106 in the order 2 , 3 , 4 , 0 , 1 .
Referring now to FIG. 11, a block diagram illustrating the round-robin generator 2006 of FIG. 10 according to one embodiment of the present invention is shown. Although only one round-robin generator 2006 is shown in FIG. 11, the transaction scheduler 602 comprises one round-robin generator 2006 for each PM_Q_priority 652 , as shown in FIG. 10. An advantage of the alternate embodiment of the round-robin generator 2006 of FIG. 11 that employs the sign-extended character of the NSE vector 2004 is that the NSE vectors 2004 may be calculated independent of the E vector 1646 , i.e., independent of the transaction transmitability of the transaction queues 106 , unlike the round-robin generator 1606 embodiment of FIG. 7.
The round-robin generator 2006 includes a mux 2102 that receives as its two inputs the L vector 1602 and the output of a register 2124 . The register 2124 receives and stores the output of the mux 2102 . The mux 2102 also receives a transaction_transmitted control signal 2158 that is true if a transaction is transmitted from the corresponding PM_Q_priority 652 during the current transmission cycle; otherwise, the transaction_transmitted control signal 2158 is false. In one embodiment, the transaction_transmitted signal 2158 may be false for all PM_Q_priority 652 levels, such as if no transaction queues 106 have a transmittable transaction or if the external device connected to the port 102 is currently unable to receive transactions. The mux 2102 selects the L vector 1602 input if the transaction_transmitted control signal 2158 is true; otherwise, the mux 2102 selects the register 2124 output. Thus, mux 2102 and register 2124 work in combination to retain the old L vector 1602 value until a transaction is transmitted by the transaction scheduler 602 at the corresponding PM_Q_priority 652 level. Thus, advantageously, round-robin order is retained within the PM_Q_priority 652 level independent of the other PM_Q_priority 652 levels.
The round-robin generator 2006 also includes a rotate left 1-bit function 2106 configured to receive and rotate the output of the register 2124 left 1-bit. Hence, the output of the rotate left 1-bit function 2106 is a 1-hot vector pointing to the transaction queue 106 rotatively-left of the last transmitted transaction queue 106 bit. For example, if n is 8, and if the L vector 1602 value is 10000000, then the output of the rotate left 1-bit function 2106 is 00000001.
The round-robin generator 2006 also includes a sign-extender 2108 configured to receive the output of the rotate left 1-bit function 2106 and to sign-extend it to generate the NSE vector 2004 of FIG. 10. For example, if the L vector 1602 value is 00000100, then the output of the sign-extender 2108 is 11111000. In one embodiment, the rotate left 1-bit function 2106 does not include any active logic, but simply comprises signal wires routed appropriately from the register 2124 output to the sign-extender 2108 input to accomplish the 1-bit left rotation.
Referring now to FIG. 12A, a block diagram illustrating a first example of operation of the transaction scheduler 602 having round-robin generators 2006 of FIG. 10 according the present invention is shown. FIG. 12A is similar to FIG. 9A; however, FIG. 12A illustrates collectively the round-robin generators 2006 of FIG. 10 , rather than the round-robin generators 1606 of FIG. 6. Additionally, the L vector 1602 input for PM_Q_priority 652 level 3 is 00010, rather than 00001. Finally, the round-robin generators 2006 do not receive the E vector 1646 .
Given the inputs of FIG. 12A, the round-robin generators 2006 generate an NSE vector 2004 for PM_Q_priority 652 level 3 with a value of 11100, indicating that transaction queue 2 106 is selected as the next transaction queue 106 in round-robin order for transmission at PM_Q_priority 652 level 3 . Transaction queue 2 106 is selected since it is the first transaction queue 106 rotatively left of transaction queue 1 106 . The round-robin generators 2006 generate an NSE vector 2004 value of 11000, 11111, and 11110 for PM_Q_priority 652 levels 2 , 1 , and 0 , respectively.
Because each of the transaction queues 106 are at PM_Q_priority 652 level 3 , the corresponding mux 1608 for each transaction queue 106 selects the corresponding bit of the N vector 2004 of PM_Q_priority 652 level 3 . Consequently, the round-robin bit 748 for transaction queue 0 106 is 0; the round-robin bit 748 for transaction queue 1 106 is 0; the round-robin bit 748 for transaction queue 2 106 is 1; the round-robin bit 748 for transaction queue 3 106 is 1; and the round-robin bit 748 for transaction queue 4 106 is 1. Therefore, the resulting TS_Q_priority 208 for transaction queues 106 0 through 4 are: 1110, 1110, 1111, 1111, and 1111, respectively. Consequently, the selection logic 202 selects transaction queue 2 106 for transaction transmission because it has the greatest or equal TS_Q_priority 208 . More specifically, transaction queue 2 106 is the highest transaction queue 106 in the transaction selection logic 202 mux tree (i.e., it has the right-most bit in the NSE vector 2004 ) that has the greatest or equal TS_Q_priority 208 . It is noted that although all transaction queues 106 are enabled and all are at the same PM_Q_priority 652 , transaction queue 2 106 is selected because it is the next transaction queue 106 in left-rotative round-robin order from the last selected transaction queue 106 (which was transaction queue 1 106 ) at the highest enabled PM_Q_priority 652 level.
Referring now to FIG. 12B, a block diagram illustrating a second example of operation of the transaction scheduler 602 employing the round-robin generators 2006 of FIG. 10 according the present invention is shown. FIG. 12B is similar to FIG. 12A; however, the input conditions are different. In the example of FIG. 12B, the E vector 1646 value is 11011, i.e., transaction queue 2 106 is disabled for transmitting a transaction.
Given the inputs just described, the round-robin generators 2006 generate an NSE vector 2004 for PM_Q_priority 652 levels 3 through 0 with a value of 11100, 11000, 11111, and 11110, respectively, indicating that transaction queues 2 , 3 , 0 , and 1 , respectively, are the next transaction queue 106 in round-robin order for transmission within PM_Q_priority 652 levels 3 through 0 , respectively.
Because all the transaction queues 106 are at PM_Q_priority 652 level 3 , the corresponding muxes 1608 select the corresponding bit of the NSE vector 2004 of PM_Q_priority 652 level 3 . Consequently, the round-robin bit 748 for transaction queues 0 through 4 are 0, 0, 1, 1, and 1, respectively. Therefore, the resulting TS_Q_priority 208 for transaction queues 0 through 4 are: 1110, 1110, 0111, 1111, and 1111, respectively. Consequently, the selection logic 202 selects transaction queue 3 106 for transaction transmission because it is the highest transaction queue 106 in the transaction selection logic 202 mu