Plaque It!
Sponsored by: Flash of Genius |
| 4984192 | Programmable state machines connectable in a reconfiguration switching network for performing real-time data processing | January, 1991 | Flynn | 711/104 |
| 5359536 | Programmable gate array with improved interconnect structure, input/output structure and configurable logic block | October, 1994 | Agrawal et al. | 716/16 |
| 5457410 | Architecture and interconnect scheme for programmable logic circuits | October, 1995 | Ting | 326/41 |
| 5477475 | Method for emulating a circuit design using an electrically reconfigurable hardware emulation apparatus | December, 1995 | Sample et al. | 716/17 |
| 5526278 | Method and apparatus for converting field-programmable gate array implementations into mask-programmable logic cell implementations | June, 1996 | Powell | 716/16 |
| 5543640 | Logical three dimensional interconnections between integrated circuit chips using a two dimensional multi-chip module | August, 1996 | Sutherland et al. | 257/203 |
| 5621650 | Programmable logic device with internal time-constant multiplexing of signals from external interconnect buses | April, 1997 | Agrawal et al. | 716/16 |
| 5778439 | Programmable logic device with hierarchical confiquration and state storage | July, 1998 | Trimberger et al. | 711/117 |
| 5815715 | Method for designing a product having hardware and software components and product therefor | September, 1998 | Kucukcakar | 717/100 |
| 5815726 | Coarse-grained look-up table architecture | September, 1998 | Cliff | 712/1 |
| 5821773 | Look-up table based logic element with complete permutability of the inputs to the secondary signals | October, 1998 | Norman et al. | 326/16 |
| 5894228 | Tristate structures for programmable logic devices | April, 1999 | Reddy et al. | 326/41 |
| 5909450 | Tool to reconfigure pin connections between a dut and a tester | June, 1999 | Wright | 714/724 |
| 5911059 | Method and apparatus for testing software | June, 1999 | Profit, Jr. | 703/20 |
| 5991907 | Method for testing field programmable gate arrays | November, 1999 | Stroud et al. | 716/16 |
| 6018490 | Programmable logic array integrated circuits | January, 2000 | Cliff et al. | 365/189.02 |
| 6020759 | Programmable logic array device with random access memory configurable as product terms | February, 2000 | Heile | 326/37 |
| 6026230 | Memory simulation system and method | February, 2000 | Lin et al. | 703/13 |
| 6028809 | Programmable logic device incorporating a tristateable logic array block | February, 2000 | Schleicher et al. | 365/230.08 |
| 6031391 | Configuration memory integrated circuit | February, 2000 | Couts-Martin et al. | 326/40 |
| 6034536 | Redundancy circuitry for logic circuits | March, 2000 | McClintock et al. | 326/38 |
| 6058452 | Memory cells configurable as CAM or RAM in programmable logic devices | May, 2000 | Rangasayee et al. | 365/49 |
| 6058492 | Method and apparatus for design verification using emulation and simulation | May, 2000 | Sample et al. | 716/5 |
| 6069489 | FPGA having fast configuration memory data readback | May, 2000 | Iwanczuk et al. | 326/38 |
| 6078736 | Method of designing FPGAs for dynamically reconfigurable computing | June, 2000 | Guccione | 716/17 |
| 6085317 | Reconfigurable computer architecture using programmable logic devices | July, 2000 | Smith | 713/100 |
| 6091258 | Redundancy circuitry for logic circuits | July, 2000 | McClintock et al. | 326/38 |
| 6097211 | Configuration memory integrated circuit | August, 2000 | Couts-Martin et al. | 326/46 |
| GB2315583 | February, 1998 | |||
| GB2321989 | August, 1998 | |||
| WO/1999/052049 | October, 1999 | METHOD OF DESIGNING A CONSTRAINT-DRIVEN INTEGRATED CIRCUIT LAYOUT |
This application claims the benefit of the U.S. provisional application No. 60/167,684 filed Nov. 29, 1999 and the international application PCT/IL00/00797 filed Nov. 28, 2000.
This invention relates to circuit design and testing and device architecture.
Known design and manufacturing processes of integrated circuits (ICs) and modules containing ICs including implementation using DSPs or Application Specific Integrated Circuits (ASIC) require lengthy development cycles and are expensive. In particular, the time required to market integrated circuits is long owing to the length of the development period requiring protracted design, verification and testing of the application. During verification and testing of the product, failures in the design may be detected requiring debugging and repair, this greatly adds to the development cost and time, particularly when design failures are detected at the end of the process, for example after product delivery.
Hardware products are expensive since the time to market influences engineering costs, market loss, and so on. Costs must also bear the overhead of testing equipment used during development, the expense of testing equipment used during manufacturing, inventory size and the costs relating to employment of professional engineers.
Simulation accuracy is poor. One simulation is required to simulate the desired functionality of the application, while another is required to simulate the run-time operation of the implementation. If simulation of high-resolution delays is required, simulation becomes cumbersome. Moreover, simulations are very slow, often hundreds of thousands times slower than real-time.
Yet another drawback of known design and development processes is that the size of prototypes and non-ASIC electronic cards may exceed practical dimensions. The ability to test high-end System on a Chip products is limited owing to the very high density of integrated circuits, making it very difficult to probe points of interest in the circuit. It is likewise difficult to test finished products containing high-density integrated circuits, and to verify the complete functionality of complex circuits.
The adaptation of already designed products to advanced chip production technologies so as to ensure the compatibility of applications to newer technologies, is poor. This impacts not only on the application developer but also makes it difficult for the chip manufacturer to employ new geometries whilst enabling developers to use their existing applications.
ASICs are used to implement an application using a smaller area, where the application will be sold in sufficient quantity to justify customization. Typically, the application is first developed using conventional design methods and, after establishing the integrity of the design, it is converted to an ASIC. This is a time-consuming and expensive process.
Typically, application speed is enhanced by implementing the application in hardware at then expense of functional flexibility since most hardware implementations are dedicated to a specific application and are not amenable to extension or changes.
Yet another drawback associated with the industry is the relative scarcity of qualified personnel and the difficulty in breaking down the design and development so as to be amenable to sharing amongst several engineers in order that the development cycle may be reduced.
All these limitations combine to increase both the length and the cost of the development and manufacturing cycle. In order to demonstrate the complexity and effort associated with conventional design and manufacturing processes for implementing logic integrated circuits or modules including them, various implementation methods will now be described.
FIG. 1 is a flow diagram showing the conventional process for hardware implementation of Integrated Circuit (IC) chips on electronic cards. During the card development process 7a card is developed for performing a needed application. After this process, the card is ready for the production activities. The idea and concept relating to the application are defined during the application definition 10, after which there follows the interface definition 12. Sometimes the interface is determined by the system environment; for example in a PC application card. Sometimes the firm that made the card can choose the interface; for example in a rack full of cards, most of the cards will have the same interface. Sometimes the interface is unique and has to be designed in the same process as the rest of the card.
Technology selection 16 is influenced by real-time needs, flexibility, size and the number of tasks. Hardware implementation is selected for an application owing to its superior real-time performance. Hardware implementation is the fastest (real-time) solution, but it can do only a limited number of tasks, is not flexible and is large. Time to market is lengthy. The process from development to manufacturing may take months to more than a year, nine months being considered a very good result. The development and manufacturing processes are expensive.
The space available for the application is an important parameter for the technology selection process. Normally, the smaller the better. If size is critical, the card can be converted into an ASIC in which case the development period is extended by several months owing to the need for ASIC conversion. Nine months is a normal extension time. In practice, ASIC conversion suffers from all the problems mentioned above.
In most cases, particularly for complex high speed, digital hardware design, simulation 18 of the application is required. Sometimes the simulation is considered part of the application definition process, and is not counted as part of the hardware development process time. The simulation process period may last from days to a few months, depending on the complexity of the application. The purpose of the simulation is to verify that the idea and the implementation considered are feasible. For example: if there is an idea to compress voice over communication links, the compression algorithm will be made using a high level language such as “C++” and other software tools such as “Matlab” (of Mathworks, Natick, Mass.). There is no simple association between the simulation and the implementation. Sometimes, more than one simulation is made and workstations may be needed to increase the simulation speed.
The interface must then be designed 20. If the interface is standard, for example a PCI interface in a PC card, in most cases there is no need to redesign it and an “off the shelf” ready-made chip set may be used in the implementation. If the interface is proprietary, the hardware developer might prefer to separate the development of the interface from the development of the application so as to be able to use the interface for other applications as well, such as the interface for a few cards in the same rack. In such a case, the time to market would not be influenced by the time needed to develop and implement the interface. Only when the interface is unique, does it become part of overall implementation development.
Electronic design 22 is the procedure that converts the solution into a set of electronic Integrated Circuit devices. The developer keeps in mind a library of such devices, which can perform certain functions and chooses the needed ICs to be connected together in order to implement the application. The electronic design process is complicated, as the engineer has to remember a lot of different components and the way to use them. As the technology improves, new components with complicated functions are added. Sometimes the electronics engineer is left behind. High-Level Design languages have been implemented, but still most of the designing has to be done in modular fashion. The more complicated the application, the more time is needed for the design.
Drawings are the interface between the application engineer and the computerized tools used to manufacture the card. This is the “language” the engineer uses to “write” his implementation ideas. The drawing process 24 converts the design into schematic drawings. The more complicated the application, the more time is needed for the drawings. If a programmable device is used, High-level Design Language (HDL) may be used to replace part of drawings.
When implementation is not trivial, simulation 25 of the design is done and timing is checked. The designer tries to correlate between the result of the application simulation 18 and the current simulation. If any mistake is found, the design is modified, requiring steps 22 and 24 to be repeated.
After the design has been finished, the components 26 are obtained. It sometimes takes quite a long time to purchase a specific device, thereby delaying prototype production. The more complex the application, the more components are used, the longer is the time to production and the bigger is the inventory of components.
Layout 28 is the process that converts the drawn design into a manufactured package. The more complex the application, the more components are used and the longer the layout time.
The board is manufactured 30. For each application a different board must be manufactured thus increasing the amount of human resources invested and the need for expensive equipment for card verification and manufacture.
The components are installed on to the manufactured card. Faults in the layout can cause problems in the installation. For example, the tools for installing the prototypes are normally different from those used for manufacturing. Specifically, installation in the development phase uses less automation and the probability of a faulty card is greater. To shorten the installation period, special purpose, expensive equipment is used, for example high pitch chip insertion equipment.
Test and debug 34 are the longest periods in the overall development process. Fifty to seventy-five percent of the time spent on a complex design goes into verification. Verification is quickly becoming the biggest technology barrier.
Errors can be made in each one of the above tasks. For example, narrow spacing between tracks of a printed circuit can cause a short circuit. It is worse if this kind of mistake is not discovered in the debugging process, because it may be found later when it is more expensive to repair. Errors can be made also in card definition, and so on. That means repetitive processes occur. It is normal to have three versions of the prototype before the first batch of production. Expensive test equipment is needed to test the electronic card, for example: signal generators, noise generators logic analyzers, scopes, dB meters, adders, line simulators and others.
In the R&D to production process 36, the documentation with all the details needed to manufacture the card is created. Although this process can start before the last version of the prototype is ready, the process extends the time to market. Automatic tools for card verification (such as bed of nails), and function verification are created. Automatic component insertion machinery is programmed, and so on.
The above process causes the development and manufacturing cycle to be long. The later an error is found the harder and the more expensive is the repair. Therefore, if an ASIC is needed, considerable effort is made to assure error-free results. It is normal to manufacture a few batches before converting the electronics into an ASIC.
Once the card documentation is ready, all the components have been procured, chip insertion machines have been programmed, and so on, the card production process 37 can commence. There then follows production card verification 38 wherein the card is tested with or without its electronics. Function verification 40 is the process of testing the card for the designed application. It is quite complicated and time consuming to create automatic equipment for testing each function of the application. When the result is satisfactory, the card delivery 42 to the client may be performed.
If an error is found or an enhancement is needed at the end of the process, repairs and enhancements 44 are very difficult, expensive and time consuming. In the worst case, most of the process has to be repeated.
As mentioned above, real-time needs, flexibility, size and number of tasks influence the choice of technology. A circuit may be implemented as a Digital Signal Processor when flexibility is needed and/or a large number of tasks are to be performed, but not at the same time. This solution is slower in real-time than an equivalent hardware implementation. The DSP implementation is about the same size as the hardware implementation, but generally DSP implementations do not allow the option of conversion into ASIC.
Although the above-described hardware development process must also be implemented for the DSP card, it rarely influences the time to market period for several reasons. First, the hardware implementation is simple, as the DSP vendors propose solutions for the hardware design. Only a non-standard interface needs to be designed. Second, once a card is ready and the interface is fixed, a ready-made card can be used for the new application. Normally the process of developing the programming code takes more time than the process of developing the DSP card. Nevertheless, the card has to be developed at least once. Different problems than those relating to hardware implementation must also be addressed, such as which kind of DSP to choose, whether the DSP is going to satisfy the needs of next-generation applications and so on.
As the industry adheres to Gordon Moore's Law, doubling the number of transistors in a die every 18 months by shrinking the feature size (transistors and interconnects), products are becoming obsolete with each new semiconductor generation. Therefore new DSP/CPU development is needed frequently. Very commonly vendors do their best to enable software compatibility, but in practice conversion time is needed. This conversion process influences the cost as well as the time to market.
DSP implementation has advantages over hardware implementation in the simulation stage as both the simulation and the implementation can be written in a high level language such as “C”. In practice, there is no efficient means of conversion from the simulation to the DSP code. In other words, the simulation does not simulate the exact implementation, especially if assembly language is used in the DSP coding. Also, the cost of R&D is high. The cost calculation has to consider the development of the card and the development of the software. The market is short of DSP experts so the wages expense is high.
In production, automatic function verification is still hard to implement, but if an error or enhancement is discovered after delivery, it still can be fixed by changing the software at the customer's premises, although in practice this is far from trivial. The easy part is loading the revised code into the implementation.
Sometimes a combined implementation is preferred. For example, if a filter is needed in a DSP implementation, the filter may be implemented in hardware and the rest of the application will be implemented in the DSP. The advantages and disadvantages of each part remain.
Programmable Logic Devices (PLDs) allow for flexible implementation but are limited in the application capability for a given chip area. A simulation language has been converted into a High-level Design Language (HDL/VHDL) to enable the designer create implementations. Nevertheless, these software languages enable the user to create the hardware in modular form: so they are far behind languages like C++. Simulation is not accurate, debugging is complicated and the product is expensive. When size is critical, it is most common to implement the application using a few high-end PLDs as a “fast prototype” and then to convert the application into ASIC. In this case, it is common to have a few iterations for the ASIC development, which increases price and time to market.
The ASIC development process is described, for example, by Texas Instruments Incorporated, (Dallas, Tex. USA, 75380-9066) whose WEB address: is http://www.ti.com/sc/docs/asic/cad/cad.htm.
Reference is also made to http://www.verisity.com/html/spechased.html belonging to Verisity Design, Inc. Mountain View, Calif. USA. Likewise, further information relating to Electronic Design Automation may be found by reference to http://www.wsdmag.com/library/penton/archives/wsd/January199 8/261.htm which acknowledges that electronic-design-automation (EDA) technology has lagged behind the rate of progress of semiconductor fabrication.
Some of the drawbacks associated with the design, manufacturing and verification process have been addressed in the patent literature. U.S. Pat. No. 5,815,726 (Cliff; Richard G.) entitled “Coarse-grained look-up table architecture” published Sep. 29, 1998 and assigned to Altera Corporation discloses a programmable logic device architecture. For interconnecting signals to and from the logic array blocks, the global interconnection resources include switch boxes, long lines, double lines, single lines, and half- and partially populated multiplexer regions. The logic array block includes two levels of function blocks. In a first level, there are eight four-input function blocks. In a second level, there are two four-input function blocks and four secondary two-input function blocks. In one embodiment, these function blocks are implemented using look-up tables (LUTs). The logic array block has combinatorial and registered outputs and also contains storage blocks for implementing sequential or registered logic functions. The logic array block has a carry chain for implementing logic functions requiring carry bits and may also be configured to implement a random access memory.
U.S. Pat. No. 5,909,450 (Wright; Adam) entitled “Tool to reconfigure pin connections between a DUT and a tester” published Jun. 1, 1999 and assigned to Altera Corporation discloses a method of simulating the testing of integrated circuits is provided. A database of desired connections between a tester unit and a device under test (DUT) for different downbonds is accessed by a multiplexer which sets up the desired connections. The system automatically makes the correct connection for each downbond without manual intervention from the user as was required in traditional simulator systems.
U.S. Pat. No. 5,821,773 (Norman; Kevin A. et al.) entitled “Look-up table based logic element with complete permutability of the inputs to the secondary signals” published Oct. 13, 1998 and assigned to Altera Corporation discloses a logic element for a programmable logic device. The logic element includes a look-up table for implementing logical functions, a programmable delay block, a storage block configurable as a latch or a flip-flop, and a diagnostic shadow latch. A plurality of inputs to the logic element and complements of these inputs are available to control the secondary functions of the storage block.
U.S. Pat. No. 6,018,490 (Cliff; Richard G. et al.) entitled “Programmable logic array integrated circuits” published Jan. 25, 2000 and assigned to Altera Corporation discloses programmable logic array integrated circuit having a number of programmable logic modules which are grouped together in a plurality of logic array blocks. The logic array blocks are arranged on the circuit in a two dimensional array. A conductor network is provided for interconnecting any logic module with any other logic module. In addition, adjacent or nearby logic modules are connectable to one another for such special purposes as providing a carry chain between logic modules and/or for connecting two or more modules together to provide more complex logic functions without having to make use of the general interconnection network. Another network of so-called fast or universal conductors is provided for distributing widely used logic signals such as clock and clear signals throughout the circuit. Multiplexers can be used in various ways to reduce the number of programmable interconnections required between signal conductors.
U.S. Pat. No. 6,058,492 (Sample; Stephen P. et al.) entitled “Method and apparatus for design verification using emulation and simulation” published May 2, 2000 and assigned to Quickturn Design Systems, Inc. discloses a method and apparatus for combining emulation and simulation of a logic design. The method and apparatus can be used with a logic design that includes gate-level descriptions, behavioral representations, structural representations, or a combination thereof. The emulation and simulation portions are combined in a manner that minimizes the time for transferring data between the two portions. Simulation is performed by one or more microprocessors while emulation is performed in reconfigurable hardware such as field programmable gate arrays. When multiple microprocessors are employed, independent portions of the logic design are selected to be executed on the multiple synchronized microprocessors. Reconfigurable hardware also performs event detecting and scheduling operations to aid the simulation, and to reduce processing time.
U.S. Pat. No. 5,815,715 (Ku.cedilla.uk et al.) entitled “Method for designing a product having hardware and software components and product therefor” published Sep. 29, 1998 and assigned to Motorola, Inc. discloses a computing system and a method for designing the computing system using hardware and software components. The computing system includes programmable coprocessors having the same architectural style. Each coprocessor includes a sequencer and a programmable interconnect network and a varying number of functional units and storage elements. The computing system is designed by using a compiler to generate a host microprocessor code from a portion of an application software code and a coprocessor code from the portion of the application software code. The compiler uses the host microprocessor code to determine the execution speed of the host microprocessor and the coprocessor code to determine the execution speed of the coprocessor and selects one of the host microprocessor or the coprocessor for execution of the portion of the application software code. Then the compiler creates a code that serves as the software program.
U.S. Pat. No. 6,058,452 (Rangasayee; Krishna) entitled “Memory cells configurable as CAM or RAM in programmable logic devices” published May 2, 2000 and assigned to Altera Corporation discloses a programmable logic device having content addressable memory. The programmable logic device may include reconfigurable dual mode memory suitable for operating as a content addressable memory in a first mode and a random access memory in a second mode. Mode control switch circuitry may be provided to selectively enable a user to configure the dual mode memory as either content addressable memory or random access memory.
U.S. Pat. No. 6,078,736 (Guccione; Steven A.) entitled “Method of designing FPGAs for dynamically reconfigurable computing” published Jun. 20, 2000 and assigned to Xilinx, Inc. discloses a method of designing FPGAs for reconfigurable computing comprising a software environment for reconfigurable coprocessor applications. This environment comprises a standard high-level language compiler (i.e. Java) and a set of libraries. The FPGA is configured directly from a host processor, configuration, reconfiguration and host run-time operation being supported in a single piece of code. Design compile times on the order of seconds and built-in support for parameterized cells are significant features of the inventive method.
U.S. Pat. Nos. 6,031,391 and 6,097,211 (Couts-Martin; Chris et al.) both entitled “Configuration memory integrated circuit” published Feb. 29, 2000 and Aug. 1, 2000 respectively and assigned to Altera Corporation disclose a configuration memory for storing information that is in-system programmable. The programming of the configuration memory may be performed using JTAG (IEEE Standard 1149.1) instructions. Furthermore, the configuration of a programmable logic device using the configuration data in the configuration memory may be initiated with a JTAG instruction. Pull-up resistors are incorporated within the configuration memory package.
U.S. Pat. No. 5,894,228 (Reddy; Srinivas et al.) entitled “Tristate structures for programmable logic devices” published Apr. 13, 1999 and assigned to Altera Corporation discloses a programmable logic device architecture including tristate structures. The programmable logic device architecture provides tristate structures which may be logically or programmably controlled, or both. Through these tristate structures, the logic elements may be coupled to the programmable interconnect, where they may be coupled with other logic elements of the programmable logic device. Using these tristate structures, the signal pathways of the architecture may be dynamically reconfigured.
U.S. Pat. No. 6,026,230 (Lin; Sharon Sheau-Pyng et al.) entitled “Memory simulation system and method” published Feb. 15, 2000 and assigned to Axis Systems, Inc. discloses a system having four modes of operation: (1) Software Simulation, (2) Simulation via Hardware Acceleration, (3) In-Circuit Emulation (ICE), and (4) Post-Simulation Analysis. At a high level, the system may be embodied in each of the above four modes or various combinations of these modes. At the core of these modes is a software kernel that controls the overall operation of this system. The main control loop of the kernel executes the following steps: initialize system, evaluate active test-bench processes/components, evaluate clock components, detect clock edge, update registers and memories, propagate combinational components, advance simulation time, and continue the loop as long as active test-bench processes are present. The Memory Mapping aspect of the invention provides a structure and scheme where the numerous memory blocks associated with the user's design is mapped into the SRAM memory devices in the Simulation system instead of inside the logic devices, which are used to configure and model the user's design. The Memory Mapping or Memory Simulation system includes a memory state machine, an evaluation state machine, and their associated logic to control and interface with: (1) the main computing system and its associated memory system, (2) the SRAM memory devices coupled to the FPGA buses in the Simulation system, and (3) the FPGA logic devices which contain the configured and programmed user design that is being debugged.
U.S. Pat. No. 6,020,759 (Heile; Francis B.) entitled “Programmable logic array device with random access memory configurable as product terms” published Feb. 1, 2000 and assigned to Altera Corporation discloses a look-up-table-based programmable logic device is provided with memory circuitry which can be operated either as random access memory (“RAM”) or to perform product term (“p-term”) logic. Each individual row of the memory is separately addressable for writing data to the memory or, in RAM mode, for reading data from the memory. Alternatively, multiple rows of the memory are addressable in parallel to read p-terms from the memory. The memory circuitry of the invention is particularly useful as an addition to look-up-table-type programmable logic devices because the p-term capability of the memory circuitry provides an efficient way to perform wide fan-in logic functions which would otherwise require trees of multiple look-up tables.
U.S. Pat. No. 6,028,809 (Schleicher; James.) entitled “Programmable logic device incorporating a tri-stateable logic array block” published Feb. 22, 2000 and assigned to Altera Corporation discloses a programmable logic that incorporates a multi-function block having a plurality of integrally connected function units where at least one of the function units within the multi-function block is a tristate logic unit. The programmable logic device also includes a tristate bus operatively connected to the tristate logic unit that can supply tristate logic signals to the tristate bus as well as receive tristate logic signals from the tristate bus. The tristate bus carries tristate data signals and address select signals that operate to select a desired one of the tristate logic units within the programmable logic device.
U.S. Pat. No. 6,085,317 (Smith; Stephen J.) entitled “Reconfigurable computer architecture using programmable logic devices” published Jul. 4, 2000 and assigned to Altera Corporation discloses a method and system for computing using reconfigurable computer architecture utilizing logic devices. The computing may be accomplished by configuring a first programmable logic unit as a system controller. The system controller directs the implementation of an algorithm in a second one of the programmable logic units concurrently with reconfiguring a third one of the programmable logic units. In another aspect, the computing system may include a pair of independent, bi-directional busses each of which is arranged to electrically interconnect the system controller and the plurality of programmable logic devices. With this arrangement, a first bus may be used to reconfigure a selected one of the programmable logic devices as directed by the system controller while the second bus is used by an operational one of the programmable logic devices.
U.S. Pat. Nos. 6,034,536 and 6,091,258 (McClintock; Cameron et al.) both entitled “Redundancy circuitry for logic circuits” published Mar. 7, 2000 and Jul. 18, 2000 respectively and assigned to Altera Corporation disclose redundant circuitry for a logic circuit such as a programmable logic device. The redundant circuitry allows the logic circuit to be repaired by replacing a defective logic area on the circuit with a redundant logic circuit. Rows and columns of logic areas may be logically remapped by row and column swapping. The logic circuit contains dynamic control circuitry for directing programming data to various logic areas on the circuit in an order defined by redundancy configuration data. Redundancy may be implemented using either fully or partially redundant logic areas. Logic areas may be swapped to re-map a partially redundant logic area on to a logic area containing a defect. The defect may then be repaired using row or column swapping or shifting. A logic circuit containing folded rows of logic areas may be repaired by replacing a defective half-row with a redundant half-row.
U.S. Pat. No. 6,069,489 (Iwanczuk; Roman et al.) entitled “FPGA having fast configuration memory data readback” published May 30, 2000 and assigned to Xilinx, Inc. discloses An FPGA configuration memory is divided into columnar frames each having a unique address. Configuration data is loaded into a configuration register, which transfers configuration data frame by frame in parallel. In a preferred embodiment, an input register, a shadow input register and a multiplexer array permit efficient configuration data transfer using a larger number of input bits than conventional FPGAs. A flexible external interface enables connection with bus sizes varying from a predetermined maximum width down to a selected fraction thereof. Configuration data transfer is made more efficient by using shadow registers to drive such data into memory cells on a frame-by-frame basis with a minimum of delay, and by employing a multiplexer array to exploit a wider configuration data transfer bus. The speed of configuration read-back is made substantially equal to the rate of configuration data input by employing configuration register logic that supports bidirectional data transfer. Using the proposed FPGA configuration memory, a bit stream designed for an old device can be used for a new device having additional configuration memory cells.
U.S. Pat. No. 5,477,475 (Sample; Stephen P.) entitled “Method for emulating a circuit design using an electrically reconfigurable hardware emulation apparatus” published Dec. 19, 1995 and assigned to Quickturn Design Systems, Inc. discloses a system for physical emulation of electronic circuits or systems including a data entry workstation where a user may input data representing the circuit or system configuration. This data is converted to a form suitable for programming an array of programmable gate elements provided with a richly interconnected architecture. Provision is made for externally connecting VLSI devices or other portions of a user's circuit or system. A network of internal probing interconnections is made available by utilization of unused circuit paths in the programmable gate arrays.
It is an object of the invention to provide an improved device architecture particularly suited for the design of digital circuits that allows high flexibility and reduces the time from design to finished product.
To this end, there is provided in accordance with a broad aspect of the invention a universal hardware device consisting essentially of:
at least one plurality of cells for storing data; and
at least one programmable matrix coupled to said at least one plurality of cells, whereby a plurality of hardware applications may be implemented by selectively storing data in said cells and selectively programming said matrix to connect at least one of said cells to at least one of said cells.
Such device architecture allows cells to be combined so as to form larger cells, which can themselves be combined to form larger cells, this process being repeated as required; and to configure the combined cell as a hardware application by downloading data to the constituent cells. Preferably, the cells are configurable as Look-Up Tables having addressable memory locations, in which the stored data defines a function implemented by the Look-Up Table. The function can itself be programmed using a high level programming language and may be formatted together with code for implementing a desired connectivity of the cells. The formatted data is then downloaded to the cells in the device. Once downloaded, the device carries out the pre-programmed functionality in a manner that is no longer dependent on the high-level program code used to implement the desired function. As a result, operation of the device is independent of the efficiency of the high-level program code. Identical code may be used to simulate the device thus greatly facilitating design and simulation of the device and greatly reducing the time from design to marketing.
The invention also provides tools for designing, simulating and debugging the hardware device. These tools can also assist in converting all or part of the device to an ASIC after establishing that the finished device operates as required, although the value of such conversion diminishes as the life expectancy of the product falls.
In order to understand the invention and to see how it may be carried out in practice, a preferred embodiment will now be described, by way of non-limiting example only, with reference to the accompanying drawings, in which:
FIG. 1 is a flow diagram showing a conventional process for hardware implementation of IC chips on electronic cards;
FIG. 2 shows schematically the device architecture according to the invention;
FIGS. 3 and 4 shows schematically alternative configurations of cells for use in the device shown in FIG. 2;
FIG. 5 shows schematically a detail of the cell shown in FIG. 4 including auxiliary circuitry;
FIG. 6 shows schematically the connectivity required to create a cell formed from two cells so as to have a larger address bus;
FIG. 7 shows schematically a matrix connecting D cell outputs to A cell inputs;
FIG. 8 shows schematically a saturated matrix that may be used in the circuit of FIG. 8;
FIG. 9 shows schematically a non-saturated matrix that may be used in the circuit of FIG. 7;
FIG. 10 shows schematically a counter using one cell as shown in FIG. 3;
FIG. 11 shows schematically an Up-Down counter using one cell as shown in FIG. 3;
FIG. 12 shows schematically a shift register using three interconnected cells as shown in FIG. 3;
FIG. 13 shows schematically a possible topology of a one-cell shift register used in the shift register shown in FIG. 12;
FIG. 14 shows schematically how two cells of the kind shown in FIG. 5 may be connected in tri-state;
FIG. 15 shows schematically a RAM-server according to a first embodiment using two cells of the type shown in FIG. 5;
FIG. 16 shows a possible timing diagram for the RAM-server shown in FIG. 15;
FIG. 17 shows schematically a RAM-server to a second embodiment using two cells of the type shown in FIG. 5;
FIG. 18 shows a possible timing diagram for the RAM-server shown in FIG. 17;
FIG. 19 shows schematically a shift register operating in time-sharing application mode;
FIG. 20 is a timing diagram showing the timing operation for the shift register shown in FIG. 19;
FIG. 21 shows schematically a RAM-Server combination operating in a time-sharing application environment;
FIG. 22 shows schematically the connectivity required to create a cell formed from two cells so as to have a larger data bus;
FIG. 23 shows schematically the connectivity required during loading of the cells with data;
FIG. 24 shows schematically a device configured to perform an 8-bit command;
FIG. 25 shows a standalone RAM cell without a latch;
FIG. 26 shows schematically a non-optimized adder using the cell shown in FIG. 25;
FIG. 27 shows schematically an improved adder using the cell shown in FIG. 25;
FIG. 28 shows schematically a latch that may be used independently of the RAM shown in FIG. 25;
FIG. 29 shows schematically a device where a clock enable signal is used to adjust the effective clock rate;
FIG. 30 shows schematically a multiple MCM architecture allowing fast switching between different states of the programmable matrix;
FIG. 31 is a flow diagram showing the principal operating steps used by a first method for deriving construction data for implementing the device according to the invention;
FIG. 32 is a flow diagram showing the principal operating steps used by a second method for using a library to extract and store construction data;
FIG. 33 is a flow diagram showing the principal operating steps used by a third method for deriving construction data for implementing the device according to the invention;
FIG. 34 is a flow diagram showing the principal operating steps used by a method for implementing the device according to the invention;
FIG. 35 is a flow diagram showing the principal operating steps used by a method for simulating an application using the device according to the invention;
FIG. 36 is a flow diagram showing the principal operating steps used by a method for emulating an application using the device according to the invention;
FIG. 37 is a flow diagram showing the principal operating steps used by a method for using the device according to the invention to facilitate ASIC design;
FIG. 38 is a flow diagram showing the principal operating steps used by a method for avoiding use of faulty cells in the device during implementation of an application using the device;
FIG. 39 is a flow diagram showing the principal operating steps used by a method for fault correction of faulty cells in the device during real-time operation of an application using the device; and
FIGS. 40 and 41 are flow diagrams showing processes according to the invention for hardware implementation of IC chips on electronic cards.
FIG. 2 shows schematically the basic architecture of a device 100 according to the invention. The device architecture is a collection of cells 101 interconnected via at least one programmable matrix 102. A cell 103 may be built out of smaller cells 101. Likewise, each of the cells 101 or 103 together with the associated matrix 102 may form part of a block such as 104 and 105. Any block has the same architecture as the whole device 100. Any block can be configured as a single cell. Although any connection can be made between the output of one block and the input of the same or another block, a particular interconnection between the internal cells of two blocks may not always be possible. This allows a block to be associated with a “level” being the number of blocks containing the block. Thus, for example, a block of level 0 is the device itself; blocks of level 1 form the device; and blocks of level 2 forms blocks of level 1. In saying this, it should be noted that FIG. 2 is schematic and whether the programmable matrix 102 is shown within or outside the boundary of the block is immaterial, since in either case all cells within a block must be connected to at least one programmable matrix. The connections of a cell that is internal to a block and enable connection to outside the block, are defined as the port of the block.
It is further to be noted that a block could form a cell or few cells. Likewise, the device 100 is itself a block containing multiple cells interconnected by a programmable matrix and any block thus has a similar architecture of the device 100 and may indeed be regarded as a device. The device 100 thus contains multiple like devices and may be regarded as a cell formed of multiple like cells.
The name “block” allows distinction to be made between the complete device 100 and a component thereof having similar architecture: even though this distinction pertains only to the description for ease of clarity. So far as the claims are concerned, no distinction is made between the complete device and any component thereof having similar architecture. Indeed, an essential feature of the invention resides in the fact that the architecture of a component of the device may be similar to the architecture of the device as a whole. By the same token, since a block is itself a device it can be realized in different ways and thus a device can contain two or more blocks having different structures.
It should be noted that the matrix 102 does not have to be a single entity but can be split into sections. Likewise, it will be seen that the block 105 contains multiple groups of cells of which two are identified as 106 and 107, each containing a possibly different number of cells 101 and both being served by a single matrix 108. The block 105 together with its constituent cells, and any other constituent elements, is also served by a second matrix 109 shown external to the block 105. Each of the matrices 108 and 109 is typically of identical structure to the matrix 102 and whether it is shown inside the block or external thereto is merely a matter of convenience. Thus, the manner in which the matrix is depicted in the figures is schematic for illustration only. It should also be noted that part of the connections available in one matrix might be duplicated in another matrix. Any such duplication will be removed in the implementation. In practice, as will be explained below with reference to FIG. 8, the matrix is simply a collection of switches (such as CMOS switches) each controlled by a Flip-Flop, such that by writing logic “1” or “0” to the corresponding Flip-Flop, the switch may be closed or opened thereby allowing the cells to be connected according to any required topology. The Flip-Flops relating to all the switches of the matrix are arranged in groups and are associated with auxiliary circuitry enabling any one of the Flip-Flops to be selected for the purpose writing data thereto. Thus, the Flip-Flops and associated auxiliary circuitry may be realized by a RAM and will be referred to as “Matrix Control Memory”. Optionally, data in the Matrix Control Memory can also be read.
The device architecture shown in may bring to mind “fractal” structures used in mathematics to describe any of a class of complex geometric shapes that exhibit the property of self-similarity.
The input pins and the output pins of the device are connected to the programmable matrix: the input pins to the matrix input; the output pins from the matrix output. In order to access a lower level directly from the input or from the output, the port of the block should be used.
FIG. 3 shows schematically a cell 110 according to a first embodiment. The cell 110 comprises a random access memory (RAM) 111 having (n+m) address lines 112, which are shown as two separate buses although they function a single address bus whose minimum number of address bits (m+n) can be one. A data bus 113 allows data stored in addressable memory locations of the RAM 111 to be read out and accommodates a number of data bits d whose minimum number is also one. Data appearing on the data bus is latched by a latch 114 whose output 115 constitutes an output of the cell 110. The RAM 111 can be loaded with the desired data.
FIG. 4 shows schematically a cell 120 according to a second embodiment. The cell 120 comprises a RAM 121 having (n+m) address lines 122, which are again shown as two separate buses although they function a single address bus whose minimum number of address bits (m+n) can be one. A data bus 123 allows data stored in addressable memory locations of the RAM 121 to be read out and accommodates a number of data bits d whose minimum number is also one. In this case, the data appearing on the data bus 123 constitutes an output of the cell 120. The address appearing on the address buses 122 is latched by a respective latch 124. The RAM 121 can be loaded with the desired data in a manner described below. Although two latches 124 are shown at the input of the RAM 121, they are referred to as “the latch” of the cell 120, no distinction being made to the actual number of latches used to latch the address.
The RAMs 111 and 121 as well as the latches 114 and 124 shown in FIG. 3 and FIG. 4 are part of the device 100 shown in FIG. 2. In a particular embodiment reduced to practice, the RAM was modeled on IDT6116 of Integrated Device Technology, Santa Clara, Calif., USA 95054, and the latch was modeled on SN74HC374 of Texas Instruments Incorporated, Dallas, Tex. USA, 75380-9066. In both configurations of the cell, the values of n, m and d may be assigned as required according to an application to be implemented using the device.
FIG. 5 shows schematically the logical cell of FIG. 4 in more detail. The figure shows a cell 130 comprising a RAM 131 having (m+n)-bit address bus 132 and a d-bit data bus 133. A latch 134a and 134b is used to latch the address on the address bus 132. Again, it is to be noted that the address bus and the latch 134 are shown split by way of illustration only. Functionally, there is only a single address bus and the latches may be considered as a single latch. Logic signals OE and {overscore (OE)} are latched by a latch 136 which also can be considered as part of, or extension, to the latch 134 and fed via auxiliary circuitry 137 to the output enable (OE) of the RAM 131 and may cause the RAM to be in a tri-state condition. Logic signals CS and {overscore (CS)} are also latched by the latch 136 and fed via the auxiliary circuitry 137 so as to allow the RAM 131 to be selected or deselected. The number of pairs of the CS and {overscore (CS)} signals is such to enable mapping the whole block (or device) into a single cell. A clock is routed to the latches 134 and 136 and can be enabled or disabled by a clock enable signal (CE) that may also be routed via a matrix. The signals OE, {overscore (OE)}, CS1, {overscore (CS1)}, CS2, {overscore (CS2)}, CS3, {overscore (CS3)}, CS4, {overscore (CS4)} and so on, and CE are such that when not connected, are set to default values, such that an active low signal is set to
There will now be described a possible timing implementation for the device based on the cell of FIG. 5.
For devices suited for applications where no write operation is needed at all, the clock cycle can be shorter to create faster (real-time) applications and the master clock low-to-high transition can occur when the cell output is ready. This will become clearer from the description of a RAM-Server combinations shown in FIG. 15 and FIG. 17 and the timing diagrams shown in FIG. 16 and FIG. 18.
FIG. 6 shows an example for connecting two cells each having an n-bit address so as form a composite cell 140 having double the size, i.e. twice the number of addressable locations addressed by an (n+1)-bit address bus. To the extent that the components of each constituent cell are identical to those described above with reference to FIG. 5, similar reference numerals are used in FIG. 6. Thus, the composite cell 140 contains two RAMs identified as 131a and 131b both having an n-bit address bus 132a and 132b, respectively. Thus, the n least significant bits of the combined address are fed via respective latches 134a and 134b to the corresponding RAMs 131a and 131b. By way of illustration, the (n+1)-bit address fed to the combined cell is derived from a RAM 142 having an m-bit address bus and an (n+1)-bit data bus, an m-bit address being fed thereto via an m-bit latch 143. The data buses 133 of the two RAMs 131a and 131b are connected via the matrix, each data output being tri-state so that only the data on a selected one of the RAMs is output. The MSB of the (n+1)-bit address bus is used to control which of the two RAMs 131a and 131b feeds data to the data bus 133. To this end, it is connected to the {overscore (CS1)} input of the latch 136a controlling the RAM 131a and to the CS1 input of the latch 136b controlling the RAM 131b.
Operation of the circuit is as follows. If the MSB of the data of RAM 142 that is routed to the MSB of the combined (n+1)-bit address is 0, then the RAM 131a is enabled and the RAM 131b is disabled. Conversely, if the MSB of the data of RAM 142 that is routed to the MSB of the combined (n+1)-bit address is 1, then the RAM 131a is disabled and the RAM 131b is enabled. Referring back to the auxiliary circuitry 137 shown in FIG. 5, the CS1 input is fed to a first logical AND-gate 145 whose output is ACTIVE only if all its inputs are enabled. As noted above, any inputs not connected by the matrix are automatically enabled so that the output of the logical AND-gate 145 is ACTIVE if CS1 is enabled and is INACTIVE if CS1 is disabled. Likewise, the {overscore (CS1)} input is fed to a second logical active Low AND-gate 146 whose output is ACTIVE only if all its inputs are enabled (active LOW). Again, since any inputs not connected by the matrix are automatically enabled, the output of the logical AND-gate 146 is ACTIVE if {overscore (CS1)} is enabled and is INACTIVE if {overscore (CS1)} is disabled. Thus, if the MSB is LOW, then {overscore (CS1)}of the RAM 131a is enabled and the RAM 131a is operative and if the MSB is HIGH, then CS1 of the RAM 131b is enabled and the RAM 131b is operative. So when the RAM 131a is ACTIVE, the RAM 131b is INACTIVE and conversely when the RAM 131a is INACTIVE, the RAM 131b is ACTIVE.
In exactly the same way, two RAMs 140 can be combined, in which case the CS2 and {overscore (CS2)} signals are also used for accommodating the two most significant bits of the address. Such extension can be repeated at will to produce a RAM having as many addressable memory locations as required by a specific application. It should also be noted that the two RAMs 131a and 131b are shown in FIG. 6 as having address buses of equal size. However, this need not be the case and an application may, and not uncommonly will, dictate a topology where RAMs having different size address buses are combined.
The “Clock Enable” signal may be considered as an input to the cell, although it is used mainly during design for debugging purposes.
FIG. 7 shows schematically a matrix 150 connecting D cell outputs 151 to A cell inputs 152.
FIG. 8 shows schematically a saturated matrix 155 having four input lines and three output lines that may be used in the circuit of FIG. 7. The matrix 155 has to be able to connect each of the cell output lines 151 to each of the cells input lines 152 in the block. Each cell output 151 is connected to a respective input of the matrix input 155 designated by alphabetic characters a, b, c, d. The matrix 155 serves to allow connection of each cell output 151 to one or more cell input 152 designated by numeric characters 1, 2, 3. There is in practice no reason to have the ability to connect each cell output 151 to all the possible cell inputs 152, thus permitting use of a matrix that is not saturated as shown in FIG. 9. However, it makes the automation simpler to use the saturated matrix 155 as in FIG. 8, and operation is faster in real-time. Although the cell could be made out of smaller cells, and in some applications, the bigger cell would not be constructed, the output and the input of the bigger cell are pre-determined.
Operation of the saturated matrix 155 is as follows. Each of the inputs a, b, C, d is connected to each of the outputs 1, 2, 3 via corresponding switches. Thus, the inputs a, b, c, d are connected to the output 1 via switches a1, b1, c1, d1. Likewise, the inputs a, b, c, d are connected to the output 2 via switches a2, b2, c2, d2; and they are connected the output 3 via switches a3, b3, c3, d3. In order to connect input a to output 1, the switch a1 is closed. In order to connect c to 3, the switch c3 is closed. In order to connect b to both 1 and 3, the switches b1 and b3 are both closed. In order to connect both b and d to both 2 and 3 the switches b2, b3, d2, and d3 are closed, and so on.
Each switch has a control line (not shown) that sets the switch to “closed” or “open” and is connected to a 1-bit memory that stores the state of the switch. As in practice there are great many switches, all the bits that store each switch state are arranged in a memory structure. In other words, there is a memory unit that stores the switches' states. Each bit in the memory is connected to one control line, there being the same number of bits in the memory as the number of switches in the matrix. By such means, the memory functions as a Matrix Control Memory for controlling whether the state of each switch is closed or open. In the above example, the matrix 155 connects a 4-bit data bus to a 3-bit address bus. However, it will be appreciated that the matrix 155 can equally well be connected with the lines a, b, c, d forming the output and the lines 1, 2, 3 forming the input so as to connect a 3-bit data bus to a 4-bit address bus.
FIG. 9 shows schematically an example of a non-saturated matrix 156 comprising a plurality of interconnected saturated matrices 155 as shown in FIG. 8, each having its own memory to control each switch thereof. All the memories are organized as one big memory that functions as the Matrix Control Memory. Programming the matrix is achieved by loading the matrix control memory with the appropriate data, as described below, and sets the desired topology of the device.
Such a matrix 156 constructed so as to have a limited but sufficient number of connections is preferred over an equivalent saturated matrix having the same number of switching connections as it save die space, though the code for choosing the links (routings) is slightly more complicated. Thus, assuming that each matrix 155 is saturated and denoting:
D=the number of input lines to the matrix,
A=the number of the output lines of the matrix,
X=the number of the input matrices 155,
Y=the number of the output matrixes 155, and
Z=the number of the middle column matrixes 155,
X, Y and Z are calculated as follows:
where “ceiling” denotes that a non-integer number is rounded up to the next highest integer.
Each of the input matrices is connected to each of the middle column matrices. Each of the output matrices is connected to each of the middle column matrices. To prevent cross connects limitations, it is possible to increase Z. Even so, the number of switches and the associated memory will be a lot smaller than the number of switches and the associated memory in a saturated matrix with the same number of input pins and output pins. This is particularly important when a single matrix is used to connect all the cells in all levels. It will be noted that the likelihood that the end-user will connect two cells in the same block is greater than the likelihood that he will connect two cells in different blocks, owing to the tendency to attempt to combine cells to form larger cells. Therefore, when such a single matrix is used, it is advisable to take into account during design of the device to which input and output matrices, pins of the cells are connected, since from these cells the end-user may choose to form larger cells. It should also be noted that if, instead, separate matrices are provided within each block, the cumulative delay for some connections is likely to be greater than if a single matrix were used. Account must also be taken of the need to provide connections in the matrix to the input and output of the device in addition to the interconnections between the outputs of the cells to the inputs of the cells.
In order to understand how the device may be used to implement different hardware applications merely by selecting a required topology and downloading data into the storage elements of each of the cells, various examples will now be described. For ease of explanation, some examples are based on the cell 110 shown in FIG. 3, although the device works in the same manner using the cell 120 of FIG. 4. In the following examples, components that are common to the cell shown in FIG. 3 and the matrix shown in FIG. 8 will be referred to by identical reference numerals.
FIG. 10 shows schematically a counter 160 using one cell 110 comprising a RAM 111 having an n-bit data output bus 113 fed to an n-input latch 114, whose output 115, constituting the cell's output, is connected to the input of the matrix 155. Each of the cell's n output lines is connected via the matrix 155 to a respective address line of the RAM's address bus 112. The RAM is loaded with the following data:
| Address | Data | |
| 0 |