# PRIMERJAVA FPGA ARHITEKTUR IN UNIVERZALNIH PROGRAMSKIH ORODIJ ## R. Sernec KLJUČN BESEDE: FPGA, PAL, EPROM, EEPROM, SRAM, makro celice, antifuse, programska orodja, pogramabilna vezja, primerjava parametrov POVZETEK: Članek je razdeljen na dva dela. V prvem so predstavljena Field Programmable Gate Array (FPGA) vezja in njihove prednosti pred mrežami vrat. Opisane so štiri arhitekture: XC4000, ACT2, MAX7000, MACH1/2, ki jih primerjamo glede na več parametrov. Drugi del opisuje univerzalna programska orodja, ki niso vezana na določeno arhitekturo. Opisane so glavne karakteristike takih orodij in vprašanja, na katera je treba odgovoriti pred njihovim nakupom. Kratko so opisana naslednja orodja Abe-FPGA, CUPL, PGADesigner, FPG Foundry. # COMPARISON O FPG ARCHITECTURE AN DEVIC INDEPENDENT SOFTWAR DESIGN TOOLS KEY WORDS: FPGA, PAL, EPROM, EEPROM SRAM, macrocells, antifuse, software design tools, programmable circuits, comparison of parameters ABSTRACT: The article is divided into two sections. In the first, introduction to FPGAs is made with their inherent benefits to gate arrays presented. Four architectures are described: XC4000, ACT2, MAX7000, MACH1/2 and compared according to various parameters. Second deals with device independent software design tools. Important issues are described which have to be considered when selecting among myriad of SW design tools. Abel-FPGA, CUPL, PGADesigner, FPGA Foundry SW tools are shortly described. ## 1.0 INTRODUCTION Over the years the only system designer's choice besides standard logic ICs, were custom designed ICs, gate arrays or ASICs all coupled with high non-recurring engineering (NRE) charges and as such applicable only to high volumes. First steps toward cheaper, lower volume designs were made with the introduction of PALs and their derivatives in mid seventies. But those suffered from a serious drawback: lack of integration. How to combine programmability, reusability, high integration and no NRE charges with low prices at low volumes was the main problem, first solved by Xilinx in 1985 with their unique field programmable gate array (FPGA) architecture. The situation on the semiconductor market can be seen by looking at Fig.1 and gives the answer why are so many companies involved in this business today<sup>11</sup>. Fig. 1: Logic market growth rate Source: Dataquest Fig. 2: FPGA market niche ## 1.1 FPGA WHY AND WHEN?<sup>/2/</sup> Comparing FPGAs to gate arrays reveals the following benefits: - user programmable - no NRE - fast time to market - standar product - 100% factory tested - applicable to lower volumes - can be used for system prototyping Although one can argue that generally they are slower and less integrated then today's gate arrays. According to some studies 80% of all gate array designs fall into 2000-10000 gates/chip category and that is where FPGAs play their major role. Cost analysis shows that production costs for gate arrays are 0.15-0.2 cents/gate vs. 0.5-0.6 cents/gate for FPGAs (10000-30000 unit volumes). Because of high NRE, simulation costs and possible late time to market and according to that profit loss, a break-even point exists below which FPGAs have clear advantage. It is estimated to be between 15000 and 70000 units/design dependign on design density, as shown in Fig.2. ## 1.2 VENDORS AND THEIR ARCHITECTURES There are many vendors manufacturing different architectures and implementations of FPGAs today, as can be seen from Fig. 3. All different PAL, PLD and FPGA architectures are presented here as one large family. ## 2.0 REVIEW OF THE FOUR ARCHITECTURES The four architectures reviewed in this article are implemented with different techniques. XC4000 and ACT2 are designed as configurable logic blocks connected by channels of programmable interconnects, whereas MAX7000 and MACH can be thought of as PLA blocks interconnected by a switch matrix. All are manufactured in 0.8 $\mu$ m CMOS technology (except ACT2 in 1.2 $\mu$ m). They offer gate densities from 900'3' to 40000'4' available gates and up to 20000 usable gates'13'. Each architecture employs different mechanisms for desired function implementation: SRAM, antifuse, EPROM, EEPROM. Xilinx's older families are XC2000 and XC3000 series for designs of up to 9000 gates/chip, but also slightly faster (XC3000), than the XC4000. Fig. 3: FPGA vendors according to implementation Altera's previous family includes MAX5000, slower and with just a quarter of density attainable by the new family. Actel is also producing older slower and less integrated ACT1 series. The design of a certain circuit and porting it to selected FPGA is done by special vendor supplied or third party designed tools. They enable designer not to worry about the internal structure of FPGAs and everything from schematic capture to final device specification can be done automatically. ## 2.1 Xilinx XC 4000 family 5/ Family consists of 10 devices with densities from 2000 - 20000 gates/chip and 64 - 240 I/O pins. The highest density part available today is XC4008 with 8000 gates and 144 user I/O pins. All devices are reprogrammable on every power-up, so the functional description of the device must be stored in a nonvolatile memory. The chip is comprised of configurable logic blocks (CLB), I/O blocks (IOB) and inter connections. All user defined functions are performed by CLBs, which can perform any logic function of up to nine variables in various forms. It contains two D-type flip-flops, positive or negative edge triggered with two registered an two combinatorial outputs as depicted in Fig.4. Logic functions are implemented using look-up tables stored in each CLB, as such each CLB contains 32 bits of SRAM. This gives it a unique ability to use that RAM storage, instead of implementing logic functions. Maximum RAM bits available are 2048 - 28800 depending on the device and can be used as FIFOs, register files, etc. Each CLB has two important features: When combinatorial output drives a flip- flop the propagation delay Fig. 4: Configurable logic block Fig. 5: Input/Output block through function generator is completely overlapped with the set-up time of the flip-flop, which results in higher speeds. Secondly, there is a dedicated carry circuitry for efficiently speeding up adder designs of up to 32 bits in length; upwards other methods must be used (carry propagate/generate). On each chip edge there are four decoders, each with up to 40 inputs wide. Each IOB has two registers that can be used as a rising/falling edge triggered D-type flip-flop with separate connection to input or output register, or as a level sensitive transparent latch allowing registered I/Os, bidirectional or three-state outputs. It is shown in Fig.5. Input data to the register can be delayed by several ns to compensate for the clock delay eliminating the hold time. Each output can sink up to 12 mA and any two can be connected together for 24 mA sink. Special logic in each IOB enables boundary scan testing conforming to IEEE Standard 1149.1 (JTAG). The chip function is programmed by loading RAM bits in each CLB either through parallel 8-bit or serial interface. The preprogrammed values can be stored in proper EPROMs. The chip alone takes care of all sequencing operations needed for loading. In cases when more than one device is needed, all can be connected in a serial fashion and configured from a single EPROM. # 2.2 Actel ACT2 family 6/ Gate densities offered are in range of 2500 - 8000 gates/chip and 82 - 140 I/O in three devices. They are Fig. 6: MAX EPM7256 architecture based on antifuse nonvolatile one-time programmable technology and can be programmed with common PAL programmers. I/Os are TTL or CMOS compatible.Available are two diagnostic pins which, with proper software enable observation of any internal signal. They also contain a security fuse to disable unauthorised copying of the design. Logic modules (LM) perform the programmed function, which can have a maximum of eight input variables and one output. It can also be configured as a transparent latch, D-type, JK-type or T-type flip- flop, each positive or negative edge triggered. Each LM can be configured as either combinatorial or registered, but not simultaneously. Fig. 7: Macrocell Every I/O pin has an I/O module (IOM) which can be configured as I/O, bidirectional, three state or with transparent latch for both input and output. One can sink 8mA at TTL levels or 6mA with HCT. Slew rate is user programmable to limit switching noise. ## 2.3 Altera MAX7000 family 17/ It was introduced less than a year ago with nine devices planned and with integration 4000 - 40000 gates/chip. The largest device available today is EPM7256 with 5000 gates and up to 164 input or 160 output pins. Its architecture resembles those of PALs and is reprogrammable, because fuses are implemented as EPROM cells and programmed in standard PAL programmers. Erasing is done under the UV light. Programmable security bit controls access to programmed data and prevents unauthorised copying of designs. The MAX architecture is depicted in Fig.6. It consists of logic array blocks (LAB) connected by programmable interconnect array (PIA). Individual LABs can operate at full speed or with half power consumption and 8ns speed penalty added to calculated timings. Each LAB has two tasks: to implement logic functions and to drive I/O pins. Each LAB contains up to 16 macrocells (MC). Logic functions are implemented within those MCs which supports functions of up to 5 product terms. If larger functions are needed each MC can obtain additional product terms in two ways: either by shared logic expanders, which constitute a pool of 16 uncommitted single product terms in the LAB, or by parallel logic expanders, which can allocate up to 20 unused product terms in the LAB to a single MC. In both cases a delay of 8ns and 3ns respectively is incurred, when using one of these options. Block diagram of the MC is show in Fig.7. Registers can be of any type, even RS-type flip-flop positive or negative edge triggered. Each can be clocked either from product term array or from a dedicated external CLK pin which is connected to all LABs. In the first case any input may be used as a clock to particular LAB. I/O block is shown in Fig.8 and as can be seen has the capability to be used as I/O, bidirectional or its output can be three stated. Buffer is controlled by two dedicated global OE signals. When control is gated to Gnd, buffer is three-stated and I/O pin is used as an input and MC logic can provide an input register. Each output can sink up to 8 mA. The PIA features constant propagation delay of 3ns independent of connection length between LABs or I/Os. This means predictable performance even before simulation, which is unlike in first two architectures described. It also ensures 100% routability between all LABs and I/O pins. Although all signals from each MC are fed to Fig. 8: Input/Output block Fig. 9: MACH230 architecture Fig. 10: Output macrocell Fig. 11: Buried macrocell PIA, only the required ones are routed. Four global input pins (clock, clear, two output enables) are available and are routed to all LABs. ## 2.4 AmD MACH family/3/ It has two subfamilies: MACH1 and MACH2, each with three members with integrations ranging 900 - 1800 gates/chip for the first and 1800 - 3600 gates/chip for the second subfamily. Essentially they are the same except that MACH2 contains buried macrocells which enable registered inputs and allows output latch. It also resembles PAL architecture with fuses implemented as EE-PROM cells. As a consequence it can be programmed and erased on standard PAL programmers. Each device consists of PAL blocks, which contain buried and output macrocells, interconnected by a switch matrix as shown in Fig.9. A security bit is available in all devices. The desired function is implemented within product-term array connected to logic allocators which allocates appropriate product-terms to output macrocells (OM). Functions of maximum 12 (MACH1) or 16 (MACH2) product-terms can be realised, either registered or combinatorial. Fig. 10 represents OM within which register can be set as a D-type or T-type positive edge triggered flip-flop or as a transparent latch. Two (MACH1) or four (MACH2) independent clock inputs are available for multiple clock circuits. Buried macrocell can be seen in Fig.11. Switch matrix interconnects all PAL blocks together with a constant epropagation delay of 15ns regardless of signal routing length. Again this means predictable performance. ## 3.0 ARCHITECTURAL COMPARISON All features mentioned above are summarised below in Table 1 to enable a quick architectural comparison and are made for the highest density part available now. Timings in Electrical specification section are best-case achievable. Exact performance figures for various se quential and logic circuits were omitted here, because they can be estimated only through simulation, specifically for XC4000 and ACT2 families. Tabl 1 Architectura comparison | GENERAL<br>SPECS | | | | | |-------------------------|----------|-------------------|--------------|------------------| | Manufacturer | Xilinx | Actel | Altera | AmD | | Famil name | XC4000 | ACT2 | MAX7000 | MACH1/2 | | Technology | смоѕ | CMOS | CMOS | CMOS | | Implementation | SRAM | Fuse | EPROM | EEPROM | | Programming | power-up | permanent | reprog. | reprog. | | Observability | JTAG | Yes <sup>1.</sup> | N/A | N/A | | Securit bit | No | Yes | Yes | Yes | | Device in family | 10 | 3 | 9 | 6 | | Devic name | XC4008 | A1280 | EPM7256 | MACH230 | | Max. gates | 8K | 8K | 5K | 3.6K | | Max. fli-flops | 936 | 998 | 256 | 128 | | Use I/ pins | 144 | 140 | 164/160 | 70/64 | | Func. siz I/P | 91 | 81 | 5(+30) PT | 16 PT | | Function outputs | 4 | 2 | 1 | 1 | | Registe types | D | D, JK, T, L | D, JK, T, RS | D, T, L | | Triggering | P/N | P/N | P/N | Р | | Clock inputs | numerous | numerous | numerous | 4 | | Packag pins/type | 176/PGA | 176/PGA | 192/PGA | 84/PLCC | | ELECTRICAL<br>SPECS | | | | | | Logic block | | | | | | t <sub>pd</sub> /ns | 7 | 5.5 | 9 | 15 | | t <sub>clkmin</sub> /ns | 8 | 10 | 14 | 15 | | f <sub>max</sub> /MHz | 125 | 100 | 71.4 | 66.7 | | t <sub>su</sub> /ns | 8 | 1 | 8 | 10 | | t <sub>h</sub> /ns | 0 | 0 | 8 | 0 | | t <sub>co</sub> /ns | 5 | 5.5 | 13 | 10 | | I/ block | | | | | | t <sub>pid</sub> /ns | 3 | 6.2 | 6 | 15 <sup>2</sup> | | t <sub>pli</sub> /ns | 9 | 11.7 | 6 | 17 <sup>3.</sup> | | t <sub>op</sub> /ns | 7 | 7.1 | 5 | 15 <sup>4.</sup> | | t <sub>cp</sub> /ns | 8 | 10.8 | N/A | 17 <sup>5.</sup> | Legend: I/PT Inputs/Product-terms P/N Positive/Negative edge t<sub>pd</sub> combinatorial propagation delay t<sub>clkmin</sub> minimum clock period f<sub>max</sub> maximum toggle frequency $\begin{array}{lll} t_{su} & & \text{set-up time} \\ t_h & & \text{hold time} \\ t_{co} & & \text{clock to output} \end{array}$ t<sub>pid</sub> input pad to logic block delay | <sup>t</sup> pli | input pad via transparent latch to logic block delay | |------------------------------------|------------------------------------------------------------------------------------------------------------| | t <sub>op</sub><br>t <sub>cp</sub> | logic block output to pa delay latch enable to outptu delay | | 1.<br>2.<br>3. | 2 diagnostic pins input via logic to combinatorial output input latch via logic to combinatorial output | | 4.<br>5.<br>5.<br>N/A | input via logic to combinatorial output input via logic to output latch two diagnostic pins not applicable | User's final choice depends on a number of parameters, specific application and its constraints. #### 3.1 DEVICE PORTING Porting existing design to a cheaper gate array may be an important issue in cases where higher volumes are needed. FPGAs can also be used in early prototype and initial production stage to lower the design risks in possible repeating gate array runs and as a consequence achieve faster time to market and lower design cost and then transferred to gate arrays. Xilinx is offering such services only on XC3000 family. NRE charges are minimal. Company replaces SRAM cells with hardwired logic which results in smaller die and faster circuit. Simulation with timing verification must be done again, otherwise nothing changes. Altera let designers put multiple MAX5000 chips into one mask programmed device/9/. Designers using Max+Plu-sll development software can exploit multi-part partitioning feature to automatically partition large logic designs into multiple MAX devices (up to 40). A reverse route is also possible to combine devices of up to 50K gate complexity into one part and process it as a masked device. Very important feature is timing compatibility between multiple and single device designs. Design conversion costs are US\$20000 - US\$60000 and chip costs cca. 6 cents/macrocell. Single device solution offers large price reduction, smaller power consumption and higher reliability. AmD is offering a derivative family called MASC with the same part designations as in MACH family. EEPROM cells are replaced with metal masks. No additional NRE costs are required and all timings are the same without any redesign. ### 4.0 SOFTWARE DESIGN TOOLS Proper device programming must be done with the right software design tools. Each FPGA vendor supplies vendor/device specific SW tools optimised for target architecture. These are capable of exploiting salient devices' features. But the question arises what to do when devices from other vendors must be used? Purchasing another set of device specific SW tools, taking precious time to learn them...? Luckily there are companies specializing in producing vendor/device independent SW design tools. Choosing another vendor means only purchasing new device specific SW fitter at most, which converts design description of any type to specific logic elements available in target architecture - transparent to the designer. Thus designer need not to worry about internal device architecture, but rather concentrates on design itself. Doing designs from top to bottom and implementing final designs in the right device as late as possible in the design cycle or even implementing designs in devices from different vendors and comparing them according to predefined objectives and then choosing the right device are features that give device independent design tools competitive edge. Most of third party SW vendors have technical agreements with device vendors that enable them to fully exploit devices' features, so the end result will be as optimal as done with proprietary SW tools. A vast majority of SW tools run on PC 386/486 and UNIX machines. This part of the paper deals with questions how to choose the right SW device independent design tool and what capabilities are preferable for a good SW tool. ## 4.1 IMPORTANT ISSUES/1/,/8/ List of supported vendors and their devices is probably the most important issue in making the first decisions. So called fitters make possible implementing logic designs in a specific architecture. They can be purchased separately as designer's needs grow and enable retargetting existing design to new devices as they become available. Form of design description and input is crucial and may include schematic capture from popular design tools, waveform, Boolean equations, state machine, truth table or some hardware description language such as VHDL or Verilog. Logic reduction and use of don't care conditions help eliminate all the unnecessary logic elements and make designs more efficient. Ease of use can speed-up the whole design, if based on menudriven interfaces and on-line help as well as supported with high quality manual. Graphical editor for complex FPGAs is almost inevitable and should be helpful in preplacing and routing of time-critical nets or those that autorouter can't handle. To accurately predict device's timing, behaviour and verify the design before committing it to target device, precludes the use of a digital simulator. Simulating individual modules, integrating them into final design and simulating it as a whole is the added benefit of a good simulator. Inevitable are automatic placer and router of logic blocks with possible 100% completion and optimum shortest path chosen for best performance figures. Besides raw logic block speed one must take into account routing path delays for final design speed and this is where a good placer/route can save precious nanoseconds. With new FPGAs ranging in complexity to several thousand available gate design capacity of SW tool must not be overlooked. Helpful feature is automatic multiple device partitioning. Designs exceeding the target device complexity must be partitioned to several devices without designer's intervention which can be done by the aforementioned feature. Automatic device selection helps designer to select the best suitable device to particular design. Report generator informs designer of any problems implementing logical to physical translation, timing inconsistencies and similar information. Connecting SW tool to other design tools is helpful when already designing with standard schematic capture, simulation or ASIC design tools. All features available should be implemented with fastest algorithms possible for fast design compilation, automated capabilities user controlled and directed and where necessary designs done manualy. ## 4.2 PROMINENT SW DESIGN TOOLS Almost two dozens device independent design tools are currently on the market, some of them are briefly reviewed here. All of them run under DOS or UNIX operating systems. Abel design tools from Data I/O Corp. have been on the market since PLAs' introduction. Abel-FPGA is specifically suited for designing with all kinds of FPGA architectures. It is bundled with Abel hardware description language (Abel-HDL) to ease the design of complex func tions. All other means of design entry mentioned are supported with the ability to accept netlist input from popular schematic capture packages like Future Net. OrCad... Device fitters are available for the following architectures: Xilinx, Actel, Altera, AmD, Atmel, Cypress, ICT, National Semiconductor, Plus Logic, Texas Instruments and can be purchased separately. It lets specify placement and route constraints to partition the same design among multiple FPGA architectures from different vendors. Automatic device selection which best fits the design, multiple device partitioning and logic optimisation with functional simulation make this tool very powerful. Output can be netlist or JEDEC format file. Its price starts at US\$4995. CUPL design tools<sup>/11/</sup> from Logical Devices were also one of the earliest available. New version supports MACH architecture. Also added is a new easy to use menu-driven interface. Users are allowed to create common logic blocks (adders, counters...) and define them as macros, which can be stored in a separate file and called into subsequent designs. A big time-saver feature is conditional compilation of only redesigned portions of designs. Simulator is a very powerful one and handles asynchronous circuits, too. It accepts schematic netlists from other packages and has the ability to automatically partition larger designs to multiple devices. Designs can be described in a custom C-like language. Basic package can be purchased for US\$495. Minc Inc. is selling **PGADesigner**<sup>12</sup>. Design entry can be specified in a proprietary Pascal-like design synthesis languae, EDIF 2.0.0. schematic netlist format as well as by any other mean. If design is specified with a waveform describing synchronous circuits, it will automatically synthesize the required logic. Besides already mentioned capabilities it also enables designers to put multiple-device PLD designs into any FPGA architecture supported. A special expert system will find the device best suitable for desired design according to various constraints specified by the designer (manufacturer, number of devices, power consumption, logic family, propagation delays...). It outputs a list of ten most appropriate devices for a certain design. Supported FPGA families are those described here, plus 3200 PLD devices from 18 vendors. Synthesis, functional and system simulation, design rule checking and vendor specific netlist generation are done with this tool. Placement and routing for XC4000, ACT2 and device programming information for MAX7000 must be done with vendor specific tools, which are budled as separate modules. According to the company's sources has the capacity of 27500 gates. Price ranges from US\$2500 upwards. Newest entry in the field is **FPGA Foundry** from Neo-CAD Inc. 11 includes device fitters, circuit optimiser, automatic place and route capability, timing estimator, back annotation, report file generation and graphical editor. It lets designers specify nets with critical path delay which are processed by knowledge-based timing estimator that provides required information to the fitters and place and route module. Critical nets can also be routed interactively with menu driven graphical editor. It accepts files in EDIF 2.0.0., Library of parametrised modules (LPM) standards and also vendor specific formats XNF (Xilinx), ADL (Actel). Initially the tool supports Xilinx and Actel devices and starts at US\$18000. ### 5. SUMMARY After answering question why and when using FPGAs the paper reviewed four different architectures from Xilinx, Actel, Altera, AmD. As seen in the architectural comparison each one is trying to target a specific niche on the market corresponding to its capabilities. On the fly reconfiguration of SRAM based XC4000 is perfectly suited to prototyping and debug stage where instant changes are required. ACT2 with antifuse technology will find implementing itself after design has been fully debugged and thoroughly tested with no changes required. Still more flexible is MACH family due to EEPROM technology with erasing done in standard programmers, although it tackles lower density market niche. Changes can be made to Altera family too, but this is cumbersome, because of EPROM technology used. By all means it places itself in high density, high speed designs. All but Actel provide the means to transfer existing designs to some kind of cheaper, higher volume technology. This feature is perhaps best exploited by Altera which lets designers put multiple MAX5000 devices on a single mask programmed highly integrated chip. Every vendor supplies users with software tools needed, but third party SW tools let designers use devices from multiple vendors without needing to know all the details of the target family, yet still utilizing it fully. Brief review of four such packages reveals, that they are very sophisticated providing all the necessary help and ease of use to today's designers. #### **ACKNOWLEDGEMENTS** Many thanks to Z. Bele, B.Sc., I. Šorli, B.Sc., R. Ročak, Ph.D. from Mikroiks d.o.o. for their support and to P. Koselj for his invaluable comments in preparing this article. ## LITERATURE /1/ Designers' guid t PLD/FPG desig tools, Electroni Design, Nov. 7 7-82; 1991 /2/ Programmable gate array data book, Xilinx; 1991 /3/ MACH Family data book; 1991 /4/ Second-generation MAX family boosts density fivefold, Electronic Design, May 9, 145; 1991 /5/ XC4000 Logic cell array family, Xilinx; 1990 /6/ ACT1, ACT2 Field programmable gate arrays, Actel, 1991 /7/ EPM7256 EPLD Data Sheet, Altera, 1991 /8/ Programmable logic, Electronic World News, Nov. 18, Special report; 1991 /9/ Compconic 91, Electronic World News, Nov. 4, C5, 1991 /10/ One toolset creates FPGAs in any technology, Electronic Design, Dec. 5, 12-130; 1990 /11/ PAI Device data book, Amd, 1990 /12/ PGADesigner Solves the FPGA puzzle, Minc Inc.; 1991 /13/ Enhanced EPLDs tackle 70MHz systems, Electroni Design, Jan. 23, 12-123; 1992 Radovan Sernec, 4<sup>th</sup> year undergraduate student of Electrical Engineering, Process Automation; Faculty of Electrical Engineering an Computer Science, Ljubljana Slovenia