A high speed serial bus controller ASIC

Share Embed


Descrição do Produto

PROC. 21 st INTERNATIONAL CONFERENCE ON MICROELECTRONICS (MIEL'97), VOL.2, NIS, YUGOSLAVIA, 14-17 SEPTEMBER, 1997

A High Speed Serial Bus Controller ASIC N. Kerii, S. Jankovit, W. Fallman and V. Litovski Abstract: A fully automated design flow of a high speed serial bus controller ASIC with asynchronous data transfer is presented. The whole chip including not only the control state machine and the pProcessor interface, but also the synchronising unit has been designed using VHDL. For verification and synthesis the SYNOPSYS tool set was used, the back-end design steps were accomplished using a technology independent placement and routing tools [EPOCH by CASCADE). It could be proven that both control state machines running comparably low frequencies and units running at high system speeds are equally well suited for a fully automated design approach. All user specifications were met, special emphasis was put on Design For Testability OFT) which resulted in an overall fault coverage o f more than 95°/~.

Two Werent input data rates were implemented, 16 and 4 Mbit/sec respectively. The desired rate again is selectable via an external pin. The internal bit sampling rate of the synchronisation unit depends of course on the input data rate, e. g. 128 MHz is used for sampling 16 Mbit/sec data stream and 64 MHz is sampling rate applied to 4 Mbit/sec data stream. For the actual bit the falling edge of the start bit was used. Re-synchronisation is has to be performed after a maximum 36 bits, thus requiring a falling edge to be transmitted after two words.

m;

I Introduction

CPU

A single chip controller for a high speed serial bus which is used for data transfer between one single master and up to 16 slaves (see figure 1) has been designed. For the ASIC only the functions contained in the slave were implemented. Any communication is initiated by the master, i. e. a slave sends data only upon request. The data stream is divided into frames of variable length. They themselves consist of 18 bit words each of which comprises a start bit, 16 data bits, and a parity bit. Two words may be sent without delay, i. e. the parity bit of the first word is followed directly by the start bit of the next second word. After 36 bit-times a high bit is inserted to guarantee a falling edge necessary for re-synchronising. Every slave receives and detects the first two words of every frame sent by the master. If the destination address of the frame is identical to the address of the slave, it continues to receive the frame and store the data into the internal dual ported RAM. The address of the slave is selectable via external pins. The active (addressed) slave has to send an answer within 18 bit-times. After having received the answer from the active slave, the master has to wait for two word-times (36 bit-times) before sending a new frame. N. Ker6 and W. Fallman are with the Institut for General Electrical Engineering and Electronics at the Vienna University of Technology, GuDhausstraDe 27-29,1060 Vienna,Austria, E-mail: keroe(ijliaee.tuwien.ac.4 [email protected] S. JankoviC and V. Litovski are with the Faculty of Electronic Engineering, University of NB,Beogradska 14,18 000 Ni5, Yugoslavia, E-mail: [email protected],[email protected]

CPU

CPU

CPU

Figure 1: High speed serial bus structure

I1 Controller Structure The first step of the design process was to determine the architecture of the controller. The design was subsequently split into separate blocks each of which is described on a functional level using VHDL [1],[2] for design capture. The chosen architecture is shown in figure 2. There are two directions of data flow. In one direction, a message is accepted, processed in a suitable manner and prepared for an external CPU to be fetched. The opposite direction is used for preparing, formattingand transferringto the bus. Incoming data enters the bit synchronisation block. This unit searches for the start bit (falling edge synchronisation) of the data stream, the actual bit value is detected by sampling the data stream in the centre of one bit-time. Finally the data is sent to the serial to parallel conversion unit. One of the main design constraints for this unit was the specification about sending two words without delay, i. e. the possible absence of a falling edge within two first words. In order to implement an efficient message processing, the received serial data has to be converted into parallel words. The function is performed by a serial to parallel converter block. This unit takes care of the parity checking as well. The parity bit is generated and compared to the received parity bit. This is the first check for transmission.

0-7803-3664-X/97/$10.000 1997 IEEE

737

Any mismatch causes an error flag to be set, informing CPU about the data comption.

controller. Four Werent clock signals are required, 128 M H z and 64 ME3z for bit sampling, 16 MHz and 4 MEk for data conversions and operation of the control unit.

lcII Synthesis and Layout Implementation Decisions $

--++4---,.J Paralld uwlvu(rr La

Figure 2: Serial bus controller architecture The control unit is the heart of the bus controller. Its task is to check correctness of the coming message header (first two words within the frame) and decode it. In the master to slave direction, both the destination slave address and the number of message words (data length) are transmitted twice (once in the first and once in the second header word). Both values must be confirmed in the second header word for the slave to proceed receiving data. If the slave receives two different values the CPU again will be informed about data corruption. If there are no errors and the destination address is identical to the slave address, further message processing is initiated. Several bits within the header designate the type of message. Besides normal operation mode for master-slave communication,there are two special types of messages, a broadcast and a reset frame. In normal operation mode the remaining part of the message data is stored in an internal dual-ported and the CPU is informed that new data has arrived. The CPU of the slave can send data by storing it into the same dual ported RAM enabling the control unit to generate an answer frame which again consists out of two header words and a data block. The broadcast message is identified by a special destination address. It is accepted by all slaves attached on the bus and contains some general information, sent by the master. This kind of message does not require an answer, the data are just stored in the RAM and later read by CPU. In the case of a reset message the header is not followed by data and the answer is also made out of only a two word header, containing an acknowledgement of the reset instruction. For preparing an answer frame to be sent to the bus a parallel to serial converter is needed. The 16-bit word on the input is converted to serial data stream. Before transmitting the first bit this unit generates a start bit. It is followed by 16 data bits. During the conversion, the parity bit is generated and sent as the 18th serial bit. Finally a designated clock unit was designed in order to generate the various clock signals needed for the

After having completed the verification of the functional VHDL description by means of logic simulation a gate level netlist suitable for automatic placement and routing had to be generated. Prior to this step the desired CMOS process had to be chosen. For this design we ~ offered by ES2.For selected the CMOS 0 . 7 technology this foundry a cheap access based on multi-project-wafer fabrication is possible via EUROPKACTICE. Two Computer Aided Design (CAD) tools were used for the design. Block synthesis is pefiormed on SYNOPSYS, while EPOCH is used for layout compilation. The two tools are closely linked via an EPOCWSYNOPSYS interface, making it possible to use both packages in the field they are most powerful. A. Circuit Synthesis

Circuit synthesis indicates a procedure of implementing chip architecture described in VHDL into a gate level netlist, containing only EPOCH library parts [3]. Taking the given controller nature into consideration, synthesis turned out to become the most important design phase. It is obvious that this design shows no distinct data path structure e.g. no arithmetic, repetitive or bus-oriented logic blocks, which would allow for a highly regular realisation. The actual synthesis process, done by SYNOPSYS VHDL Compiler [4] and Design Analyser [5], involves a couple of steps. The source code is optimised, redundancies are eliminated and constants are propagated. This is followed by resource allocation and - to a limited extent - also resource sharing, thus reducing the amount.

738

Figure 3: Modified bus controller structure

Special care was taken to cope with the problems of synthesising the high frequencies blocks. Designing one single bit synchronisation block able to fulfill the specificationof two possible input data rates with even 128 ME& bit sampling led to an unacceptably complex Finite State Machine @SM). Reaching the 128 MHz clock constraint (the longest path must be less then 7.81 nsec) became impossible. According to this conclusion, the chip structure had to be slightly modified. Instead of one block two separate bit synchronisation blocks are designed, one for 4 MHZ input data rate and the other accepting 16 MHz coming data. The block outputs are multiplexed by of the external signal selecting data rate. The modified controller structure is shown on figure 3. Regarding the above mentioned modifications, the clock unit block had to be adjusted accordingly. Clock generation is done with a synchronous 5-bit counter used as a frequency divider. The data rate clock is selected by multiplexing. Depending on the current input data rate only one of the sampling clock signals (64 or 128 MHz) is made active, while the other one is disabled. The clock unit schematicview is shown on figure 4.

The conversion is performed starting from the MSB by incrementing the value of the bit position signal until the end of the word is reached. Of murs the results of the synthesis are verified by means of a timing simulation using the gate level netlist.

B. Layout Implementation Decisions The layout implementation is the next design phase. At the beginning, the decisions involving layout topology had to be made. Within EPOCH eveq block can be processed individually. However, in the higher level of hierarchy, this block can be preserved or "absorbed" into a larger standard cell group design, which allows for optimising [6]. In the particular case, there are three blocks with critical timing constraints, the clock unit and two bit synchronisation blocks. Considering the fact that their timing performances will be poorer after "smashing" and "absorbing", they will be left fixed. The rest of the chip, except of the RAM, will be "smashed" into a large standard cell group.

IV DFT Introduction and ATPG

Rats-choice

I

Figure 4: Clock unit schematic view Naturally the control unit is the most complex controller block. It is a FSM with twelve states. After the data is available for processing, the FSM is switched from default state to two word header acceptance mode. The results of the header decoding causes the FSM to branch to one of the following states: back to the default state if the destination address is wrong, to the error state if the received data are corrupted, to normal operation message processing and preparing an appropriate answer, to broadcast, or to reset message state. The complexity of the controller was roughly 440 gatcs. For serial to parallel conversion a shifting algorithm is used in order to avoid a counter which tracks the current bit position. Before conversion, the converter output word During conversion is set to '~0000~0000~0000~0001''. serial bits are read in and shifted starting from the Least Significant Bit (LSB)position. Shifting out '1' from the Most significant Bit (MSB) position indicates the end of conversion process. The algorithm for parallel to serial conversion uses an additional signal indicating the position of the current bit.

The SYNOPSYS Test Compiler [7] is used for DFT and Automatic Test Pattern Generation (ATPG). The greatest potential for a very high fault coverage has scan design technique [SI. For this design the full scan approach using scan flip-flops was implemented. These cells are standard flip-flops which are enhanced by a multiplexer at the data input, making it possible to perform a serial shift function during test mode. This additional circuitry slightly increases the overall chip area, but more important the timing constraints may be jeopardised. For our design these meant a thorough rechecking of the bit synchronisation blocks. Two scan chains are implemented, one containing the sequential part of the control unit and the other containing rest of the residual sequential circuit part. To guarantee a successful scan insertion any asynchronous part and of course combinatorial feedback loops were avoided. Additionally gated clocks are not used, thus all but one conditions for high fault coverage were met. The clock signals generated by the clock unit (on-chip clock signals) are not available externally, so the state of the sequential cells driven by these signals is not controllable. Again, a multiplexer can provide a solution. Three multiplexers were added for clock signals referred to as CLK1, CLK2, and CLK3 respectively. The multiplexer inputs are on-chip clock signal and primary clock input. During test mode the on-chip clock signal is disabled and primary clock is made active. After scan chain inserting and solving the controllability problem, ATPG is performed. The first

739

result got by ATPG, contains 239 test patterns but after merging and compaction the number of test patterns was reduced to 199. Fault coverage is now satisfactory high, 94.31% for non-collapsed and 93% for collapsed faults. It should be noticed that the dual ported RAM, as a macrocell block, is not included in the testing procedure described above. The EPOCH memory library [3] offers quite flexible testability options. Two testability options are used. Test structure STRl allows access to all of the RAM'Sprimary inputs by placing a multiplexer at every input. Thus, test inputs are active during test mode. The second option implemented is Built-In Self-Test PIST). It works in conjunction with the STRl option. The test mode performs a self-test based on the "marching method". The test data are generated Via a test generator circuit and routed to the RAM inputs. The RAM output is monitored through a special comparator circuit for correct function. If the RAM fails, the status output pin is asserted. In this way, full testability of the dual ported RAM is achieved. The overall results of fault coverage analysis are very satisfactory. For the total chip it can be estimated to be higher than 95%.

V Chip Layout The EPOCH layout compilation process [6] is referred to as physical design and performs a couple of necessary phases. First, the blocks are individually placed and routed, using a standard cell router. After that, the chip core is compiled in the "macrocell" style. Finally, all pads are placed and connected to the chip core. This design flow is followed by buffer sizing and power estimation. These operations are hierarchical by nature, thus they are performed on all cells from bottom to top, The task is to assign cell output buffer sizes depending on load capacitance and, optionally, timing requirements (considered only during timing driven layout compilation procedure). It is accompanied by a power dissipation calculation, needed for assigning power rail widths. After b d e r sizing, placement is hierarchically adjusted to accommodate new buffers and the final outing is repeated with appropriate power rail sizing. For the bit synchronisation blocks a timing driven compilation [6] is used in order to meet the strict timing constraints. The timing driven physical design uses data from the delay calculator to try to eliminate possible timing violations. The compilation results are checked by the TACTIC timing simulator [9]. The longest path in the bit synchronisation block with 128 M H z sampling is 7.51 nsec (constraint is 7.81 nsec). 64 MKZ sampling constraint is easily met, the block longest path is 8.922 nsec. The rest of the controller blocks are compiled with the EPOCH automatic compile procedure [6], with the ogtimising criteria set to minimal area.

As a layout verification step, the block netlists and the delay information are extracted from the layout and postlayout logic simulation is performd. The results are quite satisfactory. Before entering the final layout phase, the EPOCH floor planning features [101 were tried out on the chip core in order to achieve silicon area savings. By manipulating with larger standard cell groups (changing the aspect ratio, the number of used rows, choosing between horizontal and vertical row implementation) some impressive improvements were possible. The standard cell group is implemented with six vertical rows. There are also three fixed standard cell blocks, while dual ported RAM is placed on the chip right side, occupying most of the silicon areaused. Overall chip area is 3.684x6.918 mm2, i. e. 25.5 mm2. Considering chip complexity and on-chip RAM,the result is satisfactory.

VI Conclusion A fully automatic ASIC design flow is presented using powerful SYNOPSYSEPOCH tool combination. DFT is introduced, reaching high fault coverage, over 95%. The layout compilation procedure has been successful for both, timing constraint and area optimisation. All post-synthesis and post-layout results are verified by timing and logic simulation. References [l] Lipsett, R., Schaefer,.C., Ussery, C., "VHDL: Hardware Description and Design", Kluwer Academic Publishers, Dordrecht, Netherlands, 1989.

[2] Airiau, R., Berge, J., Olive, V., "Circuit Synthesis with VHDL", Kluwer Academic Publishers, Dordrecht, Netherlands, 1994. [3] -, "CMOS Databook", Cascade Design Automation Corporation,November 1994. [SI -,"Synopsys VHDL Compiler Reference Manual Version 3.0",Synop~ysInc., November 1992.

[SI

-, "Design Analyzer Reference Manual Version 3.0",

Synop~rsInc.,December 1992. [6] -,"EPOCH User's Manual", Cascade Resign Automation Covoration, November 1994. 171 -, "Test Compiler and Test Compiler Plus Reference Manual Version 3.0",Synopsys Inc.,December 1992. [8] Abramovici, M 2Breuer, M., Friedman, A., "Digital Systems Testing and Testable Design", Computer Science Press, New York, 1990. [9] -, "Epoch TACTIC User and Reference Manual", Cascade Design Automation Corporation, November 1994. [lo] -,"Epoch Floor Planner - User and Reference Manual", Cascade Design Automation Curpomtiun, November 1994.

740

-

Lihat lebih banyak...

Comentários

Copyright © 2017 DADOSPDF Inc.