Comparing Design Flows for Structural System Level Specifications facing FPGA Platforms

June 1, 2017 | Autor: Antoni Portero | Categoria: System-level design, High Speed
Share Embed


Descrição do Produto

Comparing Design Flows for Structural System Level Specifications facing FPGA Platforms D. Castells, M. Monton, R. Pla, D. Novo, A. Portero, O. Navas, J. Farré, L. Ribas, J.Carrabina

Abstract— System level design methodologies introduce new design flows that are complementary to the ones provided by existing toolsets based on HDLs. Therefore, a miscellaneous of tools and methodologies appeared for the design of complex microelectronic systems driven by different actors playing on the microelectronic arena. This paper compares three different system level design methodologies derived from MATLAB, SystemC and JHDL; together with the classical HDL design (in this case using VHDL). A high-speed sorter, defined at structural level, is used as a common specification to test different methods. Results are presented for the different design phases up to FPGA synthesis. Index Terms— System Level Design Tools, SystemC, MATLAB, JHDL, FPGA synthesis.

T

I. INTRODUCTION

he complexity of microelectronic circuits has been increasing for decades. As the ability to embed more logic into a single chip becomes available, EDA tools provide methods of higher level of abstraction to reduce the time to market of such complex designs. To boost engineer productivity industry and academia have been studying and proposing different techniques: 1. High-level languages for structural representation: in addition to classical HDL languages (VHDL and Verilog), structure can be described in other high level Languages such as C/C++ [2] or Java [5]. 2. Behavioral synthesis from high-level programming languages: starting from behavioral HDL and lately moving to general purpose software languages like C/C++ [2] and Java [3] or scientific focused languages like Matlab [4]. The ideal system would take a software description together with some system constrain rules and convert it to hardware design. 3. Hierarchical graphical representation of system blocks: starting from basic logic elements users can buildup a subsystem that becomes a new graphic block to later be used as a building block of a more complex system, a so recursively until the whole system is complete. Tools like Matlab and some HDL development environments use this feature. 4. Automatic code generators & parametric blocks: such as wizards that let the designer to specify some key parameters to build a predefined

5.

functional block (CPU Soft-Cores, FIFOs, etc. are commonly defined this way). IP reuse: tool vendors and third parties provide packaged existing complex functions for rapid integration in custom designs. Usually IP reuse is used in conjunction with the above feature.

Different methods can coexist in the development of a complex system by subdividing the whole system into smaller subsystems and designing each one with different available methods. However it presents some difficulties and limitations to interface among them. It would be very convenient to use a single methodology and tool. This would ease the work without having to worry about interconnection of different modules and its functional verification before its physical implementation. As designs become more complex, their verification grows in complexity accordingly and verification features get more relevant than ever. Moreover design flow selection must be done before starting the design process, according to some estimation of the relation between application specifications, runtime environment, and the productivity of tools. In this paper we try to underscore the keys of different methodologies.

II. FOCUS ON STRUCTURAL DESIGN During last years, design challenges have revealed as a very important way of moving researchers towards a given focus either from technological or methodological point of view. This work starts on such an internal challenge oriented to have our own criteria in order to select the main design flow to be used for either design and teaching purposes. Following periodic revolutions (HDLs, FPGAs, etc.) we foresee a new re/evolution facing system level specification mainly oriented to make compatible the design of HW and SW from a complete system on a chip. Since the design of microelectronic circuits represents a large area of research we will focus our comparison on designs defined at structural level, due to the fact that this represents the established hardware design methodology. So in our paper we will not consider the topic of behavioral synthesis understanding that the obtained results are only applicable to systems that do not require much sequential control as pointed in [7].

III. DESIGN METHODOLOGIES In order to compare different tools and methodologies we started by choosing which methods to compare. We took 3 design flows derived from tools that are available in the industry (MatlabTM, VHDL and SystemC) and an additional academic design flow (JHDL). According to the skills and technical orientation of our research group, we organized 4 design teams with experienced system engineers. A common structural description of a simple circuit was given to each team in order to measure the productivity of each team and the performance of the final synthesized circuit against a Xilinx Virtex II 3000 FPGA. To compare different methodologies we considered objective points, as code size or development time and also some subjective ones, such as complexity in specification, verification and synthesis.

The system can be described as a simple node (Figure 1) that is replicated 64 times to form the SP (Figure 2).

Key0

Keyi-1

0

Loadi-1

1

Datai-1

0

1

E

D Register

Data D Register E Q

Q

Data0

Data Node 0

Key

Node 1

...

Node 63

Key64

IV. STRUCTURAL DESIGN SPECIFICATION The specification given is a new design of a sort processor (SP) that sorts 64 pairs of (key, data) in a descending order of the key. The specification is given as a block diagram. It is indeed a very structural specification. Also a C implementation of the presented algorithm is given in order to be able to do a functional verification together with an input stimuli file. The structure of the system is very similar to a bitparallel, word-parallel associative processor (AP) as presented in [8] with the particularity that comparators are “less than” rather than “equal” comparators. Another design particularities are that different words of memory compose a large shift register, and that the control unit of the AP is used to update the contents of each memory position and perform a shift downwards.

Load0

Load64

Data64

Figure 2 Sort Processor Specification V. MATLAB DESIGN FLOW TM

Matlab is a well-known tool in scientific field. For some time the Simulink package has the ability to describe systems, integrating technology oriented logic blocks (coming from FPGA vendors) up to the generation of the corresponding HDL files. For final Place & Route, external tools like Xilinx ISE must be used although verification including synthesized HW or FPGA downloaded code can be performed at Matlab level. The first step in the design flow is to enter the input specification using Matlab Simulink. This design entry is performed using graphical blocks that represent each logic function. As the design is simple enough the logic blocks are directly available from standard libraries of System Generator Toolbox. In Matlab Simulink, clock signal is abstracted from the different elements, as shown in the design of the SP Node (Figure 3). Note that the blocks used are dependent on a FPGA technology. So an early decision must be made between different FPGAs manufacturers (by now only Xilinx or Altera).

<

Key

Keyi

Loadi

Datai

Figure 1 Sort Processor Node Specification Pairs of (key, data) are fed to SP every clock cycle. Pairs are inserted in its corresponding memory position depending on the value of the key, shifting all memory positions below the insertion point downwards. After 64 clock cycles SP registers contains the input set sorted by the key in descending order.

Figure 3 Sorter Node Matlab design

To verify the circuit behavior an interactive test bench can be built at Matlab level (Figure 4). It is very easy to insert scopes and complex functions familiar to Matlab users (such as random number generators) and graphical interface components (such as switches) that provide userfriendly interfaces to speed up verification. In this case scopes are placed to watch how inputs and outputs are varying. Since the values of internal SP Node registers are block outputs, we can watch the whole state of the subsystem during simulation. We should note that it would be difficult to watch the values of internal node signals without varying the outputs of the Node. This can suppose a problem for larger systems since the design must be specially designed for testability.

circuit based on primitive logic units. In order to ensure interoperability between VHDL tools is common practice to work with source code rather than graphical representations, and write RTL code for basic modules of the system and structural code for higher-level modules. For these reason the VHDL team considered that describing the basic SP node in a RTL way and extracting its equivalent structural circuit with Synplify to verify that matches the given specification is faster than describing the circuit in a structural way. Figure 5 shows the graphical representation of the entered design that, in effect, matches the given specification. Here the clock is present and not abstracted like in Matlab.

Figure 5 Inferred structure of the SP Node from VHDL

Figure 4 Matlab simulation test bench We instantiate 64 simple SP Nodes to create a whole SP Processor. This can be done by replicating the simple entity and connecting its wires in a convenient way. However this must be performed through the user interface and there is no programmatic mechanism to help in this task but using copy & paste or bus node naming. Depending on the graphical complexity of the circuit this could be a error-prone task. Hierarchical structures are better handled than large flattened ones. Once SP design is complete a new verification must be done. This time the file containing (key, data) pairs is used as stimuli to the circuit. Matlab provides I/O functions so the test bench can be programmed using Matlab programming language in less than 10 lines of code. The last step in Matlab design flow is to use System generator to generate a HDL language translation of the design. The resulting output is used as an input to the physical synthesis tool, in this case the Xilinx ISE platform. VI. VHDL DESIGN FLOW VHDL tools usually provide means to design a system by connecting logic blocks and wires through a graphical user interface. Nevertheless the file format used to store these graphical definitions is not defined by the standard, so most tools store them in a proprietary format giving the chance to generate standard VHDL from them. VHDL can describe systems in either structural or behavioral way, in case of having a behavioral description there is no direct graphical representation of the described system other than a box with input and output pins, although some tools like Synplicity Synplify can infer an equivalent

The simplicity of the SP Node allows a rather simple verification using input stimuli defined as behavioral VHDL code. The stimuli are used as input for a simulation run and the obtained output waveform is analyzed to ensure that produces the expected results. After the verification of SP Node the SP processor is designed with some additional VHDL source code. As the design is very regular we can exploit the use of for loops to create the system with few lines of code. To verify the whole SP processor a new behavioral VHDL code must be added. This time the source code must read the stimuli from an input file. VHDL also provides some functions for I/O and file managing so the test bench becomes a quite simple piece of code.. The simulation run produces an output waveform that can be analyzed to verify that the circuit is valid. Finally place & route is performed within the same design tool eliminating the need of importing external like in alternative methodologies. VII. SYSTEMC DESIGN FLOW SystemC [2] was proposed by EDA manufacturers to address the limitations of existing HDL tools by offering a common design methodology for hardware and software. SystemC uses a reputed software programming language like C++ to describe the structure or behavior of hardware elements of a System. Since software in SoC environments is usually described as C/C++ the result is having a single description language for both hardware and software. Like VHDL designers, our SystemC designers feel more comfortable writing behavioral code for simple circuits than entering structural descriptions. So the first step in the SystemC design flow was to develop a behavioral representation of the SP Node (Figure 6), which is done in few lines of code in a C++ development environment like gcc. The behavioral coding rules used for the design are the

same as are explained in [9] and most of the troubles that were found are similar to [10]. // score_node.cc #include "score_node.h"

//score_node.h #include "systemc.h" const int WORD_SIZE = 32; const int PAGE_SIZE = 96; SC_MODULE (score_node) { sc_in_clk clk; sc_in reset_n; sc_in new_score; sc_in new_data; sc_in prev_score; sc_in prev_data; sc_in load_prev; sc_out score; sc_out data; sc_out load_next; bool load_new; sc_uint actual_score; sc_biguintactual_data;

void score_node::prc_score_node(){ if(reset_n==true){ score = 0; data = 0; } while(true){ if (load_prev == true) { load_new = false; load_next = true; }else if (new_score > actual_score) { load_new = true; load_next = true; }else { load_new = false; load_next = false; }

void prc_score_node();

if (load_prev == true) { actual_score = prev_score; actual_data = prev_data;

SC_CTOR(score_node){ SC_CTHREAD(prc_score_node,clk.pos()); watching (reset_n.delayed()== true); } };

};

}

} else if (load_new == true) { actual_score = new_score; actual_data = new_data; }else { actual_score = actual_score; actual_data = actual_data; } score = actual_score; data = actual_data; wait();

the conversion from generic logic primitives into specific technology when appropriate. By now, mainly Xilinx devices are supported, but this is not the result of any restriction of the language. Designs can be synthesized to produce an EDIF netlist that can later be used as input to Xilinx ISE to perform Place & Route. JHDL does no provide any graphical user interface to build up a system through the connection of simple graphical blocks. It is necessary to write Java code representing the structure of the SP Node. Nevertheless a graphical representation (Figure 7) is generated from the entered source code.

Figure 6 SP Node fragment To verify the design a new C++ class is developed to feed appropriate stimuli to the SP Node. Outputs of the circuit can be watched as debug messages to standard output or as more elaborated output in form of waveforms. The simulation is performed by running or debugging the developed test bench. Once the verification of the SP Node is complete the SP Processor can be designed with an additional C++ class. The new class has also a low complexity thanks to the use of for loops. A test bench to verify of the whole SP Processor is developed to extract data from input stimuli file and feed the SP Processor. C++ has plenty of functions to perform I/O and file operations, so this becomes an easy task for an experienced developer. After the design is complete and verified the synthesis of the design can be performed using tools like Synopsys Cocentric SystemC Compiler. Only a subset of the SystemC standard is synthesizable, so at this stage we can find that parts of our design are not directly synthesizable. To address this inconvenient we have to find which are the synthesizable equivalent constructs for our design until no error is found. When synthesis has finished successfully, the resulting Netlist is used as input to the final Place & Route phase performed by Xilinx ISE.

Figure 7 Node design in JHDL Note that the graphical representation of the circuit includes the values of the signals. This is also used in the simulation to watch different signal values. To verify the design an interactive simulation can be performed using the dbt tool (Figure 8). In dbt there is a command line in which the user can enter different commands like setting values in the various inputs or performing a clock cycle execution. Signal values can be interactively watched in the circuit representation view (in a graphical way), in a dbt pane (in a tabular way) or in a waveform viewer.

VIII. JHDL DESIGN FLOW

Figure 8 Simulation environment of JHDL

JHDL was proposed by members of Brigham Young University [5] as a Java based language for the structural description of FPGA designs. The principle of JHDL is to define logic elements as Java classes. In JHDL the instantiation of a particular logic block is seen as the instantiating of an object from a given class that represents that block. Wires between logic blocks are also represented by instances of the Wire class. Logic blocks are either generic or dependent on the underlying hardware. An entity called Techmapper performs

After the SP Node verification the final SP processor is build with a new Java Class that instantiates objects from the predefined SP Node class. Like in VHDL using for loops let to a compact small piece of code. To verify the correctness of the whole SP design we create a new Java class that extends the standard TestBench class. The job of this class is to extract data from the stimuli file and feed it into the circuit, while maintaining all the features already present in dbt simulation (interactive viewing on signals, etc). Reading data from a file is a trivial

IX. RESULTS Different results can be extracted from this design methodology comparison, being aware that it is very difficult to measure subjective faculties like speed for design entry, tool training and ability, especially for such a brief experiment. Development time is becoming more and more the driving factor to determine the system cost, so we present the different development times from each team (Figure 9) highlighting the different periods spent in each phase of the project as we divided the project in the following phases: 1. SP Node development: design the simple SP node either graphically or programmatically depending on the language. 2. SP Node verification: generate SP Node stimuli (including the development of stimuli generators) and verify the circuit by simulation runs. 3. SP Processor development: design of the whole SP processor either graphically or programmatically. 4. SP Processor verification: develop the parser of the given stimuli file to feed the SP Processor and verify the circuit by simulation runs. 5. Place & Route: obtain the results from previous phases and import them in Xilinx ISE for place & route. The shorter development times were achieved by the Matlab and JHDL teams, followed by VHDL and SystemC teams. It is remarkable that about half of the time was spent in verification; especially VHDL team dedicated 67% of the time programming test benches and running simulations. 5 4,5 4 2 3,5 0,1

hours

3 2,5

0,2

1,5

0,2

1 2

1

1,5

1 0,5

0,5

0,75

0,75

0,5

0,5

0,5

0,5

Matlab

SystemC

VHDL

JHDL

0,5 1 0,5

0,5 0,5

0,5 0

SP Node SP Processor Synthesis

SP Node Simulation SP Processor Simulation

Figure 9 Development Time

Also note that not considering the time spent in synthesis of the SystemC team all teams are faster than VHDL team. Another key point is the source code size. As the Matlab methodology works with graphical blocks rather than source code, we include both source code line count and graphical block count (including blocks, pins and wires) (Figure 10) used in the design. Although it is not possible to compare graphical blocks with source code we get an idea of the complexity of the Matlab design due to its flattened structure. 600

500

40 180

400 lines / blocks

operation in Java, so the test bench results in a simple Java class. Once verification is complete an appropriate Techmapper is specified to generate an EDIF netlist for the target FPGA. Finally the resulting .edn file is used in Xilinx ISE to perform the Place & Route.

146

300

110

390

135

200

123

100

35

135

44 0

93

86

0 51

SystemC

VHDL

JHDL

28 Matlab

SP Node

105

SP Node Simulation

SP Processor

SP Processor Simulation

Figure 10 Source Code Lines JHDL design becomes the simpler source code based design, having less than 300 lines of code in total. This is so because of the lack of source code for unit testing thanks to Interactive Simulation features. VHDL is the larger design with more than 500. But note that again the size of the VHDL code is the result of the simulation test benches contribution, which are more complex to describe than in the alternative methodologies. Some additional aspects (Table 1) are considered to highlight some features not cited before. The SystemC initiative, proposed by some EDA manufacturers as a replacement for VHDL, fails to provide a graphical design entry tool in favor of Matlab, which only provides this method for the design entry of system level modules. Matlab has the drawbacks of technology early binding, the lack of behavioral design and the need to define where to put scopes before a simulation run. Interactive simulation is only possible with JHDL and Matlab, moreover JHDL allows to inspect any part of the system at any time during the simulation run. Finally the complexity of the design flows, as perceived by the designers, put Matlab as the simpler method followed closely by JHDL and VHDL. SystemC fails to offer an easy to learn replacement of VHDL due the use extensive use of advanced software programming topics (such as template classes, macros, threads, etc) not familiar to hardware system designers.

Matlab Matlab

Needed Skills

Matlab Simulink, Xilinx ISE

Used Tools Structural design Graphical design Behavioral design Simulation Interactive Simulation CoSimulation “Watch” everywhere Technology independent Subjective Complexity

Yes Yes No Yes Yes Yes No

Methodology SystemC VHDL C++ VHDL Gcc, Synopsys Simplify, Compiler, Xilinx ISE Xilinx ISE Yes Yes No Yes Yes Yes Yes Yes No No Yes No No No

JHDL Java Netbeans IDE, Xilinx ISE Yes No No Yes Yes Yes Yes

No

Yes

Yes

Yes

REFERENCES

3

7

5

4

[1] The Mathworks Homepage, www.mathworks.com. [2] The SystemC Initiative, www.systemc.org [3] J.Cardoso and H.Neto. An Approach to Hardware Synthesis from a System JavaTM Specification. In Proceedings of WDTA'98, pages 149-152, Dubrovnik, June 1998. [4] Malay Haldar , Anshuman Nayak , Alok Choudhary , Prith Banerjee, A system for synthesizing optimized FPGA hardware from MATLAB, Proceedings of the 2001 IEEE/ACM international conference on Computer-aided design, November 04-08, 2001, San Jose, California [5] Bellows and B. Hutchings, "JHDL - An HDL for Reconfigurable Systems," in IEEE Symposium on FieldProgrammable Custom Computing Machines, April 1998. [6] B. Hutchings et al. A CAD suite for high-performance FPGA design. In IEEE Symposium on FPGAs for Custom Computing Machines, pages 12-24. IEEE Computer Society Press, 1999. [7] K. Wakabayashi and T. Okamoto. C-Based SoC Design Flow and EDA tools: An ASIC and System Vendor Perspective. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, December 2000. [8] A. Krikelis and C.C. Weems, AssociativeProcessing and Processors, IEEE ComputerSociety, 1997 [9] Synopsys online Documentation “Cocentric SystemC Compiler Behavioral User and Modeling Guide » [10] Francesco Brushi, Fabrizio Ferrandi, “Synthesis of complex control structiures from behavioral SystemC models”, Proceedings of the Design Automation and Test in Europe Conference and Exhibition, DATE`03.

Table 1 Additional aspects Finally the results of the place & route phase of the different designs are presented [Table 2]. The target device is a Xilinx Virtex II 3000 (-6). It is very interesting to see that results do not show any important difference among different tools and design methodologies in terms of area occupancy. Regarding circuit maximum frequency, there are more relevant differences that could be attributed to the maturity of optimizing synthesizers from older tools. Therefore, we can conclude that when area is the main design goal the decision of which tool to use is driven by factors as team experience and productivity, tools cost, system-level integration or other application-driven or even subjective criteria. On the contrary if speed is the main goal classical HDL methodology still produces the best results compared to alternative methodologies. . Methodology Slice FF Slice 4-input LUTS Equivalent Gate Count Max. Freq.

Matlab 7968 (27%) 10128 (35%) 132451 73.567 Mhz

SystemC 8096 (28%) 10240 (35%) 133123 62.980 Mhz

VHDL 8096 (28%) 10240 (35%) 133123 92.302 Mhz

JHDL 8192 (28%) 10240 (35%) 133123 60.157 Mhz

Table 2 Place & Route results Combining equivalent gate count and development time we get the productivity index of the analyzed methodologies (Table 3) in Gates per hour.

Gates / hour

Matlab 49 K

from newer methodologies and tools (with respect to HDL based methodologies) are not definitely proven although some benefits such as better simulation facilities are clearly shown. The toolset that exploit most techniques presented in the introduction and has all features presented in [Table 1] is yet to be appeared, but at this stage Matlab and JHDL design flows for structural specification designs achieve more productivity than SystemC, and classical HDL flows. Finally although all methodologies produce similar area occupancy results, VHDL methodology produces better results in terms of maximum frequency.

Methodology SystemC VHDL 28 K 39 K

JHDL 49 K

Table 3 Productivity index X. CONCLUSIONS This work has tried to review several commonly available methodologies in order to evaluate their productivity and capabilities to capture, verify and synthesize structural HW designs. Although the current trend is to have behavioral specifications rather than structural ones, we address the later case. In this context the promises of productivity boost

Lihat lebih banyak...

Comentários

Copyright © 2017 DADOSPDF Inc.