Digital Design for System-on-Chip - Assignment 3

July 27, 2017 | Autor: Konstantinos Drosos | Categoria: Digital design, Vhdl , Verilog Hdl, Verilog, Verilog HDL

Descrição do Produto

ELEC5563: ASSIGNMENT 3 ELEC5563: DIGITAL DESIGN SYSTEM ON CHIP

Konstantinos Drosos [200892224] MSC MECHATRONICS AND ROBOTICS | UNIVERSITY OF LEEDS

Exercise 1 Part a The addition of two binary numbers in parallel implies that all the bits of the augend and addend are available for computation at the same time. As in any combinational circuit, the signal must propagate through the gates before the correct output sum is available in the output terminals. The total propagation time is equal to the propagation delay of a typical gate, times the number of gate levels in the circuit. The longest propagation delay time in an adder is the time it takes the carry to propagate through the full adders. Since each bit of the sum output depends on the value of the input carry, the value of Si at any given stage in the adder will be in its steady-state final value only after the input carry to that stage has been propagated. The propagation delay increases when the number of full-adders increases. The carry propagation time is an important attribute of the adder because it limits the speed with which two numbers are added. Although the adder—or, for that matter, any combinational circuit—will always have some value at its output terminals, the outputs will not be correct unless the signals are given enough time to propagate through the gates connected from the inputs to the outputs. Since all other arithmetic operations are implemented by successive additions, the time consumed during the addition process is critical [1]. The source file for the implementation of the 4-bit ripple carry adder in Quartus provided in Appendix A.

Figure 1. 4-bit ripple carry adder

Part b

Figure 2. Full Adder Circuit [1]

An obvious solution for reducing the carry propagation delay time is to employ faster gates with reduced delays. However, physical circuits have a limit to their capability. Another solution is to increase the complexity of the equipment in such a way that the carry delay time is reduced. There are several techniques for reducing the carry propagation time in a parallel adder. The most widely used technique employs the principle of carry lookahead logic [1]. Let’s consider the circuit of the full adder shown in figure 2. If we define two new binary variables 𝑃𝑖 = 𝐴𝑖 ⊕ 𝐵𝑖 𝐺𝑖 = 𝐴𝑖 𝐵𝑖 1|Page

The output sum and carry can respectively be expressed as 𝑆𝑖 = 𝑃𝑖 ⊕ 𝐶𝑖 𝐶𝑖+1 = 𝐺𝑖 + 𝑃𝑖 𝐶𝑖 𝐺𝑖 is called a carry generate, and it produces a carry of 1 when both 𝐴𝑖 and 𝐵𝑖 are 1, regardless of the input carry 𝐶𝑖 . 𝑃𝑖 is called a carry propagate, because it determines whether a carry into stage i will propagate into stage i + 1 (i.e., whether an assertion of Ci will propagate to an assertion of Ci+1 ). We now write the Boolean functions for the carry outputs of each stage and substitute the value of each Ci from the previous equations: 𝐶0 = 𝑖𝑛𝑝𝑢𝑡 𝑐𝑎𝑟𝑟𝑦 𝐶1 = 𝐺0 + 𝑃0 𝐶0 𝐶2 = 𝐺1 + 𝑃1 𝐶1 = 𝐺1 + 𝑃1 (𝐺0 + 𝑃0 𝐶0 ) = 𝐺1 + 𝑃1 𝐺0 + 𝑃1 𝑃0 𝐶0 Since the Boolean function for each output carry is expressed in sum-of-products form, each function can be implemented with one level of AND gates followed by an OR gate (or by a two-level NAND). The three Boolean functions for C1, C2, and C3 are implemented in the carry lookahead generator shown in figure 3 below. Note that this circuit can add in less time because C3 does not have to wait for C2 and C1 to propagate; in fact, C3 is propagated at the same time as C1 and C2. This gain in speed of operation is achieved at the expense of additional complexity (hardware) [1].

Figure 3. Logic Circuit of Carry Lookahead Generator [1]

2|Page

The construction of a four-bit adder with a carry lookahead scheme is shown in figure 4. Each sum output requires two exclusive-OR gates. The output of the first exclusive-OR gate generates the Pi variable, and the AND gate generates the Gi variable. The carries are propagated through the carry lookahead generator (similar to that in figure 3) and applied as inputs to the second exclusive-OR gate. All output carries are generated after a delay through two levels of gates. Thus, outputs S1 through S3 have equal propagation delay times. The two-level circuit for the output carry C4 is not shown. This circuit can easily be derived by the equation-substitution method [1].

Figure 4. 4-bit adder with carry lookahead [1]

3|Page

A carry select adder is an arithmetic combinational logic circuit which adds two N-bit binary numbers and outputs their N-bit binary sum and a 1-bit carry. This is no different from a ripple carry adder in function, but in design the carry select adder does not propagate the carry through as many full adders as the ripple carry adder does. This means that the time to add two numbers should be shorter. The idea behind a N-bit carry select adder is to avoid propagating the the carry from bit to bit in sequence. If we have two adders in parallel: one with a carry input of 0, the other with a carry input of 1, then we could use the actual carry input generated to select between the outputs of the two parallel adders. This means all adders could be performing their calculations in parallel. Having two adders for each result bit is quite wasteful so we could configure the N-bit adder to use 2*N/M-1 M-bit ripple carry adders in parallel. Note that the adder for the least significant bits will always have a carry input of 0 so no parallel addition is needed in this case. The following example represents an 8-bit carry select adder could comprise three 4-bit ripple carry adders: one would calculate the sum and carry for the low nibble sum (bits 0 to 3), and the other two would calculate the high nibble sum and carry (bits 4 to 7). All adders would calculate in parallel. We could then use the low nibble carry output as a selector for a multiplexer that would choose the correct results from the high nibble sums and carries. The following circuit (figure 5) provide the solution to the limitations of the carry ripple design. The source files that used to implement the carry select adder can be found in Appendix A.

Figure 5. Carry Select Adder

In binary system, carry either can be 0 or 1.only two possibility of value provides feasibility for choosing between two. So if there is a mechanism to select carry, or better say, skipping carry through several stages, then delay minimization can be better obtained. all the signals are computed at initial stage and are available to user, the term signifies if there is a carry generation at 1st stage, and is propagated till last stage, the worst case delay can be avoided if propagated carry is selected at initial stage itself, and skipped through remaining stages. Complexity may increase due to multiplexer operation, however multiplexers can be implemented by simple CMOS inverter by selection of proper inputs to mux. 4|Page

As discussed in ripple carry adder, the significant delay produced due to ripple operation is a trade-off. This can be minimized when carry in to all stages are computed initially itself, as it will minimize the wait for carry at every stage in a n bit adder. Finally, for the carry lookahead adder carries are computed for all the stages, as long as inputs adder, but complexity of hardware increases, here architecture may be easier compared to ripple carry adder and also wire length for inputs to reach out for gates increases. So as the number of bits goes on increasing the benefits of carry look ahead adder diminishes, however it is better than ripple carry adder up to 12 bits addition operation. The following figure (figure 6) represents the carry lookahead adder that generated by using Quartus software. The general circuit of the carry lookahead adder presented in page 3, figure 4. The source file that used provided in the Appendix.

Figure 6. 4-bit adder with carry lookahead Table 1: Performance Analysis of Various Adders

No.

Design

1. 2. 3.

Ripple Carry Adder Carry Look Ahead Adder Carry Select Adder

Area (LUT’s) 8 10 8

Area (Slices)

Delay (ns)

5 5 5

2.191 2.266 2.588

5|Page

Part c The design methods of the carry ripple, carry select and carry lookahead adders compared with an adder that deduced from the EDA synthesis tool. This deduced adder implemented in order to simulate a 128-bit adder so that we could check if we can use the technique that examined before to build very large circuits. From the experiments that conducted the final results show that the 128-bit adder that deduced from the EDA tool has a good performance as well as high complexity compared to the other adders. These two aspects of the adder arise from the configuration of the logic elements and the use of dedicated routing for the carry chain utilised by the EDA tool to create the adder. According to the experiments, the carry select adder provided the best performance but in the cost of more complexity of the final design. In addition, by comparing the output results from table 1 in page 5 about the 4-bit adder and the output results from this experiment about the 128-bit adder it is obvious that the carry lookahead gate level design had the worst performance in comparison with the carry select and carry ripple adder. For such experiments on FPGAs, the better solution is to describe the operation of the adders in a behavioural manner and then let the EDA synthesis tool to choose the best implementation. The following table shows the differences between the adders that used to implement the 128-bit adders. It is obvious from the table that the carry lookahead adder has the largest propagation delay. Table 2: Performance Analysis of Various 128-bit Adders

No. 1. 2. 3.

Design Ripple Carry Adder Carry Look Ahead Adder Carry Select Adder

Elements 320 320 429

Delay (ns) 160.17 180.65 109.32

Exercise 2 Part a This exercise requires to write a Verilog code that will add two single BCD digits together. The output of the circuit should also be in form of BCD digits. Sometimes it is useful to use a digital number format that can represent each individual decimal digit using a binary code; BCD is a format that allows this. BCD uses four binary digits to represent each individual decimal digit as shown in Table 3. Table 3: BCD code values, listed with their unsigned decimal equivalents

BCD Binary Code 1010-1111 1001 1000 0111 0110 0101 0100 0011 0010 0001 0000

Decimal Digit Value Undefined 9 8 7 6 5 4 3 2 1 0

Historically BCD was used extensively in circuits that used seven segment LED displays or similar in such circuits BCD was often easier to use than 'pure' binary, as it is 'closer' to the real system: with BCD you typically have a single circuit for each 'digit' in your digital display. Conversion between binary and a format 6|Page

suitable for use with such digital displays was typically an expensive operation. Typically systems could be made simpler overall by performing all calculations in BCD. Finally, the source file that implements the BCD adder can be found in Appendix B at the end of this document. On the following sections a brief discussion will follow about the functional simulation of the BCD adder and the circuit of this adder will be presented as well.

Figure 7. BCD Adder generated from Quartus software

Figure 8. Functional Simulation of the BCD Adder

On the figure above (figure 8) we will examine the two cases that indicated in the figure. For both cases a brief discussion will take place in order to explain thoroughly how the system operates. Case A – From 0 to 4 According to the above figure the case A starts from 0 and finish at 4. Let’s consider the first number, which is (0000)2=010. For this occasion X=0, Y=0 and cin=0. Thus, the output S will be equal to 0 (S=0). The cout output will always be 0 for the overall values that are less than 10. For example, if (X+Y+cin) 9 𝑡ℎ𝑒𝑛 𝑤𝑒 𝑤𝑖𝑙𝑙 𝑎𝑑𝑑 𝑡ℎ𝑒 𝑛𝑢𝑚𝑏𝑒𝑟 6 𝑡𝑜 𝑡ℎ𝑒 𝑜𝑣𝑒𝑟𝑎𝑙𝑙 𝑟𝑒𝑠𝑢𝑙𝑡. 𝑇ℎ𝑢𝑠, 𝑍 = 01012 + 01012 + 00012 = 10112 = 1110 𝑍 = 0000 10112 + 0000 01102 = 0001 00012 = 1110 In the equation above, the red digit indicates the cout output value and the blue digits indicate the S output value. By summarising the red digits and blue digits we get the correct output value. Hence, the adder operates properly for this case. According to the above figure the case B starts from 5 and finish at 8. Let’s consider the number (0110)2=610. For this occasion X=6, Y=6 and cin=0. Thus the output S will be equal to 12 (S=12). According to figure 8, S=(0010)2=210. In this case, the cout is equal to 1 since (X+Y+cin)>10. In addition, cout acts as an overflow bit, which in this case has the value (0001)2=110, or in other words the cout is active high. As mentioned above, to convert from binary to BCD we split our bit string into individual nibbles. We then stop and examine each individual nibble. If that nibble is greater than nine, we then add six. Therefore, in this case we examine each individual number by following the method below. 𝑍 = 𝑋 + 𝑌 + 𝑐𝑖𝑛 = 6 + 6 + 0 = 12, 𝑠𝑖𝑛𝑐𝑒 𝑍 > 9 𝑡ℎ𝑒𝑛 𝑤𝑒 𝑤𝑖𝑙𝑙 𝑎𝑑𝑑 𝑡ℎ𝑒 𝑛𝑢𝑚𝑏𝑒𝑟 6 𝑡𝑜 𝑡ℎ𝑒 𝑜𝑣𝑒𝑟𝑎𝑙𝑙 𝑟𝑒𝑠𝑢𝑙𝑡. 𝑇ℎ𝑢𝑠, 𝑍 = 01102 + 01102 + 00002 = 11002 = 1210 𝑍 = 0000 11002 + 0000 01102 = 0001 00102 = 1210 In the equation above, the red digits indicates the cout output value and the blue digits indicate the S output value. By summarising the red digits and blue digits we get the correct output value. Hence, the adder operates properly for this case. According to the above figure the case B starts from 5 and finish at 8. Let’s consider the number (0110)2=610. For this occasion X=6, Y=6 and cin=0. Thus the output S will be equal to 12 (S=12). According to figure 8, S=(0010)2=210. In this case, the cout is equal to 1 since (X+Y+cin)>10. In addition, cout acts as an overflow bit, which in this case has the value (0001)2=110, or in other words the cout is active high. As mentioned above, to convert from binary to BCD we split our bit string into individual nibbles. We then stop and examine each individual nibble. If that nibble is greater than nine, we then add six. Therefore, in this case we examine each individual number by following the method below. 𝑍 = 𝑋 + 𝑌 + 𝑐𝑖𝑛 = 6 + 6 + 0 = 12, 𝑠𝑖𝑛𝑐𝑒 𝑍 > 9 𝑡ℎ𝑒𝑛 𝑤𝑒 𝑤𝑖𝑙𝑙 𝑎𝑑𝑑 𝑡ℎ𝑒 𝑛𝑢𝑚𝑏𝑒𝑟 6 𝑡𝑜 𝑡ℎ𝑒 𝑜𝑣𝑒𝑟𝑎𝑙𝑙 𝑟𝑒𝑠𝑢𝑙𝑡. 𝑇ℎ𝑢𝑠, 𝑍 = 01102 + 01102 + 00002 = 11002 = 1210 𝑍 = 0000 11002 + 0000 01102 = 0001 00102 = 1210 In the equation above, the red digits indicates the cout output value and the blue digits indicate the S output value. By putting together the red digits and blue digits we get the correct output value. Hence, the adder operates properly for this case. 8|Page

Part b

Table 4: Table truth for the 9’s complement

Inputs ABCD 0000 0001 0010 0011 0100 0101 0110 0111 1000 1001

Outputs wxyz 1001 1000 0111 0110 0101 0100 0011 0010 0001 0000

𝑑(𝐴, 𝑏, 𝑐, 𝑑) = Σ(10, 11, 12, 13, 14, 15) 9|Page

The following circuit shows the circuit design that generates the 9’s complement of a BCD digit which generated by using the Quartus software. For the source file that has been used to implement and simulate this design, refer to Appendix B at the end of this document.

Figure 9. Circuit design that generates the 9’s complement of a BCD digit

Figure 10. Functional Simulation of the 9’s complement circuit

10 | P a g e

According to Karnaugh maps in page 9 and the functions that determined from the Karnaugh maps, we have the following circuit.

Figure 11. Circuit design that generates the 9’s complement of a BCD digit with respect to the Karnaugh maps

Part c

Figure 12. Circuit generated by using the Quartus Software

Once the circuit generated by using the Quartus software, then the simulation should take place in order to identify if the circuit is operating properly and provide all the appropriate outputs according to the mode that will be selected. 11 | P a g e

The subtraction of unsigned binary numbers can be done most conveniently by means of complements. Remember that the subtraction A - B can be done by taking the 2’s complement of B and adding it to A. The 2’s complement can be obtained by taking the 1’s complement and adding 1 to the least significant pair of bits. The 1’s complement can be implemented with inverters, and a 1 can be added to the sum through the input carry. The circuit for subtracting A - B consists of an adder with inverters placed between each data input B and the corresponding input of the full adder. The input carry C0 must be equal to 1 when subtraction is performed. The operation thus performed becomes A, plus the 1’s complement of B, plus 1. This is equal to A plus the 2’s complement of B. For unsigned numbers, that gives A - B if A ≥ B or the 2’s complement of (BA) if A < B. For signed numbers, the result is A - B, provided that there is no overflow [1]. The addition and subtraction operations can be combined into one circuit with one common binary adder by including an exclusive-OR gate with each full adder. A 4-bit adder–subtractor circuit is shown in figure 13. The mode input M controls the operation. When M = 0, the circuit is an adder, and when M = 1, the circuit becomes a subtractor. Each exclusive-OR gate receives input M and one of the inputs of B. When M = 0, we have B ⊕ 0 = B. The full adders receive the value of B, the input carry is 0, and the circuit performs A plus B . When M = 1, we have B ⊕ 1 = B’ and C0 = 1. The B inputs are all complemented and a 1 is added through the input carry. The circuit performs the operation A plus the 2’s complement of B. (The exclusive-OR with output V is for detecting an overflow) [1]. It is worth noting that binary numbers in the signed-complement system are added and subtracted by the same basic addition and subtraction rules as are unsigned numbers. Therefore, computers need only one common hardware circuit to handle both types of arithmetic. The user or programmer must interpret the results of such addition or subtraction differently, depending on whether it is assumed that the numbers are signed or unsigned.

Figure 13. 4-bit adder-subtractor where V indicates the overflow detection [1]

12 | P a g e

Figure 14. Functional Simulation that performs the addition of the inputs since the Mode=0 (Low)

In the figure above, we are able to check the output results from the functional simulation that performs the addition of the inputs since the Mode is equal to 0 and according to the previous description, when Mode=0 then the circuit acts as an adder. Since the BCD output is only a 4-bit number, then a Carry Borrow bit will be used to help to represent the final output value. Taking as an example the first number from the above figure, we have that: A + B = 5 + 11 = 16 According to the figure 14, the output is 6 and the Carry Borrow bit is 1(High). Therefore the combination of these two individual bits provide the overall output value of 16 that found before.

Figure 15. Functional Simulation that performs the addition of the inputs since the Mode=1 (High)

According to this figure, the circuit operates appropriately and in this case the Carry_Borrow output indicates the sign bit.

13 | P a g e

Exercise 3 Part a The first block will be implemented as a hald-adder circuit. The circuit will be as indicated on the following figure. The source file can be found in Appendix C at the end of this document.

Figure 16. Half adder Circuit

Once the compilation of this circuit finished, then the project imported into QSim to simulate the circuit and check its functional correctness. The following figure shows the functional implementation of the half-adder.

Figure 17. Functional Simulation of the half-adder

Part b The second block will be implemented as a full-adder circuit. The circuit will be as indicated on the following figure. The source file can be found in Appendix C at the end of this document.

Figure 18. Full-adder Circuit

14 | P a g e

Once the compilation of this circuit finished, then the project imported into QSim to simulate the circuit and check its functional correctness. The following figure shows the functional implementation of the full-adder.

Figure 19. Functional Simulation of the full-adder

Part c The following figure represents the 4x4 multiplier which implemented by using the modules created in part a and part b. The source file about this implementation can be found in Appendix C at the end of this document.

Figure 20. 4x4 multiplier circuit

The following figure (figure 21) represents the functional simulation of the 4x4 multiplier. The functional simulation provide all the appropriate results, however, for the timing simulation there was large delay at the outputs which could be resolved by using pipelining. 15 | P a g e

Figure 21. Functional Simulation of the 4x4 multiplier

Part d The primary goal during synthesis of digital signal processing (DSP) circuits is to minimize the hardware area while meeting a minimum throughput constraint. In field programmable gate array (FPGA) implementations, significant area savings can be achieved by using slower, more area efficient circuit modules and/or by timemultiplexing faster, larger circuit modules. Unfortunately, manual exploration of this design space is impractical. Pipelining is a common design technique used to increase the throughput of digital circuits. Pipelining is particularly important for field-programmable gate array (FPGA) circuits because of the relatively long combinational delay of logic functions and interconnect. Most high-performance FPGA circuits employ some form of pipelining. In most cases, pipelining can be implemented without significant additional FPGA resources due to the plethora of programmable registers found in most modern FPGA architectures. A pipelined circuit can run at a faster clock rate. For many digital signal processing (DSP) designs, however, simply increasing the clock speed does not significantly improve the design. In many medium-rate DSP applications, the design goal is to minimize the total system area cost by mapping the computation onto the smallest (i.e., cheapest) possible FPGA device. Currently, designers manage this problem by manually selecting more area-efficient arithmetic implementations and/or constructing control logic to time share multiple operations on a single arithmetic unit. This manual design exploration significantly increases the complexity of the design process and increases the chance of inserting errors into the design [2]. The most straightforward way to get more performance out of a processing unit is to speed up the clock (setting aside, for the moment, fully asynchronous designs, which one doesn't find in this space for a number of reasons). Some very early computers even had a knob to continuously adjust the clock rate to match the program being run. But there are, of course, physical limitations on the rate at which operations can be performed. The act of fetching, decoding, and executing instructions is rather complex, even for a deliberately simplified instruction set, and there is a lot of sequentiality. There will be some minimum number of sequential gates, and thus, for a given gate delay, a minimum execution time, T(emin). By saving intermediate results of substages of execution in latches, and clocking those latches as well as the CPU inputs/outputs, execution of multiple instructions can be overlapped. Total time for the execution of a single instruction is no less, and in 16 | P a g e

fact will tend to be greater, than T(emin). But the rate of instruction execution, or issue rate, can be increased by a factor proportional to the number of pipe stages. The technique became practical in the mid-1960s. The Manchester Atlas and the IBM Stretch project were two of the first functioning pipelined processors. From the IBM 390/91 onward, all state-of-the-art scientific computers have been pipelined [3]. Pipelining does not help in all cases. There are several possible disadvantages. An instructionpipeline is said to be fully pipelined if it can accept a new instruction every clock cycle. A pipeline that is not has wait cycles that delay the progress of the pipeline [4]. Advantages of Pipelining: 1. The cycle time of the processor is reduced, thus increasing instruction issue-rate in mostcases. 2. Some combinational circuits such as adders or multipliers can be made faster by adding more circuitry. If pipelining is used instead, it can save circuitry vs. a more complex combinational circuit.

Disadvantages of Pipelining: 1. A non-pipelined processor executes only a single instruction at a time. This prevents branch delays (in effect, every branch is delayed) and problems with serial instructions being executed concurrently. Consequently the design is simpler and cheaper to manufacture. 2. The instruction latency in a non-pipelined processor is slightly lower than in a pipelined equivalent. This is because extra flip flops must be added to the data path of a pipelined processor. 3. A non-pipelined processor will have a stable instruction bandwidth. The performance of a pipelined processor is much harder to predict and may vary more widely between different programs

17 | P a g e

Appendix A – Exercise 1 Source Files Part a /************************************************** Implementation of the carry ripple adder that required for Part a of exercise 1. Author: Konstantinos Drosos Course: MSc Mechatronics and Robotics Module: ELEC5563 - Digital Design System-on-Chip SID: 200892224 Date: 24/02/2015 Assignment: 3 Exercise: 1 Part: a Version: 1.0 ***************************************************/ module adder4( A, B, cin, S, cout ); // Declaration of input ports input[3:0] A, B; input cin; // Declaration of output ports output[3:0] S; output cout; // Declaration of wires wire c1, c2, c3; // 4 instantiated 1-bit Full Adders FullAdder fa0( A[0], B[0], cin, c1, S[0] ); FullAdder fa1( A[1], B[1], c1, c2, S[1] ); FullAdder fa2( A[2], B[2], c2, c3, S[2] ); FullAdder fa3( A[3], B[3], c3, cout, S[3] ); endmodule // End of module

/************************************************** Implementation of the full adder module. This will be used as a submodule on the top module with the name adder4.v Author: Konstantinos Drosos Course: MSc Mechatronics and Robotics Module: ELEC5563 - Digital Design System-on-Chip SID: 200892224 Date: 24/02/2015 Assignment: 3 Exercise: 1 Part: a Version: 1.0 ***************************************************/ 18 | P a g e

module FullAdder( a, b, cin, cout, sum ); // Declaration of inputs input a, b, cin; // Declaration of outputs output cout, sum; // Declarations of internal nets wire w1, w2, w3, w4; xor #(10) (w1, a, b); // delay time of 10 units xor #(10) (sum, w1, cin); // delay time of 10 units and #(8) (w2, a, b); // delay time of 8 units and #(8) (w3, a, cin); // delay time of 8 units and #(8) (w4, b, cin); // delay time of 8 units or #(10, 8)(cout, w2, w3, w4); // (rise time of 10, fall 8) endmodule // End of module

Part b /************************************************** Implementation of the carry lookahead adder module. This source file will be used for the experiments done for part b of exercise 1. Author: Konstantinos Drosos Course: MSc Mechatronics and Robotics Module: ELEC5563 - Digital Design System-on-Chip SID: 200892224 Date: 24/02/2015 Assignment: 3 Exercise: 1 Part: b Version: 1.0 ***************************************************/ module CLA_4bit( S, Cout, PG, GG, A, B, Cin ); // Declaration of outputs output [3:0] S; output Cout,PG,GG; // Declaration of inputs input [3:0] A,B; input Cin; // Declarations of internal nets wire [3:0] G,P,C; assign assign assign assign

G = A & B; //Generate P = A ^ B; //Propagate C[0] = Cin; C[1] = G[0]|(P[0] & C[0]); 19 | P a g e

assign C[2] = G[1]|(P[1] & G[0])|(P[1] & P[0] & C[0]); assign C[3] = G[2]|(P[2] & G[1])|(P[2] & P[1] & G[0])|(P[2] & P[1] & P[0] & C[0]); assign Cout = G[3]|(P[3] & G[2])|(P[3] & P[2] & G[1])|(P[3] & P[2] & P[1] & G[0])|(P[3] & P[2] & P[1] & P[0] & C[0]); assign S = P ^ C; assign PG = P[3] & P[2] & P[1] & P[0]; assign GG = G[3] | (P[3] & G[2]) | (P[3] & P[2] & G[1]) | (P[3] & P[2] & P[1] & G[0]); endmodule // End of module

The following code implements the carry select adder by using different submodules. /************************************************** Implementation of the carry select adder module. This source file will be used for the experiments done for part b of exercise 1. Author: Konstantinos Drosos Course: MSc Mechatronics and Robotics Module: ELEC5563 - Digital Design System-on-Chip SID: 200892224 Date: 24/02/2015 Assignment: 3 Exercise: 1 Part: b Version: 1.0 ***************************************************/ module carry_select_adder(S, C, A, B); output [7:0] S; // The 8-bit sum. output C; // The 1-bit carry. input [7:0] input [7:0] wire wire wire wire wire

A; B;

// The 8-bit augend. // The 8-bit addend.

[3:0] S0; // High nibble sum output with carry input 0. [3:0] S1; // High nibble sum output with carry input 1. C0; // High nibble carry output with carry input 0. C1; // High nibble carry output with carry input 1. Clow; // Low nibble carry output used to select multiplexer output.

ripple_carry_adder rc_low_nibble_0(S[3:0], Clow, A[3:0], B[3:0], 0); // Calculate S low nibble. ripple_carry_adder rc_high_nibble_0(S0, C0, A[7:4], B[7:4], 0); // Calcualte S high nibble with carry input 0. ripple_carry_adder rc_high_nibble_1(S1, C1, A[7:4], B[7:4], 1); // Calcualte S high nibble with carry input 1. multiplexer_2_1 #(4) muxs(S[7:4], S0, S1, Clow); // Clow selects the high nibble result for S. multiplexer_2_1 #(1) muxc(C, C0, C1, Clow); // Clow selects the carry output. endmodule // End of carry_select_adder module

20 | P a g e

/************************************************** Implementation of the full adder module. This will be used as a submodule on the top module with the name carry_select_adder.v Author: Konstantinos Drosos Course: MSc Mechatronics and Robotics Module: ELEC5563 - Digital Design System-on-Chip SID: 200892224 Date: 24/02/2015 Assignment: 3 Exercise: 1 Part: b Version: 1.0 ***************************************************/ module full_adder(S, Cout, A, B, Cin); // Declaration of outputs output S; output Cout; // Declaration of inputs input A; input B; input Cin; // Declaration of internal nets wire w1; wire w2; wire w3; wire w4; // Declaration of XOR gates xor(w1, A, B); xor(S, Cin, w1); // Declaration of AND gates and(w2, A, B); and(w3, A, Cin); and(w4, B, Cin); // Declaration of OR gates or(Cout, w2, w3, w4); endmodule // End of module full_adder

/************************************************** Implementation of the multiplexer module. This will be used as a submodule on the top module with the name carry_select_adder.v Author: Konstantinos Drosos Course: MSc Mechatronics and Robotics Module: ELEC5563 - Digital Design System-on-Chip SID: 200892224 21 | P a g e

Date: 24/02/2015 Assignment: 3 Exercise: 1 Part: b Version: 1.0 ***************************************************/ module multiplexer_2_1( X, A0, A1, S ); parameter WIDTH = 16;

// How many bits wide are the lines

output [WIDTH-1:0] X;

// The output line

input [WIDTH-1:0] A1; // Input line with id 1'b1 input [WIDTH-1:0] A0; // Input line with id 1'b0 input S; // Selection bit assign X = (S == 1'b0) ? A0 : A1; endmodule // End of module multiplexer_2_1

/************************************************** Implementation of the carry ripple adder module. This will be used as a submodule on the top module with the name carry_select_adder.v Author: Konstantinos Drosos Course: MSc Mechatronics and Robotics Module: ELEC5563 - Digital Design System-on-Chip SID: 200892224 Date: 24/02/2015 Assignment: 3 Exercise: 1 Part: b Version: 1.0 ***************************************************/ module ripple_carry_adder( S, C, A, B, Cin ); output [3:0] S; // The 4-bit sum. output C; // The 1-bit carry. input [3:0] A; // The 4-bit augend. input [3:0] B; // The 4-bit addend. input Cin; // The carry input. wire wire wire

C0; // The carry out bit of fa0, the carry in bit of fa1. C1; // The carry out bit of fa1, the carry in bit of fa2. C2; // The carry out bit of fa2, the carry in bit of fa3.

full_adder full_adder full_adder full_adder

fa0(S[0], fa1(S[1], fa2(S[2], fa3(S[3],

C0, A[0], B[0], Cin); C1, A[1], B[1], C0); C2, A[2], B[2], C1); C, A[3], B[3], C2);

// Least significant bit. // Most significant bit.

endmodule // End of module ripple_carry_adder 22 | P a g e

Appendix B – Exercise 2 Source Files Part a /************************************************** Implementation of the BCD adder module. Author: Konstantinos Drosos Course: MSc Mechatronics and Robotics Module: ELEC5563 - Digital Design System-on-Chip SID: 200892224 Date: 24/02/2015 Assignment: 3 Exercise: 2 Part: a Version: 1.0 ***************************************************/ module bcd_adder ( S, cout, X, Y, cin ); // Declaration of inputs input[3:0] X, Y; input cin; // Declaration of outputs output[3:0] S; output cout; // Store the output values reg[3:0] S; reg cout; reg[4:0] Z; /* Execute the code inside the always structure */ always @( X or Y or cin ) begin // Summarise the input values Z = X + Y + cin; if ( Z > 9 )// Check if the overall output value is greater than 9 {cout,S} = Z + 6; // If so, then add 6 to the overall output else {cout,S} = Z; // Otherwise, provide only the output value // that has already been calculated above. end endmodule // End of module bcd_adder

23 | P a g e

Part b /************************************************** Implementation of the circuit design that generates the 9's complement of a BCD digit. Author: Konstantinos Drosos Course: MSc Mechatronics and Robotics Module: ELEC5563 - Digital Design System-on-Chip SID: 200892224 Date: 24/02/2015 Assignment: 3 Exercise: 2 Part: b Version: 1.0 ***************************************************/ module Nines_Complementer ( Word_9s_Comp, Word_BCD ); // Declaration of outputs output[3:0] Word_9s_Comp; // Declaration of inputs input[3:0] Word_BCD; // Store the output values reg [3:0] Word_9s_Comp; /* Execute the code inside the always structure */ always @ (Word_BCD) begin Word_9s_Comp = 4'b0; case (Word_BCD) 4'b0000: 4'b0001: 4'b0010: 4'b0011: 4'b0100: 4'b0101: 4'b0110: 4'b0111: 4'b1000: 4'b1001: default:

Word_9s_Comp Word_9s_Comp Word_9s_Comp Word_9s_Comp Word_9s_Comp Word_9s_Comp Word_9s_Comp Word_9s_Comp Word_9s_Comp Word_9s_Comp Word_9s_Comp

= = = = = = = = = = =

4'b1001; 4'b1000; 4'b1111; 4'b0110; 4'b1001; 4'b0100; 4'b0011; 4'b0010; 4'b0001; 4'b0000; 4'b1111;

// // // // // // // // // // //

0 to 9 1 to 8 2 to 7 3 to 6 4 to 5 5 to 4 6 to 3 7 to 2 8 to 1 9 to 0 Error detection

endcase // End the case end endmodule // End the Nines_Complementer module

24 | P a g e

Part c // BCD Adder – Subtractor module Problem_4_55_BCD_Adder_Subtractor ( BCD_Sum_Diff, Carry_Borrow, B, A, Mode ); input [3:0] B, A; input Mode; output [3:0] BCD_Sum_Diff; output Carry_Borrow; wire [3:0] Word_9s_Comp, mux_out; Nines_Complementer M0 (Word_9s_Comp, B); Quad_2_x_1_mux M2 (mux_out, Word_9s_Comp, B, Mode); BCD_Adder M1 (Carry_Borrow, BCD_Sum_Diff, mux_out, A, Mode); endmodule

module Nines_Complementer ( Word_9s_Comp, Word_BCD ); output reg [3:0] Word_9s_Comp; input [3:0] Word_BCD; always @ (Word_BCD) begin Word_9s_Comp = 4'b0; case (Word_BCD) 4'b0000: Word_9s_Comp = 4'b1001; // 0 to 4'b0001: Word_9s_Comp = 4'b1000; // 1 to 4'b0010: Word_9s_Comp = 4'b0111; // 2 to 4'b0011: Word_9s_Comp = 4'b0110; // 3 to 4'b0100: Word_9s_Comp = 4'b1001; // 4 to 4'b0101: Word_9s_Comp = 4'b0100; // 5 to 4'b0110: Word_9s_Comp = 4'b0011; // 6 to 4'b0111: Word_9s_Comp = 4'b0010; // 7 to 4'b1000: Word_9s_Comp = 4'b0001; // 8 to 4'b1001: Word_9s_Comp = 4'b0000; // 9 to default: Word_9s_Comp = 4'b1111; // Error endcase end endmodule

9 8 7 6 5 4 3 2 1 0 detection

module Quad_2_x_1_mux ( mux_out, b, a, select); input [3:0] b, a; input select; output[3:0] mux_out; reg [3:0] mux_out; always @ (a, b, select) case (select) 0: mux_out = a; 1: mux_out = b; endcase endmodule

25 | P a g e

module BCD_Adder ( Output_carry, Sum, Addend, Augend, Carry_in ); output Output_carry; output [3:0] Sum; input [3:0] Addend, Augend; input Carry_in; supply0 gnd; wire [3:0] Z_Addend; wire Carry_out; wire C_out; assign Z_Addend = {1'b0, Output_carry, Output_carry, 1'b0}; wire [3:0] Z_sum; and (w1, Z_sum[3], Z_sum[2]); and (w2, Z_sum[3], Z_sum[1]); or (Output_carry, Carry_out, w1, w2); Adder_4_bit M0 (Carry_out, Z_sum, Addend, Augend, Carry_in); Adder_4_bit M1 (C_out, Sum, Z_Addend, Z_sum, gnd); endmodule

26 | P a g e

Appendix C – Exercise 3 Source Files Part a // Implementation of the Half-Adder module HA(sout,cout,a,b); output sout,cout; input a,b; assign sout=a^b; assign cout=(a&b); endmodule Part b // Implementation of the Half-Adder module FA(sout,cout,a,b,cin); output sout,cout; input a,b,cin; assign sout=(a^b^cin); assign cout=((a&b)|(a&cin)|(b&cin)); endmodule Part c module multiply4bits(product,inp1,inp2); output [7:0]product; input [3:0]inp1; input [3:0]inp2; assign product[0]=(inp1[0]&inp2[0]); wire x1,x2,x3,x4,x5,x6,x7,x8,x9,x10,x11,x12,x13,x14,x15,x16,x17; HA HA1(product[1],x1,(inp1[1]&inp2[0]),(inp1[0]&inp2[1])); FA FA1(x2,x3,inp1[1]&inp2[1],(inp1[0]&inp2[2]),x1); FA FA2(x4,x5,(inp1[1]&inp2[2]),(inp1[0]&inp2[3]),x3); HA HA2(x6,x7,(inp1[1]&inp2[3]),x5); HA HA3(product[2],x15,x2,(inp1[2]&inp2[0])); FA FA5(x14,x16,x4,(inp1[2]&inp2[1]),x15); FA FA4(x13,x17,x6,(inp1[2]&inp2[2]),x16); FA FA3(x9,x8,x7,(inp1[2]&inp2[3]),x17);

27 | P a g e

HA HA4(product[3],x12,x14,(inp1[3]&inp2[0])); FA FA8(product[4],x11,x13,(inp1[3]&inp2[1]),x12); FA FA7(product[5],x10,x9,(inp1[3]&inp2[2]),x11); FA FA6(product[6],product[7],x8,(inp1[3]&inp2[3]),x10); endmodule

28 | P a g e

References [1]. Mano, M. and Mano, M. (1984). Digital design. Englewood Cliffs, N.J.: Prentice-Hall. [2]. Sun, W., Wirthlin, M. and Neuendorffer, S. (2007). FPGA Pipeline Synthesis Design Exploration Using Module Selection and Resource Sharing. IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst., 26(2), pp.254-265. [3]. Paralogos.com, (2015). Pipelining. [online] Available at: http://www.paralogos.com/DeadSuper/arch/pipeline.html [Accessed 24 Feb. 2015]. [4]. Aigal, V. (2012). Advantages and Disadvantages Mp. [online] Scribd.com. Available at: http://www.scribd.com/doc/102590434/Advantages-and-Disadvantages-Mp#scribd [Accessed 24 Feb. 2015].

29 | P a g e

Lihat lebih banyak...

Digital Design for System-on-Chip - Assignment 3

Descrição do Produto

Comentários