2012 International Conference on Design & Technology of Integrated Systems in Nanoscale Era
A compact 32-Bit AES design for embedded system
Noura Benhadjyoussef Physic Department of Faculty of Sciences of Monastir Electronics and Micro-Electronic Laboratory (E E L) Monastir, Tunisia
[email protected]
wajih El hadj youssef Physic Department of Faculty of Sciences of Monastir Electronics and Micro-Electronic Laboratory (E E L) Monastir, Tunisia
[email protected]
Mohsen Machhout Physic Department of Faculty of Sciences of Monastir Electronics and Micro-Electronic Laboratory (E E L) Monastir, Tunisia
Rached Tourki Physic Department of Faculty of Sciences of Monastir Electronics and Micro-Electronic Laboratory (E E L) Monastir, Tunisia
Abstract— Recently, much research has been conducted for security of data transactions on embedded platforms. Advanced Encryption Standard (AES) is considered as one of a candidate algorithm for data encryption/decryption. One important application of this standard is cryptography on smart cards. In this paper we describe a 32-bits architecture developed for Rijndael algorithm to accelerate execution on 32-bits platforms with reduced memory. Using the FPGA device xc5vfx70t2ff1136-6, a very low-cost implementation of 375 occupied Slices is obtained under 303.364 MHz frequency.
I.
The present paper illustrates compact implementations of the AES algorithm for several platforms and especially for smart cards. In this case, we aim to attain the greatest possible performances with a small area, since a smart card environment memory and a silicon space are limited resources. In addition; we describe the low power design methods used in our proposed AES crypto module. This implementation based on a Xilinx Virtex FPGAs, the performances are analyzed and shown to be positively compared with other well-known FPGA based implementation.
INTRODUCTION
In addition; we describe the low power design methods used in our low power AES crypto module. In our implementation besed on Xilinx Virtex FPGAs, performances are analyzed and shown to compare positively with other well known FPGA based imple mentations.
The AES algorithm [1] was selected in 2000 by the US National Institute of Standards and Technologies (NIST) as a replacement to the Data Encryption Standard (DES) cryptographic algorithm [2]. It is based on Rijndael algorithm which is a symmetric-key algorithm that processes fixed data of 128-bit blocks. The AES algorithm is suited for an efficient implementation on a wide range of processors. It can be used as encryption standard in embedded systems and especially the smart cards.
This paper is organized as follows. In Section 2, we provide the basic structure of the Rijndael algorithm. Section 3 describes our proposed 32-bit approach to the algorithm. Experimental results and comparison with other reference implementations are discussed in section 4. Conclusions are summarized in the section 5.
The important feature of recent research is the continuous alternation between theoretical investigation and practical implementation in hardware platform. Actually, embedded devices market is searching for 32-bits microprocessors as the new leading technology: LEON2 processor [15], Leon3 processor [17], ARM SecurCore SC300, SC100 [16], ST32 [6].These processors deliver unprecedented feature-rich 32-bit performances in terms of cost, area and power, compared with 8/16-bit ones.
II.
The basic information unit for treatment in the AES algorithm is a series of eight bits processes considered as a single unit. The bit series corresponding to the input, the output and the cipher key are processed as arrays of bytes; called State. The State array consists of four columns of bytes, and every column contains 4 bytes. A full description of the AES is detailed in FIPS 197 [1].
There are many implementations of the AES reported in literature;some of them use Field Programmable Gate Arrays (FPGA), or ApplicationǦSpecific Integrated Circuits (ASIC) while others use smart card. According to the performance needed; the designs are divided into two categories. The first category aims at high-speed AES encryption cores and high throughput, while requiring a reasonable amount of resources [3, 4]. The second category involves only ultra rapid implementations and demanding an extremely small area [5].
978-1-4673-1928-7/12/$31.00 ©2012 IEEE
DESCRIPTION OF THE AES ALGORITHM
The AES algorithm operates in rounds and support three different key lengths, 128, 192, and 256 bits; the standard will consider only 128-bit as legal block length. The number of these rounds is chosen depending on the key size. In fact, for a key length equal to 128, 192 or 265 the number of rounds is equal to 10, 12 and 14, respectively.
-1-
2012 International Conference on Design & Technology of Integrated Systems in Nanoscale Era
The AES round constitutes a xed set of transformations applied to the State array. A separate KeyExpansion unit is used to generate keys for each round of AES algorithm. In each round, a data block is transformed by a sequence of operations: • Addroundkey: the key schedule of the current round is added to data block by a simple using a XOR operation. •
SubBytes: replaces each byte of the 16 bytes of data block using the S-box lookup table value of that byte. The contents of an S-box is the multiplicative inverse in Galois Field (GF) (28), followed by an affine transformation.
•
Shiftrows: obtains a new data block by cyclically shifting the block rows. The bytes of row i are shifted i times, where 0 i 4.
•
Mixcolumns: transforms each column of the state array by multiplying it with a constant GF polynomial. It operates on the state column by column, treating each column as a four term polynomial. The columns are considered as polynomials over GF(28) and multiplied x4+ 1 with a fixed polynomial a(x) given by: a(x) = {03}x3+ {01}x2+ {01}x + {02}
32- BITS ARCHITECTURE
In order to optimize the size of our AES hardware design, The 128-bit data (4 × 4 bytes) block is divided into four 32-bit blocks, and is processed at one column or at one row through the 32-bit data bus. However, the ShiftRow function requires the accessibility of all the 128 bits data before it can start. In this case, four registers (32 bit) are needed. The SubBytes transformation is an 8-bit operation. As shown, in Fig. 2 there are in total 4 S-boxes in our proposed design so it can support 4 SubBytes simultaneously. Therefore, the encryption datapath processes a full 32-byte block in parallel. A complete round transformation is executed in a single clock cycle •
Design Choices for AES
Different parameters can be used for the selection of an appropriate architecture; like throughput, power consumption, area and resistance to side channel attacks [18]. This selection has a significant impact on system performance.
AddRoundKey
There are different techniques for implementing AES algorithm. The pipelined architecture is the fastest in terms of throughput and the largest of basic structures; in fact, it contains all the rounds as separate components with registers in between. This architecture enables very high-speed implementation; but implies large area and high power consumption [7].
SubBytes
Shift Rows
Repeat N-1 Rounds MixColumn
AddRoundKey
Round Key(128-bits)
SubBytes
Final Round Shift Rows
Final Round Key (128-bits)
PROPOSED
The 32-bit processors and the ALU architectures are based on registers, address buses, and data buses of 32 bits. Also the memory addresses and the data units are at that size. On the other hand, each operation of AES maps a 128-bit input state into a 128-bit output state.
(1)
PlainText (128-bits)
Initial Key(128-bits)
III.
In this section, our objective is to define appropriate architecture of the AES algorithm to accelerate execution on 32-bit microprocessors with memory constraints, such as those available in the smart cards.
(128-bits)
This detail makes it unattractive for embedded system. However, the iterated architecture consists of a round component; it is loaded by its own output until the necessary number of rounds has been performed. As a result, it leads to the smallest implementation. Hence, we chose the basic synchronous iterative architectture in our implementation. Fig. 2 presents the system architecture of our implementation. As seen, the design of the 32-bit AES processor includes the following components: •
AddRoundKey
CipherText (128-bits)
• •
Figure 1. AES Encryption Round operation
All round transformations are identical, apart from the final one. Before the cipher operation takes place, a key schedule is generated. The subkey for the rst round is the private cipher key. Fig. 1 illustrates the encryption round operations.
•
-2-
The Input and Output interfaces: as well as many internal communication data paths is 32 bits in width. It is used to hold the 128 plaintexts bits before being treated and to memorize ciphertexts until processing the total 128 bits. Key Expander is used to calculate a set of round keys. Controller is used to generate control signals for all other component. AES Round, used to encrypt or decrypt input state of data.
2012 International Conference on Design & Technology of Integrated Systems in Nanoscale Era
DEVICE UTILIZATION SUMMARY OF AES ENCRYPTION
TABLE I.
Max. Frequency (Mhz)
Number of occupied Slices
Number of Slice LUTS
Power (mW)
Throughput (Mbps)
xc5vfx70t
296.435
456
1338
90mW
2918,744
Our proposed Aes_32bit xc5vfx70t
303.364
375
75mW
2588,706
Aes128
Plaintext
Key Enable 3
The AES-32 core presents a high frequency with 303.364 MHz and a low area with 3% of occupied Slices and Slice LUTS. The AES-32 bit consumes less than AES-128 but the throughput is lower. Each round is completed in one clock cycle, with four clock cycles for registering the input, the total clock cycle need for processing 128-bit data is 15 clocks for the AES-32 bit, compared to 12 clock cycles for the AES128 bit.
32
Data_loaded Input_Buffer Key_loaded 32
Key_rea
AES Round 8
8
Sbox
Sbox
8
8
Sbox
Sbox
Table 2 compares our implementation with recent works reported in literature using other well known FPGA; XC2V6000BF957-6 [8], C5VLX50 [9], XC2V80-6 [12] and XC2V1000 [10]. The throughput varies from 2734 to 1245 megabit per second (mpbs) depending on targeted device. In the case of [8, 9, 13, 14, 12], the evaluation is more attractive since it is related to the same platforms as ours. As it is shown, the maximum frequency of our implementation is better compared with that reported in [8,9,10] but it is lower than that in [13, 12]. Compared to the [8], our proposed architecture has less Slice LUTS. A comparison with AES ASICs implementation is also given in [11].
Round_Key Controller
Key Expander
ShiftRow
32
MixColumn AddRoundKey
Generate_Key
32 Ciphertext_ready
Output_Buffer
CONCLUSION
This paper reports the implementation results of the AES algorithm on different Xilinx Virtex FPGAs. A 32- bit architecture implementation of the AES crypto module is addressed. This work details the design of the AES system based on iterative loop architecture. With the proposed architecture a consumed power reduction of 15mw is achieved, compared with the AES-128 bit. The proposed design achieved frequency is better compared with the standards. Furthermore, the proposed 32-bit architecture of the AES occupies a reasonable amount of resources in terms of slices. From the obtained performances, we can conclude that our proposed 32-bit AES Architecture is suitable to be used at the systems with resource constrained environments adapted for smart cards.
3 Ciphertext
Done
Figure 2. 32-bit AES design
IV.
1423
EXPERIMENTAL RESULT
The AES-32 bit encryption, with key expansion system is captured using VHDL and the simulation environment ModeSim. The architecture is simulated to confirm the functionality, using different test vectors provided by the AES standard [1]. In order to ensure the evaluation of our design, we implemented AES-128 encryption and the AES-32 encryption separately. The performances measured are the maximum frequency, the area, the power and the throughput. The proposed design is implemented in Xilinx 10.1 tools and the FPGA xc5vfx70t-2ff1136-6 used as the target device. Table 1, summarizes the device utilization of the AES-128 bit and the AES-32 bit. As shown in table 1, the proposed design outperforms the AES-128 designs in terms of area, power and frequency.
REFERENCES [1]
[2]
-3-
National Institute of Standards and Technology (NIST). Advanced Encr yption Standard (AES). Federal Information Processing Standards Public ations (FIPS PUBS) 197Ǧ26, 2001 National Institute of Standards and Technology (NIST). Data Encryption Standard (DES). Federal Information Processing Standards Publications (FIPS PUBS) 46Ǧ3, 1999.
2012 International Conference on Design & Technology of Integrated Systems in Nanoscale Era
.
TABLE II. Reference
PERFORMANCE COMPARISON RESULTS
Datapath
Max. Frequency (Mhz)
32
145.964
32
Number of occupied Slice
Number of Slice LUTS
Throughput (Mbps)
2068
-
1245,559
149
115(CLB slice)
-
43 2
128
182
Area ( 1937)
-
2118
32
163.908
2031
3228 (4 input LUT)
1398,6816
128
62.5
2943
5802 (4 input LUT)
666.7
32
320.403
497
1423
2734,106
128
242.153
1745
5256
3.09 Gbps
32
163.908
2031
3228(4 input LUT)
1398,681
32
264
866 (CLB slice)
-
768
128
96,42
586 slices+ 10 BRAM
1450
-
-
Our proposed XC2V 8000-5ff1517 [12] XC2V80 -6 [13] VirtexǦII Our proposed XC2V6000BF957-6 [8] XC2V6000BF957-6 Our proposed XC5VLX50 [9] XC5VLX50 Our proposed XC2V1000 [14] XC2V1000 [10] XC2V1000
-
AESǦ128: ASIC Implementation
[11]
[3]
[4]
[5]
[6] [7]
[8]
[9]
128
182
6986 Gates
[10] A.ǦB. Ignacio, F.ǦU. Claudia, and R. Cumplido. Design and Implementation of an FPGAǦBased 1.452ǦGbps NonǦpipelinedAES Architecture. Lectures Notes in Computer Science, 3982: 446–455, 2006. [11] Yibo Fan, Takeshi Ikenaga, Yukiyasu Tsunoo, and Satoshi Goto ,A Low-cost Reconfigurable Architecture for AES Algorithm, World Academy of Science, Engineering and Technology 41, pp 270-273, [12] CAST, Advanced Encryption Standard Core, available at; http://www.castinc.com/cores/aes/index.shtml. [13] Nedjah, L. de Macedo Mourelle, and M.P Cardoso. A Compact Piplined Hardware Implementation of the AESǦ128 Cipher. Proceedings of the T hird International Conference on Information Technology: New Generati ons, pages 216–221, 2006. [14] Somsak Choomchuay, Surapong Pongyupinpanich and Somsanouk Pathumvanh,A Compact 32-bit Architecture for an AES System, ECTITRANSACTIONS ON COMPUT ER AND INFORMATION THEORY VOL .1, NO.1 MAY,pp 24-29, 2005 [15] Gaisler Research. LEON2 Processor Users Manual. XST Edition. Availableonline at http://www.gaisler.com/doc/leon 2- 1.0.30- xst.pdf, July 2005.Version 1.0.30. [16] Arm website , http://www.arm.com [17] Gaisler website http://www.gaisler.com [18] François-Xavier Standaert, Sddka Berna Ors , Bart Preneel , Power analysis of an FPGA. Implementation of Rijndael: Is pipelining a DPA countermeasure? LNCS 0302-9743, vol. 3156, pp. 30-44, 2004
Anna Labbe and Annie Perez. AES Implementation on FPGA: Time Flexibility Tradeoff, in FPL 2002, FPL 2002, LNCS 2438, pp. 836-844, 2002. Christopher Caltagirone and Kasi AnanthaI. High Throughput, Parallelized 128-bit AES Encryption in a Resource-Limited FPGA, in SPAA’03, June 2003. Kimmo U. Jarvinen, Matti T. Tommiska and Jorma O. Skytta. A Fully Pipelined Memoryless 17.8 Gbps AES128 Encryptor, in FPGA’03, February 2003. STMicroelectronics website , www .st.com Panu Hämäläinen, Marko Hännikäinen, and Timo D. Hämäläinen, Review of Hardware Architectures for Advanced Encryption Standard Implementations Considering Wireless Sensor Networks, SAMOS 2007, LNCS 4599, pp. 443–453, 2007 L.Thulasimani, M.Madheswaran ,A SINGLE CHIP DESIGN AND IMPLEMENTATION OF AES -128/192/256 ENCRYPTION ALGORITHMS, International Journal of Engineering Science and Technology ,Vol. 2(5), 2010, 1052-1059 2010. Muhammad H. Rais and Syed M. Qasim, A Novel FPGA Implementation of AES-128 using Reduced Residue of Prime Number based S-Box, IJCSNS International Journal of Computer Science and Network Security, VOL.9 No.9, pp. 305-309, 2009.
-4-