A compact 32-bit AES design for embedded system

Share Embed


Descrição do Produto

2012 International Conference on Design & Technology of Integrated Systems in Nanoscale Era

A compact 32-Bit AES design for embedded system

Noura Benhadjyoussef Physic Department of Faculty of Sciences of Monastir Electronics and Micro-Electronic Laboratory (E  E L) Monastir, Tunisia [email protected]

wajih El hadj youssef Physic Department of Faculty of Sciences of Monastir Electronics and Micro-Electronic Laboratory (E  E L) Monastir, Tunisia [email protected]

Mohsen Machhout Physic Department of Faculty of Sciences of Monastir Electronics and Micro-Electronic Laboratory (E  E L) Monastir, Tunisia

Rached Tourki Physic Department of Faculty of Sciences of Monastir Electronics and Micro-Electronic Laboratory (E  E L) Monastir, Tunisia

Abstract— Recently, much research has been conducted for security of data transactions on embedded platforms. Advanced Encryption Standard (AES) is considered as one of a candidate algorithm for data encryption/decryption. One important application of this standard is cryptography on smart cards. In this paper we describe a 32-bits architecture developed for Rijndael algorithm to accelerate execution on 32-bits platforms with reduced memory. Using the FPGA device xc5vfx70t2ff1136-6, a very low-cost implementation of 375 occupied Slices is obtained under 303.364 MHz frequency.

I.

The present paper illustrates compact implementations of the AES algorithm for several platforms and especially for smart cards. In this case, we aim to attain the greatest possible performances with a small area, since a smart card environment memory and a silicon space are limited resources. In addition; we describe the low power design methods used in our proposed AES crypto module. This implementation based on a Xilinx Virtex FPGAs, the performances are analyzed and shown to be positively compared with other well-known FPGA based implementation.

INTRODUCTION

In addition; we describe the low power design methods used in our low power AES crypto module. In our implementation besed on Xilinx Virtex FPGAs, performances are analyzed and shown to compare positively with other well known FPGA based imple mentations.

The AES algorithm [1] was selected in 2000 by the US National Institute of Standards and Technologies (NIST) as a replacement to the Data Encryption Standard (DES) cryptographic algorithm [2]. It is based on Rijndael algorithm which is a symmetric-key algorithm that processes fixed data of 128-bit blocks. The AES algorithm is suited for an efficient implementation on a wide range of processors. It can be used as encryption standard in embedded systems and especially the smart cards.

This paper is organized as follows. In Section 2, we provide the basic structure of the Rijndael algorithm. Section 3 describes our proposed 32-bit approach to the algorithm. Experimental results and comparison with other reference implementations are discussed in section 4. Conclusions are summarized in the section 5.

The important feature of recent research is the continuous alternation between theoretical investigation and practical implementation in hardware platform. Actually, embedded devices market is searching for 32-bits microprocessors as the new leading technology: LEON2 processor [15], Leon3 processor [17], ARM SecurCore SC300, SC100 [16], ST32 [6].These processors deliver unprecedented feature-rich 32-bit performances in terms of cost, area and power, compared with 8/16-bit ones.

II.

The basic information unit for treatment in the AES algorithm is a series of eight bits processes considered as a single unit. The bit series corresponding to the input, the output and the cipher key are processed as arrays of bytes; called State. The State array consists of four columns of bytes, and every column contains 4 bytes. A full description of the AES is detailed in FIPS 197 [1].

There are many implementations of the AES reported in literature;some of them use Field Programmable Gate Arrays (FPGA), or ApplicationǦSpecific Integrated Circuits (ASIC) while others use smart card. According to the performance needed; the designs are divided into two categories. The first category aims at high-speed AES encryption cores and high throughput, while requiring a reasonable amount of resources [3, 4]. The second category involves only ultra rapid implementations and demanding an extremely small area [5].

978-1-4673-1928-7/12/$31.00 ©2012 IEEE

DESCRIPTION OF THE AES ALGORITHM

The AES algorithm operates in rounds and support three different key lengths, 128, 192, and 256 bits; the standard will consider only 128-bit as legal block length. The number of these rounds is chosen depending on the key size. In fact, for a key length equal to 128, 192 or 265 the number of rounds is equal to 10, 12 and 14, respectively.

-1-

2012 International Conference on Design & Technology of Integrated Systems in Nanoscale Era

The AES round constitutes a xed set of transformations applied to the State array. A separate KeyExpansion unit is used to generate keys for each round of AES algorithm. In each round, a data block is transformed by a sequence of operations: • Addroundkey: the key schedule of the current round is added to data block by a simple using a XOR operation. •

SubBytes: replaces each byte of the 16 bytes of data block using the S-box lookup table value of that byte. The contents of an S-box is the multiplicative inverse in Galois Field (GF) (28), followed by an affine transformation.



Shiftrows: obtains a new data block by cyclically shifting the block rows. The bytes of row i are shifted i times, where 0  i  4.



Mixcolumns: transforms each column of the state array by multiplying it with a constant GF polynomial. It operates on the state column by column, treating each column as a four term polynomial. The columns are considered as polynomials over GF(28) and multiplied x4+ 1 with a fixed polynomial a(x) given by: a(x) = {03}x3+ {01}x2+ {01}x + {02}

32- BITS ARCHITECTURE

In order to optimize the size of our AES hardware design, The 128-bit data (4 × 4 bytes) block is divided into four 32-bit blocks, and is processed at one column or at one row through the 32-bit data bus. However, the ShiftRow function requires the accessibility of all the 128 bits data before it can start. In this case, four registers (32 bit) are needed. The SubBytes transformation is an 8-bit operation. As shown, in Fig. 2 there are in total 4 S-boxes in our proposed design so it can support 4 SubBytes simultaneously. Therefore, the encryption datapath processes a full 32-byte block in parallel. A complete round transformation is executed in a single clock cycle •

Design Choices for AES

Different parameters can be used for the selection of an appropriate architecture; like throughput, power consumption, area and resistance to side channel attacks [18]. This selection has a significant impact on system performance.

AddRoundKey

There are different techniques for implementing AES algorithm. The pipelined architecture is the fastest in terms of throughput and the largest of basic structures; in fact, it contains all the rounds as separate components with registers in between. This architecture enables very high-speed implementation; but implies large area and high power consumption [7].

SubBytes

Shift Rows

Repeat N-1 Rounds MixColumn

AddRoundKey

Round Key(128-bits)

SubBytes

Final Round Shift Rows

Final Round Key (128-bits)

PROPOSED

The 32-bit processors and the ALU architectures are based on registers, address buses, and data buses of 32 bits. Also the memory addresses and the data units are at that size. On the other hand, each operation of AES maps a 128-bit input state into a 128-bit output state.

(1)

PlainText (128-bits)

Initial Key(128-bits)

III.

In this section, our objective is to define appropriate architecture of the AES algorithm to accelerate execution on 32-bit microprocessors with memory constraints, such as those available in the smart cards.

(128-bits)

This detail makes it unattractive for embedded system. However, the iterated architecture consists of a round component; it is loaded by its own output until the necessary number of rounds has been performed. As a result, it leads to the smallest implementation. Hence, we chose the basic synchronous iterative architectture in our implementation. Fig. 2 presents the system architecture of our implementation. As seen, the design of the 32-bit AES processor includes the following components: •

AddRoundKey

CipherText (128-bits)

• •

Figure 1. AES Encryption Round operation

All round transformations are identical, apart from the final one. Before the cipher operation takes place, a key schedule is generated. The subkey for the rst round is the private cipher key. Fig. 1 illustrates the encryption round operations.



-2-

The Input and Output interfaces: as well as many internal communication data paths is 32 bits in width. It is used to hold the 128 plaintexts bits before being treated and to memorize ciphertexts until processing the total 128 bits. Key Expander is used to calculate a set of round keys. Controller is used to generate control signals for all other component. AES Round, used to encrypt or decrypt input state of data.

2012 International Conference on Design & Technology of Integrated Systems in Nanoscale Era

DEVICE UTILIZATION SUMMARY OF AES ENCRYPTION

TABLE I.

Max. Frequency (Mhz)

Number of occupied Slices

Number of Slice LUTS

Power (mW)

Throughput (Mbps)

xc5vfx70t

296.435

456

1338

90mW

2918,744

Our proposed Aes_32bit xc5vfx70t

303.364

375

75mW

2588,706

Aes128

Plaintext

Key Enable 3

The AES-32 core presents a high frequency with 303.364 MHz and a low area with 3% of occupied Slices and Slice LUTS. The AES-32 bit consumes less than AES-128 but the throughput is lower. Each round is completed in one clock cycle, with four clock cycles for registering the input, the total clock cycle need for processing 128-bit data is 15 clocks for the AES-32 bit, compared to 12 clock cycles for the AES128 bit.

32

Data_loaded Input_Buffer Key_loaded 32

Key_rea

AES Round 8

8

Sbox

Sbox

8

8

Sbox

Sbox

Table 2 compares our implementation with recent works reported in literature using other well known FPGA; XC2V6000BF957-6 [8], C5VLX50 [9], XC2V80-6 [12] and XC2V1000 [10]. The throughput varies from 2734 to 1245 megabit per second (mpbs) depending on targeted device. In the case of [8, 9, 13, 14, 12], the evaluation is more attractive since it is related to the same platforms as ours. As it is shown, the maximum frequency of our implementation is better compared with that reported in [8,9,10] but it is lower than that in [13, 12]. Compared to the [8], our proposed architecture has less Slice LUTS. A comparison with AES ASICs implementation is also given in [11].

Round_Key Controller

Key Expander

ShiftRow

32

MixColumn AddRoundKey

Generate_Key

32 Ciphertext_ready

Output_Buffer

CONCLUSION

This paper reports the implementation results of the AES algorithm on different Xilinx Virtex FPGAs. A 32- bit architecture implementation of the AES crypto module is addressed. This work details the design of the AES system based on iterative loop architecture. With the proposed architecture a consumed power reduction of 15mw is achieved, compared with the AES-128 bit. The proposed design achieved frequency is better compared with the standards. Furthermore, the proposed 32-bit architecture of the AES occupies a reasonable amount of resources in terms of slices. From the obtained performances, we can conclude that our proposed 32-bit AES Architecture is suitable to be used at the systems with resource constrained environments adapted for smart cards.

3 Ciphertext

Done

Figure 2. 32-bit AES design

IV.

1423

EXPERIMENTAL RESULT

The AES-32 bit encryption, with key expansion system is captured using VHDL and the simulation environment ModeSim. The architecture is simulated to confirm the functionality, using different test vectors provided by the AES standard [1]. In order to ensure the evaluation of our design, we implemented AES-128 encryption and the AES-32 encryption separately. The performances measured are the maximum frequency, the area, the power and the throughput. The proposed design is implemented in Xilinx 10.1 tools and the FPGA xc5vfx70t-2ff1136-6 used as the target device. Table 1, summarizes the device utilization of the AES-128 bit and the AES-32 bit. As shown in table 1, the proposed design outperforms the AES-128 designs in terms of area, power and frequency.

REFERENCES [1]

[2]

-3-

National Institute of Standards and Technology (NIST). Advanced Encr yption Standard (AES). Federal Information Processing Standards Public ations (FIPS PUBS) 197Ǧ26, 2001 National Institute of Standards and Technology (NIST). Data Encryption Standard (DES). Federal Information Processing Standards Publications (FIPS PUBS) 46Ǧ3, 1999.

2012 International Conference on Design & Technology of Integrated Systems in Nanoscale Era

.

TABLE II. Reference

PERFORMANCE COMPARISON RESULTS

Datapath

Max. Frequency (Mhz)

32

145.964

32

Number of occupied Slice

Number of Slice LUTS

Throughput (Mbps)

2068

-

1245,559

149

115(CLB slice)

-

43 2

128

182

Area ( 1937)

-

2118

32

163.908

2031

3228 (4 input LUT)

1398,6816

128

62.5

2943

5802 (4 input LUT)

666.7

32

320.403

497

1423

2734,106

128

242.153

1745

5256

3.09 Gbps

32

163.908

2031

3228(4 input LUT)

1398,681

32

264

866 (CLB slice)

-

768

128

96,42

586 slices+ 10 BRAM

1450

-

-

Our proposed XC2V 8000-5ff1517 [12] XC2V80 -6 [13] VirtexǦII Our proposed XC2V6000BF957-6 [8] XC2V6000BF957-6 Our proposed XC5VLX50 [9] XC5VLX50 Our proposed XC2V1000 [14] XC2V1000 [10] XC2V1000

-

AESǦ128: ASIC Implementation

[11]

[3]

[4]

[5]

[6] [7]

[8]

[9]

128

182

6986 Gates

[10] A.ǦB. Ignacio, F.ǦU. Claudia, and R. Cumplido. Design and Implementation of an FPGAǦBased 1.452ǦGbps NonǦpipelinedAES Architecture. Lectures Notes in Computer Science, 3982: 446–455, 2006. [11] Yibo Fan, Takeshi Ikenaga, Yukiyasu Tsunoo, and Satoshi Goto ,A Low-cost Reconfigurable Architecture for AES Algorithm, World Academy of Science, Engineering and Technology 41, pp 270-273, [12] CAST, Advanced Encryption Standard Core, available at; http://www.castinc.com/cores/aes/index.shtml. [13] Nedjah, L. de Macedo Mourelle, and M.P Cardoso. A Compact Piplined Hardware Implementation of the AESǦ128 Cipher. Proceedings of the T hird International Conference on Information Technology: New Generati ons, pages 216–221, 2006. [14] Somsak Choomchuay, Surapong Pongyupinpanich and Somsanouk Pathumvanh,A Compact 32-bit Architecture for an AES System, ECTITRANSACTIONS ON COMPUT ER AND INFORMATION THEORY VOL .1, NO.1 MAY,pp 24-29, 2005 [15] Gaisler Research. LEON2 Processor Users Manual. XST Edition. Availableonline at http://www.gaisler.com/doc/leon 2- 1.0.30- xst.pdf, July 2005.Version 1.0.30. [16] Arm website , http://www.arm.com [17] Gaisler website http://www.gaisler.com [18] François-Xavier Standaert, Sddka Berna Ors , Bart Preneel , Power analysis of an FPGA. Implementation of Rijndael: Is pipelining a DPA countermeasure? LNCS 0302-9743, vol. 3156, pp. 30-44, 2004

Anna Labbe and Annie Perez. AES Implementation on FPGA: Time Flexibility Tradeoff, in FPL 2002, FPL 2002, LNCS 2438, pp. 836-844, 2002. Christopher Caltagirone and Kasi AnanthaI. High Throughput, Parallelized 128-bit AES Encryption in a Resource-Limited FPGA, in SPAA’03, June 2003. Kimmo U. Jarvinen, Matti T. Tommiska and Jorma O. Skytta. A Fully Pipelined Memoryless 17.8 Gbps AES128 Encryptor, in FPGA’03, February 2003. STMicroelectronics website , www .st.com Panu Hämäläinen, Marko Hännikäinen, and Timo D. Hämäläinen, Review of Hardware Architectures for Advanced Encryption Standard Implementations Considering Wireless Sensor Networks, SAMOS 2007, LNCS 4599, pp. 443–453, 2007 L.Thulasimani, M.Madheswaran ,A SINGLE CHIP DESIGN AND IMPLEMENTATION OF AES -128/192/256 ENCRYPTION ALGORITHMS, International Journal of Engineering Science and Technology ,Vol. 2(5), 2010, 1052-1059 2010. Muhammad H. Rais and Syed M. Qasim, A Novel FPGA Implementation of AES-128 using Reduced Residue of Prime Number based S-Box, IJCSNS International Journal of Computer Science and Network Security, VOL.9 No.9, pp. 305-309, 2009.

-4-

Lihat lebih banyak...

Comentários

Copyright © 2017 DADOSPDF Inc.