Low-power compact composite field AES S-Box/Inv S-Box design in 65 nm CMOS using Novel XOR Gate

doi:10.1016/j.vlsi.2012.06.002

Integration

Volume 46, Issue 4, September 2013, Pages 333-344

https://doi.org/10.1016/j.vlsi.2012.06.002 Get rights and content

Abstract

The Substitution box (S-Box) forms the core building block of any hardware implementation of the Advanced Encryption Standard (AES) algorithm as it is a non-linear structure requiring multiplicative inversion. This paper presents a full custom CMOS design of S-Box/Inversion S-Box (Inv S-Box) with low power GF (2⁸) Galois Field inversions based on polynomial basis, using composite field arithmetic. The S-Box/Inv S-Box utilizes a novel low power 2-input XOR gate with only six devices to achieve a compact module implemented in 65 nm IBM CMOS technology. The area of the core circuit is only about 288 μm² as a result of this transistor level optimization. The hardware cost of the S-Box/Inv S-Box is about 158 logic gates equivalent to 948 transistors with a critical path propagation delay of 7.322 ns enabling a throughput of 130 Mega-SubBytes per second. This design indicates a power dissipation of only around 0.09 μW using a 0.8 V supply voltage, and, is suitable for applications such as RFID tags and smart cards which require low power consumption with a small silicon die. The proposed implementation compares favorably with other existing S-Box designs.

Highlights

► New full-custom S-Box/Inv S-Box AES using composite field arithmetic by resource sharing. ► Implementation using a novel low power 2-input XOR gate with only six devices. ► New XOR gate offers the lowest propagation delay and power consumption. ► Implementation using 65 nm CMOS technology and the area of the S-Box is 288 μm² with 158 logic gates. ► Critical path delay of 7.322 ns, throughput of 130 Mbps and power dissipation of 0.09 μW (0.8 V).

Introduction

As cryptography plays a crucial role in the security of data transmission, AES based on Rijndael algorithm [1] was selected as a data encryption standard by the National Institute of Standards and Technology (NIST) in 1997 based on the primary criteria of security, performance, efficiency in software and hardware implementation, and flexibility. AES is one of the most common symmetric encryption algorithms and is widely adopted for a variety of encryption needs, such as wireless networks and secure transactions via the Internet. AES can be implemented on a wide range of platforms under different constraints [2]. In portable applications computing resources are usually limited and dedicated hardware implementation of the security process is essential [39]. Implementation using Field Programmable Gate Array (FPGA) is not suitable for such applications mainly due to size and power constraints. FPGA being a general purpose logic array usually there is some residual (unused) logic and I/O blocks, and consequently, highly compact implementation is difficult to achieve. In addition, FPGA implementation is prone to switching noise induced power analysis attack [42]. A compact small foot-print full-custom chip is more suitable in such a case. In addition, such a dedicated hardwired AES implementation can provide higher data rate for fast handling of ciphered network data packets in applications such as routers compared to software packages. The hardwired implementation is also physically secure since tempering by an attacker is more difficult. The overall efficiency of AES hardware implementation in terms of size, speed, security and power dissipation depends largely on the AES architecture [40]. For high throughput, loop-unrolled pipelined structure [4] is used, but on the other hand, to save power and area, iterative single round with resource sharing is implemented.

The S-Box is at the core of any AES implementation and is considered a full complexity design consuming the major portion of the power and energy budget of the AES hardware. This paper is focused on area-efficient low-voltage and low-power CMOS implementation of the S-Box/Inv S-Box. There are various reported techniques to implement the S-Box to satisfy the varying criteria such as power, speed and delay for different applications. Among them there are two main streams: (a) Implementation using look up tables (LUTs) which stores all predefined 256 8-bit values of S-Box in a Read-Only-Memory (ROM). The advantage of using LUT is that it offers a shorter critical path. However, it has a drawback of the unbreakable delay path [3] in pipelined designs, and hence it is not suitable for high speed applications. This delay prohibits each round unit from being divided into more than two sub-stages to achieve any further increase in processing speed [41]. It also requires a large area to implement both AES encryption and decryption as a different table is used in each case. (b) The alternative way is to design the S-Box circuit using combinatorial logic directly from its arithmetic operations. This approach has breakable delay-path for S-Box processing. Other S-Box architectures, such as positive polarity Reed–Muller structure [6], binary decision diagram (BDD) [7], or its variance, the twisted binary decision diagram (TBDD) [8] can achieve a high speed design but suffer from extremely large area cost. The S-Box design based on sum of product (SOP) expressions in Refs. [9], [10], [11] also suffer from large silicon area penalty.

A well-known approach to design S-Box from its arithmetic operations involves multiplicative inversion in GF (2⁸) using composite field arithmetic [12], [13], decomposing the field operations from GF (2⁸) to GF ((2⁴)²). Subfield arithmetic is thus used in the computation of an inverse in the Galois Field. In this technique, hardware area cost can be reduced substantially by sharing the multiplicative inverse step for the SubBytes and the InvSubBytes operations. Also, among existing techniques, composite field S-Box architecture is the most area-efficient approach for AES encryption/decryption algorithm as the computation cost of certain Galois Field operations is lower when the operation is performed in an isomorphic composite field. The authors in Ref. [14] reported a fast composite field S-Box architecture that showed an increased throughput rate of 56.25% along with reduced pipeline latency by 40%–60% compared with other conventional designs. The approaches in Refs. [2], [15] results in a very small size of the S-Box, but suffers from a longer critical path than LUT technique. The LUT technique on the other hand has a shorter critical path compared to the composite field approach, but its area-size is 2–3 times larger.

Next, considering the S-box design methodologies reported so far, only Refs. [16], [17], [36] evaluated the performances of the S-Box using the full custom design technique. The advantage of full custom design using state of the art CMOS processes is that it is possible to scale all the transistors down with process scaling without deteriorating the overall performance along with increased speed in most cases. This leads to smaller chip area and low power consumption. Another design methodology is to reduce power consumption by using advanced process technology that offers very low supply voltage. This approach also leads to a reduction in the die area.

S-Box architectures, especially the composite field approach uses the XOR gate as the fundamental logic function along with AND gates. Consequently, enhancing the performance of the XOR gates can significantly improve the critical path performance and die area of the S-Box design. In this paper, we present a low-power design methodology for the S-Box/Inv S-Box which includes minimizing the overall circuit size and critical path delay by implementing a new XOR gate, scaling down the supply voltage and the transistor size, along with choosing an advanced technology for optimized CMOS full custom design. Our approach of optimized full-custom S-Box/Inv S-Box implementation in low cost isomorphic composite field arithmetic using low power minimal transistor count XOR gates have not been considered before in the context of AES implementations. To the best of the authors' knowledge, most reported works use standard static CMOS XOR logic gates requiring 12 transistors resulting in a larger overall silicon-area in spite of any architectural optimization. In addition, minimized implementation of InvSubBytes for Inv S-Box by sharing S-Box resources on the same chip was not considered in many previously reported works.

Section snippets

AES algorithm and s-box implementation preliminaries

AES is a symmetric encryption algorithm which processes a fixed 128-bit data block and variable length keys of 128, 192 and 256 bits. The data block is mapped into a 4×4 array of byte elements called the State matrix. Each byte in the State is considered an element in GF (2⁸) and denoted by S_ij $(0 \leq i, j < 4)$ . The AES is also an iterative algorithm which performs iteratively for 10, 12 or 14 rounds depending on the key length. The AES contains four different data transformations: SubBytes,

Design methodology and proposed S-box/INV S-box architecture

The proposed S-Box/Inv S-Box architecture employs combinational logic using composite field arithmetic based on Ref. [3] and optimized in Ref. [32] with a different choice of the polynomial coefficients and the implementation of the constant multiplication with λ. The S-Box is implemented using XOR circuits, multiplexers and AND gates. The Optimization of the low voltage and low power composite field S-Box implementation has been further enhanced in this paper by using a new six transistors XOR

Novel XOR gate for low power CMOS Galois field arithmetic

From the above Galois Field arithmetic for S-Box and the corresponding Fig. 3, Fig. 4, Fig. 5, Fig. 6, Fig. 7, Fig. 8, Fig. 9 it is clearly evident that the implementation of S-Box/Inv S-Box requires a large number of XOR operations whose efficient and low power implementation can result in a substantially improved CMOS S-Box hardware design.

Compact S-Box/Inv S-Box chip and comparison with other designs

The Hardware architecture in Fig. 2 is implemented to perform both encryption and decryption with S-Box and Inv S-Box sharing the same hardware. It is an improved modification of the architecture in Ref. [32] along with this inclusion of the inverse S-Box which was not implemented in Ref. [32]. This modification enables the implementation of Inverse SubBytes for decryption by reusing the same S-Box resources. Using the novel low-power and low-area XOR gate of the previous section, a circuit

Conclusion

This paper presents a full custom hardware implementation of low power AES S-Box/Inv S-Box architecture in the 65 nm CMOS process employing circuit level optimization. The design demonstrated a new approach to minimize silicon-area of S-Box design by using a new 2-input XOR gate for low-power composite field arithmetic in order to reduce the power dissipation and delay for the overall circuit. The results indicate that our design is suitable for applications which require small area and low

Acknowledgment

The authors wishes to acknowledge the anonymous reviewers for their comments which helped in enhancing the quality of the paper. Acknowledgment is also due to Dr. Shaun Cooper of the Institute of Information and Mathematical Sciences for discussions on Galois Field arithmetic.

Nabihah Ahmad received the B.S. in electrical, electronic and system engineering from Universiti Kebangsaan Malaysia (UKM) and M.S. degrees in electronic engineering from Universiti Tun Hussein Onn Malaysia (UTHM) in 2002 and 2006, respectively. She is currently a Ph.D. candidate with the Center for Research in Analog and VLSI Microsystem Design at School of Engineering and Advanced Technology, Massey University, New Zealand. Her research interests include low power VLSI circuit design,

References (46)

Y.H. Zeng et al.
Low-power clock-less hardware implementation of the Rijndael S-box for wireless sensor networks
The Journal of China Universities of Posts and Telecommunications
(2007)
J. Daemen et al.
The Design of Rijndael
(2002)
S. Tillich et al.
Area, delay, and power characteristics of standard-cell implementations of the AES S-Box
Journal of Signal Processing Systems
(2008)
A. Satoh et al.
A compact Rijndael hardware architecture with S-Box optimization, ASIACRYPT 2001
Lecture Notes in Computer Science
(2001)
N. Sklavos et al.
Architectures and VLSI implementations of the AES-proposal Rijndael
IEEE Transactions on Computers
(2002)
X. Zhang et al.
High-speed VLSI architectures for the AES algorithm
IEEE Transactions on VLSI Systems
(2004)
S. Morioka, A. Satoh, An optimized S-box circuit architecture for low power AES design, in: Proceedings of the Workshop...
R.E. Bryant
Graph-Based Algorithms for Boolean Function Manipulation
IEEE Transactions on Computers
(1986)
S. Morioka et al.
A 10-Gbps Full-AES crypto design with a twisted BDD S-Box architecture
IEEE Transactions on VLSI Systems
(2004)
N. Ahmad, R. Hasan, W.M. Jubadi, Design of AES S-box using combinational logic optimization, in: Proceedings of the...

R.R. Rach, P.V. Ananda Mohan, Implementation of AES S-Boxes using combinational logic, in: Proceedings of the IEEE...

N. Chen, Z. Yan, High-performance designs of AES transformations, in: Proceedings of the International Symposium on...

C. Nalini, P.V. Anandmohan, D.V. Poomaiah, V.D. Kulkarni, Compact designs of SubBytes and MixColumn for AES, in:...

V. Rijmen, Efficient implementation of the Rijndael S-Box, 2000. Available from:...

R. Liu, K.K. Parhi, Fast composite field S-box architectures for advanced encryption standard, in: Proceedings of the...

D. Canright et al.

Compact S-Box for AES, workshop on cryptographic hardware and embedded systems 2005 (CHES 2005)

Lecture Notes in Computer Science

(2005)

L. Zhenglin et al.

A high security and low-power AES S-Box full-custom design for wireless sensor network

Proceedings of the International Conference on Wireless Communications, Networking and Mobile Computing

(2007)

D. Kamel, F.X. Standaert, D. Flandre, Scaling trends of the AES S-box lower power consumption in 130 and 65nm CMOS...

N. Ahmad, R. Hasan, Design of XOR gates in VLSI implementation, in: Proceedings of the Electronic New Zealand...

H.T. Bui et al.

Design and analysis of low-power 10-transistor full adders using novel XOR-XNOR gates

IEEE Transactions on Circuits and Systems II: Analog and Digital Signal Processing

(2002)

A.M. Shams et al.

Performance analysis of low-power 1-bit CMOS full adder cells

IEEE Transactions on VLSI Systems

(2002)

D. Radhakrishnan

Low-voltage low-power CMOS full adder

IET Proceedings on Circuits, Devices and Systems

(2001)

H. Lee et al.

New XOR/XNOR and full adder circuits for low voltage, low power applications

Microelectronics Journal

(1998)

Cited by (51)

Design and implementation of low power Advanced Encryption Standard cryptocore utilizing dynamic pipelined asynchronous model
2023, Integration
The Advanced Encryption Standard (AES) has added new dimension to cryptography with its potentials of safeguarding the health care devices and systems. This paper presents the design of low power VLSI architecture for AES crypto application focusing on ECG signal transmission. The proposed design addresses the next generation cyber physical security requirement and low power for portable biomedical digital assistant devices. The proposed work utilizes on-chip built-in BRAM, configurable pipelining stages for combinational path, clock gating mechanism, and dynamic pipelined asynchronous operation. The use of dynamic pipelined asynchronous model (DPAM) controls different functional units of AES in controlled sequential order with gating of global clock and reset signals. This prevents unwanted switching activity in the functional units that waits for valid data. The performance of the proposed low power DPAM-AES architecture is compared with existing low power architectures in terms of power consumption, area, delay and throughput. The proposed DPAM-AES architecture results in power consumption starting from 260 mW to 329 mW, which are 47 mW to 385 mW lesser power for the frequencies of 100 MHZ to 800 MHZ respectively. The low power AES micro architecture is designed using VHDL and simulated, synthesized and implemented using Vivado design suite targeting ZynqSoC 7z020clg484-1 FPGA platform. The performance of the proposed AES micro architecture can be further improved by targeting to ASIC implementation of the designed AES core.
A new ASIC implementation of an advanced encryption standard (AES) crypto-hardware accelerator
2021, Microelectronics Journal
Citation Excerpt :
However, the S-box in Ref. [26] is only designed for SubBytes transformation, whereas the proposed design includes both the transformations, with S-box and inverse S-box sharing the same hardware. Designs in Refs. [23,26] use the 65 nm node with low threshold voltage, allowing ultra-low Vdd (0.3–0.5 V). By using deep nanometer CMOS along with voltage scaling as an effective constraint, the proposed architecture can achieve an even lower power budget than that in Ref. [23].
Single-chip hardware implementation of Advanced Encryption Standard (AES) offers a low-power and low-area design that is suitable for portable devices. It is widely applicable for numerous encryption needs such as in Bluetooth controller, wireless communication and secure Internet transactions. This paper proposes a new full-custom compact 8-bit data-path architecture core for a single-chip VLSI AES crypto-hardware accelerator. In order to optimize chip-area, power and performance, novel circuit-level techniques, logic minimization, resource sharing and low supply-voltage has been employed. The proposed design is implemented in 130 nm CMOS process and supports both encryption and decryption in Electronic-Codebook-Mode (EBC) using 128-bit keys. Novel S-box/InvS-box, MixColumn/InvMixColumn and ShiftRow/InvShiftRow using low-power Exclusive-OR (XOR) gate is employed to minimize the power consumption. This design utilized 3120 gate-equivalents (GE), including an on-the-fly key scheduling unit with an active chip-area of 640μm × 325 μm (0.208 sq. mm) excluding the bonding pads. It has a power consumption of 4.23 μW/MHz and a throughput of 0.05 Gbit/s (at 100 MHz clock). The proposed AES design thus achieved low-power dissipation, higher throughput with a compact chip-size (silicon-area) compared to other recent implementations.
A low cost fault-attack resilient AES for IoT applications
2021, Microelectronics Reliability
Citation Excerpt :
In the SB(SB−1) operation 8-bit is substituted by S-box (S-box−1) which is a nonlinear transformation in Galois field GF(28). We select the composite field-based S-box [56,57] due to its small implementation area. Another reason for selecting this type of S-box implementation is to easily use the resources sharing between encryption and decryption data-path for low-cost AES implementation (see Fig. 5(a)).
The Internet of Things (IoT) as an emerging infrastructure has an essential rule in daily lives in many domains, ranging from healthcare wearable devices to complex industrial systems. Nevertheless, its security is a challenging issue that has to be addressed. The security can be settled by utilizing cryptographic techniques such as Advanced Encryption Standard (AES) for encryption and authentication. In this paper, we propose 32-bit architecture AES encryption/decryption for utilizing in IoT infrastructure and similar resource-constrained applications. On the other hand, providing robustness against existing malicious attacks is a significant factor in ensuring communication reliably and so securely. Therefore, we propose a low-cost fault-resilient integrated architecture, named LC-FRAES, for data-path and also on-the-fly key expansion unit by exploiting of resource sharing between encryption and decryption processes. The results of both ASIC and FPGA implementations of the proposed architecture are reported and also compared with those of similar recent designs. The comparisons illustrate that the LC-FRAES outperforms its counterparts in many architectural features which make it suitable for IoT applications. Moreover, we provide a comparison between our proposal and lightweight cryptographic designs from literature. The comparisons verify the consistency and appropriateness of proposed architecture for IoT applications. Finally, through the extensive experimental results, we show that LC-FRAES can detect almost all injected faults.
Compact and efficient structure of 8-bit S-box for lightweight cryptography
2021, Integration
In this paper, we design an inversion-based S-box with better hardware implementation than the AES S-box with similar cryptographic properties. The proposed S-box computation involves basically two steps, the field inversion, and the affine transformation. The constructed S-box uses a cost-efficient affine transformation with low area resources and low critical path delay (CPD). The sub-blocks of the S-box, such as field inversion in $F_{2^{4}}$ , are implemented based on the efficient circuits. A large number of gates, in the structure, have been implemented by 2-input NAND and 2-input NOR gates to reduce delay and area. The cryptographic strength of the proposed S-boxes is analyzed by studying the properties of S-box such as Nonlinearity, Differential uniformity (DU), Strict avalanche criterion (SAC), Algebraic degree (AD), Differential approximation probability (DAP), and Linear approximation probability (LAP) in SAGE. Security analysis of the proposed S-box shown that the structure has the security level equal to the AES S-box. Therefore, this structure can be used in the lightweight block ciphers. Also, the implementation results, in 180 nm and 65 nm CMOS technologies, show the proposed S-box is comparable in terms of area, delay, and area $\times$ delay than most of the famous S-boxes.
A low power and energy efficient 4:2 precise compressor based on novel 14T hybrid full adders in 10 nm wrap gate CNTFET technology
2020, Microelectronics Journal
In the realm of VLSI circuits, addition and multiplication are the most pivotal operators in arithmetic units. In this regard, this work aims to propose an energy efficient 4:2 precise compressor utilizing 10 nm wrap gate CNTFET technology. The proposed compressor is constructed from 2 hybrids 14-transistors full adders. The proposed Gate All Around (GAA) CNTFET based full adder employs a hybrid XOR/XNOR logic structure to lower the transistor count and power consumption. After performing rigorous simulations, we have demonstrated that the proposed GAA CNTFET based full adder and 4:2 compressor depict 175.3 nW and 308.5 nW power consumption respectively along with superior propagation delay response (4.4 ps and 10.57 ps respectively) at a 0.5 V nominal supply voltage. Moreover, the proposed designs occupy a lower layout area (0.195 μm² and 0.43 μm² for the proposed full adder and 4:2 compressor respectively) compared to the other counterparts. Our results accentuate the application of the proposed GAA CNTFET based 4:2 precise compressor for ultra miniature microprocessing systems.
High speed and low power implementation of AES for wireless sensor networks
2018, Procedia Computer Science
In the recent years, data security has become the biggest concern due to the increasing number of connected devices. Hence, cryptography has become vital for enhancing data security. Cryptography is a technique which converts the data into an unintelligible form. In applications such as the wireless sensor networks, it plays a major role since most of the data is transmitted over an insecure channel. Symmetric key cryptosystems play a major role in such applications, since they are lightweight and faster in operation. Power dissipation of the system is another major concern for such applications as they are battery-operated devices. In this paper, the power dissipation of the circuit is enhanced by trading off area and throughput. The power is minimized by the method of parallel processing the hardware along with reduced amount of redundant hardware. The power dissipation of the circuit for the proposed structure is presented to be 2.04 times less than that of existing parallel processing structures. The proposed architecture was implemented using the industry standard Cadence® Encounter SoC tools using TSMC180 technology library.

View all citing articles on Scopus

S. M. Rezaul Hasan received his Ph.D. in Electronics Engineering from the University of California Los Angeles (UCLA) in 1985. From 1983 to 1986 he was a VLSI design engineer at Xerox Microelectronics Center in El Segundo, CA., where he worked in the design of CMOS VLSI microprocessors. In 1986 he moved to the Asia-Pacific region and served several institutions including Nanyang Technological University, Singapore (1986–1988), Curtin University of Technology, Perth, Western Australia (1990–1991) and University Sains Malaysia, Perak, Malaysia (1992–2000). At University Sains Malaysia he held the position of Associate Professor and was the coordinator of the Analog and VLSI research laboratory. He spent the next four years (2000–2004) in the West Asia-Gulf region where he served as an Associate Professor of Microelectronics, Integrated Circuit Design and VLSI Design in the Department of Electrical and Computer Engineering at the University of Sharjah, Sharjah, United Arab Emirates. While in Sharjah he received the National Bank of Sharjah Award for outstanding research publication in Integrated Circuit Design. Presently he is the Director of the Center for Research in Analog and VLSI microsystems dEsign (CRAVE) at Massey University, Auckland, New Zealand. He is also a senior faculty member within the School of Engineering and Advanced Technology (SEAT) in Electronics and Computer Engineering, teaching courses in Advanced Microelectronics and Integrated Circuit Design. He has published over 138 papers in international journals and conferences in the areas of Analog, Digital, RF and Mixed-Signal Integrated Circuit Design and VLSI Design. Dr. Hasan has also served as a consultant for many electronics companies. His present areas of interest include Analog and RF Integrated Circuit and Microsystem Design, VLSI signal processing, CMOS sensors, CMOS Bioelectronics and Biological (gene-protein) Circuit Design. He is a senior member of the IEEE and an editor of the Hindawi journal of active and passive electronic components.

View full text

Low-power compact composite field AES S-Box/Inv S-Box design in 65 nm CMOS using Novel XOR Gate

Abstract

Highlights

Introduction

Section snippets

AES algorithm and s-box implementation preliminaries

Design methodology and proposed S-box/INV S-box architecture

Novel XOR gate for low power CMOS Galois field arithmetic

Compact S-Box/Inv S-Box chip and comparison with other designs

Conclusion

Acknowledgment

The Journal of China Universities of Posts and Telecommunications

The Design of Rijndael

Area, delay, and power characteristics of standard-cell implementations of the AES S-Box

Journal of Signal Processing Systems

A compact Rijndael hardware architecture with S-Box optimization, ASIACRYPT 2001

Lecture Notes in Computer Science

Architectures and VLSI implementations of the AES-proposal Rijndael

IEEE Transactions on Computers

High-speed VLSI architectures for the AES algorithm

IEEE Transactions on VLSI Systems

Graph-Based Algorithms for Boolean Function Manipulation

IEEE Transactions on Computers

A 10-Gbps Full-AES crypto design with a twisted BDD S-Box architecture

IEEE Transactions on VLSI Systems

Compact S-Box for AES, workshop on cryptographic hardware and embedded systems 2005 (CHES 2005)

Lecture Notes in Computer Science

A high security and low-power AES S-Box full-custom design for wireless sensor network

Proceedings of the International Conference on Wireless Communications, Networking and Mobile Computing

Design and analysis of low-power 10-transistor full adders using novel XOR-XNOR gates

IEEE Transactions on Circuits and Systems II: Analog and Digital Signal Processing

Performance analysis of low-power 1-bit CMOS full adder cells

IEEE Transactions on VLSI Systems

Low-voltage low-power CMOS full adder

IET Proceedings on Circuits, Devices and Systems

New XOR/XNOR and full adder circuits for low voltage, low power applications

Microelectronics Journal