Area and power efficient pipelined hybrid merged adders for customized deep learning framework for FPGA implementation

https://doi.org/10.1016/j.micpro.2019.102906Get rights and content

Abstract

With the rapid growth of deep learning and neural network algorithms, various fields such as communication, Industrial automation, computer vision system and medical applications have seen the drastic improvements in recent years. However, deep learning and neural network models are increasing day by day, while model parameters are used for representing the models. Although the existing models use efficient GPU for accommodating these models, their implementation in the dedicated embedded devices needs more optimization which remains a real challenge for researchers. Thus paper, carries an investigation of deep learning frameworks, more particularly as review of adders implemented in the deep learning framework. A new pipelined hybrid merged adders (PHMAC) optimized for FPGA architecture which has more efficient in terms of area and power is presented. The proposed adders represent the integration of the principle of carry select and carry look ahead principle of adders in which LUT is re-used for the different inputs which consume less power and provide effective area utilization. The proposed adders were investigated on different FPGA architectures in which the power and area were analyzed. Comparison of the proposed adders with the other adders such as carry select adders (CSA), carry look ahead adder (CLA), Carry skip adders and Koggle Stone adders has been made and results have proved to be highly vital into a 50% reduction in the area, power and 45% when compared with above mentioned traditional adders.

Introduction

In recent years, there has been a dramatic increase in neural network and deep learning framework compared to the traditional learning models in various fields such as image, vision, medical systems and communication. Various deep learning algorithms such as Re Concurrent Neural Networks (RNN), Convolutional Neural Networks (CNN) have been proposed for the different application research areas. With the advent of above-mentioned algorithms, the level of accuracy in detection has increased from 78% to 95% and further improved the classification ratio by extracting the several features. In short, capabilities of the deep learning algorithms have made them highly for the application of artificial intelligence (AI).

However, Deep Neural Network (DNN) structures face complexities in terms of storage and power. Since the size of the deep learning models is proportional to the input sizes, processing needs more FLOPS (floating point Operations), power and area. Therefore, it is important to optimize the computational model for the deep learning of neural networks. CPUs have the ability to perform 10–100 GFLOP/Sec. However, the efficiency of the power is less than 1 GOP/J. [1] and becomes very complex in the achievement of the performance improvement and low power and area. Also, it also becomes a real nightmare when implementing in the embedded devices where the power and area plays a mandatory role.

In addition, various CPU, GPU and FPGA are gradually becoming the new trends in the implementation of an area efficient deep learning neural networks. [2] FPGAs can accomplish high parallelism and simplified logic to the calculation procedure of NN with hardware equipment for any particular models. A few examiners demonstrate the possible of simplifying the neural system model in equipment in a cordial manner without influencing the precision of the model. Thus, FPGA can accomplish more effectiveness than other Embedded CPU's.

Deep learning framework consists of five different layers namely the convolutional layer, the fully connected layer, the pooling layers, the non-linear layer and the element wise layer. Out of these layers, the convolutional layer (CONV) uses the two-dimensional neuron process in which the adders and multiplier integration are integrated to perform the operation. As we detailed earlier, these adders and multipliers increase exponentially as the size of the input increases.

Hence this paper proposes a novel and unique collection of pipelined Hybrid merged adders which use the integration of the principle of carry select and carry look ahead principle of adders in which LUT is re-used for different inputs consuming less power in the effective area utilization. The paper discusses the implementation of the proposed adders in the CONV layer of deep learning neural networks. The integration is based on the principle of s3 principle in which the LUT is shared among the operands and consisting of both signed and unsigned bits. The paper also discusses about the LUT optimization for packing the proposed adders in an adder-tree structures suitable for deep learning framework which assists nearly 50%, 45% reduction in area and power.

The organization of the paper is follows

  • (i)

    Related works by one or more authors are detailed in the Section-2

  • (ii)

    Background Preliminary about the Deep Learning Algorithms is presented in Section-3

  • (iii)

    The architecture, workflow, implementation mechanism of the proposed pipelined hybrid merged adders are discussed in Section-4

  • (iv)

    Experimental setup, simulation results and comparative analysis are detailed in the Section-5

  • (v)

    Conclusion with the indication of future scope is presented in Section-6.

Section snippets

Related Works

Teng Wang examined the neural system accelerator that is dependent on FPGA. In particular, this work makes separate surveys of accelerators intended for explicit issues, explicit calculations, algorithm highlights, and general formats. Correlation has been done on the structure and usage of the accelerator dependent on FPGA under various gadgets and system models and contrasted with it and the renditions of CPU and GPU. At long last, this work presents a reference to the preferences and

Background Preliminary View

Deep learning is used in the consolidation of the low-level features for representing high level attributes in order to discover distributed feature representation. The idea has been proposed by Hinton in 2006. [1] Unsupervised greedy layer-by-layer training algorithm based on the realization of the desire for settling the deep structure-related optimization issues has been proposed. At that point the deep structure of multi-layer programmed encoder is proposed. In addition, a constitutional

Proposed Adders

The adders proposed by the researcher consist of pipelined hybrid adders which integrate the digital unsigned radix-2 input numbers and two's complement numbers. Consider an unsigned-digit (USD) number whose digits are obtained from the digit set D = {β, •••, 1, 0, 1, •••, β}, where r is the radix and β is the biggest digit in the digit-set. X and Y are given a chance to be excess radix-2 USD numbers and every digit Xi from X involves two bits X + l and X − l, and given as Xi = X + l − X − l.

Experimental Setup

The proposed Pipelined Hybrid Merged Adders were developed using Verilog hardware description language and synthesized and simulated using Xilinx VIVADO tools for evaluation of the different parameters namely area and power. The different Xilinx hardware were taken for the experimentation and the specification used for the experimentations are shown in Table 2.

Meanwhile the proposed adders were simulated using VIVADO for 64 -bit inputs suitable for the image processing application. The

Conclusion

In this investigation, the pipelined hybrid merged adders suitable for deep learning framework have been designed and implemented in Xilinx Virtex-7 xc7s25csga324/xcE6vlx760 FPGA processor. The major parameters namely area utilization and dynamic power consumption were computed and analyzed. The different existing adder structures are also implemented in the Xilinx FPGA processors and are compared with the proposed hybrid adders. Nearly a 50% reduction in area utilization and 45% lower power

Declaration of Competing Interest

There is No conflict of Interest.

Dr. T. Kowsalya received B.E. Degree in Electronics and Communication Engineering in VLB Janaki Ammal College of Engineering and Technology, Coimbatore in the year 1996 and she received M.E Degree in Communication Systems in Government College of Technology, Coimbatore in the year 2005.She has also completed Ph.D Degree in Anna University, Chennai in the year 2018. She worked in various institutions as Associate professor. She is having 22 years of teaching Experience. Currently she is working

References (12)

  • Ying Wang et al.

    Deep burning: automatic generation of FPGA-based learning accelerators for the neural network family

  • Teng Wang, Chao Wang, Xuehai Zhou, Huaping Chen,“A survey of fpga based deep learning accelerators: challenges and...
  • Chen Zhang et al.

    Optimizing fpga-based accelerator design for deep convolutional neural networks

  • Matthieu Courbariaux et al.

    BinaryConnect: training deep neural networks with binary weights during propagations

  • Chao Wang et al.

    DLAU: a scalable deep learning accelerator unit on fpga

    IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems

    (2017)
  • Stylianos I. Venieris et al.

    “Toolflows for mapping convolutional neural networks on FPGAs: a survey and future directions

    ACMComput. Surv.

    (2018)
There are more references available in the full text version of this article.

Cited by (0)

Dr. T. Kowsalya received B.E. Degree in Electronics and Communication Engineering in VLB Janaki Ammal College of Engineering and Technology, Coimbatore in the year 1996 and she received M.E Degree in Communication Systems in Government College of Technology, Coimbatore in the year 2005.She has also completed Ph.D Degree in Anna University, Chennai in the year 2018. She worked in various institutions as Associate professor. She is having 22 years of teaching Experience. Currently she is working as Associate Professor in Muthayammal Engineering College, Rasipuram, Namakkal (Dt),Tamilnadu,India. Her research interests are Low Power VLSI and Signal Processing.

View full text