Machine learning on-a-chip: A high-performance low-power reusable neuron architecture for artificial neural networks in ECG classifications

https://doi.org/10.1016/j.compbiomed.2012.04.007Get rights and content

Abstract

Artificial neural networks (ANNs) are a promising machine learning technique in classifying non-linear electrocardiogram (ECG) signals and recognizing abnormal patterns suggesting risks of cardiovascular diseases (CVDs). In this paper, we propose a new reusable neuron architecture (RNA) enabling a performance-efficient and cost-effective silicon implementation for ANN. The RNA architecture consists of a single layer of physical RNA neurons, each of which is designed to use minimal hardware resource (e.g., a single 2-input multiplier–accumulator is used to compute the dot product of two vectors). By carefully applying the principal of time sharing, RNA can multiplexs this single layer of physical neurons to efficiently execute both feed-forward and back-propagation computations of an ANN while conserving the area and reducing the power dissipation of the silicon. A three-layer 51-30-12 ANN is implemented in RNA to perform the ECG classification for CVD detection. This RNA hardware also allows on-chip automatic training update. A quantitative design space exploration in area, power dissipation, and execution speed between RNA and three other implementations representative of different reusable hardware strategies is presented and discussed. Compared with an equivalent software implementation in C executed on an embedded microprocessor, the RNA ASIC achieves three orders of magnitude improvements in both the execution speed and the energy efficiency.

Introduction

Artificial neural networks (ANNs), an established biologically inspired machine learning paradigm, mimic their biological counterparts in the human brain to provide effective learning capability in recognizing patterns in complex, non-linear signals [1], [2], [3]. In ANN, neuron is the basic computational cell. An ANN consists of layers of neurons responsible for carrying out feed-forward (e.g., in classification mode) and back-propagation (e.g., training mode). Compared to other popular machine learning algorithms, such as Bayesian network and support vector machine (SVM), ANN has many advantages, such as simple and parallelizable computational structure, comparable classification performance, and adaptive learning parameters (e.g., weights and biases of its interconnected neuron model) which broadens its generalizebility to unforeseen input data [4], [5], [6]. As mobile devices (e.g., smartphones) witnessed tremendous growth and started to emerge as a popular post-PC computing platform, it is not unrealistic to see many applications, such as voice recognition, biometric identification (e.g., finger prints), and physiological signal classification (e.g., electrocardiography and electromyography), which can be effectively solved by ANN on PCs, will be transplanted to these mobile devices to enable new use cases. A technical obstacle needs to be resolved before this vision can turn into reality: how to run a computation intensive neural learning algorithm in a mobile form factor where resource is scarce and battery life determines its usability. It is very challenging for a pure software implementation executed on a commodity embedded microprocessor to meet both the real-time deadline and energy efficiency requirements imposed. Conventional ASIC based ANN implementation can meet both speed and energy requirements but suffers the high NRE costs and lacks the flexibility to adapt algorithmic changes.

Prior hardware based ANN implementations include the use of a finite state machine (FSM)-based controller plus a generic arithmetic logic unit (ALU) to implement neurons for the feed-forward computation, and reusing the neurons in hidden layers and output layer for feed-forward computation. However, few techniques had been proposed to efficiently design neurons according to the characteristics of the ANN algorithm, nor had any technique been proposed for efficiently implementing the back-propagation training algorithm on embedded mobile device.

In this paper, we propose a cost-effective reusable neuron architecture (RNA) to perform artificial neural network-based machine learning on a single chip. Unlike the conventional hardware-based ANN technique, the neuron in RNA can be configured dynamically to efficiently perform the feed-forward and back-propagation computation. Moreover, RNA only uses one layer of neurons and one unique look-up table (LUT) to implement the full ANN algorithm. RNA reduces the hardware resource requirement by multiplexing the same physical layer of neurons so that a single neuron layer can behave like different layers of the network at different algorithmic stages. The global controller is in charge of dynamically reconfiguring the network layer to execute the corresponding connections, while the local controller dynamically configures the neurons in RNA to perform all the required computation patterns associated with both the feed-forward and the back-propagation. A single look-up table is included to support faster calculation of a non-linear activation function.

To demonstrate the efficiency and efficacy of the proposed RNA architecture, we conducted a thorough case study to evaluate RNA for an important real-life application—classifying electrocardiogram (ECG) to detect cardiovascular diseases (CVDs). In this case study, RNA is implemented as an ASIC with dynamic reconfigurability using 45 nm CMOS technology and was compared with three other hardware multiplexing designs in area, power dissipation, and speed. Additionally, the RNA ASIC implementation was also compared with an equivalent software implementation in C executed on a mainstream embedded microprocessor. A proof-of-concept prototype was constructed based on a Xilinx Virtex-5 FPGA board to further validate the design and to demonstrate its feasibility using real commodity hardware.

Section snippets

Related work

Because of the advantages of implementing the ANN on hardware, many prior works have been devoted to map neural networks into hardware, including both with and without hardware resource reuse technique. We put these works into three categories, which are Flat design, Light-weight Neuron design and Layer Reused design.

Proposed design

Since it has been proved that a 2 layers (1 hidden layer and 1 output layer) neural network, shown in Fig. 1, has the capability of approximating any arbitrary function [12], we focus on RNA design with 1 input layer (p neurons), 1 hidden layer (q neurons) and 1 output layer (r neurons, s=max(q, r)).

Case study: ECG classification

In this case study, we review and discuss several generations of embedded mobile devices to perform cardiovascular diseases (CVDs) detection, which are able to provide daily ECG information for diagnosis, as well as comparison between the proposed RNA solution and the other current solutions.

Conclusion and future work

In this paper, we proposed RNA, a cost-effective hardware multiplexing architecture for artificial neural networks capable of performing both feed-forward and back-propagation computations. We implemented the proposed RNA design using custom ASIC design flow in 45 nm CMOS technology. We also prototyped RNA on a Xilinx Virtex-5 FPGA for verification.

In an ECG classification case study, a three layers ANN structure (51–30–12) was implemented as four different ASIC designs: RNA, Flat design,

Conflict of interest statement

None declared.

Yuwen Sun received the B.S. degree in information science and electrical engineering from Zhejiang University, Hangzhou, in China, and M.S. degree from the Department of Electrical and Computer Engineering, University of Pittsburgh, Pittsburgh, PA. He is working on his Ph.D. study in University of California, Los Angeles, now. His current research interests include biomedical computing systems, computer architecture, very large-scale integration, and field-programmable gate array prototyping.

References (19)

  • G. Cottrell, P. Munro, D. Zipser, Learning internal representations of gray scale images: an example of extensional...
  • M. Bianchina et al.

    Learning in multilayered networks used as auto-associators

    IEEE Trans. Neural Networks

    (1995)
  • H. Bourland et al.

    Auto-association by multilayer perceptrons and singular value decomposition

    J. Biol. Cybern.

    (1988)
  • K. Gurney et al.

    An Introduction to Neural Networks

    (1997)
  • I. Aleksander et al.

    An Introduction to Neural Computing

    (1990)
  • S. Haykin

    Neural Networks: A Comprehensive Foundation

    (1998)
  • P. Domingos et al.

    An efficient and scalable architecture for neural networks with backpropagation learning

    Field Program. Logic Appl.

    (2005)
  • R. Gadea, J. Cerda, F. Ballester, A. Macholi, Artificial neural network implementation on a single FPGA of a pipelined...
  • D. Ferrer et al.

    NeuroFPGA—implementing artificial neural networks on programmable logic devices

    Des. Autom. Test Eur.

    (2004)
There are more references available in the full text version of this article.

Cited by (26)

  • GA trained parallel hidden layered ANN based differential protection of three phase power transformer

    2015, International Journal of Electrical Power and Energy Systems
    Citation Excerpt :

    Alternately, in this paper, GA has been used to train the ANN by finding optimum values for layer weights and biases to minimize the fitness function i.e., the mean squared error (MSE). The proposed work also take inspiration from the applications of ANN in other areas like biology and medicine [28,29], where ANN and GA are used for pattern classification in cancer diagnosis and ECG classification. The main aim of the work can be divided into two parts, one being fault detection which involves discrimination of internal fault from other health conditions followed by fault classification, which detects the type of fault once the fault has occurred.

  • Soybean yield prediction by machine learning and climate

    2023, Theoretical and Applied Climatology
View all citing articles on Scopus

Yuwen Sun received the B.S. degree in information science and electrical engineering from Zhejiang University, Hangzhou, in China, and M.S. degree from the Department of Electrical and Computer Engineering, University of Pittsburgh, Pittsburgh, PA. He is working on his Ph.D. study in University of California, Los Angeles, now. His current research interests include biomedical computing systems, computer architecture, very large-scale integration, and field-programmable gate array prototyping.

Allen C. Cheng (M’98) received the Ph.D. degree in computer science and engineering from the University of Michigan, Ann Arbor. He was an Assistant Professor in the Departments of Electrical and Computer Engineering, Computer Science, Bioengineering, and Neurological Surgery, University of Pittsburgh, Pittsburgh, PA, where he also directed the Advanced Computing Technology Laboratory. He is working at Nokia Research Center, Nokia Inc, as a senior researcher now. His research interests include the interdisciplinary confluence of computer engineering, computer science, neural engineering, biomedical engineering, and medicine.Dr. Cheng is a member of the Association for Computing Machinery and the American Association for the Advancement of Science.

1

Tel.: +1 415 216 8865.

View full text