Elsevier

Neural Networks

Volume 18, Issue 2, March 2005, Pages 179-189
Neural Networks

NEUROM: a ROM based RNS digital neuron

https://doi.org/10.1016/j.neunet.2004.11.006Get rights and content

Abstract

In this work, a fast digital device is defined, which is customized to implement an artificial neuron. Its high computational speed is obtained by mapping data from floating point to integer residue representation, and by computing neuron functions through residue arithmetic operations, with the use of table look-up techniques. Specifically, the logic design of a residue neuron is described and complexity figures of area occupancy and time consumption of the proposed device are derived. The approach was applied to the logic design of a residue neuron with 12 inputs and with a Residue Number System defined in such a way as to attain an accuracy better than or equal to the accuracy of a 20-bit floating point system. The proposed design (NEUROM) exploits the RNS carry independence property to speed up computations, in addition it is very suitable for using look-up tables. The response time of our device is about 8×TACC, where TACC is the ROM access time. With a value of TACC close to the 10 ns allowed by the current ROM technology, the proposed neuron responds within 80 ns, NEUROM is therefore the neuron device proposed in the literature which allows for maximum throughput. Moreover, when a pipeline mode of operation is adopted, the pipeline delay can assume a value as low as about 14 ns. In the case study considered, the total amount of ROM is about 5.55 Mbits. Thus, using current technology, it is possible to integrate several residue neurons into a single VLSI chip, thereby enhancing chip throughput. The paper also discusses how this amount of memory could be reduced, at the expense of the response time.

Introduction

For many years, interest in Artificial Neural Networks (ANN) has been growing in a broad area of applications. A great deal of effort has been made to enhance ANN performance, especially in the framework of very complex applications with real time constraints, for example, real time image processing and artificial vision (Bello, 2000, Clark and Furth, 1999, Hammadou et al., 2000, Montufar-Chaveznava et al., 2001). As far as computing support for an advanced ANN emulation is concerned, the approaches proposed in the literature range from general purpose superscalar computers down to special parallel systems (Ming-Jung, Hau, & Vijayan, 2003), as well as to very fast special processors. There are several examples of the last approach in the literature. In (Clarkson, Ng, & Guan, 1993) a special VLSI chip is presented which implements a p-RAM. In (Bolouri, Morgan, & Gurney, 1994) a neural system, HyperNet, is proposed, which is based on five VLSI custom IC's. In (El-Mousa & Clarkson, 1996), a neurocomputer board is presented, which incorporates the pRAM-256 VLSI neural processor. Another interesting example is a board containing four SIMD-type processors named NEUR04, which can efficiently support a general neural network (Komori, Arima, Kondo, Tsubota, Tanaka, & Kyuma, 1997). In (Lau, 1996) high performance chips/systems for processing neural networks are presented, among which there are two fully digital implementations, namely the floating point SNAP (SIMD Numerical Array Processor) and the fixed point CNAPS (Connectionist Neural Adaptive Processor System). In (Hasan & Ng, 1997) the SLiFBAM, a parallel VLSI BAM processor, is described, which is based on bidirectional associative memories. There are many research papers related to hardware implementation of neural networks. However, in order to make significant comparisons, only those proposals that strictly relate to the goals of our work are considered in this paper.

We focus on the definition of a fast digital device that implements an artificial neuron. More specifically, with reference to a trained neural network, a neuron is considered in which inputs and weights are real numbers. Our approach is to enhance its computational speed implementing it by means of a physical device based on residue arithmetic units. The proposed approach exploits both the RNS carry independence property to speed up the computations, and the ability of the RNS to be implemented by means of lookup tables.

In Section 2 the fundamentals of a Residue Number System (RNS), which is the number representation system adopted in this work, are recalled.

In Section 3 the integer reduction problem in an artificial neuron is considered, and the aim is to find the integer range required to emulate neural computations that use floating point notation, without loss of accuracy.

In Section 4 the logic design of a table lookup structure for a residue neuron is described, while Section 5 outlines a case study concerning the design of a residue neuron covering the requirements of many neural applications in terms of both input number and variables ranges.

With regard to design issues, a 12-input neuron is considered, as well as a 20-bit floating point notation, with 16 bits of signed mantissa and 4 bits of two-complement exponent, is assumed for all variables modelling the neuron. With these values, several applications can achieve the required level of accuracy and a ROM-based implementation (NEUROM) yields a high time performance and a practicable area occupancy. In fact, we evaluated the response time of this residue neuron to be less than 80 ns, while a total memory amount of about 5.55 Mbits is required. It will be shown that NEUROM has a better throughput than other high performance devices proposed in the literature. Moreover, the feasibility of integrating several NEUROMs on the same chip will be discussed.

The number of NEUROM inputs can be increased, provided that the overall number of ROMs is increased as well, without ROM address space increasing. On the other hand, the upper bound to the address space expansion limits the size of RNS moduli.

Finally, when a pipeline mode of operation is adopted, we have evaluated that the pipeline delay can assume a value close to the time necessary to access a ROM.

Memory-based implementations in the field of Pattern Recognition were first proposed in (Bledsoe and Blisson, 1962, Bledsoe and Browning, 1959), and memory-based neurons theory was assessed in (Gurney, 1989) and (Gorse and Taylor, 1988, Gorse and Taylor, 1990a, Gorse and Taylor, 1990b, Gorse and Taylor, 1990c, Gorse and Taylor, 1991, Gorse et al., 1993, Gorse et al., 1997).

In the following a nomenclature of less known mathematical terms and symbols is given. However, symbols will generally also be defined in the text when they first occur.

    x

    is the greatest integer less than or equal to the real number x

    x

    is the least integer greater than or equal to the real number x

    abs(x)

    is the absolute value of x

    |n|m

    with n and m integers, denotes the remainder of n/m that is nmn/m

    [A,B]

    is the set of numbers between A and B, with A<B, A and B included

    [A,B)

    is the set of numbers between A and B, with A<B, A included, B not included

    {ei: P(ei)}

    is the set of elements ei that satisfy predicate P(ei)

    {ei}, i=1,…,n

    is a set of n elements ei

    {e1, e2, e3}

    is the set of elements e1, e2, e3

    states a one-to-one correspondence between an integer and its RNS representation

    sign(x)

    sign of x

    a*

    is the real number corresponding to the floating point representation of the real number a

    approximate equality

Section snippets

Residue Number Systems

In a Residue Number System, a set of p integers {mj: mj>1} called moduli is defined as the representation base. In such a system, an integer N is represented by means of p integers nj, where nj=|N|mj=NN/mjmj; in this equality, ⌊N/mj⌋ denotes the largest integer not exceeding N/mj. Integers nj are called residue digits. If moduli mj are pairwise relatively prime, it can be shown (Szabo & Tanaka, 1967) that there is a unique representation for each number N in the range [0,M), with M=j=1pmj.

Integer arithmetic versus floating point arithmetic

Let us consider the artificial neuron operation. First, the neuron function u=F(x1,x2,,xn)=i=1nxi×wi is computed, where (xi, i=1,…,n) is the input pattern vector to the node, and (wi, i=1,…,n) is the associated weight vector. We assume that inputs and weights are normalized to the range [−1,1]. Second, the activation function a=f(u)=f(F(x1,x2,,xn)) is performed. To keep the benefits of residue arithmetic, it is advisable to choose a linear function as an activation function:a=f(u)=a0+a1u

Logic design of a residue neuron

In this section we describe the logic organization of an artificial neuron, in which residue arithmetic is implemented (residue neuron).

Consider a Residue Number System of p moduli mj and range [0, M−1] where M=j=1pmj>2H. The proposed logic organization exploits the RNS carry independence property to speed up the computations. In addition, it can be based on lookup tables, at least as long as the residue digit ranges can be kept sufficiently small. In this paper, this table approach is

A floating point equivalent residue neuron

In this section we present a case study, which aims to evaluate the performance in terms of delay time and memory capacity for a residue neuron featuring an integer range [−H,+H] with H=222. This range makes it able to cover several neural applications, according to the considerations in Section 3.

Let us consider a neuron with 12 inputs (consequently the weight range is about ±1/3×220), and assume that an RNS with p=4 moduli, m1=63, m2=55, m3=53, m4=47, that is M=8,631,315>223, and with p′=4

Conclusions

We have described the logic design of a high speed table driven residue neuron. High speed is required in complex applications such as image processing and artificial vision. Unlike standard computer arithmetic, residue arithmetic complies with high speed requirements whenever the most frequent operations are additions and multiplications.

We have also considered the performance in terms of memory capacity and response time for a ROM based residue neuron (NEUROM) with 12 inputs, featuring an

References (28)

  • N. Clark et al.

    Intelligent vision systems using dynamic neural networks

    Proceedings SPIE

    (1999)
  • T. Clarkson et al.

    The pRAM: an adaptive VLSI chip

    IEEE Transactions on Neural Networks

    (1993)
  • A.H. El-Mousa et al.

    Multi-configurable pRAM based neuro-computer

    Neural Network World

    (1996)
  • D. Gorse et al.

    Reinforcement training strategies for probabilistic RAMs

  • Cited by (0)

    View full text