VLSI architecture for parallel radix-4 CORDIC

https://doi.org/10.1016/j.micpro.2012.12.001Get rights and content

Abstract

COordinate Rotation DIgital Computer (CORDIC) algorithm is an iterative method for fast hardware implementation of the elementary functions such as trigonometric, inverse trigonometric, logarithm, exponential, multiplication and division functions in a simple and elegant way. This paper presents a regular and scalable VLSI architecture for the implementation of parallel radix-4 rotational CORDIC algorithm. Thorough comparison of the proposed architecture with the available architectures has been carried out to show the latency and the hardware improvement. Furthermore, the proposed architecture is coded for 16-bit precision using the VHDL language. The functionally simulated net list has been synthesized with 90 nm CMOS technology library and the area-time measures are provided. This architecture is also implemented using Xilinx ISE7.1i software and a Virtex device.

Introduction

The advances in the VLSI technology have stimulated a great interest in developing special purpose parallel processor arrays to facilitate realtime signal processing. The basic arithmetic operations required in VLSI arrays are implemented with multiplication and accumulation (MAC) unit. The reduction in the hardware cost also motivated the development of sophisticated Digital Signal Processing (DSP) algorithms to enhance the performance of modern DSP systems. Many of these sophisticated algorithms require the evaluation of elementary functions such as trigonometric, exponential and logarithmic functions. The commonly used software solutions for the digital implementation of these functions are table lookup method and polynomial expansions. These methods are preferred for applications requiring low resolution. For higher precision, table look up method is not feasible and polynomial approximation methods involve computationally intensive multiplications and additions/subtractions. These functions cannot be evaluated with MAC based arithmetic units since they result in significant performance degradation. The COordinate Rotation DIgital Computer (CORDIC) algorithm is developed [1] as an alternative solution for the computation of these elementary functions. CORDIC offers an unified iterative solution to efficiently evaluate each of these elementary functions by formulating all evaluation tasks as a rotations of 2 × 1 vector in various coordinate systems. The same CORDIC processor can evaluate all these elementary functions using the same hardware within the same time by varying a few simple parameters.

In the conventional CORDIC [1], rotation of a two dimensional vector by a target angle is achieved by decomposing the desired rotation angle into the weighted sum of a set of predefined elementary rotation angles such that the rotation through each of them can be realized with simple shift-and-add operations. Subsequently, the CORDIC algorithm is extended [2] to propose a unified algorithm for computation of the rotation in circular, linear, and hyperbolic coordinate systems, embedding coordinate systems as a parameter. These transcendental functions are the core for many applications such as digital signal processing [3], image processing [4], [5], and kinematic processing [6]. In [4], FPGA realization of a 16-bit CORDIC based 128 point FFT processor for biomedical signal processing is presented. The choice of the CORDIC algorithm for realizing the basic butterfly operation for the FFT saves hardware compared to its counterparts employing other techniques. The advances in the VLSI technology have extended the application of CORDIC algorithm recently to the field of biomedical signal processing [7], [8], neural networks [9], and wireless communications [10], [11] to mention a few. In [8], CORDIC-based unified VLSI architecture for implementing window functions for real time spectral analysis is presented which requires less hardware in contrast to the ROM based implementations of windowing functions. In [10], digit-pipelined Direct Digital Frequency Synthesis (DDFS) based on Differential CORDIC is presented which does not incur the exponential growth of hardware with precision in contrast to the ROM lookup table approach of realization of DDFS, where all the required sine/cosine values are stored in a ROM.

The design of an architecture for the implementation of CORDIC algorithm with an iterative nature depends on the design constraints such as silicon area, speed and power consumption. In general, the architectures can be broadly classified as folded and unfolded. Folding provides a means for trading area for time in signal processing architectures. Folded architectures are obtained by time multiplexing all the iterations on a single functional unit. These folded architectures can be further subdivided into bit-serial and word-serial architectures depending on whether the functional unit implements the logic for one bit or one word of each iteration of the CORDIC algorithm. The CORDIC algorithm has traditionally been implemented using a bit serial architecture [1] and a word serial architecture [2], [12]. The major drawback of these architectures is high latency. This is overcome by unfolding the iteration process, so that each of the processing elements always perform the same iteration. The unfolded architecture realizes the shift operation using hardwired shifts rather than time and area consuming barrel shifters and eliminates the ROM required for storing the elementary angles. The latency can be further decreased by employing redundant arithmetic in the implementation of CORDIC algorithm.

A redundant radix-2 CORDIC is proposed using carry save (CS) arithmetic [13] to reduce the time for each iteration of the conventional CORDIC. However, it results in variable scale factor. In order to reduce the cost of this redundant CORDIC for the scale factor calculation and the scaling operation, two constant scale factor redundant CORDIC algorithms are proposed using signed digit (SD) arithmetic [14]. Since these methods lead to more complicated iterations or extra correcting rotations, two methods using CS arithmetic [15], [16] and two methods using SD arithmetic [16], [17] are proposed to reduce latency. These methods offer low latency solutions at the cost of increasing hardware. The latency as well as the hardware can be reduced by employing higher radix schemes [18], [19]. Both redundant and higher radix based CORDIC algorithms are still iterative in nature, and greatly restrict the speed of implementation of the algorithm. This prompted the researchers to devise methods for reducing the delay of iteration.

The delay of each iteration is composed of two delays, the delay to predict the new rotation direction and the delay involved in the computation of rotation. The total computation delay can be reduced by precomputing the direction of microrotations [20], [21]. It can be further reduced by eliminating iterative nature in the x/y path completely [22] or partially [23]. Based on the study of the available architectures and algorithms [24], we observe that the latency and hardware of radix-2 CORDIC can be reduced by employing redundant radix-4 arithmetic and parallelizing the determination of direction of rotations [25]. In this paper, we present the design of an unfolded architecture to reduce the computation delay of CORDIC.

The remainder of the paper is organized as follows. We present a brief review of a rotational radix-4 CORDIC algorithm in Section 2. The proposed architecture is discussed in Section 3. The latency and hardware complexity comparison of the proposed architecture with the unfolded parallel architectures available in the literature is presented along with the synthesis results in Section 4. Finally, conclusions are presented in Section 5.

Section snippets

Radix-4 CORDIC algorithm

The basis for CORDIC algorithm is a two dimensional vector rotation in the xy-plane (see Fig. 1) [24]. This is accomplished by rotating a vector through a sequence of elementary angles whose algebraic sum approximates the desired rotation angle [1]. These elementary angles are selected such that the vector rotation through each of them may be computed easily with simple shift and add operations. The CORDIC method can be employed in two different modes, namely the rotation mode and the vectoring

Proposed architecture

In this paper, an unfolded architecture is presented for the implementation of parallel radix-4 rotational CORDIC algorithm to address latency. The latency of radix-2 CORDIC is reduced by using radix-4 number system and redundant arithmetic [19]. The redundant radix-4 arithmetic requires more delay to determine the direction of rotations iteratively compared to redundant radix-2 arithmetic, since it takes more time to select from among the five rotation direction values, and to select an

Evaluation

In this section, we present the latency comparison of the proposed architecture with various architectures available in the literature. A true comparison between different implementations is possible only if circuit level simulations are provided. Since, this is not always made available in the literature, a first order comparison based on the number of full adder levels for delay and number of full adders for hardware complexity is presented. We consider the length of data path as n for

Conclusions

In this paper, a new architecture is proposed to address area and computation delay in rotational CORDIC. Reduction in area and computation delay is achieved by halving the number of iterations and precomputing all the direction of rotations. The proposed architecture is fully scalable and can be extended to higher accuracy as well. The latency and area of the proposed architecture is computed in terms of full adder delay and full adder area, so that, these architectures can be implemented

B. Lakshmi received her Bachelor degree in Electronics and communication Engineering from Nagarjuna University, Andhra Pradesh, India. In 1990, she received her Master of Technology degree in Electronics and Instrumentation from National Institute of Technology (NIT), Warangal. Since 1990, she has been working as faculty member at NIT Warangal. She was a research scholar at the department of Electronics and Electrical Communication Engineering of Indian Institute of Technology, Kharagpur during

References (35)

  • A.S. Dhar et al.

    An array architecture for fast computation of discrete Hartley transform

    IEEE Trans. Circ. Syst.

    (1991)
  • K.C. Ray et al.

    CORDIC-based unified VLSI architecture for implementing window functions for real time spectral analysis

    IEE Proc. – Circ. Dev. Syst.

    (2006)
  • A. Meyr et al.

    A parallel CORDIC architecture dedicated to compute the gaussian potential function in neural networks

    Elsevier Eng. Appl. Artif. Intell.

    (2003)
  • C.Y. Kang et al.

    Digit-pipelined direct digital frequency synthesis based on differential CORDIC

    IEEE Trans. Circ. Syst.

    (2006)
  • M.D. Erecegovac et al.

    Digital Arithmetic

    (2004)
  • M.D. Ercegovac, T. Lang, Fast cosine/sine implementation using on-line CORDIC, in: Proc. 21st Asilomar Conf. Signals,...
  • N. Takagi et al.

    Redundant CORDIC methods with a constant scale factor for sine and cosine computation

    IEEE Trans. Comput.

    (1991)
  • Cited by (18)

    • A low-cost and high-performance architecture for robust human detection using histogram of edge oriented gradients

      2017, Microprocessors and Microsystems
      Citation Excerpt :

      Orientation computation needs trigonometric function calculation and division operation as given in Eq. (8). Cost paid for hardware implementation using coordinate rotation digital computer (CORDIC) algorithm is high [40,41]. The shift based orientation method is employed to realize Eq. (8) with reduced hardware cost, initially the orientation is uniformly spaced over 0°−180° and is divided into nine bins as shown in Fig. 8.

    • A Comparative Study on CORDIC Algorithms and Applications

      2023, Journal of Circuits, Systems and Computers
    • A Decision-Based CORDIC Hardware for Arc Tangent Calculation

      2023, Proceedings of International Conference on ASIC
    View all citing articles on Scopus

    B. Lakshmi received her Bachelor degree in Electronics and communication Engineering from Nagarjuna University, Andhra Pradesh, India. In 1990, she received her Master of Technology degree in Electronics and Instrumentation from National Institute of Technology (NIT), Warangal. Since 1990, she has been working as faculty member at NIT Warangal. She was a research scholar at the department of Electronics and Electrical Communication Engineering of Indian Institute of Technology, Kharagpur during 2006–2009, and completed her PhD in 2010. Her research interests include digital system design, VLSI design and CORDIC architectures.

    A.S. Dhar received his Bachelor degree in Electronics and Telecommunication Engineering from Bengal Engineering College, Howrah, India in 1987. In 1989, he received his M.Tech degree in Integrated circuits and systems engineering from Indian Institute of Technology, Kharagpur, India. He received his PhD degree from the same institute in 1994, where he is presently serving as an Associate Professor in the Department of Electronics and Electrical Communication Engineering. His research interests include VLSI design, CORDIC, DSP architectures and VLSI for Communication.

    View full text