# Ellora: <u>Exploring Low-Power OFDM-based</u> <u>Radar Processors using Approximate Computing</u>

Rajat Bhattacharjya\*, Alish Kanani<sup>†</sup>, A Anil Kumar<sup>‡</sup>, Manoj Nambiar<sup>‡</sup>, M Girish Chandra<sup>‡</sup>, and Rekha Singhal<sup>§</sup>

\*University of California, Irvine

<sup>†</sup>University of Wisconsin-Madison

<sup>‡</sup>TCS Research, India

§TCS Research, USA

rajatb1@uci.edu, ahkanani@wisc.edu, {achannaanil.kumar, m.nambiar, m.gchandra, rekha.singhal}@tcs.com

Abstract-In recent times, orthogonal frequency-division multiplexing (OFDM)-based radar has gained wide acceptance given its applicability in joint radar-communication systems. However, realizing such a system on hardware poses a huge area and power bottleneck given its complexity. Therefore it has become ever-important to explore low-power OFDM-based radar processors in order to realize energy-efficient joint radar-communication systems targeting edge devices. This paper aims to address the aforementioned challenges by exploiting approximations on hardware for early design space exploration (DSE) of trade-offs between accuracy, area and power. We present Ellora, a DSE framework for incorporating approximations in an OFDM radar processing pipeline. Ellora uses pairs of approximate adders and multipliers to explore design points realizing energy-efficient radar processors. Particularly, we incorporate approximations into the block involving periodogram based estimation and report area, power and accuracy levels. Experimental results show that at an average accuracy loss of 0.063% in the positive SNR region, we save 22.9% of on-chip area and 26.2% of power. Towards achieving the area and power statistics, we design a fully parallel Inverse Fast Fourier Transform (IFFT) core which acts as a part of periodogram based estimation and approximate the addition and multiplication operations in it. The aforementioned results show that Ellora can be used in an integrated way with various other optimization methods for generating low-power and energy-efficient radar processors.

*Index Terms*—OFDM radar, Approximate computing, Low-power design, Periodogram based estimation, IFFT.

## I. INTRODUCTION

Systems involving sensing and communication have been historically separate [1]. However, recently the domain of joint radar-communication (JRC) has emerged, where a radar and a communication system are co-located in a single system. Such a system helps in optimal utilization of spectral resources and benefits both sensing and signalling via cooperation between them [2]. JRC systems find applications in vehicleto-vehicle (V2V) communication scenarios for enabling various safety functions, smart traffic applications and developing autonomous vehicles [3]. However such systems suffer from great complexity and hardware overheads given the complex digital signal processing techniques employed in them [4]. Systems involving orthogonal frequency-division multiplexing (OFDM) waveforms offer great interoperability between both radar and communication system [1]. However, in order to target



Fig. 1: OFDM Radar processing pipeline. Approximated block highlighted in red.

power-constrained edge devices, it becomes important to achieve solutions targeting low-power and energy-efficient hardware.

There have been some works targeting efficient hardware in JRC. One work focused on introducing zero-padding in OFDM-based radar as opposed to the usual cyclic prefix [5]. Another work focused on introducing hybrid precoding and dynamic selection of optimal RF chains [6]. Kaushik et al. proposed an energy-efficient RF chain and DAC bit selection procedure [7]. However studies focusing on low-power and energy-efficient systems in particular for JRC are limited and thus in this paper, we focus on addressing this problem.

In this paper, we propose a design space exploration (DSE) framework, *Ellora* for generating low-power radar processors pertaining to OFDM waveforms using approximate computing [8]. Approximate computing is a computing paradigm that aims at achieving energy efficiency at the expense of tolerable inexactness in applications [8]-[10]. Arithmetic units such as adders and multipliers form the basic building blocks in all such approximate systems [11]–[14]. Ellora helps find out the approximation target in the application pipeline and incorporates pairs of approximate adders and multipliers in the targeted block to achieve area and power savings with just a marginal loss of accuracy. The approximation target is found to be the block involving periodogram based estimation (shown in Fig. 1) since it involves the computationally-intensive Inverse Fast Fourier Transform (IFFT) and shows error-resilience. Though there have been some works focusing on approximate FFT [15], [16], we showcase a DSE framework across an end-to-end application pipeline (OFDMradar for JRC) along with a fully parallel hardware design. Ellora explores optimal accuracy-power-area design points in order to generate low-power radar processors. Experiments show that at an average accuracy loss of 0.063% in the positive SNR region, Ellora helps save 22.9% of on-chip area and 26.2% of power.

#### **II. METHODOLOGY**

Fig. 1 outlines the complete radar processing pipeline and highlights the block of approximation, i.e, the block involving

Paper accepted at IEEE-LASCAS 2024.

Authors' version of the work posted for personal use and not for redistribution. The definitive version will be available in the Proceedings of the IEEE-LASCAS 2024.



Fig. 2: Methodology of Ellora

periodogram based estimation. Towards achieving the goal of generating low-power OFDM-based radar processors, we follow the flow as shown in Fig. 2. The process of generating low-power OFDM radar processors (*Ellora*) involves 3 steps mainly: Functional Validation at the Software Level, Hardware Implementation, followed by Design Space Exploration to generate and explore optimal design points. The steps are described in continuation, where the circled numbers denote the blocks in Fig. 2:

## A. Functional Validation

We conduct Functional Validation (as shown in Fig. 2) at the Software Level using MATLAB so as to study the effects of approximation on the end application, i.e., target detection in OFDM radar.

First, we take an OFDM-radar processing pipeline in consideration<sup>2</sup> (shown in Fig. 1). Then we conduct the error-resilience analysis<sup>3</sup> for the whole pipeline by injecting white Gaussian noise along with respective inputs into the computationally-intensive digital signal processing blocks. This helps us investigate which blocks are resilient enough to handle the effects of approximations and thereby we select an approximation target. We select the block involving periodogram based estimation as the approximation target since it is computationally-intensive [17] and shows error resilience. The periodogram based estimation that we employ here eventually performs IFFT on the data obtained after the element-wise division of received data by transmitted data, resulting in normalized power and finally helping plot range of target (m) v/s normalized range profile (dB).

Then we approximate  $^{\textcircled{4}}$  the addition and multiplication operations in the periodogram based estimation block (in the IFFT) by employing an adder-multiplier<sup>(1)</sup> pair at once from the EvoApprox Library [18]. The selection of adder-multiplier pairs is done based on their individual error metrics [18] and various combinations are tried out keeping in mind corner cases (best and worst individual error metrics) so that a representative design space can be explored. The IFFT operation that we perform is a radix-2 decimation in time. We employ approximate adders and multipliers in the IFFT core. Since now we have an approximated IFFT core, thereby an approximated OFDM radar processing pipeline, we obtain the target's range profile. If we find that the deviation of target's range is not at an acceptable level, we discard the adder and multiplier<sup>6</sup> pair in consideration. Else we move to the hardware implementation part where we design and implement our IFFT core and introduce hardware models of the approximate adder-multiplier pairs.



Fig. 3: IFFT Core Microarchitecture

The use of approximate circuits can have varying impacts on the accuracy of an application. Therefore, within this methodology, it is crucial to conduct a comprehensive exploration of the design space to identify suitable circuits for the application and understand the underlying reasons. Although circuits may possess diverse accuracy metrics such as Error Percentage (EP), Mean Absolute Error (MAE), Worst-Case Absolute Error (WCE), Mean Relative Error (MRE) etc. as defined in [19], it is essential to examine their end effects on applications and end-to-end pipelines, as the outcomes may differ from what is expected.

#### B. Hardware Implementation

For Hardware Implementation, first, we design our own IFFT core and describe it using SystemVerilog. Its microarchitecture is shown in Fig. 3. It is a fully parallel IFFT core which calculates IFFT for a given sequence of N-points in  $log_2(N)$  clock cycles.

The central computation unit of this IFFT core is the radix-2 butterfly structure. This module calculates the complex multiply-accumulation in a single cycle which is described in the Eq. 1, where Xa and Xb are complex inputs while W is the *twiddle factor*. For an N-point IFFT,  $W_N$  is  $e^{-i2\pi/N}$  which can also be represented in terms of *sine* and *cosine* values.

$$Ya = Xa + W * Xb$$

$$Yb = Xa - W * Xb$$
(1)

This complex multiply-accumulate operation (as shown in Eq. 1) uses 4 multipliers and 6 adders. The output Ya and Yb are being stored in flip-flops which are then used as inputs to the next IFFT stage. Since all the butterfly units for a particular IFFT stage are independent, all can be calculated parallelly. In an N-point IFFT calculation, each stage consists of N/2 butterfly

TABLE I: System parameters corresponding to Fig. 1

| System Parameter/Block          | Value/Property                        |
|---------------------------------|---------------------------------------|
| Carrier frequency               | 30 GHz                                |
| Number of subcarriers           | 32                                    |
| Number of symbols               | 16                                    |
| Subcarrier spacing              | 960 kHz                               |
| Elementary OFDM symbol duration | 1.04 µs                               |
| Cyclic prefix duration          | 0.26 µs                               |
| Total symbol duration           | 1.3 µs                                |
| Modulation                      | 4-QAM                                 |
| Target position                 | 50 m                                  |
| Target velocity                 | 20 m/s                                |
| Channel                         | Additive White Gaussian Noise Channel |

structures. The output of N/2 butterfly units are then fed back to the reshuffle module, which reshuffles Ya and Yb as inputs to the next stage and selects the *twiddle factor* using a counter which counts till  $log_2(N)$ . All possible *sine* and *cosine* values (*twiddle factors*) are pre-calculated and stored in a Read-Only Memory (ROM). When the counter reaches  $log_2(N)$ , outputs of N/2 butterfly units are combined as final output.

The butterfly module uses 16-bit signed adder and multiplier units which are approximated [18] in this work. This is to align with the functional validation as 16-bit signed numbers were used for OFDM-radar processing pipeline. The approximated adders and multipliers are highlighted as broken circles in the microarchitecture (Fig. 3).

Now, as per the flow in Fig. 2, once we have shortlisted adders and multipliers<sup>⑦</sup> from the Functional Validation part, we proceed to incorporate those circuits into our IFFT core and obtain area and power statistics<sup>®</sup>. Towards achieving the area and power statistics, we synthesize our accurate and approximate IFFT cores using the 45nm NanGate Open Cell Library with 100 MHz frequency in Synopsys Design Compiler (DC). Next, we conduct the design space exploration by varying the adder-multiplier pairs.

#### C. Design Space Exploration

After functional validation and hardware evaluation, if we find that the trade-off between accuracy, area, and power<sup>(9)</sup> is satisfactory, the adder-multiplier pair corresponding to the design point is used to generate a radar processor<sup>(10)</sup>. In case the design point fails to satisfy user-defined quality constraints, we discard the adder-multiplier pair corresponding to that design point and do not generate radar processor<sup>(11)</sup>. The selection in (6) is different from the one in (11). (6) enables early selection just after functional validation at the software level, but, in order to reach (11), it is imperative to carry out functional validation and hardware implementation, followed by design space exploration.

## III. EVALUATION

This section provides us with the comprehensive evaluation of the flow presented in Fig. 2. The system properties for the pipeline presented in Fig. 1 is shown in Table I.

## A. Accuracy Analysis

For obtaining accuracy statistics, we approximate the addition and multiplication operations in the butterfly unit of the IFFT core (periodogram based estimation) and then obtain results for target's range across an end-to-end pipeline as shown in Fig. 1. We also apply Zadoff-Chu precoding during transmission to improve the correlation properties of the transmitted signal.



Fig. 4: Accuracy statistics of various adder-multiplier (accurate and approximate) pairs. Range averaged for 100 runs per SNR for each adder-multiplier pair.

Fig. 4 shows the value of averaged target's range (original target is at distance = 50 m) over an SNR range of -5 to 10 dB using various adder and multiplier pairs, including accurate and approximate circuits [18]. The range values obtained per SNR are averaged across 100 runs. Fig. 5 shows the target's range profile using accurate adder and multiplier; and the approximate pair add16se\_3BD-mul16s\_HFB. We can see that the highest peaks (estimated target range) for both accurate and approximate circuit pairs lie close to each other.

Thus, we make four major observations. First, pertaining to Fig. 4, results show that in the negative SNR region, the accurate adder and multiplier perform best. Second, we can see that at SNR=0 dB, the pair of add16se\_3BD-mul16s\_HFB performs slightly better (deviation= 0.19%) than the accurate adder and multiplier pair (deviation = 0.63%) as the range obtained is closer to 50 m (the target's actual range). Third, for negative SNR region, all approximate circuit pairs have high deviation, and gradually moving towards positive SNR, we see that the approximate circuits tend to give results as accurate ones. Fourth, the pair add16se\_3BD-mul16s\_HFB performs well for estimating the target's range. This is interesting because add16se\_3BD has an MAE of 0.046%, EP of 99.02%, MRE of 0.96% which is the highest among all approximate adders in consideration [18]. Similarly, the multiplier mul16s\_HFB has an MAE of 0.002%, EP of 98.43%, MRE of 0.22% which is also quite high [18]. However, the overall effect after selecting such a pair seems to give us good results in the positive SNR region. This illustrates why a method like Ellora is necessary. Dynamic interactions between system modules can yield unexpected results due to factors like input data distribution and overall functional design etc.

Finally, it is important to note that for practical applications, considering positive SNR region, approximate circuits seem to be a viable option for generating radar processors.

## B. Hardware Evaluation

As stated in Section II-B, we implement our 512- point IFFT core (since, number of subcarriers  $(32) \times$  number of symbols (16) = 512) and incorporate various adder-multiplier pairs in it. The comprehensive area and power statistics are shown in Fig. 6. We select Carry-Lookahead Adder (CLA)-Booth Encoded Wallace Multiplier (BEWM) as the accurate circuit pair so as to meet the high frequency requirement of 100 MHz,



Fig. 5: Target's range profile at SNR= 5 dB with accurate adder and multiplier; and add16se\_3BD-mul16s\_HFB



Fig. 6: Area and Power Statistics of 512-point IFFT core using various adder-multiplier pairs

which is not fulfilled using designs involving long carry chains. From Fig. 6, we can see that the combination of CLA-BEWM takes up the most area and power. On average, considering all approximate adder-multiplier pairs, the area and power savings compared to the accurate case is 22.9% and 26.2% respectively.

The combination of add16se\_3BD-mul16s\_HFB provides the highest power savings, i.e., 44.4%. It also provides area savings of 28.83% when compared to the accurate case. This provides us with a very good relationship between accuracy and hardware statistics, as we can see from Section III-A that among all approximate circuit pairs, add16se\_3BD-mul16s\_HFB seems to perform best in terms of accuracy. And now in terms of hardware too, it seems the same pair gives out good results. This drives us to explore design options for achieving optimal points while meeting user-defined quality constraints, enabling decisions on low-power radar processors for these points.

#### C. Design Space Exploration

We study the relationship between accuracy and hardware statistics so as to explore the design space in order to generate low-power radar processors. Fig. 7 and 8 show the relationship between deviation from the actual range (in metres, averaged across SNR= -5 to 10 dB) and power (mW) and area (mm<sup>2</sup>) respectively. The deviation from actual range is obtained after using various adder-multiplier pairs in the periodogram based estimation block (IFFT core) across an end-to-end pipeline as shown in Fig. 1. From Fig. 6, 7, 8, we can make various decisions based on user-defined quality constraints. E.g.; referring to Fig. 7, for a power



Fig. 7: Deviation from actual range (m) v/s Power (mW)



Fig. 8: Deviation from actual range (m) v/s Area  $(mm^2)$ 

budget of < 300 mW, we have 2 pairs of approximate adders and multipliers satisfying it, namely, add16se\_3BD-mul16s\_GV3, and add16se\_3BD-mul16s\_HFB. If we also couple deviation from actual range being < 2.3 m along with a power budget of < 300 mW, then we only obtain the design point corresponding to add16se\_3BD-mul16s\_HFB. Similarly, we can make many observations and report design points pertaining to various accuracy-area-power combinations for generating low-power radar processors. These results apply exclusively to the models and parameters shown in Fig. 1, 3, and Table I. With change in models and parameters, Ellora can help find optimal design points.

#### IV. CONCLUSION AND FUTURE WORK

We presented Ellora, a DSE framework (across an end-to-end pipeline) that leverages approximate computing and helps explore optimal design points for generating low-power OFDM-based radar processors. As a part of exploration of design points, we also propose a radix-2 fully parallel IFFT core for various use cases. Ellora exploits pairs of approximate adders and multipliers inside the compute-intensive IFFT core (a part of the periodogram based estimation block) in the OFDM radar processing pipeline. Experimental results show that at a marginal loss of accuracy, Ellora is able to discover optimal design points satisfying user-defined quality constraints while saving both on-chip area and power.

It is also seen that Ellora discovers interesting design points that give unexpected results, which shows the usefulness of such a DSE framework. This is particularly so since every application exhibits a different level of sensitivity to approximations. In the future, we plan on integrating Ellora with various other optimization methods, e.g., [20], [21] and study the effects on end-to-end pipelines as a result of such integration.

#### REFERENCES

- [1] Klaus Martin Braun. Ofdm radar algorithms in mobile communication networks. PhD Dissertation, 2014.
- [2] Fan Liu, Christos Masouros, Athina P. Petropulu, Hugh Griffiths, and Lajos Hanzo. Joint radar and communication design: Applications, state-of-the-art, and the road ahead. *IEEE Transactions on Communications*, 68(6):3834–3862, 2020.
- [3] Sana Mazahir, Sajid Ahmed, and Mohamed-Slim Alouini. A survey on joint communication-radar systems. *Frontiers in Communications and Networks*, 1, 2021.
- [4] Kumar Vijay Mishra, M.R. Bhavani Shankar, Visa Koivunen, Bjorn Ottersten, and Sergiy A. Vorobyov. Toward millimeter-wave joint radar communications: A signal processing perspective. *IEEE Signal Processing Magazine*, 36(5):100–114, 2019.
- [5] Sayed Hossein Dokhanchi, André Noll Barreto, and Gerhard Fettweis. A half-duplex joint communications and sensing system using zp-ofdm. In 2022 2nd IEEE International Symposium on Joint Communications & Sensing (JC&S), pages 1–6, 2022.
- [6] Aryan Kaushik, Christos Masouros, and Fan Liu. Hardware efficient joint radar-communications with hybrid precoding and rf chain optimization. In *ICC 2021 - IEEE International Conference on Communications*, pages 1–6, 2021.
- [7] Aryan Kaushik, Evangelos Vlachos, Christos Masouros, Christos Tsinos, and John Thompson. Green joint radar-communications: Rf selection with low resolution dacs and hybrid precoding. In *ICC 2022 - IEEE International Conference on Communications*, pages 3160–3165, 2022.
- [8] Gennaro Rodrigues, Fernanda Lima Kastensmidt, and Alberto Bosio. Survey on approximate computing and its intrinsic fault tolerance. *Electronics*, 9(4), 2020.
- [9] Shobhit Belwal, Rajat Bhattacharjya, Kaustav Goswami, and Dip Sankar Banerjee. Acla: An approximate carry-lookahead adder with intelligent carry judgement and correction. In 2021 22nd International Symposium on Quality Electronic Design (ISQED), pages 1–7, 2021.
- [10] Renira Soares, Matheus Isquierdo, Felipe Sampaio, Amir Rahmani, Nikil Dutt, Guilherme Correa, Daniel Palomino, and Bruno Zatt. Error resilience evaluation of approximate storage in the motion compensation of vvc decoders. In 2023 IEEE 14th Latin America Symposium on Circuits and Systems (LASCAS), pages 1–4. IEEE, 2023.
- [11] Rajat Bhattacharjya, Biswadip Maity, and Nikil Dutt. Locate: Low-power viterbi decoder exploration using approximate adders. In *Proceedings of the Great Lakes Symposium on VLSI 2023*, GLSVLSI '23, page 409–413, New York, NY, USA, 2023. Association for Computing Machinery.
- [12] Rajat Bhattacharjya, Vishesh Mishra, Saurabh Singh, Kaustav Goswami, and Dip Sankar Banerjee. An approximate carry estimating simultaneous adder with rectification. In *Proceedings of the 2020 on Great Lakes Symposium on VLSI*, GLSVLSI '20, page 139–144, New York, NY, USA, 2020. Association for Computing Machinery.
- [13] Alish Kanani, Rajat Bhattacharjya, and Dip Sankar Banerjee. Approxbiowear: Approximating additions for efficient biomedical wearable computing at the edge. In 2021 43rd Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC), pages 7566–7569, 2021.
- [14] Prashanth H C, Soujanya S R, Bindu G Gowda, and Madhav Rao. Design and evaluation of in-exact compressor based approximate multipliers. In *Proceedings of the Great Lakes Symposium on VLSI 2022*, GLSVLSI '22, page 431–436, New York, NY, USA, 2022. Association for Computing Machinery.
- [15] Pedro Tauã Lopes Pereira, Patrícia Ücker Leleu da Costa, Guilherme da Costa Ferreira, Brunno Alves de Abreu, Guilherme Paim, Eduardo Antônio Ceśar da Costa, and Sergio Bampi. Energy-quality scalable design space exploration of approximate fft hardware architectures. *IEEE Transactions on Circuits and Systems I: Regular Papers*, 69(11):4524–4534, 2022.
- [16] Weiqiang Liu, Qicong Liao, Fei Qiao, Weijie Xia, Chenghua Wang, and Fabrizio Lombardi. Approximate designs for fast fourier transform (fft) with application to speech recognition. *IEEE Transactions on Circuits* and Systems I: Regular Papers, 66(12):4727–4739, 2019.
- [17] Christian Sturm and Werner Wiesbeck. Waveform design and signal processing aspects for fusion of wireless communications and radar sensing. *Proceedings of the IEEE*, 99(7):1236–1259, 2011.
- [18] Vojtech Mrazek, Lukas Sekanina, and Zdenek Vasicek. Libraries of approximate circuits: Automated design and application in cnn accelerators. *IEEE Journal on Emerging and Selected Topics in Circuits and Systems*, 10(4):406–418, 2020.

- [19] Zdenek Vasicek. Formal methods for exact analysis of approximate circuits. *IEEE Access*, 7:177309–177331, 2019.
- [20] Jianguo Li, Sining An, Jianping An, Herbert Zirath, and Zhongxia Simon He. Ofdm radar range accuracy enhancement using fractional fourier transformation and phase analysis techniques. *IEEE Sensors Journal*, 20(2):1011–1018, 2020.
- [21] Zenghui Zhang, Zhen Du, and Wenxian Yu. Mutual-information-based ofdm waveform design for integrated radar-communication system in gaussian mixture clutter. *IEEE Sensors Letters*, 4(1):1–4, 2020.