ABSTRACT
Systolic array (SA) is an architecture which is conceptually similar to an arithmetic pipeline and is created by uniformly connecting group of identical data processing elements (PE). Approximate computing benefits in hardware and performance, but incurs accuracy loss, thereby limiting it to error-resilient applications. Majority of inexact multipliers offer one-sided Error Distribution (ErD), and SA architecture with such multipliers results in large accumulated errors. This paper investigates SA architecture with various arrangement of approximate multipliers (AM) with dissimilar ErD for image smoothing and outline extracting applications. Among all the patterns, the Ring arrangement comprising of AMs with opposite-sided ErD placed in nested loops of the SA, was found to accelerate performance by 22.31%, and enhance image quality metrics by 18.15%. For FPGA implementation, alternate arrangement with equal number of AMs with opposite-sided ErD in the SA offered 12.14% LUT savings and comparable flip-flops usage when compared with one-sided AMs in the SA.
- H C Prashanth, S R Soujanya, G Gowda Bindu, and Rao Madhav. Design and evaluation of in-exact compressor based approximate multipliers. In Proceedings of the Great Lakes Symposium on VLSI 2022, GLSVLSI '22, page 431--436, New York, NY, USA, 2022. Association for Computing Machinery.Google Scholar
- Jiao Du, Weisheng Li, and Bin Xiao. Anatomical-functional image fusion by information of interest in local laplacian filtering domain. IEEE Transactions on Image Processing, 26(12):5855--5866, 2017.Google ScholarDigital Library
- Jackson Melchert, Setareh Behroozi, Jingjie Li, and Younghyun Kim. Saadi-ec: A quality-configurable approximate divider for energy efficiency. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 27(11):2680--2692, 2019.Google ScholarDigital Library
- Darjn Esposito, Antonio Giuseppe Maria Strollo, Ettore Napoli, Davide De Caro, and Nicola Petra. Approximate multipliers based on new approximate compressors. IEEE Transactions on Circuits and Systems I: Regular Papers, 65(12):4169--4182, 2018.Google ScholarCross Ref
- Yashaswi Mannepalli, Viraj Bharadwaj Korede, and Madhav Rao. Novel approximate multiplier designs for edge detection application. In Proceedings of the 2021 on Great Lakes Symposium on VLSI, GLSVLSI '21, page 371--377, New York, NY, USA, 2021. Association for Computing Machinery.Google ScholarDigital Library
- Chandan Jha and Joycee Mekie. Design of novel cmos based inexact subtractors and dividers for approximate computing: An in-depth comparison with ptl based designs. In 2019 22nd Euromicro Conference on Digital System Design (DSD), pages 174--181, 2019.Google ScholarCross Ref
- Vojtech Mrazek, Zdenek Vasicek, Lukas Sekanina, Muhammad Hanif, and Muhammad Shafique. Alwann: Automatic layer-wise approximation of deep neural network accelerators without retraining. In 2019 IEEE/ACM International Conference on Computer-Aided Design (ICCAD), pages 1--8, 11 2019.Google ScholarCross Ref
- Hyunjin Kim and Alberto A. Del Barrio. A cost-efficient approximate dynamic ranged multiplication and approximation-aware training on convolutional neural networks. IEEE Access, 9:135513-135525, 2021.Google ScholarCross Ref
- Ke Chen, Fabrizio Lombardi, and Jie Han. Matrix multiplication by an inexact systolic array. In Proceedings of the 2015 IEEE/ACM International Symposium on Nanoscale Architectures (NANOARCH "15), pages 151--156, 2015.Google ScholarCross Ref
- Sau-Gee Chen Sau-Gee Chen, Jiann-Cherng Lee Jiann-Cherng Lee, and Chieh-Chih Li Chieh-Chih Li. New systolic arrays for matrix multiplication. In 1994 Internatonal Conference on Parallel Processing Vol. 2, volume 2, pages 211--215, 1994.Google ScholarDigital Library
- Haroon Waris, Chenghua Wang, Weiqiang Liu, and Fabrizio Lombardi. Design and evaluation of a power-efficient approximate systolic array architecture for matrix multiplication. In 2019 IEEE International Workshop on Signal Processing Systems (SiPS), pages 13--18, 2019.Google ScholarCross Ref
- Halil Snopce and Azir Aliu. Latency analysis in the 2-dimensional systolic arrays for matrix multiplication. International Journal of Computers, 15:1--7, 03 2021.Google ScholarCross Ref
- https://github.com/saikarthik26/EBASA.Google Scholar
Index Terms
- EBASA: Error Balanced Approximate Systolic Array Architecture Design
Recommendations
Configurable Multi-directional Systolic Array Architecture for Convolutional Neural Networks
The systolic array architecture is one of the most popular choices for convolutional neural network hardware accelerators. The biggest advantage of the systolic array architecture is its simple and efficient design principle. Without complicated control ...
A reconfigurable HexCell-based systolic array architecture for evolvable hardware on FPGA
AbstractEvolvable hardware is a system that modifies its architecture and behavior to adapt with changes of the environment. It is formed by reconfigurable processing elements driven by an evolutionary algorithm. In this paper, we study a reconfigurable ...
A Systolic, Linear-Array Multiplier for a Class of Right-Shift Algorithms
A very simple multiplier cell is developed for use in a linear, purely systolic array forming a digit-serial multiplier for unsigned or 2'complement operands. Each cell produces two digit-product terms and accumulates these into a previous sum of the ...
Comments