Abstract
QR-decomposition accelerators are attractive SoC components for many applications with a wide range of specifications. A new family of highly area- and energy-efficient, modular two-way linear-array QRD architectures based on the Givens algorithm and CORDIC rotations is proposed. The template architecture allows for implementations of real-/complex-valued and integer/floating-point QRDs. An accurate algebraic cost model enables cross-layer optimization over architecture, micro-architecture and circuit level using a rich set of parameters. Quantitative results for exemplary applications are presented for implementations in 40-nm CMOS, proving the significant improvement of efficiency.




































Similar content being viewed by others
References
Senning, C., Staudacher, A., Burg. A. (2010). Systolic-array based regularized QR-decomposition for IEEE 802.11n Compliant Soft-MMSE Detection. In 2010 International Conference on Microelectronics (ICM), pp. 391–394
Kung, S. Y. (1987). VLSI array processors. Upper Saddle River: Prentice-Hall, Inc.
Golub, G. H., & Van Loan, C. F. (1996). Matrix computations (3rd ed.). Baltimore: Johns Hopkins University Press.
Luethi, P., Studer, C., Duetsch, S., Zgraggen, E., Kaeslin, H., Felber, N., Fichtner, W., (2008). Gram-schmidt-based QR decomposition for MIMO detection: VLSI implementation and comparison. In Proceedings of the IEEE Asia Pacific Conference on Circuits and Systems (APCCAS), Macao, China, pp. 830–833.
Elster, A., Cavallaro, J. R. (1991). A CORDIC processor array for the SVD of a complex matrix. In SVD and Signal Processing II: Algorithms, Analysis and Applications,Elsevier Publishers (pp. 227–239)
Kung, H., Gentleman, W. (1982). Matrix triangularization by systolic arrays, vol. Paper 1603 of Computer Science Department. Carnegie Mellon Uminersity.
Luethi, P., Burg, A., Haene, S., Perels, D., Felber, N., Fichtner, W. (2007). VLSI implementation of a high-speed iterative sorted MMSE QR decomposition. In Proceedings of International Symposium on Circuits and Systems (ISCAS), (New Orleans), pp. 1421–1424, IEEE.
Liu, Z., McCanny, J., Lightbody, G., & Walke, R. (2003). Generic SoC QR array processor for adaptive beamforming. IEEE Transactions on Circuits and Systems II: Analog and Digital Signal Processing, 50(4), 169–175.
Ma, L., Dickson, K., McAllister, J., & McCanny, J. (2011). QR decomposition-based matrix inversion for high performance embedded MIMO receivers. IEEE Transactions on Signal Processing, 59(4), 1858–1867.
Misra, M., Moona, R. (1994). Design of systolic arrays for QR decomposition. In International Conference on Computer Systems and Education, IISc.
Lightbody, G., Walke, R., Woods, R., & McCanny, J. (2000). Linear QR architecture for a single chip adaptive beamformer. Journal VLSI Signal Processing Systems Signal Image and Video Technology, 24(1), 67–81.
Walke, R. (1997). High Sample-rate Givens Rotations for Recursive Least Squares. PhD thesis, University of Warwick.
Huang, Z., & Tsai, P. (2011). Efficient implemetation of QR decomposition of gigabit MIMO-OFDM systems. IEEE Transactions on Circuits and Systems I: Regular Papers, 58, 2531–2542.
Shabany, M., Patel, D., & Gulak, P. (2013). A Low-latency low-power QR-Decomposition ASIC Implementation in 0.13 \( \mu \)m CMOS. IEEE Transactions on Circuits and Systems I: Regular Papers, 60, 327–340.
Ercegovac, M. D., & Lang, T. (2003). Digital arithmetic (1st ed.). San Francisco: Morgan Kaunfamm Publishers.
Chiu, P., Huang, L., Chai, L., & Huang, Y. (2011). Interpolation-based QR decomposition and channel estimation processor for MIMO-OFDM system. IEEE Transactions on Circuits and Systems I: Regular Papers, 58(5), 1129–1141.
Vishnoi, U., Noll, T. 2013. Cross-layer optimization of QRD accelerators. In Proceedings of IEEE European Solid-State Circuits Conference (ESSCIRC), Bucharest, Romania, pp. 263–266
Liu, Z., Lightbody, G., Walke, R., Hu, Y., McCanny, J. (2001). Generic scheduling methods for a linear QR array SoC processor. In Proceedings ICASSP, vol. 2, pp. 1097–1100, IEEE.
Patel, D., Shabany, M., Gulak, P. (2009). A low-complexity high-speed QR decomposition implementation for MIMO receivers. In Proceedings of International Symposium on Circuits and Systems (ISCAS), pp. 33–36.
Vishnoi, U., & Noll, T. G. (2012). Area- and energy-efficient CORDIC accelerators in deep sub-micron CMOS technologies. Advances in Radio Science, 10, 207–213.
Vishnoi, U., Meixner, M., & Noll, T. (2012). An approach for quantitative optimization of highly efficient dedicated CORDIC macros as SoC building blocks (pp. 242–247). Niagara Falls: Proceedings International System-On-Chip Conference.
Säll, E., Vesterbacka, M., Andersson, K. (2004). A study of digital decoders in flash analog-to-digital converters. In Proceedings of International Symposium on Circuits and Systems (ISCAS), pp. 129–132, IEEE.
Weiss, O., Gansen, M., Noll, T. (2001). A flexible data path generator for physical oriented design. In Solid-State Circuits Conference (ESSCIRC)2001, Proceedings of the 27 th European, pp. 393–396.
Careto, B., Masera, G., Nilsson, P. (2007). Hardware architecture for matrix factorization in MIMO receivers. In Proceedings of the 17 th ACM Great Lakes symposium on VLSI (GLSVLSI), Stresa-Lago Maggiore, (Italy), pp. 196–199.
Studer, C., Blösch, P., Friedli, P.,Burg, A. (2007). Matrix decomposition architecture for MIMO systems: design and implementation trade-offs. In Proceedings of the Forsty-First Asilomar Conference on Signals, Systems and Computers, (Asilomar, USA), pp. 1986–1990.
Mohamed, M. I. A., Mohammed, K., & Daneshrad, B. (2014). Energy efficient programmable MIMO decoder accelerator chip in 65-nm CMOS. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 22(7), 1481–1490.
Korb, M. (20132). Deep-submicron full-custom VLSI-design of highly optimized high-throughput low-latency LDPC decoders. PhD dissertation thesis, RWTH Aachen University, pp. 39–40.
Vishnoi, U., Noll, T. (2013). A family of modular area- and energy-efficient QRD- accelerator architectures. In Proceedings International Symposium on System-on-Chip Conference(SoC), (Tampere), Finland, pp. 1–8.
Salmela, P., Burian, A., Sorokin, H., Takala, J. (2008). Complex-valued QR decomposition implementation for MIMO receivers. In Proceedings of International Conference on Acoustics Speech and Signal Processing (ICASSP), pp. 1433–1436, IEEE.
Acknowledgments
The authors would like to thank their colleague Jos Huisken for many discussions and helpful comments as well as Eqbal Maraqa for his highly valuable contributions in the validation of the cost model.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Vishnoi, U., Meixner, M. & Noll, T.G. A Family of Modular QRD-Accelerator Architectures and Circuits Cross-Layer Optimized for High Area- and Energy-Efficiency. J Sign Process Syst 83, 329–356 (2016). https://doi.org/10.1007/s11265-015-0976-6
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11265-015-0976-6