Abstract
Large-scale matrix inversion play an important role in many applications. However to the best of our knowledge, there is no FPGA-based implementation. In this paper, we explore the possibility of accelerating large-scale matrix inversion on FPGA. To exploit the computational potential of FPGA, we introduce a fine-grained parallel algorithm for matrix inversion. A scalable linear array processing elements (PEs), which is the core component of the FPGA accelerator, is proposed to implement this algorithm. A total of 12 PEs can be integrated into an Altera StratixII EP2S130F1020C5 FPGA on our self-designed board. Experimental results show that a factor of 2.6 speedup and the maximum power-performance of 41 can be achieved compare to Pentium Dual CPU with double SSE threads.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Bailey, D.H., Ferguson, H.R.: A strassen-newton algorithm for high-speed parallelizable matrix inversion. In: Proceedings of Supercomputing 1988, pp. 419–424. IEEE, Los Alamitos (November 1988)
Batchelor, G.: Introduction to Fluid Dynamics, 2nd edn. Cambridge University Press, Cambridge (2000)
Bigdeli, A., Biglari-Abhari, M., Salcic, Z., Lai, Y.T.: A new pipelined systolic array-based architecture for matrix inversion in fpgas with kalman filter case study. EURASIP Journal on Applied Signal Processing archive 2006(1), 75 (2006)
Caron, E., Utard, G.: Parallel out-of-core matrix inversion. In: Proceedings of International Parallel and Distributed Processing Symposium (IPDPS 2002), pp. 71–76 (2002)
Echman, F., Owall, V.: A scalable pipelined complex valued matrix inversion architecture. In: IEEE International Symposium on Circuits and Systems, vol. 5, pp. 4489–4492 (2005)
Edman, F., Owall, V.: Implementation of a scalable matrix inversion architecture for triangular matrices. In: 14th IEEE Proceedings on Personal, Indoor and Mobile Radio Communications, vol. 3, pp. 2558–2562 (2003)
El-Amawy, A.: A systolic architecture for fast dense matrix inversion. IEEE Transactions on Computers 38(3), 449–455 (1989)
Farina, A., Timmoneri, L.: Parallel algorithms and processing architectures for space-time adaptive processing. In: Proceedings of CIE International Conference of Radar, pp. 770–774 (1996)
Fischer, B., Modersitzki, J.: Fast inversion of matrices arising in image processing. Computer Science 22(1), 1–11 (1999)
LaRoche, I., Roy, S.: A efficient regular matrix inversion circuit architecture for mimo processing. In: Proceedings of IEEE International Symposium on Circuits and Systems, May 2006, pp. 4819–4822 (2006)
Lau, K., Kumar, M., Venkatesh, S.: Parallel matrix inversion techniques. In: Proceedings of the 16th Annual Symposium on Foundations of Computer Science, October 1975, pp. 11–12 (1975)
Lightbody, G., Walke, R., Woods, R., McCanny, J.: Linear qr architecture for a single chip adaptive beamformer. Journal of VLSI Signal Processing Systems archive 24(1), 67–81 (2000)
Lim, C.H., Mulgrew, B.: Prediction of inverse covariance matrix (picm) sequences for stap. IEEE Signal Processing Letters 13(4), 236–239 (2006)
Milovanovic, E., Milovanovic, I., Stojcev, M., Jovanovic, G.: Fault-tolerant matrix inversion on processor array. Electronics Letters 28(13), 1206–1208 (1992)
Ojalvo, I.: Proper use of lanczos vectors for large eigenvalue problems. Computers & Structures 20(1-3), 115–120 (1985)
Quintana, E.S., Quintana, G., Sun, X., van de Geijn, R.: Efficient matrix inversion via gauss-jordan elimination and its parallelization. Technical Report TR-98-19, Dept. of Computer Sciences, The University of Texas at Austin (1998)
Rabideau, D., Kogon, S.: A signal processing architecture for space-based gmti radar. In: The Record of the 1999 IEEE Radar Conference, pp. 96–101. ACM, New York (1999)
Singh, C.K., Prasad, S.H., Balsara, P.T.: Vlsi architecture for matrix inversion using modified gram-schmidt based qr decomposition. In: 20th International Conference on VLSI Design, pp. 836–841 (2007)
Xiaodong, W., Roychowdhury, V.: Minimizing communication overhead for matrix inversion algorithms on hypercubes. In: Proceedings of the 9th International Parallel Processing Symposium, April 1995, pp. 446–450 (1995)
Yong, D., Jie, Z., Xiaoyang, C., Yuanwu, L., Jinbo, X.: Fpga accelerating three qr decomposition algorithms in the unified pipelined framework. In: FPL 2009 (2009)
Yong, D., Jie, Z., Yuanwu, L., Xingming, Z.: Fpga sar processor with window memory accesses. In: IEEE International Conf. on Application-specific Systems, Architectures and Processors, pp. 95–100 (2007)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Zhou, J., Dou, Y., Zhao, J., Xia, F., Lei, Y., Tang, Y. (2009). A Fine-Grained Pipelined Implementation for Large-Scale Matrix Inversion on FPGA. In: Dou, Y., Gruber, R., Joller, J.M. (eds) Advanced Parallel Processing Technologies. APPT 2009. Lecture Notes in Computer Science, vol 5737. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-03644-6_9
Download citation
DOI: https://doi.org/10.1007/978-3-642-03644-6_9
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-03643-9
Online ISBN: 978-3-642-03644-6
eBook Packages: Computer ScienceComputer Science (R0)