Skip to main content

A Fine-Grained Pipelined Implementation for Large-Scale Matrix Inversion on FPGA

  • Conference paper
Advanced Parallel Processing Technologies (APPT 2009)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 5737))

Included in the following conference series:


Large-scale matrix inversion play an important role in many applications. However to the best of our knowledge, there is no FPGA-based implementation. In this paper, we explore the possibility of accelerating large-scale matrix inversion on FPGA. To exploit the computational potential of FPGA, we introduce a fine-grained parallel algorithm for matrix inversion. A scalable linear array processing elements (PEs), which is the core component of the FPGA accelerator, is proposed to implement this algorithm. A total of 12 PEs can be integrated into an Altera StratixII EP2S130F1020C5 FPGA on our self-designed board. Experimental results show that a factor of 2.6 speedup and the maximum power-performance of 41 can be achieved compare to Pentium Dual CPU with double SSE threads.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others


  1. Bailey, D.H., Ferguson, H.R.: A strassen-newton algorithm for high-speed parallelizable matrix inversion. In: Proceedings of Supercomputing 1988, pp. 419–424. IEEE, Los Alamitos (November 1988)

    Google Scholar 

  2. Batchelor, G.: Introduction to Fluid Dynamics, 2nd edn. Cambridge University Press, Cambridge (2000)

    Book  MATH  Google Scholar 

  3. Bigdeli, A., Biglari-Abhari, M., Salcic, Z., Lai, Y.T.: A new pipelined systolic array-based architecture for matrix inversion in fpgas with kalman filter case study. EURASIP Journal on Applied Signal Processing archive 2006(1), 75 (2006)

    Google Scholar 

  4. Caron, E., Utard, G.: Parallel out-of-core matrix inversion. In: Proceedings of International Parallel and Distributed Processing Symposium (IPDPS 2002), pp. 71–76 (2002)

    Google Scholar 

  5. Echman, F., Owall, V.: A scalable pipelined complex valued matrix inversion architecture. In: IEEE International Symposium on Circuits and Systems, vol. 5, pp. 4489–4492 (2005)

    Google Scholar 

  6. Edman, F., Owall, V.: Implementation of a scalable matrix inversion architecture for triangular matrices. In: 14th IEEE Proceedings on Personal, Indoor and Mobile Radio Communications, vol. 3, pp. 2558–2562 (2003)

    Google Scholar 

  7. El-Amawy, A.: A systolic architecture for fast dense matrix inversion. IEEE Transactions on Computers 38(3), 449–455 (1989)

    Article  MathSciNet  Google Scholar 

  8. Farina, A., Timmoneri, L.: Parallel algorithms and processing architectures for space-time adaptive processing. In: Proceedings of CIE International Conference of Radar, pp. 770–774 (1996)

    Google Scholar 

  9. Fischer, B., Modersitzki, J.: Fast inversion of matrices arising in image processing. Computer Science 22(1), 1–11 (1999)

    MathSciNet  MATH  Google Scholar 

  10. LaRoche, I., Roy, S.: A efficient regular matrix inversion circuit architecture for mimo processing. In: Proceedings of IEEE International Symposium on Circuits and Systems, May 2006, pp. 4819–4822 (2006)

    Google Scholar 

  11. Lau, K., Kumar, M., Venkatesh, S.: Parallel matrix inversion techniques. In: Proceedings of the 16th Annual Symposium on Foundations of Computer Science, October 1975, pp. 11–12 (1975)

    Google Scholar 

  12. Lightbody, G., Walke, R., Woods, R., McCanny, J.: Linear qr architecture for a single chip adaptive beamformer. Journal of VLSI Signal Processing Systems archive 24(1), 67–81 (2000)

    Article  Google Scholar 

  13. Lim, C.H., Mulgrew, B.: Prediction of inverse covariance matrix (picm) sequences for stap. IEEE Signal Processing Letters 13(4), 236–239 (2006)

    Article  Google Scholar 

  14. Milovanovic, E., Milovanovic, I., Stojcev, M., Jovanovic, G.: Fault-tolerant matrix inversion on processor array. Electronics Letters 28(13), 1206–1208 (1992)

    Article  MATH  Google Scholar 

  15. Ojalvo, I.: Proper use of lanczos vectors for large eigenvalue problems. Computers & Structures 20(1-3), 115–120 (1985)

    Article  MATH  Google Scholar 

  16. Quintana, E.S., Quintana, G., Sun, X., van de Geijn, R.: Efficient matrix inversion via gauss-jordan elimination and its parallelization. Technical Report TR-98-19, Dept. of Computer Sciences, The University of Texas at Austin (1998)

    Google Scholar 

  17. Rabideau, D., Kogon, S.: A signal processing architecture for space-based gmti radar. In: The Record of the 1999 IEEE Radar Conference, pp. 96–101. ACM, New York (1999)

    Google Scholar 

  18. Singh, C.K., Prasad, S.H., Balsara, P.T.: Vlsi architecture for matrix inversion using modified gram-schmidt based qr decomposition. In: 20th International Conference on VLSI Design, pp. 836–841 (2007)

    Google Scholar 

  19. Xiaodong, W., Roychowdhury, V.: Minimizing communication overhead for matrix inversion algorithms on hypercubes. In: Proceedings of the 9th International Parallel Processing Symposium, April 1995, pp. 446–450 (1995)

    Google Scholar 

  20. Yong, D., Jie, Z., Xiaoyang, C., Yuanwu, L., Jinbo, X.: Fpga accelerating three qr decomposition algorithms in the unified pipelined framework. In: FPL 2009 (2009)

    Google Scholar 

  21. Yong, D., Jie, Z., Yuanwu, L., Xingming, Z.: Fpga sar processor with window memory accesses. In: IEEE International Conf. on Application-specific Systems, Architectures and Processors, pp. 95–100 (2007)

    Google Scholar 

Download references

Author information

Authors and Affiliations


Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Zhou, J., Dou, Y., Zhao, J., Xia, F., Lei, Y., Tang, Y. (2009). A Fine-Grained Pipelined Implementation for Large-Scale Matrix Inversion on FPGA. In: Dou, Y., Gruber, R., Joller, J.M. (eds) Advanced Parallel Processing Technologies. APPT 2009. Lecture Notes in Computer Science, vol 5737. Springer, Berlin, Heidelberg.

Download citation

  • DOI:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-03643-9

  • Online ISBN: 978-3-642-03644-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics