skip to main content
10.1145/3665283.3665299acmotherconferencesArticle/Chapter ViewAbstractPublication PagesheartConference Proceedingsconference-collections
research-article
Open access

A Hardware Solver for Simultaneous Linear Equations with Multistage Interconnection Network

Published: 19 June 2024 Publication History

Abstract

Many scientific and technological computations often boil down to solving systems of linear equations, and therefore, much research has been dedicated to accelerating this process through parallel computing on hardware. Efficient parallelization can be achieved, particularly when the coefficient matrix is dense or exhibits regularity in the distribution of non-zero elements, as seen in finite element methods, through vector operations or GPU acceleration.However, in cases where the number of non-zero elements per row in the coefficient matrix varies significantly, as in rigid body physics simulations, parallelization becomes challenging using conventional methods.In this paper, we propose an iterative algorithm that allows flexible parallelization even for coefficient matrices with irregular distributions of non-zero elements and validate its convergence rate.Furthermore, due to memory access patterns, the proposed algorithm struggles to leverage the performance of general CPU or GPU architectures, necessitating the design of special RAM using multistage interconnection networks.Utilizing this RAM, we design a hardware solver for the proposed algorithm and implement it on FPGA, confirming its convergence to solutions for systems of linear equations with diagonally dominant coefficient matrices.

References

[1]
W. Luk A. Cross, L. Guo and M. Salmon. 2018. CJS: Custom Jacobi Solver. In Proceedings of the 9th International Symposium on Highly-Efficient Accelerators and Reconfigurable Technologies (Toronto, ON, Canada) (HEART ’18). Association for Computing Machinery, New York, NY, USA, Article 9, 6 pages. https://doi.org/10.1145/3241793.3241802
[2]
H. Amano, T. Yoshida, and H. Aiso. 1983. (SM)2-Sparse Matrix Solving Machine. SIGARCH Comput. Archit. News 11, 3 (jun 1983), 213–220. https://doi.org/10.1145/1067651.801658
[3]
S. Sirouspour B. Mahdavikhah, R. Mafi and N. Nicolici. 2014. A Multiple-FPGA parallel computing architecture for real-time simulation of soft-object deformation. ACM Trans. Embed. Comput. Syst. 13, 4, Article 81 (mar 2014), 23 pages. https://doi.org/10.1145/2560031
[4]
R. Bagnara. 1995. A unified proof for the convergence of Jacobi and Gauss-Seidel methods. 37, 1 (1995), 93–97.
[5]
D. Baraff. 1993. Non-penetrating Rigid Body Simulation. (09 1993).
[6]
V. E. Benes. 1964. Optimal rearrangeable multistage connecting networks. The Bell System Technical Journal 43, 4 (1964), 1641–1656. https://doi.org/10.1002/j.1538-7305.1964.tb04103.x
[7]
Y. Deng and T. T. Lee. 2006. Crosstalk-free conjugate networks for optical multicast switching. Journal of Lightwave Technology 24, 10 (2006), 3635–3645. https://doi.org/10.1109/JLT.2006.882249
[8]
N. Koenig E. Drumwright, J. Hsu and D. Shell. 2010. Extending Open Dynamics Engine for Robotics Simulation. In Simulation, Modeling, and Programming for Autonomous Robots. Springer Berlin Heidelberg, Berlin, Heidelberg, 38–50.
[9]
K. Erleben. 2013. Numerical methods for linear complementarity problems in physics-based animation. In ACM SIGGRAPH 2013 Courses (Anaheim, California) (SIGGRAPH ’13). Association for Computing Machinery, New York, NY, USA, Article 8, 42 pages. https://doi.org/10.1145/2504435.2504443
[10]
A. Ettlin. 2006. Rigid body dynamics simulation for robot motion planning. (11 2006).
[11]
H. Fu H. Ruan, X. Huang and G. Yang. 2013. Jacobi Solver: A Fast FPGA-based Engine System for Jacobi Method. Research Journal of Applied Sciences, Engineering and Technology 6, 23 (Dec. 2013), 4459–4463. https://doi.org/10.1145/1188913.1188915
[12]
A. M. Erisman I. S. Duff and J. K. Reid. 2017. Direct Methods for Sparse Matrices. Oxford University Press. https://doi.org/10.1093/acprof:oso/9780198508380.001.0001
[13]
H. He J. Zhang, M. Zhang and Q. Song. 2015. Accelerating the finite element method using FPGA for electromagnetic field computation. In 2015 IEEE International Conference on Cyber Technology in Automation, Control, and Intelligent Systems (CYBER). 1763–1768. https://doi.org/10.1109/CYBER.2015.7288213
[14]
D. Guohao H. Wang X. Yang H. Zhang J. Si Q. Mao S. Zeng K. Hong G. Zhang H. Yang K. Zhong, Z. Zhenhua and Y. Wang. 2024. FEASTA: A Flexible and Efficient Accelerator for Sparse Tensor Algebra in Machine Learning. In Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 3 (, La Jolla, CA, USA, ) (ASPLOS ’24). Association for Computing Machinery, New York, NY, USA, 349–366. https://doi.org/10.1145/3620666.3651336
[15]
S. Lee and K. Raahemifar. 2008. FPGA placement optimization methodology survey. In 2008 Canadian Conference on Electrical and Computer Engineering. 001981–001986. https://doi.org/10.1109/CCECE.2008.4564891
[16]
S. Mumtaz M. Aslam, O. Riaz and A. D. Asif. 2020. Performance Comparison of GPU-Based Jacobi Solvers Using CUDA Provided Synchronization Methods. IEEE Access 8 (2020), 31792–31812. https://doi.org/10.1109/ACCESS.2020.2973669
[17]
NVIDIA 2018. NVIDIA PhysX SDK 4.0 Documentation. Retrieved April 1, 2024 from https://gameworksdocs.nvidia.com/PhysX/4.0/documentation/PhysXGuide/Index.html
[18]
A. William P. Manuel, K. Qureshi and A. Muthumalai. 2007. VLSI layout of Benes networks. Journal of Discrete Mathematical Sciences and Cryptography 10, 4 (2007), 461–472. https://doi.org/10.1080/09720529.2007.10698132 arXiv:https://doi.org/10.1080/09720529.2007.10698132
[19]
R. Smith 2019. Open Dynamics Engine. Retrieved April 1, 2024 from https://ode.org/
[20]
P.P. To and T.T. Lee. 1997. Generalized non-blocking copy networks. In Proceedings of ICC’97 - International Conference on Communications. 467–471 vol.1. https://doi.org/10.1109/ICC.1997.605352
[21]
X. Chen Y. Yang and Y. Han. 2023. Dadu-RBD: Robot Rigid Body Dynamics Accelerator with Multifunctional Pipelines. In Proceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture (, Toronto, ON, Canada, ) (MICRO ’23). Association for Computing Machinery, New York, NY, USA, 297–309. https://doi.org/10.1145/3613424.3614298
[22]
D. Young. 1954. Iterative Methods for Solving Partial Difference Equations of Elliptic Type. Trans. Amer. Math. Soc. 76, 1 (1954), 92–111.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
HEART '24: Proceedings of the 14th International Symposium on Highly Efficient Accelerators and Reconfigurable Technologies
June 2024
147 pages
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives International 4.0 License.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 19 June 2024

Check for updates

Author Tags

  1. FPGA
  2. HPC
  3. interconnection network
  4. liner equations

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

HEART 2024

Acceptance Rates

Overall Acceptance Rate 22 of 50 submissions, 44%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 157
    Total Downloads
  • Downloads (Last 12 months)157
  • Downloads (Last 6 weeks)29
Reflects downloads up to 03 Mar 2025

Other Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Login options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media