research-article

Open access

A Hardware Solver for Simultaneous Linear Equations with Multistage Interconnection Network

Authors:

Tetsu NarumiAuthors Info & Claims

HEART '24: Proceedings of the 14th International Symposium on Highly Efficient Accelerators and Reconfigurable Technologies

Pages 81 - 89

https://doi.org/10.1145/3665283.3665299

Published: 19 June 2024 Publication History

All formats PDF

Abstract

Many scientific and technological computations often boil down to solving systems of linear equations, and therefore, much research has been dedicated to accelerating this process through parallel computing on hardware. Efficient parallelization can be achieved, particularly when the coefficient matrix is dense or exhibits regularity in the distribution of non-zero elements, as seen in finite element methods, through vector operations or GPU acceleration.However, in cases where the number of non-zero elements per row in the coefficient matrix varies significantly, as in rigid body physics simulations, parallelization becomes challenging using conventional methods.In this paper, we propose an iterative algorithm that allows flexible parallelization even for coefficient matrices with irregular distributions of non-zero elements and validate its convergence rate.Furthermore, due to memory access patterns, the proposed algorithm struggles to leverage the performance of general CPU or GPU architectures, necessitating the design of special RAM using multistage interconnection networks.Utilizing this RAM, we design a hardware solver for the proposed algorithm and implement it on FPGA, confirming its convergence to solutions for systems of linear equations with diagonally dominant coefficient matrices.

References

[1]

W. Luk A. Cross, L. Guo and M. Salmon. 2018. CJS: Custom Jacobi Solver. In Proceedings of the 9th International Symposium on Highly-Efficient Accelerators and Reconfigurable Technologies (Toronto, ON, Canada) (HEART ’18). Association for Computing Machinery, New York, NY, USA, Article 9, 6 pages. https://doi.org/10.1145/3241793.3241802

Digital Library

[2]

H. Amano, T. Yoshida, and H. Aiso. 1983. (SM)2-Sparse Matrix Solving Machine. SIGARCH Comput. Archit. News 11, 3 (jun 1983), 213–220. https://doi.org/10.1145/1067651.801658

Digital Library

[3]

S. Sirouspour B. Mahdavikhah, R. Mafi and N. Nicolici. 2014. A Multiple-FPGA parallel computing architecture for real-time simulation of soft-object deformation. ACM Trans. Embed. Comput. Syst. 13, 4, Article 81 (mar 2014), 23 pages. https://doi.org/10.1145/2560031

Digital Library

[4]

R. Bagnara. 1995. A unified proof for the convergence of Jacobi and Gauss-Seidel methods. 37, 1 (1995), 93–97.

[5]

D. Baraff. 1993. Non-penetrating Rigid Body Simulation. (09 1993).

[6]

V. E. Benes. 1964. Optimal rearrangeable multistage connecting networks. The Bell System Technical Journal 43, 4 (1964), 1641–1656. https://doi.org/10.1002/j.1538-7305.1964.tb04103.x

[7]

Y. Deng and T. T. Lee. 2006. Crosstalk-free conjugate networks for optical multicast switching. Journal of Lightwave Technology 24, 10 (2006), 3635–3645. https://doi.org/10.1109/JLT.2006.882249

[8]

N. Koenig E. Drumwright, J. Hsu and D. Shell. 2010. Extending Open Dynamics Engine for Robotics Simulation. In Simulation, Modeling, and Programming for Autonomous Robots. Springer Berlin Heidelberg, Berlin, Heidelberg, 38–50.

[9]

K. Erleben. 2013. Numerical methods for linear complementarity problems in physics-based animation. In ACM SIGGRAPH 2013 Courses (Anaheim, California) (SIGGRAPH ’13). Association for Computing Machinery, New York, NY, USA, Article 8, 42 pages. https://doi.org/10.1145/2504435.2504443

Digital Library

[10]

A. Ettlin. 2006. Rigid body dynamics simulation for robot motion planning. (11 2006).

[11]

H. Fu H. Ruan, X. Huang and G. Yang. 2013. Jacobi Solver: A Fast FPGA-based Engine System for Jacobi Method. Research Journal of Applied Sciences, Engineering and Technology 6, 23 (Dec. 2013), 4459–4463. https://doi.org/10.1145/1188913.1188915

Digital Library

[12]

A. M. Erisman I. S. Duff and J. K. Reid. 2017. Direct Methods for Sparse Matrices. Oxford University Press. https://doi.org/10.1093/acprof:oso/9780198508380.001.0001

[13]

H. He J. Zhang, M. Zhang and Q. Song. 2015. Accelerating the finite element method using FPGA for electromagnetic field computation. In 2015 IEEE International Conference on Cyber Technology in Automation, Control, and Intelligent Systems (CYBER). 1763–1768. https://doi.org/10.1109/CYBER.2015.7288213

[14]

D. Guohao H. Wang X. Yang H. Zhang J. Si Q. Mao S. Zeng K. Hong G. Zhang H. Yang K. Zhong, Z. Zhenhua and Y. Wang. 2024. FEASTA: A Flexible and Efficient Accelerator for Sparse Tensor Algebra in Machine Learning. In Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 3 (, La Jolla, CA, USA, ) (ASPLOS ’24). Association for Computing Machinery, New York, NY, USA, 349–366. https://doi.org/10.1145/3620666.3651336

Digital Library

[15]

S. Lee and K. Raahemifar. 2008. FPGA placement optimization methodology survey. In 2008 Canadian Conference on Electrical and Computer Engineering. 001981–001986. https://doi.org/10.1109/CCECE.2008.4564891

[16]

S. Mumtaz M. Aslam, O. Riaz and A. D. Asif. 2020. Performance Comparison of GPU-Based Jacobi Solvers Using CUDA Provided Synchronization Methods. IEEE Access 8 (2020), 31792–31812. https://doi.org/10.1109/ACCESS.2020.2973669

[17]

NVIDIA 2018. NVIDIA PhysX SDK 4.0 Documentation. Retrieved April 1, 2024 from https://gameworksdocs.nvidia.com/PhysX/4.0/documentation/PhysXGuide/Index.html

[18]

A. William P. Manuel, K. Qureshi and A. Muthumalai. 2007. VLSI layout of Benes networks. Journal of Discrete Mathematical Sciences and Cryptography 10, 4 (2007), 461–472. https://doi.org/10.1080/09720529.2007.10698132 arXiv:https://doi.org/10.1080/09720529.2007.10698132

[19]

R. Smith 2019. Open Dynamics Engine. Retrieved April 1, 2024 from https://ode.org/

[20]

P.P. To and T.T. Lee. 1997. Generalized non-blocking copy networks. In Proceedings of ICC’97 - International Conference on Communications. 467–471 vol.1. https://doi.org/10.1109/ICC.1997.605352

[21]

X. Chen Y. Yang and Y. Han. 2023. Dadu-RBD: Robot Rigid Body Dynamics Accelerator with Multifunctional Pipelines. In Proceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture (, Toronto, ON, Canada, ) (MICRO ’23). Association for Computing Machinery, New York, NY, USA, 297–309. https://doi.org/10.1145/3613424.3614298

Digital Library

[22]

D. Young. 1954. Iterative Methods for Solving Partial Difference Equations of Elliptic Type. Trans. Amer. Math. Soc. 76, 1 (1954), 92–111.

Index Terms

A Hardware Solver for Simultaneous Linear Equations with Multistage Interconnection Network
1. Computer systems organization
  1. Architectures
    1. Parallel architectures
      1. Multicore architectures
2. Hardware
  1. Integrated circuits
    1. Reconfigurable logic and FPGAs
      1. Hardware accelerators

Recommendations

Multistage bus network (MBN): an interconnection network for cache coherent multiprocessors
SPDP '91: Proceedings of the 1991 Third IEEE Symposium on Parallel and Distributed Processing

Single bus multiprocessor systems do not scale well due to the limited bandwidth of the bus. Hierarchical bus interconnections are scalable, but unfortunately the top level bus becomes a bottleneck for larger systems. The authors present a novel ...
Performance of Multistage Bus Networks for a Distributed Shared Memory Multiprocessor

A Multistage Bus Network (MBN) is proposed in this paper to overcome some of the shortcomings of the conventional multistage interconnection networks (MINs), single bus, and hierarchical bus interconnection networks. The MBN consists of multiple stages ...
Implementing an interconnection network based on crossbar topology for parallel applications in MPSoC
ICCSA'13: Proceedings of the 13th international conference on Computational Science and Its Applications - Volume 1

Multi-Processor System on Chip (MPSoC) offers a set of processors, embedded in one single chip. A parallel application can, then, be scheduled to each processor, in order to accelerate its execution. One problem in MPSoCs is the communication between ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

HEART '24: Proceedings of the 14th International Symposium on Highly Efficient Accelerators and Reconfigurable Technologies

June 2024

147 pages

ISBN:9798400717277

DOI:10.1145/3665283

Copyright © 2024 Owner/Author.

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives International 4.0 License.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 19 June 2024

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

HEART 2024

HEART 2024: 14th International Symposium on Highly Efficient Accelerators and Reconfigurable Technologies

June 19 - 21, 2024

Porto, Portugal

Acceptance Rates

Overall Acceptance Rate 22 of 50 submissions, 44%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
157
Total Downloads

Downloads (Last 12 months)157
Downloads (Last 6 weeks)29

Reflects downloads up to 03 Mar 2025

Other Metrics

View Author Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Figures

Tables

Media

View Table of Conten