Abstract
Solving large-scale sparse linear systems over GF(2) plays a key role in fluid mechanics, simulation and design of materials, petroleum seismic data processing, numerical weather prediction, computational electromagnetics, and numerical simulation of unclear explosions. Therefore, developing algorithms for this issue is a significant research topic. In this paper, we proposed a hyper-scale custom supercomputer architecture that matches specific data features to process the key procedure of block Wiedemann algorithm and its parallel algorithm on the custom machine. To increase the computation, communication, and storage performance, four optimization strategies are proposed. This paper builds a performance model to evaluate the execution performance and power consumption for our custom machine. The model shows that the optimization strategies result in a considerable speedup, even three times faster than the fastest supercomputer, TH2, while consuming less power.












Similar content being viewed by others
References
Anzt H, Tomov S, Dongarra J (2015) Energy efficiency and performance frontiers for sparse computations on GPU supercomputers. In: Proceedings of the sixth international workshop on programming models and applications for multicores and manycores, pp 1–10. ACM
Aoki K, Shimoyama T, Ueda H (2007) Experiments on the linear algebra step in the number field sieve. In: Atsuko M, Hiroaki K, Kai R (eds) Advances in information and computer security, pp 58–73. Springer, Berlin
Awad M (2009) FPGA supercomputing platforms: a survey. In: International conference on field programmable logic and applications, 2009. FPL 2009, pp 564–568. IEEE
Baskaran MM, Bordawekar R (2008) Optimizing sparse matrix-vector multiplication on GPUs using compile-time and run-time strategies. IBM Reserach Report, RC24704 (W0812-047)
Buluç A, Gilbert JR (2008) On the representation and multiplication of hypersparse matrices. In: IEEE international symposium on parallel and distributed processing, 2008. IPDPS 2008, pp 1–11. IEEE
Çatalyürek UV, Aykanat C (2001) A fine-grain hypergraph model for 2D decomposition of sparse matrices. In: Parallel and distributed processing symposium. Proceedings 15th international, pp 1199–1204. IEEE
Chen C, Du Y, Jiang H, Zuo K, Yang C (2014) HPCG: preliminary evaluation and optimization on Tianhe-2 CPU-only nodes. In: 2014 IEEE 26th international symposium on computer architecture and high performance computing (SBAC-PAD), pp 41–48. IEEE
Coppersmith D (1994) Solving homogeneous linear equations over GF(2) via block Wiedemann algorithm. Math Comput 62(205):333–350
Dave N, Fleming K, King M, Pellauer M, Vijayaraghavan M (2007) Hardware acceleration of matrix multiplication on a xilinx FPGA. In: 5th IEEE/ACM international conference on formal methods and models for codesign, 2007. MEMOCODE 2007, pp 97–100. IEEE
Dordopulo AI, Levin II, Doronchenko YI, Raskladkin MK (2015) High-performance reconfigurable computer systems based on virtex FPGAs. In: Victor M (ed) Parallel computing technologies, pp 349–362. Springer, Berlin
Dou Y, Vassiliadis S, Kuzmanov G, Gaydadjiev G (2005) 64-bit floating-point FPGA matrix multiplication. In: FPGA, pp 86–95. ACM, New York
Güneysu T, Paar C, Pfeiffer G, Schimmler M (2008) Enhancing copacobana for advanced applications in cryptography and cryptanalysis. In: International conference on field programmable logic and applications, 2008. FPL 2008, pp 675–678. IEEE
Kapre N, Moorthy P (2015) A case for embedded FPGA-based socs in energy-efficient acceleration of graph problems. Supercomput Front Innov 2(3):76–86
Kimball D, Michel E, Keltcher P, Wolf MM (2014) Quantifying the effect of matrix structure on multithreaded performance of the SPMV kernel. In: High performance extreme computing conference (HPEC), 2014 IEEE, pp 1–6. IEEE
Langr D, Tvrdik P (2015) Evaluation criteria for sparse matrix storage formats. IEEE Trans Parallel Distrib Syst 27(2):428–440
Meintanis D, Papaefstathiou I (2009) A module-based partial reconfiguration design for solving sparse linear systems over GF (2). In: International conference on field-programmable technology, 2009. FPT 2009, pp 335–338. IEEE
Pichel JC, Lorenzo JA, Heras DB, Cabaleiro JC (2009) Evaluating sparse matrix-vector product on the finisterrae supercomputer. In: 9th international conference on computational and mathematical methods in science and engineering, pp 831–842
Rajovic N, Carpenter PM, Gelado I, Puzovic N, Ramirez A, Valero M (2013) Supercomputing with commodity CPUs: are mobile SoCs ready for HPC? In: 2013 international conference for high performance computing, networking, storage and analysis (SC), pp 1–12. IEEE
Schmidt B, Aribowo H, Dang HV (2013) Iterative sparse matrix-vector multiplication for accelerating the block Wiedemann algorithm over GF (2) on multi-graphics processing unit systems. Concurr Comput Pract Exp 25(4):586–603
Sedaghati N, Ashari A, Pouchet LN, Parthasarathy S, Sadayappan P (2015) Characterizing dataset dependence for sparse matrix-vector multiplication on GPUs. In: Proceedings of the 2nd workshop on parallel programming for analytics applications, pp 17–24. ACM
Stanisic L, Videau B, Cronsioe J, Degomme A, Marangozova-Martin V, Legrand A, Méhaut JF (2013) Performance analysis of HPC applications on low-power embedded platforms. In: Proceedings of the conference on design, automation and test in Europe, pp 475–480. EDA Consortium
Thomé E (2001) Fast computation of linear generators for matrix sequences and application to the block Wiedemann algorithm. In: Proceedings of the 2001 international symposium on symbolic and algebraic computation, pp 323–331. ACM
Vastenhouw B, Bisseling RH (2005) A two-dimensional data distribution method for parallel sparse matrix-vector multiplication. SIAM Rev 47(1):67–95
Acknowledgments
This work was funded by National Natural Science Foundation of China (number 61303070). We acknowledge TH-1A supercomputing system service to support our simulation. We would like to thank the reviewers for their helpful comments.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Zhou, T., Jiang, J. Performance modeling of hyper-scale custom machine for the principal steps in block Wiedemann algorithm. J Supercomput 72, 4181–4203 (2016). https://doi.org/10.1007/s11227-016-1767-y
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-016-1767-y