Abstract
Array operations are useful in a lot of scientific codes. In recent years, several applications, such as the geological analysis and the medical images processing, are processed using array operations for three-dimensional (abbreviate to “3D”) sparse arrays. Due to the huge computation time, it is necessary to compress 3D sparse arrays and use parallel computing technologies to speed up sparse array operations. How to compress the sparse arrays efficiently is an important task for practical applications. Hence, in this paper, two strategies, inter- and intra-task parallelization (abbreviate to “ETP” and “RTP”), are presented to compress 3D sparse arrays, respectively. Each strategy was designed and implemented on Intel Xeon and Xeon Phi, respectively. From experimental results, the ETP strategy achieves 17.5\(\times \) and 18.2\(\times \) speedup ratios based on Intel Xeon E5-2670 v2 and Intel Xeon Phi SE10X, respectively; 4.5\(\times \) and 4.5\(\times \) speedup ratios for the RTP strategy based on these two environments, respectively.






Similar content being viewed by others
References
Cullum JK, Willoughby RA (1985) Lanczos algorithms for large symmetric eignenvalue computations. Birkhauser, Boston
Golub GH, Loan CFV (1989) Matrix computations, 2nd edn. The John Hopkins University Press, Baltimore
Duff I, Grimes R, Lewis J (1989) Sparse matrix test problems. ACM Trans Math Softw 15(1):1–14
McKinley KS, Carr S, Tseng CW (1996) Improving data locality with loop transformations. ACM Trans Progr Lang Syst 18(4):424–453
Lin CY, Liu JS, Chung YC (2002) Efficient representation scheme for multi-dimensional array operations. IEEE Trans Comput 51(3):327–345
Chambers JE, Wilkinson PB, Kuras O et al (2011) Three-dimensional geophysical anatomy of an active landslide in Lias Group mudrocks, Cleveland Basin, UK. Geomorphology 125(4):472–484
Gateau J, Caballero MAA, Dima A et al (2013) Three-dimensional optoacoustic tomography using a conventional ultrasound linear detector array: whole-body tomographic system for small animals. Med Phys 40:013302
Lin CY, Chung YC, Liu JS (2003) Efficient data compression methods for multi-dimensional sparse array operations based on the EKMR scheme. IEEE Trans Comput 52(12):1640–1646
Harwell-Boeing collection. http://math.nist.gov/MatrixMarket/data/Harwell-Boeing/. Accessed 30 Aug 2015
Barrett R, Berry M, Chan TF et al (1994) Templates for the solution of linear systems: building blocks for the iterative methods, 2nd edn. SIAM, Philadelphia
Lin CY, Chung YC, Liu JS (2003) Efficient data parallel algorithms for multi-dimensional array operations based on the EKMR scheme for distributed memory multicomputers. IEEE Trans Parall Distr 14(7):625–639
Chang RG, Chung TR, Lee JK (2001) Parallel sparse supports for array intrinsic functions of Fortran 90. J Supercomput 18(3):305–339
Oliver T, Schmidt B, Maskell DL (2005) Reconfigurable architectures for bio-sequence database scanning on FPGAs. IEEE Trans Circ Syst II 52:851–855
Szalkowski A, Ledergerber C, Krahenbuhl P et al (2008) SWPS3—fast multi-threaded vectorized Smith-Waterman for IBM Cell/B.E. and x86/SSE2. BMC Res Notes 1:107
Liu W, Schmidt B, Voss G et al (2006) Bio-sequence database scanning on a GPU. In: Proceedings 20th IEEE International Parallel & Distributed Processing Symposium. IEEE, Rhodes Island. doi:10.1109/IPDPS.2006.1639531
Lin CY, Chung YC (2007) Data distribution schemes of sparse arrays on distributed memory multicomputers. J Supercomput 41(1):63–87
Lin CY, Chung YC (2007) Efficient data distribution schemes for multi-dimensional sparse arrays. J Inf Sci Eng 23(1):315–327
Hsu WS, Hung CL, Lin CY et al (2013) Efficient strategy for compressing sparse matrices on graphics processing units. In: International Conference on Computational Problem-Solving(ICCP). IEEE, Jiuzhai, pp 5–8. doi:10.1109/ICCPS.2013.6893496
Intel Corporation, Intel R Xeon PhiTM coprocessor instruction set architecture reference manual. September 2012, reference number 327364-001
Cramer T, Schmidl D, Klemm K et al (2012) OpenMP programming on Intel R Xeon Phi TM coprocessors: an early performance comparison. http://www.lfbs.rwth-aachen.de/marc2012/07_Cramer.pdf. Accessed 30 Aug 2015
Liu X, Smelyanskiy M, Chow E et al (2013) Efficient sparse matrix-vector multiplication on x86-based many-core processors. In: 27th International ACM Conference on International Conference on Supercomputing. ACM, New York, pp 273–282. doi:10.1145/2464996.2465013
Saule E, Kaya K, Catalyurek UV (2014) Performance evaluation of sparse matrix multiplication kernels on Intel Xeon Phi. In: Parallel processing and applied mathematics, Part I. Lecture notes in computer science, vol 8384, pp 559–570. Springer, Berlin, Heidelberg. doi:10.1007/978-3-642-55224-3_52
Cierniak M, Li W (1994) Unifying data and control transformations for distributed shared memory machines. Technical report
Press WH, Teukolsky SA, Vetterling WT et al (1996) Numerical recipes in Fortran 90: the art of parallel scientific computing. Cambridge University Press, Cambridge
Acknowledgments
Part of this work was supported by the Ministry of Science and Technology under the Grants MOST104-2221-E-182-050, MOST104-2221-E-182-051 and MOST103-2221-E-126-013. The authors would like to thank the hardware support by the Professor Che-Rung Lee who joined the Department of Computer Science at National Tsing Hua University. The authors also would like to thank other experts who discussed with us in the past.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Lin, CY., Yen, H.T. & Hung, CL. Compressing three-dimensional sparse arrays using inter- and intra-task parallelization strategies on Intel Xeon and Xeon Phi. J Supercomput 73, 3391–3410 (2017). https://doi.org/10.1007/s11227-016-1820-x
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-016-1820-x