Compressing three-dimensional sparse arrays using inter- and intra-task parallelization strategies on Intel Xeon and Xeon Phi

Lin, Chun-Yuan; Yen, Huang Ting; Hung, Che-Lun

doi:10.1007/s11227-016-1820-x

Compressing three-dimensional sparse arrays using inter- and intra-task parallelization strategies on Intel Xeon and Xeon Phi

Published: 21 July 2016

Volume 73, pages 3391–3410, (2017)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

Chun-Yuan Lin¹,
Huang Ting Yen¹ &
Che-Lun Hung²

243 Accesses
1 Citation
Explore all metrics

Abstract

Array operations are useful in a lot of scientific codes. In recent years, several applications, such as the geological analysis and the medical images processing, are processed using array operations for three-dimensional (abbreviate to “3D”) sparse arrays. Due to the huge computation time, it is necessary to compress 3D sparse arrays and use parallel computing technologies to speed up sparse array operations. How to compress the sparse arrays efficiently is an important task for practical applications. Hence, in this paper, two strategies, inter- and intra-task parallelization (abbreviate to “ETP” and “RTP”), are presented to compress 3D sparse arrays, respectively. Each strategy was designed and implemented on Intel Xeon and Xeon Phi, respectively. From experimental results, the ETP strategy achieves 17.5\(\times \) and 18.2\(\times \) speedup ratios based on Intel Xeon E5-2670 v2 and Intel Xeon Phi SE10X, respectively; 4.5\(\times \) and 4.5\(\times \) speedup ratios for the RTP strategy based on these two environments, respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Designing Parallel Sparse Matrix Transposition Algorithm Using CSR for GPUs

Parallel Efficient Sparse Matrix-Matrix Multiplication on Multicore Platforms

Parallel Iterative Solution of Large Sparse Linear Equation Systems on the Intel MIC Architecture

References

Cullum JK, Willoughby RA (1985) Lanczos algorithms for large symmetric eignenvalue computations. Birkhauser, Boston
MATH Google Scholar
Golub GH, Loan CFV (1989) Matrix computations, 2nd edn. The John Hopkins University Press, Baltimore
MATH Google Scholar
Duff I, Grimes R, Lewis J (1989) Sparse matrix test problems. ACM Trans Math Softw 15(1):1–14
Article MathSciNet MATH Google Scholar
McKinley KS, Carr S, Tseng CW (1996) Improving data locality with loop transformations. ACM Trans Progr Lang Syst 18(4):424–453
Article Google Scholar
Lin CY, Liu JS, Chung YC (2002) Efficient representation scheme for multi-dimensional array operations. IEEE Trans Comput 51(3):327–345
Article MathSciNet Google Scholar
Chambers JE, Wilkinson PB, Kuras O et al (2011) Three-dimensional geophysical anatomy of an active landslide in Lias Group mudrocks, Cleveland Basin, UK. Geomorphology 125(4):472–484
Article Google Scholar
Gateau J, Caballero MAA, Dima A et al (2013) Three-dimensional optoacoustic tomography using a conventional ultrasound linear detector array: whole-body tomographic system for small animals. Med Phys 40:013302
Article Google Scholar
Lin CY, Chung YC, Liu JS (2003) Efficient data compression methods for multi-dimensional sparse array operations based on the EKMR scheme. IEEE Trans Comput 52(12):1640–1646
Article Google Scholar
Harwell-Boeing collection. http://math.nist.gov/MatrixMarket/data/Harwell-Boeing/. Accessed 30 Aug 2015
Barrett R, Berry M, Chan TF et al (1994) Templates for the solution of linear systems: building blocks for the iterative methods, 2nd edn. SIAM, Philadelphia
Lin CY, Chung YC, Liu JS (2003) Efficient data parallel algorithms for multi-dimensional array operations based on the EKMR scheme for distributed memory multicomputers. IEEE Trans Parall Distr 14(7):625–639
Article Google Scholar
Chang RG, Chung TR, Lee JK (2001) Parallel sparse supports for array intrinsic functions of Fortran 90. J Supercomput 18(3):305–339
Article MATH Google Scholar
Oliver T, Schmidt B, Maskell DL (2005) Reconfigurable architectures for bio-sequence database scanning on FPGAs. IEEE Trans Circ Syst II 52:851–855
Article Google Scholar
Szalkowski A, Ledergerber C, Krahenbuhl P et al (2008) SWPS3—fast multi-threaded vectorized Smith-Waterman for IBM Cell/B.E. and x86/SSE2. BMC Res Notes 1:107
Article Google Scholar
Liu W, Schmidt B, Voss G et al (2006) Bio-sequence database scanning on a GPU. In: Proceedings 20th IEEE International Parallel & Distributed Processing Symposium. IEEE, Rhodes Island. doi:10.1109/IPDPS.2006.1639531
Lin CY, Chung YC (2007) Data distribution schemes of sparse arrays on distributed memory multicomputers. J Supercomput 41(1):63–87
Article Google Scholar
Lin CY, Chung YC (2007) Efficient data distribution schemes for multi-dimensional sparse arrays. J Inf Sci Eng 23(1):315–327
Google Scholar
Hsu WS, Hung CL, Lin CY et al (2013) Efficient strategy for compressing sparse matrices on graphics processing units. In: International Conference on Computational Problem-Solving(ICCP). IEEE, Jiuzhai, pp 5–8. doi:10.1109/ICCPS.2013.6893496
Intel Corporation, Intel R Xeon PhiTM coprocessor instruction set architecture reference manual. September 2012, reference number 327364-001
Cramer T, Schmidl D, Klemm K et al (2012) OpenMP programming on Intel R Xeon Phi TM coprocessors: an early performance comparison. http://www.lfbs.rwth-aachen.de/marc2012/07_Cramer.pdf. Accessed 30 Aug 2015
Liu X, Smelyanskiy M, Chow E et al (2013) Efficient sparse matrix-vector multiplication on x86-based many-core processors. In: 27th International ACM Conference on International Conference on Supercomputing. ACM, New York, pp 273–282. doi:10.1145/2464996.2465013
Saule E, Kaya K, Catalyurek UV (2014) Performance evaluation of sparse matrix multiplication kernels on Intel Xeon Phi. In: Parallel processing and applied mathematics, Part I. Lecture notes in computer science, vol 8384, pp 559–570. Springer, Berlin, Heidelberg. doi:10.1007/978-3-642-55224-3_52
Cierniak M, Li W (1994) Unifying data and control transformations for distributed shared memory machines. Technical report
Press WH, Teukolsky SA, Vetterling WT et al (1996) Numerical recipes in Fortran 90: the art of parallel scientific computing. Cambridge University Press, Cambridge
MATH Google Scholar

Download references

Acknowledgments

Part of this work was supported by the Ministry of Science and Technology under the Grants MOST104-2221-E-182-050, MOST104-2221-E-182-051 and MOST103-2221-E-126-013. The authors would like to thank the hardware support by the Professor Che-Rung Lee who joined the Department of Computer Science at National Tsing Hua University. The authors also would like to thank other experts who discussed with us in the past.

Author information

Authors and Affiliations

Department of Computer Science and Information Engineering, Chang Gung University, Taoyuan, 33302, Taiwan
Chun-Yuan Lin & Huang Ting Yen
Department of Computer Science and Communication Engineering, Providence University, Taichung, 43301, Taiwan
Che-Lun Hung

Authors

Chun-Yuan Lin
View author publications
You can also search for this author in PubMed Google Scholar
Huang Ting Yen
View author publications
You can also search for this author in PubMed Google Scholar
Che-Lun Hung
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Che-Lun Hung.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lin, CY., Yen, H.T. & Hung, CL. Compressing three-dimensional sparse arrays using inter- and intra-task parallelization strategies on Intel Xeon and Xeon Phi. J Supercomput 73, 3391–3410 (2017). https://doi.org/10.1007/s11227-016-1820-x

Download citation

Published: 21 July 2016
Issue Date: August 2017
DOI: https://doi.org/10.1007/s11227-016-1820-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Compressing three-dimensional sparse arrays using inter- and intra-task parallelization strategies on Intel Xeon and Xeon Phi

Abstract

Access this article

Similar content being viewed by others

Designing Parallel Sparse Matrix Transposition Algorithm Using CSR for GPUs

Parallel Efficient Sparse Matrix-Matrix Multiplication on Multicore Platforms

Parallel Iterative Solution of Large Sparse Linear Equation Systems on the Intel MIC Architecture

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Abstract

Access this article

Similar content being viewed by others

Designing Parallel Sparse Matrix Transposition Algorithm Using CSR for GPUs

Parallel Efficient Sparse Matrix-Matrix Multiplication on Multicore Platforms

Parallel Iterative Solution of Large Sparse Linear Equation Systems on the Intel MIC Architecture

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation