Skip to main content

Advertisement

Log in

ZAKI: A Smart Method and Tool for Automatic Performance Optimization of Parallel SpMV Computations on Distributed Memory Machines

  • Published:
Mobile Networks and Applications Aims and scope Submit manuscript

Abstract

SpMV is a vital computing operation of many scientific, engineering, economic and social applications, increasingly being used to develop timely intelligence for the design and management of smart societies. Several factors affect the performance of SpMV computations, such as matrix characteristics, storage formats, software and hardware platforms. The complexity of the computer systems is on the rise with the increasing number of cores per processor, different levels of caches, processors per node and high speed interconnect. There is an ever-growing need for new optimization techniques and efficient ways of exploiting parallelism. In this paper, we propose ZAKI, a data-driven, machine-learning approach and tool, to predict the optimal number of processes for SpMV computations of an arbitrary sparse matrix on a distributed memory machine. The aim herein is to allow application scientists to automatically obtain the best configuration, and hence the best performance, for the execution of SpMV computations. We train and test the tool using nearly 2000 real world matrices obtained from 45 application domains including computational fluid dynamics (CFD), computer vision, and robotics. The tool uses three machine learning methods, decision trees, random forest, gradient boosting, and is evaluated in depth. A discussion on the applicability of our proposed tool to energy efficiency optimization of SpMV computations is given. This is the first work where the sparsity structure of matrices have been exploited to predict the optimal number of processes for a given matrix in distributed memory environments by using different base and ensemble machine learning methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

References

  1. Tabib MV, Rasheed A, Priya Uteng T (2017) Methodology for assessing cycling comfort during a smart city development. Energy Procedia 122:361–366

    Article  Google Scholar 

  2. Triscone G et al (2016) Computational fluid dynamics as a tool to predict the air pollution dispersion in a neighborhood – a research project to improve the quality of life in cities. Int J Sustain Dev Plan 11(4):546–557

    Article  Google Scholar 

  3. González García C, Meana-Llorián D, Pelayo G-Bustelo BC, Cueva Lovelle JM, Garcia-Fernandez N (2017) Midgar: Detection of people through computer vision in the Internet of Things scenarios to improve the security in Smart Cities, Smart Towns, and Smart Homes. Futur Gener Comput Syst 76:301–313

    Article  Google Scholar 

  4. Montemayor AS, Pantrigo JJ, Salgado L (2015) Special issue on real-time computer vision in smart cities. J Real-Time Image Process 10(4):723–724

    Article  Google Scholar 

  5. Estrada E, Maciel R, Ochoa A, Bernabe-Loranca B, Oliva D, Larios V Smart City Visualization Tool for the Open Data Georeferenced Analysis Utilizing Machine Learning. Int J Comb Optim Probl Informatics 9(2):25–40

  6. Rahman A et al (2016) Cloud-Enhanced Robotic System for Smart City Crowd Control. J Sens Actuator Networks 5(4):20

    Article  Google Scholar 

  7. Aliaga DG (2012) 3D Design and Modeling of Smart Cities from a Computer Graphics Perspective. ISRN Comput Graph 2012:1–19

    Article  Google Scholar 

  8. Gade R et al (2016) Thermal imaging systems for real-time applications in smart cities. Int J Comput Appl Technol 53(4):291

    Article  Google Scholar 

  9. Akcin M, Kaygusuz A, Karabiber A, Alagoz S, Alagoz BB, Keles C (2016) Opportunities for energy efficiency in smart cities. In: 2016 4th International Istanbul Smart Grid Congress and Fair (ICSG), 2016, pp. 1–5

  10. Zappatore M, Longo A, Bochicchio MA (2017) Crowd-sensing our Smart Cities: a Platform for Noise Monitoring and Acoustic Urban Planning. J Commun Softw Syst 13(2):53

    Article  Google Scholar 

  11. Bello JP, Mydlarz C, Salamon J (2018) Sound Analysis in Smart Cities. In: Computational Analysis of Sound Scenes and Events. Springer International Publishing, Cham, pp 373–397

    Chapter  Google Scholar 

  12. Mehmood R, Meriton R, Graham G, Hennelly P, Kumar M (2017) Exploring the influence of big data on city transport operations: a Markovian approach. Int J Oper Prod Manag 37(1):75–104

    Article  Google Scholar 

  13. Mehmood R, Graham G (2015) Big Data Logistics: A health-care Transport Capacity Sharing Model. Procedia Computer Science 64:1107–1114

    Article  Google Scholar 

  14. Mehmood R, Lu JA (2011) Computational Markovian analysis of large systems. J Manuf Technol Manag 22(6):804–817

    Article  Google Scholar 

  15. Altowaijri S, Mehmood R, Williams J (2010) A Quantitative Model of Grid Systems Performance in Healthcare Organisations. Int Conf Intell Syst Model Simul:431–436

  16. Mehmood R, Alturki R, Zeadally S (2011) Multimedia applications over metropolitan area networks (MANs). J Netw Comput Appl 34(5):1518–1529

    Article  Google Scholar 

  17. El-Gorashi TEH, Pranggono B, Mehmood R, Elmirghani JMH (2008) A data mirroring technique for SANs in a metro WDM sectioned ring. In ONDM 2008 - 12th Conference on Optical Network Design and Modelling

  18. Alamoudi E, Mehmood R, Albeshri A, Gojobori T (2018) DNA profiling methods and tools: A review. In: Lecture Notes of the Institute for Computer Sciences, Social-Informatics and Telecommunications Engineering, LNICST, 224, pp. 216–231

  19. Khanum A, Alvi A, Mehmood R (2018) Towards a semantically enriched computational intelligence (SECI) framework for smart farming. In Lecture Notes of the Institute for Computer Sciences, Social-Informatics and Telecommunications Engineering, LNICST, vol. 224, pp. 247–257

  20. Aqib M, Mehmood R, Alzahrani A, Katib I, Albeshri A (2018) A Deep Learning Model to Predict Vehicles Occupancy on Freeways for Traffic Management. IJCSNS - Int J Comput Sci Netw Secur 18(12):246–254

    Google Scholar 

  21. Aqib M, Mehmood R, Albeshri A, Alzahrani A (2018) Disaster management in smart cities by forecasting traffic plan using deep learning and GPUs. in Lecture Notes of the Institute for Computer Sciences, Social-Informatics and Telecommunications Engineering, LNICST, vol. 224, pp. 139–154

  22. Arfat Y et al (2017) Enabling Smarter Societies through Mobile Big Data Fogs and Clouds. Procedia Computer Science 109

  23. Schlingensiepen J, Mehmood R, Nemtanu FC, Niculescu M (2014) Increasing Sustainability of Road Transport in European Cities and Metropolitan Areas by Facilitating Autonomic Road Transport Systems (ARTS). In Sustainable Automotive Technologies 2013 Proceedings of the 5th International Conference ICSAT 2013, pp. 201–210

  24. Alam F, Mehmood R, Katib I (2018) D2TFRS: An object recognition method for autonomous vehicles based on RGB and spatial values of pixels. In Lecture Notes of the Institute for Computer Sciences, Social-Informatics and Telecommunications Engineering, LNICST, vol. 224, pp. 155–168

  25. Arfat Y, Mehmood R, Albeshri A (2017) Parallel Shortest Path Graph Computations of United States Road Network Data on Apache Spark. In International Conference on Smart Cities, Infrastructure, Technologies and Applications, pp. 323–336

  26. Suma S, Mehmood R, Albeshri A (2018) Automatic event detection in smart cities using big data analytics,” in International Conference on Smart Cities, Infrastructure, Technologies and Applications (SCITA 2017): Lecture Notes of the Institute for Computer Sciences, Social-Informatics and Telecommunications Engineering, LNICST, Volume 224, vol. 224, pp. 111–122

  27. Suma S, Mehmood R, Albugami N, Katib I, Albeshri A (2017) Enabling Next Generation Logistics and Planning for Smarter Societies. Procedia Comput Sci 109:1122–1127

    Article  Google Scholar 

  28. R. Mehmood, B. Bhaduri, I. Katib, and I. Chlamtac (2018) Smart Societies, Infrastructure, Technologies and Applications, Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering (LNICST), Volume 224, vol. 224. Cham: Springer International Publishing

  29. Mehmood R, Alam F, Albogami NN, Katib I, Albeshri A, Altowaijri SM (2017) UTiLearn: A Personalised Ubiquitous Teaching and Learning System for Smart Societies. IEEE Access 5:2615–2635

    Article  Google Scholar 

  30. Muhammed T et al (2019) SURAA: A Novel Method and Tool for Loadbalanced and Coalesced SpMV Computations on GPUs. Appl Sci 9(5):947

    Article  Google Scholar 

  31. Alyahya H, Mehmood R, Katib I (2018) Parallel sparse matrix vector multiplication on intel MIC: Performance analysis,” in Smart Societies, Infrastructure, Technologies and Applications, SCITA 2017, Lecture Notes of the Institute for Computer Sciences, Social-Informatics and Telecommunications Engineering, LNICST, Volume 224, pp. 306–322

  32. Kwiatkowska M, Parker D, Zhang Y, Mehmood R (2004) Dual-processor parallelisation of symbolic probabilistic model checking. In: Proceedings - IEEE Computer Society’s Annual International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunications Systems, MASCOTS, pp. 123–130

  33. Mehmood R, Crowcroft J (2005) Parallel iterative solution method for large sparse linear equation systems, Technical Report Number UCAM-CL-TR-650. Computer Laboratory, University of Cambridge, Cambridge

    Google Scholar 

  34. Chen M, Mao S, Liu Y (2014) Big data: A survey. Mob Networks Appl 19(2):171–209

    Article  Google Scholar 

  35. Arfat Y et al (2017) Enabling Smarter Societies through Mobile Big Data Fogs and Clouds. Procedia - Procedia Comput Sci

  36. Alomari E, Mehmood R (2018) Analysis of tweets in Arabic language for detection of road traffic conditions. In Lecture Notes of the Institute for Computer Sciences, Social-Informatics and Telecommunications Engineering, LNICST, Volume 224, pp. 98–110

  37. Usman S, Mehmood R, Katib I (2018) Big data and HPC convergence: The cutting edge and outlook,” in Lecture Notes of the Institute for Computer Sciences, Social-Informatics and Telecommunications Engineering, LNICST, vol. 224, pp. 11–26

  38. Farber R (2018) The Convergence of Big Data and Extreme-Scale HPC. HPC Wire. Available: https://www.hpcwire.com/2018/08/31/the-convergence-of-big-data-and-extreme-scale-hpc/. [Accessed: 01-Nov-2011]

  39. Grossman M, Thiele C, Frank F, Alpak FO, Sarkar V (2016) A Survey of Sparse Matrix-Vector Multiplication Performance on Large Matrices

  40. Mehmood R (2004) Disk-based Techniques for Efficient Solution of Large Markov Chains. PhD Thesis, School of Computer Science, University of Birmingham

  41. Mehmood R, Parker D, Kwiatkowska M (2013) An efficient BDD-based implementation of Gauss-Seidel for CTMC analysis, Technical report CSR-03-13. School of Computer Science, University of Birmingham, Birmingham

    Google Scholar 

  42. Mehmood R (2003) A Survey of Out-of-Core Analysis Techniques in Stochastic Modelling, Technical Report CSR-03-7. School of Computer Science, University of Birmingham, Birningham

    Google Scholar 

  43. Intel® Math Kernel Library (Intel® MKL) (2018) Intel® Software. Available: https://software.intel.com/en-us/mkl. [Accessed: 24-Mar-2019]

  44. The Trilinos Project. Available: https://trilinos.org/publicRepo/. [Accessed: 24-Mar-2019]

  45. CUSP. Available: https://cusplibrary.github.io/. [Accessed: 24-Mar-2019]

  46. cuSPARSE. Available: https://developer.nvidia.com/cusparse. [Accessed: 24-Mar-2019]

  47. Feng X, Jin H, Zheng R, Hu K, Zeng J, Shao Z (2011) Optimization of sparse matrix-vector multiplication with variant CSR on GPUs. In: 2011 IEEE 17th International Conference on Parallel and Distributed Systems (ICPADS), pp. 165–172

  48. Kislal O, Ding W, Kandemir M, Demirkiran I (2013) Optimizing sparse matrix vector multiplication on emerging multicores. In 2013 IEEE 6th International Workshop on Multi−/Many-core Computing Systems (MuCoCoS), pp. 1–10

  49. Davis TA, Hu Y (2011) The university of Florida sparse matrix collection. ACM Trans Math Softw 38(1):1–25

    MathSciNet  Google Scholar 

  50. Nisa I, Siegel C, Rajam AS, Vishnu A, Sadayappan P (2018) Effective Machine Learning Based Format Selection and Performance Modeling for SpMV on GPUs. In 2018 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), pp. 1056–1065

  51. Benatia A, Ji W, Wang Y, Shi F (2016) Machine Learning Approach for the Predicting Performance of SpMV on GPU. In: 2016 IEEE 22nd International Conference on Parallel and Distributed Systems (ICPADS), pp. 894–901

  52. Asanovic K et al (2009) A view of the parallel computing landscape. Commun ACM 52(10):56

    Article  Google Scholar 

  53. Neil Liberman, “Decision Trees and Random Forests – Towards Data Science,” 2017.

  54. Tam G (2017) Interpreting Decision Trees and Random Forests. Pivotal Engineering Journal

  55. Lan H (2017) Decision Trees and Random Forests for Classification and Regression pt.1

  56. Donges N (2018) The Random Forest Algorithm – Towards Data Science

  57. CUI H, HIRASAWA S, KOBAYASHI H, TAKIZAWA H (2018) A Machine Learning-Based Approach for Selecting SpMV Kernels and Matrix Storage Formats. IEICE Trans Inf Syst E101.D(9):2307–2314

    Article  Google Scholar 

  58. Yilmaz B, Aktemur B, Garzarán MJ, Kamin S, Kiraç F (2016) Autotuning Runtime Specialization for Sparse Matrix-Vector Multiplication. ACM Trans Archit Code Optim 13(1):1–26

    Article  Google Scholar 

  59. K. Hou, W. C. Feng, and S. Che (2017) Auto-tuning strategies for parallelizing sparse matrix-vector (SpMV) multiplication on multi- and many-core processors. In Proceedings - 2017 IEEE 31st International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2017

  60. Lee S, Eigenmann R (2008) Adaptive runtime tuning of parallel sparse matrix-vector multiplication on distributed memory systems. in Proceedings of the 22nd annual international conference on Supercomputing - ICS ‘08, p. 195

  61. Malossi ACI, Ineichen Y, Bekas C, Curioni A, Quintana-Orti ES (2014) Performance and Energy-Aware Characterization of the Sparse Matrix-Vector Multiplication on Multithreaded Architectures. In 2014 43rd International Conference on Parallel Processing Workshops, pp. 139–148

  62. Elafrou A, Goumas GI, Koziris N (2015) A lightweight optimization selection method for Sparse Matrix-Vector Multiplication. CoRR, vol. abs/1511.0

  63. Chen S, Fang J, Chen D, Xu C, Wang Z (2018) Optimizing Sparse Matrix-Vector Multiplication on Emerging Many-Core Architectures. eprint arXiv:1805.11938

  64. Bhowmick S, Eijkhout V, Freund Y, Fuentes E, Keyes D (2006) Application of machine learning to the selection of sparse linear solvers. Int J High Perf Comput

  65. Banu SJ (2013) Performance Analysis on Parallel Sparse Matrix Vector Multiplication Micro-Benchmark Using Dynamic Instrumentation Pintool

  66. M. Grossman, C. Thiele, M. Araya-Polo, F. Frank, F. O. Alpak, and V. Sarkar (2016) A survey of sparse matrix-vector multiplication performance on large matrices

  67. Bienz A, Calhoun J, Olson L, Snir M, Gropp W (2015) Analyzing the Performance of a Sparse Matrix Vector Multiply for Extreme Scale Computers. In sc15.supercomputing.org

Download references

Acknowledgements

This project was funded by the Deanship of Scientific Research (DSR) at King Abdulaziz University, Jeddah, under grant number RG-11-611-40. The authors, therefore, acknowledge with thanks DSR for technical and financial support. The experiments performed in this paper were executed on the Aziz supercomputer being managed by the HPC Center at the King Abdul-Aziz University.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sardar Usman.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Usman, S., Mehmood, R., Katib, I. et al. ZAKI: A Smart Method and Tool for Automatic Performance Optimization of Parallel SpMV Computations on Distributed Memory Machines. Mobile Netw Appl 28, 744–763 (2023). https://doi.org/10.1007/s11036-019-01318-3

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11036-019-01318-3

Keywords