Abstract
SpMV is a vital computing operation of many scientific, engineering, economic and social applications, increasingly being used to develop timely intelligence for the design and management of smart societies. Several factors affect the performance of SpMV computations, such as matrix characteristics, storage formats, software and hardware platforms. The complexity of the computer systems is on the rise with the increasing number of cores per processor, different levels of caches, processors per node and high speed interconnect. There is an ever-growing need for new optimization techniques and efficient ways of exploiting parallelism. In this paper, we propose ZAKI, a data-driven, machine-learning approach and tool, to predict the optimal number of processes for SpMV computations of an arbitrary sparse matrix on a distributed memory machine. The aim herein is to allow application scientists to automatically obtain the best configuration, and hence the best performance, for the execution of SpMV computations. We train and test the tool using nearly 2000 real world matrices obtained from 45 application domains including computational fluid dynamics (CFD), computer vision, and robotics. The tool uses three machine learning methods, decision trees, random forest, gradient boosting, and is evaluated in depth. A discussion on the applicability of our proposed tool to energy efficiency optimization of SpMV computations is given. This is the first work where the sparsity structure of matrices have been exploited to predict the optimal number of processes for a given matrix in distributed memory environments by using different base and ensemble machine learning methods.













Similar content being viewed by others
References
Tabib MV, Rasheed A, Priya Uteng T (2017) Methodology for assessing cycling comfort during a smart city development. Energy Procedia 122:361–366
Triscone G et al (2016) Computational fluid dynamics as a tool to predict the air pollution dispersion in a neighborhood – a research project to improve the quality of life in cities. Int J Sustain Dev Plan 11(4):546–557
González García C, Meana-Llorián D, Pelayo G-Bustelo BC, Cueva Lovelle JM, Garcia-Fernandez N (2017) Midgar: Detection of people through computer vision in the Internet of Things scenarios to improve the security in Smart Cities, Smart Towns, and Smart Homes. Futur Gener Comput Syst 76:301–313
Montemayor AS, Pantrigo JJ, Salgado L (2015) Special issue on real-time computer vision in smart cities. J Real-Time Image Process 10(4):723–724
Estrada E, Maciel R, Ochoa A, Bernabe-Loranca B, Oliva D, Larios V Smart City Visualization Tool for the Open Data Georeferenced Analysis Utilizing Machine Learning. Int J Comb Optim Probl Informatics 9(2):25–40
Rahman A et al (2016) Cloud-Enhanced Robotic System for Smart City Crowd Control. J Sens Actuator Networks 5(4):20
Aliaga DG (2012) 3D Design and Modeling of Smart Cities from a Computer Graphics Perspective. ISRN Comput Graph 2012:1–19
Gade R et al (2016) Thermal imaging systems for real-time applications in smart cities. Int J Comput Appl Technol 53(4):291
Akcin M, Kaygusuz A, Karabiber A, Alagoz S, Alagoz BB, Keles C (2016) Opportunities for energy efficiency in smart cities. In: 2016 4th International Istanbul Smart Grid Congress and Fair (ICSG), 2016, pp. 1–5
Zappatore M, Longo A, Bochicchio MA (2017) Crowd-sensing our Smart Cities: a Platform for Noise Monitoring and Acoustic Urban Planning. J Commun Softw Syst 13(2):53
Bello JP, Mydlarz C, Salamon J (2018) Sound Analysis in Smart Cities. In: Computational Analysis of Sound Scenes and Events. Springer International Publishing, Cham, pp 373–397
Mehmood R, Meriton R, Graham G, Hennelly P, Kumar M (2017) Exploring the influence of big data on city transport operations: a Markovian approach. Int J Oper Prod Manag 37(1):75–104
Mehmood R, Graham G (2015) Big Data Logistics: A health-care Transport Capacity Sharing Model. Procedia Computer Science 64:1107–1114
Mehmood R, Lu JA (2011) Computational Markovian analysis of large systems. J Manuf Technol Manag 22(6):804–817
Altowaijri S, Mehmood R, Williams J (2010) A Quantitative Model of Grid Systems Performance in Healthcare Organisations. Int Conf Intell Syst Model Simul:431–436
Mehmood R, Alturki R, Zeadally S (2011) Multimedia applications over metropolitan area networks (MANs). J Netw Comput Appl 34(5):1518–1529
El-Gorashi TEH, Pranggono B, Mehmood R, Elmirghani JMH (2008) A data mirroring technique for SANs in a metro WDM sectioned ring. In ONDM 2008 - 12th Conference on Optical Network Design and Modelling
Alamoudi E, Mehmood R, Albeshri A, Gojobori T (2018) DNA profiling methods and tools: A review. In: Lecture Notes of the Institute for Computer Sciences, Social-Informatics and Telecommunications Engineering, LNICST, 224, pp. 216–231
Khanum A, Alvi A, Mehmood R (2018) Towards a semantically enriched computational intelligence (SECI) framework for smart farming. In Lecture Notes of the Institute for Computer Sciences, Social-Informatics and Telecommunications Engineering, LNICST, vol. 224, pp. 247–257
Aqib M, Mehmood R, Alzahrani A, Katib I, Albeshri A (2018) A Deep Learning Model to Predict Vehicles Occupancy on Freeways for Traffic Management. IJCSNS - Int J Comput Sci Netw Secur 18(12):246–254
Aqib M, Mehmood R, Albeshri A, Alzahrani A (2018) Disaster management in smart cities by forecasting traffic plan using deep learning and GPUs. in Lecture Notes of the Institute for Computer Sciences, Social-Informatics and Telecommunications Engineering, LNICST, vol. 224, pp. 139–154
Arfat Y et al (2017) Enabling Smarter Societies through Mobile Big Data Fogs and Clouds. Procedia Computer Science 109
Schlingensiepen J, Mehmood R, Nemtanu FC, Niculescu M (2014) Increasing Sustainability of Road Transport in European Cities and Metropolitan Areas by Facilitating Autonomic Road Transport Systems (ARTS). In Sustainable Automotive Technologies 2013 Proceedings of the 5th International Conference ICSAT 2013, pp. 201–210
Alam F, Mehmood R, Katib I (2018) D2TFRS: An object recognition method for autonomous vehicles based on RGB and spatial values of pixels. In Lecture Notes of the Institute for Computer Sciences, Social-Informatics and Telecommunications Engineering, LNICST, vol. 224, pp. 155–168
Arfat Y, Mehmood R, Albeshri A (2017) Parallel Shortest Path Graph Computations of United States Road Network Data on Apache Spark. In International Conference on Smart Cities, Infrastructure, Technologies and Applications, pp. 323–336
Suma S, Mehmood R, Albeshri A (2018) Automatic event detection in smart cities using big data analytics,” in International Conference on Smart Cities, Infrastructure, Technologies and Applications (SCITA 2017): Lecture Notes of the Institute for Computer Sciences, Social-Informatics and Telecommunications Engineering, LNICST, Volume 224, vol. 224, pp. 111–122
Suma S, Mehmood R, Albugami N, Katib I, Albeshri A (2017) Enabling Next Generation Logistics and Planning for Smarter Societies. Procedia Comput Sci 109:1122–1127
R. Mehmood, B. Bhaduri, I. Katib, and I. Chlamtac (2018) Smart Societies, Infrastructure, Technologies and Applications, Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering (LNICST), Volume 224, vol. 224. Cham: Springer International Publishing
Mehmood R, Alam F, Albogami NN, Katib I, Albeshri A, Altowaijri SM (2017) UTiLearn: A Personalised Ubiquitous Teaching and Learning System for Smart Societies. IEEE Access 5:2615–2635
Muhammed T et al (2019) SURAA: A Novel Method and Tool for Loadbalanced and Coalesced SpMV Computations on GPUs. Appl Sci 9(5):947
Alyahya H, Mehmood R, Katib I (2018) Parallel sparse matrix vector multiplication on intel MIC: Performance analysis,” in Smart Societies, Infrastructure, Technologies and Applications, SCITA 2017, Lecture Notes of the Institute for Computer Sciences, Social-Informatics and Telecommunications Engineering, LNICST, Volume 224, pp. 306–322
Kwiatkowska M, Parker D, Zhang Y, Mehmood R (2004) Dual-processor parallelisation of symbolic probabilistic model checking. In: Proceedings - IEEE Computer Society’s Annual International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunications Systems, MASCOTS, pp. 123–130
Mehmood R, Crowcroft J (2005) Parallel iterative solution method for large sparse linear equation systems, Technical Report Number UCAM-CL-TR-650. Computer Laboratory, University of Cambridge, Cambridge
Chen M, Mao S, Liu Y (2014) Big data: A survey. Mob Networks Appl 19(2):171–209
Arfat Y et al (2017) Enabling Smarter Societies through Mobile Big Data Fogs and Clouds. Procedia - Procedia Comput Sci
Alomari E, Mehmood R (2018) Analysis of tweets in Arabic language for detection of road traffic conditions. In Lecture Notes of the Institute for Computer Sciences, Social-Informatics and Telecommunications Engineering, LNICST, Volume 224, pp. 98–110
Usman S, Mehmood R, Katib I (2018) Big data and HPC convergence: The cutting edge and outlook,” in Lecture Notes of the Institute for Computer Sciences, Social-Informatics and Telecommunications Engineering, LNICST, vol. 224, pp. 11–26
Farber R (2018) The Convergence of Big Data and Extreme-Scale HPC. HPC Wire. Available: https://www.hpcwire.com/2018/08/31/the-convergence-of-big-data-and-extreme-scale-hpc/. [Accessed: 01-Nov-2011]
Grossman M, Thiele C, Frank F, Alpak FO, Sarkar V (2016) A Survey of Sparse Matrix-Vector Multiplication Performance on Large Matrices
Mehmood R (2004) Disk-based Techniques for Efficient Solution of Large Markov Chains. PhD Thesis, School of Computer Science, University of Birmingham
Mehmood R, Parker D, Kwiatkowska M (2013) An efficient BDD-based implementation of Gauss-Seidel for CTMC analysis, Technical report CSR-03-13. School of Computer Science, University of Birmingham, Birmingham
Mehmood R (2003) A Survey of Out-of-Core Analysis Techniques in Stochastic Modelling, Technical Report CSR-03-7. School of Computer Science, University of Birmingham, Birningham
Intel® Math Kernel Library (Intel® MKL) (2018) Intel® Software. Available: https://software.intel.com/en-us/mkl. [Accessed: 24-Mar-2019]
The Trilinos Project. Available: https://trilinos.org/publicRepo/. [Accessed: 24-Mar-2019]
CUSP. Available: https://cusplibrary.github.io/. [Accessed: 24-Mar-2019]
cuSPARSE. Available: https://developer.nvidia.com/cusparse. [Accessed: 24-Mar-2019]
Feng X, Jin H, Zheng R, Hu K, Zeng J, Shao Z (2011) Optimization of sparse matrix-vector multiplication with variant CSR on GPUs. In: 2011 IEEE 17th International Conference on Parallel and Distributed Systems (ICPADS), pp. 165–172
Kislal O, Ding W, Kandemir M, Demirkiran I (2013) Optimizing sparse matrix vector multiplication on emerging multicores. In 2013 IEEE 6th International Workshop on Multi−/Many-core Computing Systems (MuCoCoS), pp. 1–10
Davis TA, Hu Y (2011) The university of Florida sparse matrix collection. ACM Trans Math Softw 38(1):1–25
Nisa I, Siegel C, Rajam AS, Vishnu A, Sadayappan P (2018) Effective Machine Learning Based Format Selection and Performance Modeling for SpMV on GPUs. In 2018 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), pp. 1056–1065
Benatia A, Ji W, Wang Y, Shi F (2016) Machine Learning Approach for the Predicting Performance of SpMV on GPU. In: 2016 IEEE 22nd International Conference on Parallel and Distributed Systems (ICPADS), pp. 894–901
Asanovic K et al (2009) A view of the parallel computing landscape. Commun ACM 52(10):56
Neil Liberman, “Decision Trees and Random Forests – Towards Data Science,” 2017.
Tam G (2017) Interpreting Decision Trees and Random Forests. Pivotal Engineering Journal
Lan H (2017) Decision Trees and Random Forests for Classification and Regression pt.1
Donges N (2018) The Random Forest Algorithm – Towards Data Science
CUI H, HIRASAWA S, KOBAYASHI H, TAKIZAWA H (2018) A Machine Learning-Based Approach for Selecting SpMV Kernels and Matrix Storage Formats. IEICE Trans Inf Syst E101.D(9):2307–2314
Yilmaz B, Aktemur B, Garzarán MJ, Kamin S, Kiraç F (2016) Autotuning Runtime Specialization for Sparse Matrix-Vector Multiplication. ACM Trans Archit Code Optim 13(1):1–26
K. Hou, W. C. Feng, and S. Che (2017) Auto-tuning strategies for parallelizing sparse matrix-vector (SpMV) multiplication on multi- and many-core processors. In Proceedings - 2017 IEEE 31st International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2017
Lee S, Eigenmann R (2008) Adaptive runtime tuning of parallel sparse matrix-vector multiplication on distributed memory systems. in Proceedings of the 22nd annual international conference on Supercomputing - ICS ‘08, p. 195
Malossi ACI, Ineichen Y, Bekas C, Curioni A, Quintana-Orti ES (2014) Performance and Energy-Aware Characterization of the Sparse Matrix-Vector Multiplication on Multithreaded Architectures. In 2014 43rd International Conference on Parallel Processing Workshops, pp. 139–148
Elafrou A, Goumas GI, Koziris N (2015) A lightweight optimization selection method for Sparse Matrix-Vector Multiplication. CoRR, vol. abs/1511.0
Chen S, Fang J, Chen D, Xu C, Wang Z (2018) Optimizing Sparse Matrix-Vector Multiplication on Emerging Many-Core Architectures. eprint arXiv:1805.11938
Bhowmick S, Eijkhout V, Freund Y, Fuentes E, Keyes D (2006) Application of machine learning to the selection of sparse linear solvers. Int J High Perf Comput
Banu SJ (2013) Performance Analysis on Parallel Sparse Matrix Vector Multiplication Micro-Benchmark Using Dynamic Instrumentation Pintool
M. Grossman, C. Thiele, M. Araya-Polo, F. Frank, F. O. Alpak, and V. Sarkar (2016) A survey of sparse matrix-vector multiplication performance on large matrices
Bienz A, Calhoun J, Olson L, Snir M, Gropp W (2015) Analyzing the Performance of a Sparse Matrix Vector Multiply for Extreme Scale Computers. In sc15.supercomputing.org
Acknowledgements
This project was funded by the Deanship of Scientific Research (DSR) at King Abdulaziz University, Jeddah, under grant number RG-11-611-40. The authors, therefore, acknowledge with thanks DSR for technical and financial support. The experiments performed in this paper were executed on the Aziz supercomputer being managed by the HPC Center at the King Abdul-Aziz University.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Usman, S., Mehmood, R., Katib, I. et al. ZAKI: A Smart Method and Tool for Automatic Performance Optimization of Parallel SpMV Computations on Distributed Memory Machines. Mobile Netw Appl 28, 744–763 (2023). https://doi.org/10.1007/s11036-019-01318-3
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11036-019-01318-3