Abstract
Many classical methods and algorithms developed when single-core CPUs dominated the parallel computing landscape, are still widely used in the changed multicore world. Two prominent examples are load balancing, which has been one of the main techniques for minimization of the computation time of parallel applications since the beginning of parallel computing, and model-based power/energy measurement techniques using performance events. In this paper, we show that in the multicore era, load balancing is no longer synonymous to optimization and present recent methods and algorithms for optimization of parallel applications for performance and energy on modern HPC platforms, which do not rely on load balancing and often return imbalanced but optimal solutions.
We also show that some fundamental assumptions about performance events, which have to be true for the model-based power/energy measurement tools to be accurate, are increasingly difficult to satisfy as the number of CPU cores increases. Therefore, energy-aware computing methods relying on these tools will be increasingly difficult to verify.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Fatica, M.: Accelerating Linpack with CUDA on heterogenous clusters. In: GPGPU-2, pp. 46–51. ACM (2009). https://doi.org/10.1145/1513895.1513901
Yang, C., Wang, F., Du, Y., et al.: Adaptive optimization for petascale heterogeneous CPU/GPU computing. In: Cluster 2010, pp. 19–28 (2010)
Ogata, Y., Endo, T., Maruyama, N., Matsuoka, S.: An efficient, model-based CPU-GPU heterogeneous FFT library. In: IPDPS 2008, pp. 1–10 (2008)
Lastovetsky, A., Reddy, R.: Data partitioning with a functional performance model of heterogeneous processors. Int. J. High Perform. Comput. Appl. 21(1), 76–90 (2007)
Rojek, K., Wyrzykowski, R.: Parallelization of 3D MPDATA algorithm using many graphics processors. In: Malyshkin, V. (ed.) PaCT 2015. LNCS, vol. 9251, pp. 445–457. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-21909-7_43
Zhong, Z., Rychkov, V., Lastovetsky, A.: Data partitioning on multicore and multi-GPU platforms using functional performance models. IEEE Trans. Comput. 64(9), 2506–2518 (2015)
Linderman, M.D., Collins, J.D., Wang, H., et al.: Merge: a programming model for heterogeneous multi-core systems. SIGPLAN Not. 43, 287–296 (2008)
Augonnet, C., Thibault, S., Namyst, R.: Automatic calibration of performance models on heterogeneous multicore architectures. In: Lin, H.-X., et al. (eds.) Euro-Par 2009. LNCS, vol. 6043, pp. 56–65. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-14122-5_9
Quintana-OrtÃ, G., Igual, F.D., Quintana-OrtÃ, E.S., van de Geijn, R.A.: Solving dense linear systems on platforms with multiple hardware accelerators. SIGPLAN Not. 44, 121–130 (2009)
Lastovetsky, A., Szustak, L., Wyrzykowski, R.: Model-based optimization of EULAG kernel on Intel Xeon Phi through load imbalancing. IEEE Trans. Parallel Distrib. Syst. 28(3), 787–797 (2017)
Luk, C.K., Hong, S., Kim, H.: Qilin: exploiting parallelism on heterogeneous multiprocessors with adaptive mapping. In: MICRO-42, pp. 45–55 (2009)
Cierniak, M., Zaki, M., Li, W.: Compile-time scheduling algorithms for heterogeneous network of workstations. Comput. J. 40, 356–372 (1997)
Kalinov, A., Lastovetsky, A.: Heterogeneous distribution of computations solving linear algebra problems on networks of heterogeneous computers. J. Parallel Distrib. Comput. 61(4), 520–535 (2001)
MartÃnez, J., Garzón, E., Plaza, A., GarcÃa, I.: Automatic tuning of iterative computation on heterogeneous multiprocessors with ADITHE. J. Supercomput. 58(2), 151–159 (2011)
Lastovetsky, A., Twamley, J.: Towards a realistic performance model for networks of heterogeneous computers. In: Ng, M.K., Doncescu, A., Yang, L.T., Leng, T. (eds.) High Performance Computational Science and Engineering. ITIFIP, vol. 172, pp. 39–57. Springer, Boston, MA (2005). https://doi.org/10.1007/0-387-24049-7_3
Lastovetsky, A., Reddy, R.: Data partitioning with a realistic performance model of networks of heterogeneous computers. In: Proceedings of the 18th International Parallel and Distributed Processing Symposium (IPDPS 2004). IEEE Computer Society, Santa Fe (2004)
Ilic, A., Pratas, F., Trancoso, P., Sousa, L.: High-performance computing on heterogeneous systems: database queries on CPU and GPU. In: High Performance Scientific Computing with Special Emphasis on Current Capabilities and Future Perspectives. IOS Press, Amsterdam (2011)
Colaço, J., Matoga, A., Ilic, A., Roma, N., Tomás, P., Chaves, R.: Transparent application acceleration by intelligent scheduling of shared library calls on heterogeneous systems. In: Wyrzykowski, R., Dongarra, J., Karczewski, K., Waśniewski, J. (eds.) PPAM 2013, part I. LNCS, vol. 8384, pp. 693–703. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-642-55224-3_65
Lastovetsky, A., Reddy, R.: Data distribution for dense factorization on computers with memory heterogeneity. Parallel Comput. 33(12), 757–779 (2007)
Clarke, D., Lastovetsky, A., Rychkov, V.: Dynamic load balancing of parallel computational iterative routines on highly heterogeneous HPC platforms. Parallel Proces. Lett. 21(02), 195–217 (2011)
AlOnazi, A., Keyes, D., Lastovetsky, A., Rychkov, V.: Design and Optimization of OpenFOAM-based CFD Applications for Hybrid and Heterogeneous HPC Platforms. arXiv preprint arXiv:1505.07630 (2015)
Clarke, D., Lastovetsky, A., Rychkov, V.: Column-based matrix partitioning for parallel matrix multiplication on heterogeneous processors based on functional performance models. In: Alexander, M., et al. (eds.) Euro-Par 2011. LNCS, vol. 7155, pp. 450–459. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-29737-3_50
FFTW: Fastest Fourier Transform in the West (2018). http://www.fftw.org/
Lastovetsky, A., Reddy, R.: New model-based methods and algorithms for performance and energy optimization of data parallel applications on homogeneous multicore clusters. IEEE Trans. Parallel Distrib. Syst. 28, 1119–1133 (2017)
Reddy, R., Lastovetsky, A.: Bi-objective optimization of data-parallel applications on homogeneous multicore clusters for performance and energy. IEEE Trans. Comput. 67, 160–177 (2018)
Khaleghzadeh, H., Reddy, R., Lastovetsky, A.: A novel data-partitioning algorithm for performance optimization of data-parallel applications on heterogeneous HPC platforms. IEEE Trans. Parallel Distrib. Syst. 29, 2176–2190 (2018)
O’Brien, K., Petri, I., Reddy, R., Lastovetsky, A., Sakellariou, R.: A survey of power and energy predictive models in HPC systems and applications. ACM Comput. Surv. 50, 37 (2017)
Shahid, A., Fahad, M., Manumachu, R.R., Lastovetsky, A.: Additivity: a selection criterion for performance events for reliable energy predictive modeling. Supercomput. Front. Innov. 4, 50–65 (2017)
Acknowledgement
This publication has emanated from research conducted with the financial support of Science Foundation Ireland (SFI) under Grant Number 14/IA/2474. This work is partially supported by EU under the COST Program Action IC1305: Network for Sustainable Ultrascale Computing (NESUS).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Lastovetsky, A. et al. (2018). How Pre-multicore Methods and Algorithms Perform in Multicore Era. In: Yokota, R., Weiland, M., Shalf, J., Alam, S. (eds) High Performance Computing. ISC High Performance 2018. Lecture Notes in Computer Science(), vol 11203. Springer, Cham. https://doi.org/10.1007/978-3-030-02465-9_37
Download citation
DOI: https://doi.org/10.1007/978-3-030-02465-9_37
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-02464-2
Online ISBN: 978-3-030-02465-9
eBook Packages: Computer ScienceComputer Science (R0)