Abstract
Optimization of data-parallel applications for modern HPC platforms requires partitioning the computations between the heterogeneous computing devices in proportion to their speed. Heterogeneous data partitioning algorithms are based on computation performance models of the executing platforms. Their implementation is not trivial as it requires: accurate and efficient benchmarking of computing devices, which may share resources and/or execute different codes; appropriate interpolation methods to predict performance; and advanced mathematical methods to solve the data partitioning problem. In this paper, we present FuPerMod, a software tool that addresses these implementation issues and automates the development of data partitioning code in data-parallel applications for heterogeneous HPC platforms.
Similar content being viewed by others
References
Aubanel E, Wu X (2007) Incorporating latency in heterogeneous graph partitioning. In: IPDPS 2007, pp 1–8
Beaumont O, Boudet V, Rastello F, Robert Y (2001) Matrix multiplication on heterogeneous platforms. IEEE Trans Parallel Distrib Syst 12(10):1033–1051
Catalyurek U, Boman E, Devine K et al (2007) Hypergraph-based dynamic load balancing for adaptive scientific computations. In: IPDPS 2007, pp 1–11
Chevalier C, Pellegrini F (2008) PT-Scotch: a tool for efficient parallel graph ordering. Parallel Comput 34(68):318–331
Clarke D, Lastovetsky A, Rychkov V (2012) Column-based matrix partitioning for parallel matrix multiplication on heterogeneous processors based on functional performance models. In: HeteroPar’2011, pp 450–459
Clarke D et al (2011) Dynamic load balancing of parallel computational iterative routines on highly heterogeneous HPC platforms. Parallel Process Lett 21:195–217
Karypis G, Schloegel K (2013) ParMETIS: parallel graph partitioning and sparse matrix ordering library. Version 4
Lastovetsky A, Reddy R (2007) Data partitioning with a functional performance model of heterogeneous processors. Int J High Perform C 21:76–90
Lastovetsky A, Reddy R (2010) Distributed data partitioning for heterogeneous processors based on partial estimation of their functional performance models. In: Euro-Par 2009, LNCS, vol 6043. Springer, pp 91–101
Malony AD, Biersdorff S, Shende S et al (2011) Parallel performance measurement of heterogeneous parallel systems with GPUs. In: ICPP ’11, pp 176–185
Rychkov V, Clarke D, Lastovetsky A (2011) Using multidimensional solvers for optimal data partitioning on dedicated heterogeneous HPC platforms. In: PaCT-2011, LNCS, vol 6873. Springer, pp 332–346
Walshaw C, Cross M (2001) Multilevel mesh partitioning for heterogeneous communication networks. Future Gener Comput Syst 17(5):601–623
Zhong Z, Rychkov V, Lastovetsky A (2012) Data partitioning on heterogeneous multicore and multi-GPU systems using functional performance models of data-parallel applications. In: Cluster, pp 191–199
Author information
Authors and Affiliations
Corresponding author
Additional information
This research is supported by Science Foundation Ireland (Grant 08/IN.1/I2054).
Rights and permissions
About this article
Cite this article
Clarke, D., Zhong, Z., Rychkov, V. et al. FuPerMod: a software tool for the optimization of data-parallel applications on heterogeneous platforms. J Supercomput 69, 61–69 (2014). https://doi.org/10.1007/s11227-014-1207-9
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-014-1207-9