Abstract
Widely adumbrated as patterns of parallel computation and communication, algorithmic skeletons introduce a viable solution for efficiently programming modern heterogeneous multi-core architectures equipped not only with traditional multi-core CPUs, but also with one or more programmable Graphics Processing Units (GPUs). By systematically applying algorithmic skeletons to address complex programming tasks, it is arguably possible to separate the coordination from the computation in a parallel program, and therefore subdivide a complex program into building blocks (modules, skids, or components) that can be independently created and then used in different systems to drive multiple functionalities. By exploiting such systematic division, it is feasible to automate coordination by addressing extra-functional and non-functional features such as application performance, portability, and resource utilisation from the component level in heterogeneous multi-core architectures. In this paper, we introduce a novel approach to exploit the inherent features of skeleton-based applications in order to automatically coordinate them over heterogeneous (CPU/GPU) multi-core architectures and improve their performance. Our systematic evaluation demonstrates up to one order of magnitude speed-up on heterogeneous multi-core architectures.
Similar content being viewed by others
Notes
It is noted that PEI is an open source framework located at https://github.com/mehdi-goli/MC-FastFlow-PEI.
References
Aldinucci, M., Danelutto, M., Kilpatrick, P.: Management in distributed systems: a semi-formal approach. In: Euro-Par 2007, LNCS, vol 4641, Springer, Rennes, pp 651–661 (2007)
Aldinucci, M., Campa, S., Danelutto, M., Vanneschi, M.: Behavioural skeletons in GCM: autonomic management of grid components. In: PDP 2008, IEEE, Toulouse, pp 54–63 (2008)
Aldinucci, M., Danelutto, M., Zoppi, G., Kilpatrick, P.: Advances in autonomic components & services. In: From grids to service and pervasive computing, Springer, pp 3–17 (2008b)
Aldinucci, M., Danelutto, M., Kilpatrick, P.: Towards hierarchical management of autonomic components: a case study. In: PDP 2009, IEEE, Weimar, pp 3–10 (2009)
Aldinucci, M., Danelutto, M., Kilpatrick, P., Torquati, M.: FastFlow: high-level and efficient streaming on multi-core. In: Pllana, S., Xhafa, F. (eds), Programming Multi-core and Many-core Computing Systems, no. 66 in Wiley Series on Parallel and Distributed Computing, Wiley (2014)
Bharadwaj, V., Ghose, D., Robertazzi, T.G.: Divisible load theory: a new paradigm for load scheduling in distributed systems. Cluster Comput. 6(1), 7–17 (2003)
Campa, S., Danelutto, M., Goli, M., González-Vélez, H., Popescu, A.M., Torquati, M.: Parallel patterns for heterogeneous CPU/GPU architectures: structured parallelism from cluster to cloud. Future Gener. Comput. Syst. 37, 354–366 (2014)
Clint Whaley, R., Petitet, A., Dongarra, J.J.: Automated empirical optimizations of software and the atlas project. Parallel Comput. 27(1), 3–35 (2001)
Cole, M.: Algorithmic skeletons: structured management of parallel computation. Research monographs in parallel & distributed computing. Pitman/MIT Press, London (1989)
Danelutto, M., Torquati, M.: Structured parallel programming with “core” FastFlow. In: CEFP2013 5th Summer School (revised selected papers), Springer, Cluj-Napoca, LNCS, vol 8606, pp 29–75 (2013)
Danelutto, M., Zoppi, G.: Behavioural skeletons meeting services. In: ICCS 2008, LNCS, vol 5101, Springer, Kraków, pp 146–153 (2008)
Donadio, S., Brodman, J., Roeder, T., Yotov, K., Barthou, D., Cohen, A., Garzarán, M.J., Padua, D., Pingali, K.: A language for the compact representation of multiple programversions. In: LCPC 2005, LNCS, vol 4339, Springer, Hawthorne, pp 136–151 (2006)
Enmyren, J., Kessler, C.W.: SkePU: a multi-backend skeleton programming library for multi-GPU systems. In: Proceedings of the fourth international workshop on High-level parallel programming and applications, ACM, pp 5–14 (2010)
Ernsting, S., Kuchen, H.: Algorithmic skeletons for multi-core, multi-GPU systems and clusters. Int. J. High Perform. Comput. Network. 7(2), 129–138 (2012)
Fialka, O., Cadik, M.: FFT and convolution performance in image filtering on GPU. CIV2006, pp. 609–614. IEEE, London (2006)
Frigo, M., Johnson, S.G.: FFTW: An adaptive software architecture for the FFT. In: ICSSP’98, IEEE, Seattle, vol 3, pp 1381–1384 (1998)
Goli, M.: Autonomic behavioural framework for structural parallelism over heterogeneous multi-core systems. PhD thesis, Robert Gordon University, Aberdeen, UK, http://hdl.handle.net/10059/1373 (2015)
Goli, M., González-Vélez, H.: Heterogeneous Algorithmic Skeletons for Fast Flowwith Seamless Coordination over Hybrid Architectures. In: PDP 2013, IEEE, Belfast,pp 148–156 (2013)
Goli, M., González-Vélez, H.: N-body computations using skeletal frameworks on multicore cpu/graphics processing unit architectures: an empirical performance evaluation. Concurr. Comput. 26(4), 972–986 (2014)
Goli, M., McCall, J., Brown, C., Janjic, V., Hammond, K.: Mapping parallel programs to heterogeneous CPU/GPU architectures using a Monte Carlo tree search. In: CEC2013, IEEE, Cancun, pp 2932–2939 (2013)
González-Vélez, H., Leyton, M.: A survey of algorithmic skeleton frameworks: high-level structured parallel programming enablers. Softw. Pract. Exp. 40(12), 1135–1160 (2010)
Hammond, K., Aldinucci, M., Brown, C., Cesarini, F., Danelutto, M.,González-Vélez, H., Kilpatrick, P., Keller, R., Rossbory, M., Shainer, G.: The ParaPhrase project: Parallel patterns for adaptive heterogeneous multicore systems. In: FMCO 2011- Revised Selected Papers, Springer, Turin, LNCS, vol 7542, pp 218–236 (2011)
Hintjens, P.: ZeroMQ: Messaging for Many Applications. O’Reilly Media, Inc, Sebastopol (2013)
Hwu, WmW: GPU Computing Gems Jade Edition. Morgan Kaufmann, (2011)
Katagiri, T., Kise, K., Honda, H., Yuba, T.: Fiber: A generalized framework for auto-tuning software. In: High Performance Computing, Springer, pp 146–159 (2003)
Kephart, J.O., Chess, D.M.: The vision of autonomic computing. Computer 36(1), 41–50 (2003)
Owens, J.D., Luebke, D., Govindaraju, N., Harris, M., Krger, J., Lefohn, A.E., Purcell, T.J.: A survey of general-purpose computation on graphics hardware. Comput. Graph. Forum 26(1), 80–113 (2007)
Pancake, C.M.: Performance tools for today’s HPC: Are we addressing the right issues? Parallel Comput. 27(11), 1403–1415 (2001)
Puschel, M., Moura, J.M., Johnson, J.R., Padua, D., Veloso, M.M., Singer, B.W., Xiong, J., Franchetti, F., Gacic, A., Voronenko, Y., et al.: SPIRAL: Code generation for DSP transforms. Proc. IEEE 93(2), 232–275 (2005)
Schaefer, C.A., Pankratius, V., Tichy, W.F.: Atune-il: An instrumentation language for auto-tuning parallel applications. In: EuroPar 2009, Springer, Delft, LNCS, vol 5704, pp 9–20 (2009)
Steuwer, M., Kegel, P., Gorlatch, S.: SkelCL-a portable skeleton library for high-level GPU programming. In: IPDPS 2011, IEEE, Anchorage, pp 1176–1182 (2011)
Author information
Authors and Affiliations
Corresponding author
Additional information
Research supported by the European Commission FP7 Project “ParaPhrase: Parallel Patterns for Adaptive Heterogeneous Multicore Systems” Under Contract No.: 288570.
Rights and permissions
About this article
Cite this article
Goli, M., González–Vélez, H. Autonomic Coordination of Skeleton-Based Applications Over CPU/GPU Multi-Core Architectures. Int J Parallel Prog 45, 203–224 (2017). https://doi.org/10.1007/s10766-016-0419-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10766-016-0419-4