Skip to main content
Log in

Autonomic Coordination of Skeleton-Based Applications Over CPU/GPU Multi-Core Architectures

  • Published:
International Journal of Parallel Programming Aims and scope Submit manuscript

Abstract

Widely adumbrated as patterns of parallel computation and communication, algorithmic skeletons introduce a viable solution for efficiently programming modern heterogeneous multi-core architectures equipped not only with traditional multi-core CPUs, but also with one or more programmable Graphics Processing Units (GPUs). By systematically applying algorithmic skeletons to address complex programming tasks, it is arguably possible to separate the coordination from the computation in a parallel program, and therefore subdivide a complex program into building blocks (modules, skids, or components) that can be independently created and then used in different systems to drive multiple functionalities. By exploiting such systematic division, it is feasible to automate coordination by addressing extra-functional and non-functional features such as application performance, portability, and resource utilisation from the component level in heterogeneous multi-core architectures. In this paper, we introduce a novel approach to exploit the inherent features of skeleton-based applications in order to automatically coordinate them over heterogeneous (CPU/GPU) multi-core architectures and improve their performance. Our systematic evaluation demonstrates up to one order of magnitude speed-up on heterogeneous multi-core architectures.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Notes

  1. It is noted that PEI is an open source framework located at https://github.com/mehdi-goli/MC-FastFlow-PEI.

References

  1. Aldinucci, M., Danelutto, M., Kilpatrick, P.: Management in distributed systems: a semi-formal approach. In: Euro-Par 2007, LNCS, vol 4641, Springer, Rennes, pp 651–661 (2007)

  2. Aldinucci, M., Campa, S., Danelutto, M., Vanneschi, M.: Behavioural skeletons in GCM: autonomic management of grid components. In: PDP 2008, IEEE, Toulouse, pp 54–63 (2008)

  3. Aldinucci, M., Danelutto, M., Zoppi, G., Kilpatrick, P.: Advances in autonomic components & services. In: From grids to service and pervasive computing, Springer, pp 3–17 (2008b)

  4. Aldinucci, M., Danelutto, M., Kilpatrick, P.: Towards hierarchical management of autonomic components: a case study. In: PDP 2009, IEEE, Weimar, pp 3–10 (2009)

  5. Aldinucci, M., Danelutto, M., Kilpatrick, P., Torquati, M.: FastFlow: high-level and efficient streaming on multi-core. In: Pllana, S., Xhafa, F. (eds), Programming Multi-core and Many-core Computing Systems, no. 66 in Wiley Series on Parallel and Distributed Computing, Wiley (2014)

  6. Bharadwaj, V., Ghose, D., Robertazzi, T.G.: Divisible load theory: a new paradigm for load scheduling in distributed systems. Cluster Comput. 6(1), 7–17 (2003)

    Article  Google Scholar 

  7. Campa, S., Danelutto, M., Goli, M., González-Vélez, H., Popescu, A.M., Torquati, M.: Parallel patterns for heterogeneous CPU/GPU architectures: structured parallelism from cluster to cloud. Future Gener. Comput. Syst. 37, 354–366 (2014)

    Article  Google Scholar 

  8. Clint Whaley, R., Petitet, A., Dongarra, J.J.: Automated empirical optimizations of software and the atlas project. Parallel Comput. 27(1), 3–35 (2001)

    Article  MATH  Google Scholar 

  9. Cole, M.: Algorithmic skeletons: structured management of parallel computation. Research monographs in parallel & distributed computing. Pitman/MIT Press, London (1989)

    MATH  Google Scholar 

  10. Danelutto, M., Torquati, M.: Structured parallel programming with “core” FastFlow. In: CEFP2013 5th Summer School (revised selected papers), Springer, Cluj-Napoca, LNCS, vol 8606, pp 29–75 (2013)

  11. Danelutto, M., Zoppi, G.: Behavioural skeletons meeting services. In: ICCS 2008, LNCS, vol 5101, Springer, Kraków, pp 146–153 (2008)

  12. Donadio, S., Brodman, J., Roeder, T., Yotov, K., Barthou, D., Cohen, A., Garzarán, M.J., Padua, D., Pingali, K.: A language for the compact representation of multiple programversions. In: LCPC 2005, LNCS, vol 4339, Springer, Hawthorne, pp 136–151 (2006)

  13. Enmyren, J., Kessler, C.W.: SkePU: a multi-backend skeleton programming library for multi-GPU systems. In: Proceedings of the fourth international workshop on High-level parallel programming and applications, ACM, pp 5–14 (2010)

  14. Ernsting, S., Kuchen, H.: Algorithmic skeletons for multi-core, multi-GPU systems and clusters. Int. J. High Perform. Comput. Network. 7(2), 129–138 (2012)

    Article  Google Scholar 

  15. Fialka, O., Cadik, M.: FFT and convolution performance in image filtering on GPU. CIV2006, pp. 609–614. IEEE, London (2006)

  16. Frigo, M., Johnson, S.G.: FFTW: An adaptive software architecture for the FFT. In: ICSSP’98, IEEE, Seattle, vol 3, pp 1381–1384 (1998)

  17. Goli, M.: Autonomic behavioural framework for structural parallelism over heterogeneous multi-core systems. PhD thesis, Robert Gordon University, Aberdeen, UK, http://hdl.handle.net/10059/1373 (2015)

  18. Goli, M., González-Vélez, H.: Heterogeneous Algorithmic Skeletons for Fast Flowwith Seamless Coordination over Hybrid Architectures. In: PDP 2013, IEEE, Belfast,pp 148–156 (2013)

  19. Goli, M., González-Vélez, H.: N-body computations using skeletal frameworks on multicore cpu/graphics processing unit architectures: an empirical performance evaluation. Concurr. Comput. 26(4), 972–986 (2014)

    Article  Google Scholar 

  20. Goli, M., McCall, J., Brown, C., Janjic, V., Hammond, K.: Mapping parallel programs to heterogeneous CPU/GPU architectures using a Monte Carlo tree search. In: CEC2013, IEEE, Cancun, pp 2932–2939 (2013)

  21. González-Vélez, H., Leyton, M.: A survey of algorithmic skeleton frameworks: high-level structured parallel programming enablers. Softw. Pract. Exp. 40(12), 1135–1160 (2010)

    Article  Google Scholar 

  22. Hammond, K., Aldinucci, M., Brown, C., Cesarini, F., Danelutto, M.,González-Vélez, H., Kilpatrick, P., Keller, R., Rossbory, M., Shainer, G.: The ParaPhrase project: Parallel patterns for adaptive heterogeneous multicore systems. In: FMCO 2011- Revised Selected Papers, Springer, Turin, LNCS, vol 7542, pp 218–236 (2011)

  23. Hintjens, P.: ZeroMQ: Messaging for Many Applications. O’Reilly Media, Inc, Sebastopol (2013)

    Google Scholar 

  24. Hwu, WmW: GPU Computing Gems Jade Edition. Morgan Kaufmann, (2011)

  25. Katagiri, T., Kise, K., Honda, H., Yuba, T.: Fiber: A generalized framework for auto-tuning software. In: High Performance Computing, Springer, pp 146–159 (2003)

  26. Kephart, J.O., Chess, D.M.: The vision of autonomic computing. Computer 36(1), 41–50 (2003)

    Article  MathSciNet  Google Scholar 

  27. Owens, J.D., Luebke, D., Govindaraju, N., Harris, M., Krger, J., Lefohn, A.E., Purcell, T.J.: A survey of general-purpose computation on graphics hardware. Comput. Graph. Forum 26(1), 80–113 (2007)

    Article  Google Scholar 

  28. Pancake, C.M.: Performance tools for today’s HPC: Are we addressing the right issues? Parallel Comput. 27(11), 1403–1415 (2001)

    Article  MATH  Google Scholar 

  29. Puschel, M., Moura, J.M., Johnson, J.R., Padua, D., Veloso, M.M., Singer, B.W., Xiong, J., Franchetti, F., Gacic, A., Voronenko, Y., et al.: SPIRAL: Code generation for DSP transforms. Proc. IEEE 93(2), 232–275 (2005)

    Article  Google Scholar 

  30. Schaefer, C.A., Pankratius, V., Tichy, W.F.: Atune-il: An instrumentation language for auto-tuning parallel applications. In: EuroPar 2009, Springer, Delft, LNCS, vol 5704, pp 9–20 (2009)

  31. Steuwer, M., Kegel, P., Gorlatch, S.: SkelCL-a portable skeleton library for high-level GPU programming. In: IPDPS 2011, IEEE, Anchorage, pp 1176–1182 (2011)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Horacio González–Vélez.

Additional information

Research supported by the European Commission FP7 Project “ParaPhrase: Parallel Patterns for Adaptive Heterogeneous Multicore Systems” Under Contract No.: 288570.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Goli, M., González–Vélez, H. Autonomic Coordination of Skeleton-Based Applications Over CPU/GPU Multi-Core Architectures. Int J Parallel Prog 45, 203–224 (2017). https://doi.org/10.1007/s10766-016-0419-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10766-016-0419-4

Keywords

Navigation