Stencil computations on heterogeneous platforms for the Jacobi method: GPUs versus Cell BE

Cecilia, José M.; Abellán, José L.; Fernández, Juan; Acacio, Manuel E.; García, José M.; Ujaldón, Manuel

doi:10.1007/s11227-012-0749-y

Stencil computations on heterogeneous platforms for the Jacobi method: GPUs versus Cell BE

Published: 15 February 2012

Volume 62, pages 787–803, (2012)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

José M. Cecilia¹,
José L. Abellán²,
Juan Fernández³,
Manuel E. Acacio²,
José M. García² &
…
Manuel Ujaldón⁴

249 Accesses
7 Citations
Explore all metrics

Abstract

We are witnessing the consolidation of the heterogeneous computing in parallel computing with architectures such as Cell Broadband Engine (Cell BE) or Graphics Processing Units (GPUs) which are present in a myriad of developments for high performance computing. These platforms provide a Software Development Kit (SDK) to maximize performance at the expense of dealing with complex and low-level architectural details which makes the software development a daunting task. This paper explores stencil computations in several heterogeneous programming models like Cell SDK, CellSs, ALF and CUDA to optimize the Jacobi method for solving Laplace’s differential equation. We describe the programming techniques to extract the maximum performance on the Cell BE and the GPU, and compare their computing paradigms. Experimental results are shown on two Nvidia Teslas and one IBM BladeCenter QS20 blade which incorporates two 3.2 GHz Cell BEs v 5.1. The speed-up factor for our set of GPU optimizations reaches 3–4×, and the execution times defeat those of the Cell BE by an order of magnitude, also showing great scalability when moving towards newer GPU generations and/or more demanding problem sizes.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Applying the swept rule for solving explicit partial differential equations on heterogeneous computing systems

Article 30 May 2020

A Look at Performance and Scalability of the GPU Accelerated Sparse Linear System Solver Spliss

Evaluating Performance and Scalability of the Sparse Linear Systems Solver Spliss

References

Abellán JL, Fernández J, Acacio ME (2008) Characterizing the basic synchronization and communication operations in dual cell-based blades. In: International conference on computational science, Krakow, Poland.
Google Scholar
Amorim R, Haase G, Liebmann M, Weber dos Santos R (2009) Comparing CUDA and OpenGL implementations for a Jacobi iteration. In: Smari WW (ed) Proceedings of the 2009 high performance computing & simulation conference (HPCS’09), IEEE, New Jersey. Logos Verlag, Berlin, pp 22–32
Chapter Google Scholar
Asanovic K, Bodik R, Catanzaro BC, Gebis JJ, Husbands P, Keutzer K, Patterson DA, Plishker WL, Shalf J, Williams SW, Yelick KA (2006) The landscape of parallel computing research: a view from Berkeley. Tech rep UCB/EECS-2006-183, EECS Department, University of California, Berkeley
Christen M, Schenk O, Neufeld E, Messmer P, Burkhart H (2009) Parallel data-locality aware stencil computations on modern micro-architectures. In: Proceedings of the 2009 IEEE international symposium on parallel & distributed processing (IPDPS ’09). IEEE Computer Society, Washington, pp 1–10
Google Scholar
Datta K, Murphy M, Volkov V, Williams S, Carter J, Oliker L, Patterson D, Shalf J, Yelick K (2008) Stencil computation optimization and auto-tuning on state-of-the-art multicore architectures. In: Proceedings of the 2008 ACM/IEEE conference on supercomputing (SC ’08). IEEE Press, Piscataway, pp 1–12
Google Scholar
Demmel JW (1997) Applied numerical linear algebra. In: Society for industrial and applied mathematics. SIAM, Philadelphia
Google Scholar
Fang X, Tang Y, Wang G, Tang T, Zhang Y (2010) Optimizing stencil application on multi-thread GPU architecture using stream programming model. In: Proceedings of 23rd international conference (ARCS), Hannover, Germany, pp 234–245
Google Scholar
Gaona E, Fernández J, Acacio ME (2009) Fast and efficient synchronization and communication collective primitives for dual cell-based blades. In: Euro-Par, pp 900–911
Google Scholar
Hill J (2007) Scientific programming on the cell using ALF. Tech rep, HPCx consortium
Systems IBM Technology Group (2007) Cell broadband engine programming tutorial version 2.1
IBM Systems and Technology Group (2007) SPE runtime management library version 2.1
Intel: Array building blocks (2012). http://software.intel.com/en-us/articles/intel-array-building-blocks/
Kahle J, Day M, Hofstee H, Johns C, Maeurer T, Shippy D (2005) Introduction to the cell multiprocessor. IBM J Res Dev 49(4/5):589–604
Article Google Scholar
Lester BP (1993) The art of parallel programming. Prentice-Hall, Upper Saddle River
Google Scholar
Lindholm E, Nickolls J, Oberman S, Montrym J (2008) Nvidia tesla: a unified graphics and computing architecture. IEEE MICRO 28(2):39–55. http://doi.ieeecomputersociety.org/10.1109/MM.2008.31
Article Google Scholar
Maruyama N, Nomura T, Sato K, Matsuoka S (2011) Physis: an implicitly parallel programming model for stencil computations on large-scale GPU-accelerated supercomputers. In: Proceedings of 2011 international conference for high performance computing, networking, storage and analysis (SC ’11), New York, USA, pp 11:1–11:12
Google Scholar
McCool MD (2008) Scalable programming models for massively multicore processors. IEEE MICRO 96(5):816–831
Google Scholar
NVIDIA: (2008) NVIDIA CUDA programming guide 2.0
Owens JD, Houston M, Luebke D, Green S, Stone JE, Phillips JC (2008) Gpu computing. Proc IEEE 96(5):879–899
Article Google Scholar
Owens JD, Luebke D, Govindaraju N, Harris M, Krüger J, Lefohn AE, Purcell T (2007) A survey of general-purpose computation on graphics hardware. Comput Graph Forum 26(1):80–113
Article Google Scholar
Renganarayana L, Harthikote-matha M, Dewri R, Rajopadhye S (2007) Towards optimal multi-level tiling for stencil computations. In Proceedings of 21st IEEE international parallel and distributed processing symposium (IPDPS), Long Beach, CA, USA
Google Scholar
Stone JE, Gohara D, Shi G (2010) Opencl: A parallel programming standard for heterogeneous computing systems. IEEE Des Test Comput 12(3):66–73. http://dx.doi.org/10.1109/MCSE.2010.69
Google Scholar
Unat D, Cai X, Baden SB (2011) Mint: realizing CUDA performance in 3D stencil methods with annotated C. In: Proceedings of the international conference on supercomputing (ICS ’11). ACM, New York, pp 214–224
Google Scholar
Venkatasubramanian S, Vuduc RW, None N (2009) Tuned and wildly asynchronous stencil kernels for hybrid cpu/gpu systems. In: Proceedings of the 23rd international conference on supercomputing (ICS ’09). ACM, New York, pp 244–255
Chapter Google Scholar

Download references

Acknowledgements

This work has been jointly supported by the Fundación Séneca (Agencia Regional de Ciencia y Tecnología, Región de Murcia) under projects 00001/CS/2007, 15290/PI/2010 and under the fellowship 12461/FPI/09, by the Spanish MICINN and European Commission FEDER funds under projects Consolider Ingenio-2010 CSD2006-00046 and TIN2009-14475-C04. We also thank NVIDIA for hardware donation under Professor Partnership 2008–2010 and CUDA Teaching Center Award 2011–2012.

Author information

Authors and Affiliations

Dept. of Computer Science, Catholic University of Murcia, Murcia, Spain
José M. Cecilia
Dept. of Computer Engineering, University of Murcia, Murcia, Spain
José L. Abellán, Manuel E. Acacio & José M. García
Intel Barcelona Research Center, Intel Labs, Universitat Politècnica de Catalunya, Barcelona, Spain
Juan Fernández
Computer Architecture Department, University of Malaga, Malaga, Spain
Manuel Ujaldón

Authors

José M. Cecilia
View author publications
You can also search for this author inPubMed Google Scholar
José L. Abellán
View author publications
You can also search for this author inPubMed Google Scholar
Juan Fernández
View author publications
You can also search for this author inPubMed Google Scholar
Manuel E. Acacio
View author publications
You can also search for this author inPubMed Google Scholar
José M. García
View author publications
You can also search for this author inPubMed Google Scholar
Manuel Ujaldón
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to José M. Cecilia.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Cecilia, J.M., Abellán, J.L., Fernández, J. et al. Stencil computations on heterogeneous platforms for the Jacobi method: GPUs versus Cell BE. J Supercomput 62, 787–803 (2012). https://doi.org/10.1007/s11227-012-0749-y

Download citation

Published: 15 February 2012
Issue Date: November 2012
DOI: https://doi.org/10.1007/s11227-012-0749-y

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Stencil computations on heterogeneous platforms for the Jacobi method: GPUs versus Cell BE

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Applying the swept rule for solving explicit partial differential equations on heterogeneous computing systems

A Look at Performance and Scalability of the GPU Accelerated Sparse Linear System Solver Spliss

Evaluating Performance and Scalability of the Sparse Linear Systems Solver Spliss

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now