Analysis of Task Offloading for Accelerators

Ferrer, Roger; Beltran, Vicenç; Gonzàlez, Marc; Martorell, Xavier; Ayguadé, Eduard

doi:10.1007/978-3-642-11515-8_24

Roger Ferrer²¹,
Vicenç Beltran²¹,
Marc Gonzàlez^21,22,
Xavier Martorell^21,22 &
…
Eduard Ayguadé^21,22

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 5952))

Included in the following conference series:

International Conference on High-Performance Embedded Architectures and Compilers

1320 Accesses

Abstract

As an answer to the forthcoming heterogeneous multicore and accelerator–based architectures, we have proposed some syntactic extensions to C in the form of C pragmas, based on OpenMP, that make easier for programmers to offload parts of their applications to the auxiliary processors. Offloaded tasks can be made more profitable using a simple blocking strategy. And the runtime system is used to better support computation and communication overlap, while moving data to and from accelerators.

In order to prove the feasibility and usefulness of our proposal, we have considered the IBM Cell architecture. The performance of the whole system has been evaluated using HPCC STREAM Triad and several NAS benchmarks. We present their evaluation and a detailed performance breakdown at the level of parallel regions. We also classify the parallel regions according to their suitability to be exploited in accelerators. Overall, our performance is better compared to the results obtained from the IBM compiler for the Cell processor.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Celerity: High-Level C++ for Accelerator Clusters

The Celerity High-level API: C++20 for Accelerator Clusters

Article Open access 22 April 2022

Comparing High Performance Computing Accelerator Programming Models

References

Chen, T., Raghavan, R., Dale, J., Iwata, E.: Cell Broadband Engine Architecture and its first implementation. IBM Developer Works (November 2005)
Google Scholar
NVIDIA corporation: NVIDIA CUDA Compute Unified Device Architecture Version 2.0 (2008)
Google Scholar
NVIDIA corporation: NVIDIA Tesla GPU Computing Technical Brief (2008)
Google Scholar
OpenMP Architecture Review Board: OpenMP Application Program Interface. Version 3.0 (May 2008), http://www.openmp.org
Ayguadé, E., Copty, N., Duran, A., Hoeflinger, J., Lin, Y., Massaioli, F., Teruel, X., Unnikrishnan, P., Zhang, G.: The Design of OpenMP Tasks. IEEE Transactions on Parallel and Distributed Systems 20(3), 404–418 (2009)
Article Google Scholar
Ayguadé, E., Badia, R.M., Cabrera, D., Duran, A., Gonzalez, M., Igual, F., Jimenez, D., Labarta, J., Martorell, X., Mayo, R., Perez, J.M., Quintana-Orti, E.: A Proposal to Extend the OpenMP Tasking Model for Heterogeneous Architectures. In: Fifth International Workshop on OpenMP, IWOMP (2009)
Google Scholar
Jin, H., Frumkin, M., Yan, J.: The OpenMP Implementation of NAS Parallel Benchmarks and Its Performance. Technical Report NAS-99-011, NASA Ames Research Center (1999)
Google Scholar
Kusano, K., Satoh, S., Sato, M.: Performance evaluation of the Omni OpenMP compiler. In: Third International Symposium on High Performance Computing, pp. 403–414 (2000)
Google Scholar
Ferrer, R., Gonzalez, M., Silla, F., Martorell, X., Ayguadé, E.: Evaluation of Memory Performance on the Cell BE with the SARC Programming Model. In: Proceedings of the 9th Workshop on Memory Performance: Dealing with Applications, systems, and architecture (MEDEA 2008) (October 2008)
Google Scholar
Intel Corporation: Intel Corporation’s Multicore Architecture Briefing (March 2008), http://www.intel.com/pressroom/archive/releases/20080317fact.htm
AMD Corporation: AMD 2007 Technology Analyst Day, http://www2.amd.com/us-en/assets/content_type/DownloadableAssets/FinancialA-DayNewsSummary121307FINAL.pdf
Stanford University: BrookGPU, http://graphics.stanford.edu/projects/brookgpu/
Stanford University: Brook Language, http://merrimac.stanford.edu/brook/
Group, K.O.W.: The OpenCL Specification (February 2009), http://www.khronos.org/registry/cl/
Ayguadé, E., Copty, N., Duran, A., Hoeflinger, J., Lin, Y., Massaioli, F., Su, E., Unnikrishnan, P., Zhang, G.: A Proposal for Task Parallelism in OpenMP. In: Chapman, B., Zheng, W., Gao, G.R., Sato, M., Ayguadé, E., Wang, D. (eds.) IWOMP 2007. LNCS, vol. 4935, pp. 1–12. Springer, Heidelberg (2008)
Chapter Google Scholar
Perez, J.M., Bellens, P., Badia, R.M., Labarta, J.: CellSs: Making it easier to program the Cell Broadband Engine processor. IBM Journal of Research and Development 51(5), 593–604 (2007)
Article Google Scholar
Duran, A., Pérez, J.M., Ayguadé, E., Badia, R.M., Labarta, J.: Extending the OpenMP Tasking Model to Allow Dependent Tasks. In: Eigenmann, R., de Supinski, B.R. (eds.) IWOMP 2008. LNCS, vol. 5004, pp. 111–122. Springer, Heidelberg (2008)
Chapter Google Scholar
Dolbeau, R., Bihan, S., Bodin, F.: HMPP: A Hybrid Multi-core Parallel Programming Environment. In: Workshop on General Processing Using GPUs (2006)
Google Scholar
IBM Corporation: XL C/C++ for Multicore Acceleration (January 2009), http://www-01.ibm.com/software/awdtools/xlcpp/multicore/
O’Brien, K., O’Brien, K., Sura, Z., Chen, T., Zhang, T.: Supporting OpenMP on Cell. International Journal of Parallel Programming (2008)
Google Scholar
Balart, J., Gonzalez, M., Martorell, X., Ayguadé, E., Sura, Z., Chen, T., Zhang, T., O’Brien, K., O’Brien, K.: A Novel Asynchronous Software Cache Implementation for the CELL/BE Processor. In: Adve, V., Garzarán, M.J., Petersen, P. (eds.) LCPC 2007. LNCS, vol. 5234, pp. 125–140. Springer, Heidelberg (2008)
Chapter Google Scholar
Group, T.P.: PGI Fortran & C Accelerator Programming Model (December 2008), http://www.pgroup.com/lit/whitepapers/pgi_whitepaper_accpre.pdf
Rafique, M.M., Butt, A.R., Nikolopoulos, D.S.: Dma-based prefetching for i/o-intensive workloads on the cell architecture. In: CF 2008: Proceedings of the 2008 conference on Computing frontiers, pp. 23–32. ACM, New York (2008)
Chapter Google Scholar
Chen, T., Zhang, T., Sura, Z., Gonzalez, M.: Prefetching irregular references for software cache on cell. In: CGO 2008: Proceedings of the sixth annual IEEE/ACM international symposium on Code generation and optimization, pp. 155–164. ACM, New York (2008)
Chapter Google Scholar
Ahmed, M.F., Ammar, R.A., Rajasekaran, S.: SPENK: Adding Another Level of Parallelism on the Cell Broadband Engine. In: IFMT 2008: Proceedings of the 1st international forum on Next-generation multicore/manycore technologies, pp. 1–10. ACM, New York (2008)
Chapter Google Scholar
Beltran, V., Carrera, D., Torres, J., Ayguadé, E.: CellMT: A Cooperative Multithreading Library for the Cell/B.E. In: HiPC 2009: Proceedings of the 16th Annual IEEE International Conference on High Performance Computing. IEEE Computer Society, Los Alamitos (2009)
Google Scholar
Weltzer, J., Silha, E., May, C., Frey, B., Furukawa, J., Frazier, G.: PowerPC Architecture Book V. 2.02. IBM Corporation (2005)
Google Scholar
McCalpin, J.D.: STREAM: Sustainable Memory Bandwidth in High Performance Computers (2008), http://www.cs.virginia.edu/stream
Corder, S., Sheumaker, K.: STREAM Benchmarking: Intel Xeon 5500 Nehalem vs AMD Opteron 2400 Istanbul (2009), http://www.advancedclustering.com/company-blog/stream-benchmarking.html
Corporation, I.: Intel Xeon Processor 5000 Sequence (2009), http://www.intel.com/p/en_US/products/server/processor/xeon5000
Balart, J., Gonzalez, M., Martorell, X., Ayguadé, E., Labarta, J.: Runtime Address Space Computation for SDSM Systems. In: Almási, G.S., Caşcaval, C., Wu, P. (eds.) LCPC 2006. LNCS, vol. 4382, pp. 330–344. Springer, Heidelberg (2007)
Chapter Google Scholar
Chen, T., Sura, Z., O’Brien, K., O’Brien, J.K.: Optimizing the Use of Static Buffers for DMA on a CELL Chip. In: Almási, G.S., Caşcaval, C., Wu, P. (eds.) LCPC 2006. LNCS, vol. 4382, pp. 314–329. Springer, Heidelberg (2007)
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

Barcelona Supercomputing Center, Jordi Girona, 29
Roger Ferrer, Vicenç Beltran, Marc Gonzàlez, Xavier Martorell & Eduard Ayguadé
Departament d’Arquitectura de Computadors, Univ. Politècnica de Catalunya, Jordi Girona, 1–3, Barcelona, Spain
Marc Gonzàlez, Xavier Martorell & Eduard Ayguadé

Authors

Roger Ferrer
View author publications
You can also search for this author in PubMed Google Scholar
Vicenç Beltran
View author publications
You can also search for this author in PubMed Google Scholar
Marc Gonzàlez
View author publications
You can also search for this author in PubMed Google Scholar
Xavier Martorell
View author publications
You can also search for this author in PubMed Google Scholar
Eduard Ayguadé
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Electrical and Computer Engineering, The University of Texas at Austin, 1 University Station C0803, TX 78712-0240, Austin, USA
Yale N. Patt
Dipartimento di Ingegneria della Informazione, Università di Pisa, Via Diotisalvi 2, 56100, Pisa, Italy
Pierfrancesco Foglia
IBM T.J.Watson Research Center, 19 Skyline Drive, NY 10532, Hawthorne, USA
Evelyn Duesterwald
Hewlett-Packard, Cami de Can Graells 1-21, Sant Cugat del Vallés, 08174, Barcelona, Spain
Paolo Faraboschi
Computer Architecture Department, Technical University of Catalunya (UPC), c/Jordi Girona 1-3, 08034, Barcelona, Spain
Xavier Martorell

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ferrer, R., Beltran, V., Gonzàlez, M., Martorell, X., Ayguadé, E. (2010). Analysis of Task Offloading for Accelerators. In: Patt, Y.N., Foglia, P., Duesterwald, E., Faraboschi, P., Martorell, X. (eds) High Performance Embedded Architectures and Compilers. HiPEAC 2010. Lecture Notes in Computer Science, vol 5952. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-11515-8_24

Download citation

DOI: https://doi.org/10.1007/978-3-642-11515-8_24
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-11514-1
Online ISBN: 978-3-642-11515-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics