Modeling Multigrain Parallelism on Heterogeneous Multi-core Processors: A Case Study of the Cell BE

Blagojevic, Filip; Feng, Xizhou; Cameron, Kirk W.; Nikolopoulos, Dimitrios S.

doi:10.1007/978-3-540-77560-7_4

Filip Blagojevic¹,
Xizhou Feng¹,
Kirk W. Cameron¹ &
…
Dimitrios S. Nikolopoulos¹

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 4917))

Included in the following conference series:

International Conference on High-Performance Embedded Architectures and Compilers

759 Accesses
10 Citations

Abstract

Heterogeneous multi-core processors invest the most significant portion of their transistor budget in customized “accelerator” cores, while using a small number of conventional low-end cores for supplying computation to accelerators. To maximize performance on heterogeneous multi-core processors, programs need to expose multiple dimensions of parallelism simultaneously. Unfortunately, programming with multiple dimensions of parallelism is to date an ad hoc process, relying heavily on the intuition and skill of programmers. Formal techniques are needed to optimize multi-dimensional parallel program designs. We present a model of multi-dimensional parallel computation for steering the parallelization process on heterogeneous multi-core processors. The model predicts with high accuracy the execution time and scalability of a program using conventional processors and accelerators simultaneously. More specifically, the model reveals optimal degrees of multi-dimensional, task-level and data-level concurrency, to maximize performance across cores. We use the model to derive mappings of two full computational phylogenetics applications on a multi-processor based on the IBM Cell Broadband Engine.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

IBM Corporation. Cell Broadband Engine Architecture, Version 1.01. Technical report, (October 2006)
Google Scholar
Fahey, M., Alam, S., Dunigan, T., Vetter, J., Worley, P.: Early Evaluation of the Cray XD1. In: Proc. of the 2005 Cray Users Group Meeting (2005)
Google Scholar
Starbridge Systems. A Reconfigurable Computing Model for Biological Research: Application of Smith-Waterman Analysis to Bacterial Genomes. Technical report (2005)
Google Scholar
Chamberlain, R., Miller, S., White, J., Gall, D.: Highly-Scalable Recondigurable Computing. In: Proc. of the 2005 MAPLD International Conference, Washington, DC (September 2005)
Google Scholar
Blagojevic, F., Nikolopoulos, D., Stamatakis, A., Antonopoulos, C.: Dynamic Multigrain Parallelization on the Cell Broadband Engine. In: Proc. of the 2007 ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, San Jose, CA, pp. 90–100 (March 2007)
Google Scholar
Culler, D., Karp, R., Patterson, D., Sahay, A., Scauser, K., Santos, E., Subramonian, R., Von Eicken, T.: LogP: Towards a Realistic Model of Parallel Computation. In: PPoPP 1993. Proc. of the 4th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (May 1993)
Google Scholar
Bosque, J., Pastor, L.: A Parallel Computational Model for Heterogeneous Clusters. IEEE Transactions on Parallel and Distributed Systems 17(12), 1390–1400 (2006)
Article Google Scholar
Girkar, M., Polychronopoulos, C.: The Hierarchical Task Graph as a Universal Intermediate Representation. International Journal of Parallel Programming 22(5), 519–551 (1994)
Article Google Scholar
Gropp, W., Lusk, E.: Reproducible Measurements of MPI Performance Characteristics. In: Proc. of the 6th European PVM/MPI User’s Group Meeting, Barcelona, Spain, pp. 11–18 (September 1999)
Google Scholar
Feng, X., Cameron, K., Buell, D.: PBPI: a high performance Implementation of Bayesian Phylogenetic Inference. In: Proc. of Supercomputing 2006, Tampa, FL (November 2006)
Google Scholar
Feng, X., Buell, D., Rose, J., Waddell, P.: Parallel algorithms for bayesian phylogenetic inference. Journal of Parallel Distributed Computing 63(7-8), 707–718 (2003)
Article Google Scholar
Feng, X., Cameron, K., Smith, B., Sosa, C.: Building the Tree of Life on Terascale Systems. In: Proc. of the 21st International Parallel and Distributed Processing Symposium, Long Beach, CA (March 2007)
Google Scholar
Valiant, L.: A bridging model for parallel computation. Communications of the ACM 22(8), 103–111 (1990)
Article Google Scholar
Cameron, K., Sun, X.: Quantifying Locality Effect in Data Access Delay: Memory LogP. In: Proc. of the 17th International Parallel and Distributed Processing Symposium, Nice, France (April 2003)
Google Scholar
Alexandrov, A., Ionescu, M., Schauser, C., Scheiman, C.: LogGP: Incorporating Long Messages into the LogP Model: One Step Closer towards a Realistic Model for Parallel Computation. In: Proc. of the 7th Annual ACM Symposium on Parallel Algorithms and Architectures, Santa Barbara, CA, pp. 95–105 (June 1995)
Google Scholar
Cappello, F., Etiemble, D.: MPI vs. MPI+OpenMP on the IBM SP for the NAS Benchmarks. In: Reich, S., Anderson, K.M. (eds.) Open Hypermedia Systems and Structural Computing. LNCS, vol. 1903, Springer, Heidelberg (2000)
Google Scholar
Krawezik, G.: Performance Comparison of MPI and three OpenMP Programming Styles on Shared Memory Multiprocessors. In: Proc. of the 15th Annual ACM Symposium on Parallel Algorithms and Architectures (2003)
Google Scholar
Sharapov, I., Kroeger, R., Delamater, G., Cheveresan, R., Ramsay, M.: A Case Study in Top-Down Performance Estimation for a Large-Scale Parallel Application. In: Proc. of the 11th ACM SIGPLAN Symposium on Pronciples and Practice of Parallel Programming, New York, pp. 81–89 (March 2006)
Google Scholar

Download references

Author information

Authors and Affiliations

Center for High-End Computing Systems Department of Computer Science, Virginia Tech,
Filip Blagojevic, Xizhou Feng, Kirk W. Cameron & Dimitrios S. Nikolopoulos

Authors

Filip Blagojevic
View author publications
You can also search for this author in PubMed Google Scholar
Xizhou Feng
View author publications
You can also search for this author in PubMed Google Scholar
Kirk W. Cameron
View author publications
You can also search for this author in PubMed Google Scholar
Dimitrios S. Nikolopoulos
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Per Stenström Michel Dubois Manolis Katevenis Rajiv Gupta Theo Ungerer

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Blagojevic, F., Feng, X., Cameron, K.W., Nikolopoulos, D.S. (2008). Modeling Multigrain Parallelism on Heterogeneous Multi-core Processors: A Case Study of the Cell BE. In: Stenström, P., Dubois, M., Katevenis, M., Gupta, R., Ungerer, T. (eds) High Performance Embedded Architectures and Compilers. HiPEAC 2008. Lecture Notes in Computer Science, vol 4917. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-77560-7_4

Download citation

DOI: https://doi.org/10.1007/978-3-540-77560-7_4
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-77559-1
Online ISBN: 978-3-540-77560-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics