Skip to main content

Modeling Multigrain Parallelism on Heterogeneous Multi-core Processors: A Case Study of the Cell BE

  • Conference paper
High Performance Embedded Architectures and Compilers (HiPEAC 2008)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 4917))

Abstract

Heterogeneous multi-core processors invest the most significant portion of their transistor budget in customized “accelerator” cores, while using a small number of conventional low-end cores for supplying computation to accelerators. To maximize performance on heterogeneous multi-core processors, programs need to expose multiple dimensions of parallelism simultaneously. Unfortunately, programming with multiple dimensions of parallelism is to date an ad hoc process, relying heavily on the intuition and skill of programmers. Formal techniques are needed to optimize multi-dimensional parallel program designs. We present a model of multi-dimensional parallel computation for steering the parallelization process on heterogeneous multi-core processors. The model predicts with high accuracy the execution time and scalability of a program using conventional processors and accelerators simultaneously. More specifically, the model reveals optimal degrees of multi-dimensional, task-level and data-level concurrency, to maximize performance across cores. We use the model to derive mappings of two full computational phylogenetics applications on a multi-processor based on the IBM Cell Broadband Engine.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. IBM Corporation. Cell Broadband Engine Architecture, Version 1.01. Technical report, (October 2006)

    Google Scholar 

  2. Fahey, M., Alam, S., Dunigan, T., Vetter, J., Worley, P.: Early Evaluation of the Cray XD1. In: Proc. of the 2005 Cray Users Group Meeting (2005)

    Google Scholar 

  3. Starbridge Systems. A Reconfigurable Computing Model for Biological Research: Application of Smith-Waterman Analysis to Bacterial Genomes. Technical report (2005)

    Google Scholar 

  4. Chamberlain, R., Miller, S., White, J., Gall, D.: Highly-Scalable Recondigurable Computing. In: Proc. of the 2005 MAPLD International Conference, Washington, DC (September 2005)

    Google Scholar 

  5. Blagojevic, F., Nikolopoulos, D., Stamatakis, A., Antonopoulos, C.: Dynamic Multigrain Parallelization on the Cell Broadband Engine. In: Proc. of the 2007 ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, San Jose, CA, pp. 90–100 (March 2007)

    Google Scholar 

  6. Culler, D., Karp, R., Patterson, D., Sahay, A., Scauser, K., Santos, E., Subramonian, R., Von Eicken, T.: LogP: Towards a Realistic Model of Parallel Computation. In: PPoPP 1993. Proc. of the 4th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (May 1993)

    Google Scholar 

  7. Bosque, J., Pastor, L.: A Parallel Computational Model for Heterogeneous Clusters. IEEE Transactions on Parallel and Distributed Systems 17(12), 1390–1400 (2006)

    Article  Google Scholar 

  8. Girkar, M., Polychronopoulos, C.: The Hierarchical Task Graph as a Universal Intermediate Representation. International Journal of Parallel Programming 22(5), 519–551 (1994)

    Article  Google Scholar 

  9. Gropp, W., Lusk, E.: Reproducible Measurements of MPI Performance Characteristics. In: Proc. of the 6th European PVM/MPI User’s Group Meeting, Barcelona, Spain, pp. 11–18 (September 1999)

    Google Scholar 

  10. Feng, X., Cameron, K., Buell, D.: PBPI: a high performance Implementation of Bayesian Phylogenetic Inference. In: Proc. of Supercomputing 2006, Tampa, FL (November 2006)

    Google Scholar 

  11. Feng, X., Buell, D., Rose, J., Waddell, P.: Parallel algorithms for bayesian phylogenetic inference. Journal of Parallel Distributed Computing 63(7-8), 707–718 (2003)

    Article  Google Scholar 

  12. Feng, X., Cameron, K., Smith, B., Sosa, C.: Building the Tree of Life on Terascale Systems. In: Proc. of the 21st International Parallel and Distributed Processing Symposium, Long Beach, CA (March 2007)

    Google Scholar 

  13. Valiant, L.: A bridging model for parallel computation. Communications of the ACM 22(8), 103–111 (1990)

    Article  Google Scholar 

  14. Cameron, K., Sun, X.: Quantifying Locality Effect in Data Access Delay: Memory LogP. In: Proc. of the 17th International Parallel and Distributed Processing Symposium, Nice, France (April 2003)

    Google Scholar 

  15. Alexandrov, A., Ionescu, M., Schauser, C., Scheiman, C.: LogGP: Incorporating Long Messages into the LogP Model: One Step Closer towards a Realistic Model for Parallel Computation. In: Proc. of the 7th Annual ACM Symposium on Parallel Algorithms and Architectures, Santa Barbara, CA, pp. 95–105 (June 1995)

    Google Scholar 

  16. Cappello, F., Etiemble, D.: MPI vs. MPI+OpenMP on the IBM SP for the NAS Benchmarks. In: Reich, S., Anderson, K.M. (eds.) Open Hypermedia Systems and Structural Computing. LNCS, vol. 1903, Springer, Heidelberg (2000)

    Google Scholar 

  17. Krawezik, G.: Performance Comparison of MPI and three OpenMP Programming Styles on Shared Memory Multiprocessors. In: Proc. of the 15th Annual ACM Symposium on Parallel Algorithms and Architectures (2003)

    Google Scholar 

  18. Sharapov, I., Kroeger, R., Delamater, G., Cheveresan, R., Ramsay, M.: A Case Study in Top-Down Performance Estimation for a Large-Scale Parallel Application. In: Proc. of the 11th ACM SIGPLAN Symposium on Pronciples and Practice of Parallel Programming, New York, pp. 81–89 (March 2006)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Per Stenström Michel Dubois Manolis Katevenis Rajiv Gupta Theo Ungerer

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Blagojevic, F., Feng, X., Cameron, K.W., Nikolopoulos, D.S. (2008). Modeling Multigrain Parallelism on Heterogeneous Multi-core Processors: A Case Study of the Cell BE. In: Stenström, P., Dubois, M., Katevenis, M., Gupta, R., Ungerer, T. (eds) High Performance Embedded Architectures and Compilers. HiPEAC 2008. Lecture Notes in Computer Science, vol 4917. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-77560-7_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-77560-7_4

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-77559-1

  • Online ISBN: 978-3-540-77560-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics