Abstract
Heterogeneous multi-core processors invest the most significant portion of their transistor budget in customized “accelerator” cores, while using a small number of conventional low-end cores for supplying computation to accelerators. To maximize performance on heterogeneous multi-core processors, programs need to expose multiple dimensions of parallelism simultaneously. Unfortunately, programming with multiple dimensions of parallelism is to date an ad hoc process, relying heavily on the intuition and skill of programmers. Formal techniques are needed to optimize multi-dimensional parallel program designs. We present a model of multi-dimensional parallel computation for steering the parallelization process on heterogeneous multi-core processors. The model predicts with high accuracy the execution time and scalability of a program using conventional processors and accelerators simultaneously. More specifically, the model reveals optimal degrees of multi-dimensional, task-level and data-level concurrency, to maximize performance across cores. We use the model to derive mappings of two full computational phylogenetics applications on a multi-processor based on the IBM Cell Broadband Engine.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
IBM Corporation. Cell Broadband Engine Architecture, Version 1.01. Technical report, (October 2006)
Fahey, M., Alam, S., Dunigan, T., Vetter, J., Worley, P.: Early Evaluation of the Cray XD1. In: Proc. of the 2005 Cray Users Group Meeting (2005)
Starbridge Systems. A Reconfigurable Computing Model for Biological Research: Application of Smith-Waterman Analysis to Bacterial Genomes. Technical report (2005)
Chamberlain, R., Miller, S., White, J., Gall, D.: Highly-Scalable Recondigurable Computing. In: Proc. of the 2005 MAPLD International Conference, Washington, DC (September 2005)
Blagojevic, F., Nikolopoulos, D., Stamatakis, A., Antonopoulos, C.: Dynamic Multigrain Parallelization on the Cell Broadband Engine. In: Proc. of the 2007 ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, San Jose, CA, pp. 90–100 (March 2007)
Culler, D., Karp, R., Patterson, D., Sahay, A., Scauser, K., Santos, E., Subramonian, R., Von Eicken, T.: LogP: Towards a Realistic Model of Parallel Computation. In: PPoPP 1993. Proc. of the 4th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (May 1993)
Bosque, J., Pastor, L.: A Parallel Computational Model for Heterogeneous Clusters. IEEE Transactions on Parallel and Distributed Systems 17(12), 1390–1400 (2006)
Girkar, M., Polychronopoulos, C.: The Hierarchical Task Graph as a Universal Intermediate Representation. International Journal of Parallel Programming 22(5), 519–551 (1994)
Gropp, W., Lusk, E.: Reproducible Measurements of MPI Performance Characteristics. In: Proc. of the 6th European PVM/MPI User’s Group Meeting, Barcelona, Spain, pp. 11–18 (September 1999)
Feng, X., Cameron, K., Buell, D.: PBPI: a high performance Implementation of Bayesian Phylogenetic Inference. In: Proc. of Supercomputing 2006, Tampa, FL (November 2006)
Feng, X., Buell, D., Rose, J., Waddell, P.: Parallel algorithms for bayesian phylogenetic inference. Journal of Parallel Distributed Computing 63(7-8), 707–718 (2003)
Feng, X., Cameron, K., Smith, B., Sosa, C.: Building the Tree of Life on Terascale Systems. In: Proc. of the 21st International Parallel and Distributed Processing Symposium, Long Beach, CA (March 2007)
Valiant, L.: A bridging model for parallel computation. Communications of the ACM 22(8), 103–111 (1990)
Cameron, K., Sun, X.: Quantifying Locality Effect in Data Access Delay: Memory LogP. In: Proc. of the 17th International Parallel and Distributed Processing Symposium, Nice, France (April 2003)
Alexandrov, A., Ionescu, M., Schauser, C., Scheiman, C.: LogGP: Incorporating Long Messages into the LogP Model: One Step Closer towards a Realistic Model for Parallel Computation. In: Proc. of the 7th Annual ACM Symposium on Parallel Algorithms and Architectures, Santa Barbara, CA, pp. 95–105 (June 1995)
Cappello, F., Etiemble, D.: MPI vs. MPI+OpenMP on the IBM SP for the NAS Benchmarks. In: Reich, S., Anderson, K.M. (eds.) Open Hypermedia Systems and Structural Computing. LNCS, vol. 1903, Springer, Heidelberg (2000)
Krawezik, G.: Performance Comparison of MPI and three OpenMP Programming Styles on Shared Memory Multiprocessors. In: Proc. of the 15th Annual ACM Symposium on Parallel Algorithms and Architectures (2003)
Sharapov, I., Kroeger, R., Delamater, G., Cheveresan, R., Ramsay, M.: A Case Study in Top-Down Performance Estimation for a Large-Scale Parallel Application. In: Proc. of the 11th ACM SIGPLAN Symposium on Pronciples and Practice of Parallel Programming, New York, pp. 81–89 (March 2006)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Blagojevic, F., Feng, X., Cameron, K.W., Nikolopoulos, D.S. (2008). Modeling Multigrain Parallelism on Heterogeneous Multi-core Processors: A Case Study of the Cell BE. In: Stenström, P., Dubois, M., Katevenis, M., Gupta, R., Ungerer, T. (eds) High Performance Embedded Architectures and Compilers. HiPEAC 2008. Lecture Notes in Computer Science, vol 4917. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-77560-7_4
Download citation
DOI: https://doi.org/10.1007/978-3-540-77560-7_4
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-77559-1
Online ISBN: 978-3-540-77560-7
eBook Packages: Computer ScienceComputer Science (R0)