skip to main content
10.1145/1654059.1654114acmconferencesArticle/Chapter ViewAbstractPublication PagesscConference Proceedingsconference-collections
research-article

Towards a framework for abstracting accelerators in parallel applications: experience with cell

Published:14 November 2009Publication History

ABSTRACT

While accelerators have become more prevalent in recent years, they are still considered hard to program. In this work, we extend a framework for parallel programming so that programmers can easily take advantage of the Cell processor's Synergistic Processing Elements (SPEs) as seamlessly as possible. Using this framework, the same application code can be compiled and executed on multiple platforms, including x86-based and Cell-based clusters. Furthermore, our model allows independently developed libraries to efficiently time-share one or more SPEs by interleaving work from multiple libraries. To demonstrate the framework, we present performance data for an example molecular dynamics (MD) application. When compared to a single Xeon core utilizing streaming SIMD extensions (SSE), the MD program achieves a speedup of 5.74 on a single Cell chip (with 8 SPEs). In comparison, a similar speedup of 5.89 is achieved using six Xeon (x86) cores.

References

  1. Barcelona Supercomputing Center. SMP Superscalar (SMPSs) User's Manual, July 2007. http://www.bsc.es/media/1002.pdf.Google ScholarGoogle Scholar
  2. K. J. Barker, K. Davis, A. Hoisie, D. J. Kerbyson, M. Lang, S. Pakin, and J. C. Sancho. Entering the petaflop era: the architecture and performance of roadrunner. In SC '08: Proceedings of the 2008 ACM/IEEE conference on Supercomputing, pages 1--11, Piscataway, NJ, USA, 2008. IEEE Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. P. Bellens, J. M. Perez, R. M. Badia, and J. Labarta. CellSs: A Programming Model for the Cell BE Architecture. In Proceedings of the ACM/IEEE SC 2006 Conference, November 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. A. Bhatele, S. Kumar, C. Mei, J. C. Phillips, G. Zheng, and L. V. Kale. Overcoming Scaling Challenges in Biomolecular Simulations across Multiple Platforms. In Proceedings of IEEE International Parallel and Distributed Processing Symposium 2008, April 2008.Google ScholarGoogle ScholarCross RefCross Ref
  5. E. Bohm, G. J. Martyna, A. Bhatele, S. Kumar, L. V. Kale, J. A. Gunnels, and M. E. Tuckerman. Fine Grained Parallelization of the Car-Parrinello ab initio MD Method on Blue Gene/L. IBM Journal of Research and Development: Applications of Massively Parallel Systems, 52(1/2):159--174, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. B. Bouzas, R. Cooper, J. Greene, M. Pepe, and M. J. Prelle. MultiCore Framework: An API for Programming Heterogeneous Multicore Processors. Mercury Computer System's Literature Library (http://www.mc.com/mediacenter/litlibrarylist.aspx).Google ScholarGoogle Scholar
  7. L. Dagum and R. Menon. OpenMP: An Industry-Standard API for Shared-Memory Programming. IEEE Computational Science&Engineering, 5(1), January--March 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. A. E. Eichenberger, K. O'Brien, K. O'Brien, P. Wu, T. Chen, P. H. Oden, D. A. Prener, J. C. Shepherd, B. So, Z. Sura, A. Wang, T. Zhang, P. Zhao, and M. Gschwind. Optimizing compiler for the cell processor. In PACT '05: Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques, pages 161--172, Washington, DC, USA, 2005. IEEE Computer Society. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. K. Fatahalian, T. J. Knight, M. Houston, M. Erez, D. R. Horn, L. Leem, J. Y. Park, M. Ren, A. Aiken, W. J. Dally, and P. Hanrahan. Sequoia: Programming the Memory Hierarchy. In Proceedings of the 2006 ACM/IEEE Conference on Supercomputing, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. P. Jetley, F. Gioachin, C. Mendes, L. V. Kale, and T. R. Quinn. Massively Parallel Cosmological Simulations with ChaNGa. In Proceedings of IEEE International Parallel and Distributed Processing Symposium 2008, 2008.Google ScholarGoogle ScholarCross RefCross Ref
  11. J. A. Kahle, M. N. Day, H. P. Hofstee, C. R. Johns, T. R. Maeurer, and D. Shippy. Introduction to the Cell Processor. IBM Journal of Research and Development: POWER5 and Packaging, 49(4/5):589, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. L. V. Kalé. Performance and productivity in parallel programming via processor virtualization. In Proc. of the First Intl. Workshop on Productivity and Performance in High-End Computing (at HPCA 10), Madrid, Spain, February 2004.Google ScholarGoogle Scholar
  13. L. V. Kale, G. Zheng, C. W. Lee, and S. Kumar. Scaling applications to massively parallel machines using projections performance analysis tool. In Future Generation Computer Systems Special Issue on: Large-Scale System Performance Modeling and Analysis, volume 22, pages 347--358, February 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. D. Kunzman. Charm++ on the Cell Processor. Master's thesis, Dept. of Computer Science, University of Illinois, 2006. http://charm.cs.uiuc.edu/papers/KunzmanMSThesis06.shtml.Google ScholarGoogle Scholar
  15. D. Kunzman, G. Zheng, E. Bohm, and L. V. Kalé. Charm++, Offload API, and the Cell Processor. In Workshop on Programming Models for Ubiquitous Parallelism, Seattle, WA, USA, September 2006.Google ScholarGoogle Scholar
  16. M. D. McCool. Data-parallel programming on the cell be and the gpu using the rapidmind development platform. In GSPx Multicore Applications Converence, 2006.Google ScholarGoogle Scholar
  17. M. Ohara, H. Inoue, Y. Sohda, H. Komatsu, and T. Nakatani. MPI microtask for programming the cell broadband engine#8482;processor. IBM Syst. J., 45(1):85--102, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. L. Seiler, D. Carmean, E. Sprangle, T. Forsyth, M. Abrash, P. Dubey, S. Junkins, A. Lake, J. Sugerman, R. Cavin, R. Espasa, E. Grochowski, T. Juan, and P. Hanrahan. Larrabee: a many-core x86 architecture for visual computing. ACM Trans. Graph., 27(3):1--15, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Towards a framework for abstracting accelerators in parallel applications: experience with cell

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          SC '09: Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
          November 2009
          778 pages
          ISBN:9781605587448
          DOI:10.1145/1654059

          Copyright © 2009 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 14 November 2009

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          SC '09 Paper Acceptance Rate59of261submissions,23%Overall Acceptance Rate1,516of6,373submissions,24%

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader