skip to main content
10.1145/1188455.1188543acmconferencesArticle/Chapter ViewAbstractPublication PagesscConference Proceedingsconference-collections
Article

Sequoia: programming the memory hierarchy

Published:11 November 2006Publication History

ABSTRACT

We present Sequoia, a programming language designed to facilitate the development of memory hierarchy aware parallel programs that remain portable across modern machines featuring different memory hierarchy configurations. Sequoia abstractly exposes hierarchical memory in the programming model and provides language mechanisms to describe communication vertically through the machine and to localize computation to particular memory locations within it. We have implemented a complete programming system, including a compiler and runtime systems for Cell processor-based blade systems and distributed memory clusters, and demonstrate efficient performance running Sequoia programs on both of these platforms.

References

  1. Aho, A., Sethi, R., and Ullman, J. D. 1986. Compilers: Principles, Techniques, and Tools. Addison-Wesley. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Allen, E., Chase, D., Luchangco, V., Maessen, J.-W., Ryu, S., Steele, G., and Tobin-Hochstadt., S., 2005. The Fortress language specification version 0.707. Technical report. Sun Microsystems.Google ScholarGoogle Scholar
  3. Alpern, B., Carter, L., and Ferrante, J. 1993. Modeling parallel computers as memory hierarchies. In Proc. Programming Models for Massively Parallel Computers.Google ScholarGoogle Scholar
  4. Alpern, B., Carter, L., Feig, E., and Selker, T. 1994. The uniform memory hierarchy model of computation. Algorithmica 12, 2/3, 72--109.Google ScholarGoogle Scholar
  5. Alpern, B., Carter, L., and Ferrante, J. 1995. Space-limited procedures: A methodology for portable high performance. In International Working Conference on Massively Parallel Programming Models. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Alverson, G. A., and Notkin, D. 1993. Program structuring for effective parallel portability. IEEE Trans. Parallel Distrib. Syst. 4, 9, 1041--1059. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Bikshandi, G., Guo, J., Hoeflinger, D., Almasi, G., Fraguela, B. B., Garzarn, M. J., Padua, D., and von Praun, C. 2006. Programming for parallelism and locality with hierarchically tiled arrays. In Proceedings of the Eleventh ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 48--57. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Blumofe, R., Joerg, C., Kuszmaul, B., Leiserson, C., Randall, K., and Zhou, Y. 1995. Cilk: An efficient multithreaded runtime system. In Proceedings of the 5th Symposium on Principles and Practice of Parallel Programming. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Buck, I., Foley, T., Horn, D., Sugerman, J., Fatahalian, K., Houston, M., and Hanrahan, P. 2004. Brook for GPUs: Stream computing on graphics hardware. ACM Trans. Graph. 23, 3, 777--786. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Callahan, D., Chamberlain, B. L., and Zima, H. P. 2004. The Cascade high productivity language. In Ninth International Workshop on High-Level Parallel Programming Models and Supportive Environments, IEEE Computer Society, 52--60.Google ScholarGoogle Scholar
  11. Carlson, W. W., Draper, J. M., Culler, D. E., Yelick, K., Brooks, E., and Warren, K., 1999. Introduction to UPC and language specification. University of California-Berkeley Technical Report: CCS-TR-99-157.Google ScholarGoogle Scholar
  12. Charles, P., Grothoff, C., Saraswat, V., Donawa, C., Kielstra, A., Ebcioglu, K., von Praun, C., and Sarkar, V. 2005. X10: An object-oriented approach to nonuniform cluster computing. In OOPSLA '05: Proceedings of the 20th Annual ACM SIGPLAN Conference on Object Oriented Programming Systems Languages and Applications, 519--538. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Chow, A., Fossum, G., and Brokenshire, D., 2005. A programming example: Large FFT on the Cell Broadband Engine.Google ScholarGoogle Scholar
  14. Culler, D. E., Arpaci-Dusseau, A. C., Goldstein, S. C., Krishnamurthy, A., Lumetta, S., Von Eicken, T., and Yelick, K. A. 1993. Parallel programming in Split-C. In Supercomputing, 262--273. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Dagum, L., and Menon, R. 1998. OpenMP: An industry-standard API for shared-memory programming. IEEE Comput. Sci. Eng. 5, 1, 46--55. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Dally, W. J., Hanrahan, P., Erez, M., Knight, T. J., Labonte, F., Ahn, J.-H. Jayasena, N., Kapasi, U. J., Das, A., Gummaraju, J., and Buck, I. 2003. Merrimac: Supercomputing with streams. In Proceedings of the 2003 ACM/IEEE Conference on Supercomputing, 35. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Deitz, S. J., Chamberlain, B. L., and Snyder, L. 2004. Abstractions for dynamic data distribution. In Ninth International Workshop on High-Level Parallel Programming Models and Supportive Environments, IEEE Computer Society, 42--51.Google ScholarGoogle Scholar
  18. Eager, D. L., and Jahorjan, J. 1993. Chores: Enhanced run-time support for shared-memory parallel computing. ACM Trans. Comput. Syst. 11, 1, 1--32. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Frigo, M., and Strumpen, V. 2005. Cache oblivious stencil computations. In ICS '05: Proceedings of the 19th Annual International Conference on Supercomputing, 361--366. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Frigo, M., Leiserson, C. E., Prokop, H., and Ramachandran, S. 1999. Cache-oblivious algorithms. In FOCS '99: Proceedings of the 40th Annual Symposium on Foundations of Computer Science, IEEE Computer Society, Washington, DC, USA, 285. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Frigo, M. 1999. A fast Fourier transform compiler. In Proc. 1999 ACM SIGPLAN Conf. on Programming Language Design and Implementation, vol. 34, 169--180. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Fukushige, T., Makino, J., and Kawai, A. 2005. GRAPE-6A: A Single-Card GRAPE-6 for Parallel PC-GRAPE Cluster Systems. Publications of the Astronomical Society of Japan 57 (dec), 1009--1021.Google ScholarGoogle Scholar
  23. Gustavson, F. G. 1997. Recursion leads to automatic variable blocking for dense linear-algebra algorithms. IBM J. Res. Dev. 41, 6, 737--756. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Guyer, S. Z., and Lin, C. 1999. An annotation language for optimizing software libraries. In Second Conference on Domain-Specific Languages, 39--52. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Horn, D. R., Houston, M., and Hanrahan, P. 2005. ClawHMMER: A streaming HMMer-search implementation. In Proceedings of the 2005 ACM/IEEE Conference on Supercomputing, IEEE Computer Society, Washington, DC, USA, 11. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Intel, 2005. Math kernel library. http://www.intel.com/software/products/mkl.Google ScholarGoogle Scholar
  27. Jia-Wei, H., and Kung, H. T. 1981. I/O complexity: The red-blue pebble game. In STOC '81: Proceedings of the Thirteenth Annual ACM Symposium on Theory of Computing, 326--333. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Kapasi, U., Dally, W. J., Rixner, S., Owens, J. D., and Khailany, B. 2002. The Imagine stream processor. In Proceedings 2002 IEEE International Conference on Computer Design, 282--288. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Kennedy, K., Broom, B., Cooper, K., Dongarra, J., Fowler, R., Gannon, D., Johnsson, L., Mellor-Crummey, J., and Torczon, L. 2001. Telescoping languages: A strategy for automatic generation of scientific problem-solving systems from annotated libraries. Journal of Parallel Distributed Computing 61 (December), 1803--1826.Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Labonte, F., Mattson, P., Buck, I., Kozyrakis, C., and Horowitz, M. 2004. The stream virtual machine. In Proceedings of the 2004 International Conference on Parallel Architectures and Compilation Techniques. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Lim, A. W., Liao, S.-W., and Lam, M. S. 2001. Blocking and array contraction across arbitrarily nested loops using affine partitioning. In Proceedings of the Eighth ACM SIGPLAN Symposium on Principles and Practices of Parallel Programming, 103--112. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Mattson, P. 2002. A Programming System for the Imagine Media Processor. PhD thesis, Stanford University. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. McPeak, S., and Wilkerson, D., 2005. Elsa: The Elkhound-based C/C++ Parser. http://www.cs.berkeley.edu/~smcpeak/elkhound.Google ScholarGoogle Scholar
  34. Numrich, R. W., and Reid, J. 1998. Co-array Fortran for parallel programming. SIGPLAN Fortran Forum 17, 2, 1--31. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Pham, D., Asano, S., Bolliger, M., Day, M. N., Hofstee, H. P., Johns, C., Kahle, J., Kameyama, A., Keaty, J., Masubuchi, Y., Riley, M., Shippy, D., Stasiak, D., Suzuoki, M., Wang, M., Warnock, J., Weitzel, S., Wendel, D., Yamazaki, T., and Yazawa, K. 2005. The design and implementation of a first-generation CELL processor. In IEEE International Solid-State Circuits Conference.Google ScholarGoogle Scholar
  36. Vitter, J. S. 2002. External memory algorithms. In Handbook of Massive Data Sets, Kluwer Academic Publishers, Norwell, MA, USA, 359--416. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Whaley, R. C., Petitet, A., and Dongarra, J. J. 2001. Automated empirical optimization of software and the ATLAS project. Parallel Computing 27, 1--2, 3--35.Google ScholarGoogle ScholarCross RefCross Ref
  38. Yelick, K., Semenzato, L., Pike, G., Miyamoto, C., Liblit, B., Krishnamurthy, A., Hilfinger, P., Graham, S., Gay, D., Colella, P., and Aiken, A. 1998. Titanium: A high-performance Java dialect. In ACM 1998 Workshop on Java for High-Performance Network Computing.Google ScholarGoogle Scholar

Index Terms

  1. Sequoia: programming the memory hierarchy

              Recommendations

              Comments

              Login options

              Check if you have access through your login credentials or your institution to get full access on this article.

              Sign in
              • Published in

                cover image ACM Conferences
                SC '06: Proceedings of the 2006 ACM/IEEE conference on Supercomputing
                November 2006
                746 pages
                ISBN:0769527000
                DOI:10.1145/1188455

                Copyright © 2006 ACM

                Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

                Publisher

                Association for Computing Machinery

                New York, NY, United States

                Publication History

                • Published: 11 November 2006

                Permissions

                Request permissions about this article.

                Request Permissions

                Check for updates

                Qualifiers

                • Article

                Acceptance Rates

                SC '06 Paper Acceptance Rate54of239submissions,23%Overall Acceptance Rate1,516of6,373submissions,24%

              PDF Format

              View or Download as a PDF file.

              PDF

              eReader

              View online with eReader.

              eReader

              HTML Format

              View this article in HTML Format .

              View HTML Format