skip to main content
10.1145/1188455.1188558acmconferencesArticle/Chapter ViewAbstractPublication PagesscConference Proceedingsconference-collections
Article

Hypergraph partitioning for automatic memory hierarchy management

Published: 11 November 2006 Publication History

Abstract

In this paper, we present a mechanism for automatic management of the memory hierarchy, including secondary storage, in the context of a global address space parallel programming framework. The programmer specifies the parallelism and locality in the computation. The scheduling of the computation into stages, together with the movement of the associated data between secondary storage and global memory, and between global memory and local memory, is automatically managed. A novel formulation of hypergraph partitioning is used to model the optimization problem of minimizing disk I/O. Experimental evaluation of the proposed approach using a sub-computation from the quantum chemistry domain shows a reduction in the disk I/O cost by up to a factor of 11, and a reduction in turnaround time by up to 49%, as compared to alternative approaches used in state-of-the-art quantum chemistry codes.

References

[1]
Ahmed, N., Mateev, N., And Pingali, K. 2000. Synthesizing transformations for locality enhancement of imperfectly nested loops. In Proc. ACM Intl. Conf. on Supercomputing, 141--152.
[2]
Baumgartner, G., Bernholdt, D., Cociorva, D., Harrison, R., Hirata, S., Lam, C., Nooijen, M., Pitzer, R., Ramanujam, J., And Sadayappan, P. 2002. A High-Level Approach to Synthesis of High-Performance Codes for Quantum Chemistry. In Proc. of Supercomputing 2002.
[3]
Çatalyürek, U. V., And Aykanat, C. 1996. Decomposing irregularly sparse matrices for parallel matrix-vector multiplications. In Proceedings of 3rd International Symposium on Solving Irregularly Structured Problems in Parallel, Irregular'96, Springer-Verlag, vol. 1117 of Lecture Notes in Computer Science, 75--86.
[4]
Çatalyürek, U. V., And Aykanat, C. 1999. Hypergraph-partitioning based decomposition for parallel spars e-matrix vector multiplication. IEEE TPDS 10, 7, 673--693.
[5]
Chang, C., Kurc, T., Sussman, A., Çatalyürek, U. V., And Saltz, J. 2001. A hypergraph-based workload partitioning strategy for parallel data aggregation. In Proceedings of the Eleventh SIAM Conference on Parallel Processing for Scientific Computing, SIAM.
[6]
Crawford, T., And III, H. S. 2000. An Introduction to Coupled Cluster Theory for Computational Chemists. In Reviews in Computational Chemistry, K. Lipkowitz and D. Boyd, Ed., vol. 14. John Wiley & Sons, Ltd., 33--136.
[7]
Duff, I. S., Marrone, M., Radicati, G., And Vittoli, C. 1997. Level 3 basic linear algebra subprograms for sparse matrices: a user-level interface. ACM Trans. Math. Softw. 23, 3, 379--401.
[8]
Hendrickson, B., And Leland, R. 1994. The Chaco user's guide: Version 2.0. Tech. Rep. SAND94-2692, Sandia National Laboratories.
[9]
High Performance Computational Chemistry Group. 2004. NWChem, A Computational Chemistry Package for Parallel Computers, Version 4.6. Pacific Northwest National Laboratory.
[10]
Kalé, L., And Krishnan, S. 1993. CHARM++: A Portable Concurrent Object Oriented System Based on C++. In Proceedings of OOPSLA'93, ACM Press, A. Paepcke, Ed., 91--108.
[11]
Karypis, G., Aggrawal, R., Kumar, V., And Shekhar, S. 1997. Multilevel hypergraph partitioning: Applications in VLSI domain. In Proc. of 34th Design Automation Conference.
[12]
Khanna, G., Vydyanathan, N., Kurc, T., Catalyurek, U., Wyckoff, P., Saltz, J., And Sadayappan, P. 2005. A Hypergraph Partitioning Based Approach for Scheduling of Tasks with Batch-shared I/O. In Proceedings of the 5th IEEE/ACM International Symposium on Cluster Computing and the Grid (CCGrid 2005). To Appear.
[13]
Kodukula, I., Ahmed, N., And Pingali, K. 1997. Data-centric multi-level blocking. In Proc. SIGPLAN Conf. Programming Language Design and Implementation, 346--357.
[14]
Krishnamoorthy, S., Catalyurek, U., Nieplocha, J., Rountev, A., And Sadayappan, P. 2006. An extensible global address space frame-work with decoupled task and data abstractions. In Proc. IPDPS Workshop on Next Generation Software.
[15]
Krishnan, S., Krishnamoorthy, S., Baumgartner, G., Cociorva, D., Lam, C., Sadayappan, P., Ramanujam, J., Bernholdt, D., And Choppella, V. 2003. Data Locality Optimization for Synthesis of Efficient Out-of-Core Algorithms. In Proc. 10th Annual International Conference on High Performance Computing (HiPC), Springer Verlag, 406--417.
[16]
Krishnan, S., Krishnamoorthy, S., Baumgartner, G., Lam, C.-C., Ramanujam, J., Sadayappan, P., And Choppella, V. 2004. Efficient synthesis of out-of-core algorithms for tensor contractions using a nonlinear optimization solver. In The 18th International Parallel and Distributed Processing Symposium.
[17]
Lim, A. W., And Lam, M. S. 1998. Maximizing parallelism and minimizing synchronization with affine partitions. Parallel Computing 24, 3-4 (May), 445--475.
[18]
Lim, A., Liao, S., And Lam, M. 2001. Blocking and array contraction across arbitrarily nested loops using affine partitioning. In Proc. 8th ACM SIGPLAN Symposium on Principles and Practices of Parallel Programming, ACM Press, 103--112.
[19]
Navarro, J., Juan, A., And Lang, T. 1994. MOB Forms: A Class of Multilevel Block Algorithms for Dense Linear Algebra Operations. In Proc. ACM International Conference on Supercomputing.
[20]
Randall, K. H. 1998. Cilk: Efficient Multithreaded Computing. PhD thesis, MIT Department of Electrical Engineering and Computer Science.
[21]
Sahoo, S. K., Krishnamoorthy, S., Panuganti, R., And Sadayappan, P. 2005. Integrated loop optimizations for data locality enhancement of tensor contraction expressions. In Proc. Supercomputing (SC 2005).
[22]
Saltz, J., Ponnusamy, R., Sharma, S., Moon, B., And Das, R. 1995. A manual for the CHAOS runtime library. Tech. Rep. CS-TR-3437 and UMIACS-TR-95-34, University of Maryland, Department of Computer Science and UMIACS, March.
[23]
Sinha, A., And Kalé, L. 1993. A load balancing strategy for prioritized execution of tasks. In Seventh International Parallel Processing Symposium, 230--237.
[24]
Tuminaro, R. S., Heroux, M., Hutchinson, S. A., And Shadid, J. N. 1999. Official Aztec user's guide: Version 2.1. Tech. rep., Sandia National Laboratories.

Cited By

View all
  • (2015)Hypergraph Partitioning for Parallel Sparse Matrix-Matrix MultiplicationProceedings of the 27th ACM symposium on Parallelism in Algorithms and Architectures10.1145/2755573.2755613(86-88)Online publication date: 13-Jun-2015
  • (2013)Inspector/executor load balancing algorithms for block-sparse tensor contractionsProceedings of the 27th international ACM conference on International conference on supercomputing10.1145/2464996.2467282(483-484)Online publication date: 10-Jun-2013
  • (2011)Modeling Network Transition Constraints with HypergraphsTransportation Science10.5555/1953109.195311545:1(81-97)Online publication date: 1-Feb-2011
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SC '06: Proceedings of the 2006 ACM/IEEE conference on Supercomputing
November 2006
746 pages
ISBN:0769527000
DOI:10.1145/1188455
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 November 2006

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Article

Conference

SC '06
Sponsor:

Acceptance Rates

SC '06 Paper Acceptance Rate 54 of 239 submissions, 23%;
Overall Acceptance Rate 1,516 of 6,373 submissions, 24%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)9
  • Downloads (Last 6 weeks)2
Reflects downloads up to 05 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2015)Hypergraph Partitioning for Parallel Sparse Matrix-Matrix MultiplicationProceedings of the 27th ACM symposium on Parallelism in Algorithms and Architectures10.1145/2755573.2755613(86-88)Online publication date: 13-Jun-2015
  • (2013)Inspector/executor load balancing algorithms for block-sparse tensor contractionsProceedings of the 27th international ACM conference on International conference on supercomputing10.1145/2464996.2467282(483-484)Online publication date: 10-Jun-2013
  • (2011)Modeling Network Transition Constraints with HypergraphsTransportation Science10.5555/1953109.195311545:1(81-97)Online publication date: 1-Feb-2011
  • (2011)Fault oblivious eXascale whitepaperProceedings of the 1st International Workshop on Runtime and Operating Systems for Supercomputers10.1145/1988796.1988800(17-24)Online publication date: 31-May-2011
  • (2009)Scalable work stealingProceedings of the Conference on High Performance Computing Networking, Storage and Analysis10.1145/1654059.1654113(1-11)Online publication date: 14-Nov-2009
  • (2007)Data exploration of turbulence simulations using a database clusterProceedings of the 2007 ACM/IEEE conference on Supercomputing10.1145/1362622.1362654(1-11)Online publication date: 16-Nov-2007

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media