Abstract
Software developers for distributed memory multiprocessors often complain about the lack of libraries and tools for developing and performance tuning their applications. While some tools exist for regular array-based computations, support for applications with pointer-based data structures, asynchronous communication patterns, or unpredictable computational costs is seriously lacking. In this paper we describe our experience with six irregular applications from CAD, Robotics, Genetics, Physics, and Computer Science, and offer them as application challenges for other systems that support irregular applications. The applications vary in the amount and kind of irregularity. We characterize their irregularity profiles and the implementation problems that arise from those profiles. In addition to performance, one of our goals is to provide implementations that run efficiently with minimal performance tuning across machine platforms, and our designs are influenced by this desire for performance portability. Each of our applications is organized around one or two distributed data structures, which are part of the Multipol data structure library. We describe these data structures, give an overview of some key features in our underlying runtime support, and present performance results for the applications on three platforms.
This work was supported in part by the Advanced Research Projects Agency of the Department of Defense under contracts DABT63-92-C-0026 and F30602-95-C-0136, by the Department of Energy grant DE-FG03-94ER25206, and by the National Science Foundation grants CCR-9210260, CDA-8722788, and CDA-9401156. The information presented here does not necessarily reflect the position or the policy of the Government and no official endorsement should be inferred.
Preview
Unable to display preview. Download preview PDF.
References
Andrew A. Chien. Concurrent Aggregates: Supporting Modularity in Massively-Parallel Programs. MIT Press, Cambridge, MA, 1993.
S. B. Baden. Programming Abstractions for Dynamically Partitioning and Coordinating Localized Scientific Calculations Running on Multiprocessors. SIAM J. Sci. Stat. Comput., 12(1):145–157, 1991.
F. Bodin, P. Beckman, D. Gannon, S. Yang, S. Kesavan, A. Maloney, and B. Mohr. Implementing a Parallel C++ Runtime System for Scalable Parallel System. In Supercomputing '93, pages 588–597, Portland, Oregon, November 1993.
Robert D. Blumofe and Charles E. Leiserson. Scheduling Multithreaded Computations by Work Stealing. In Thirty-Fifth Annual Symposium on Foundations of Computer Science (FOCS '94), pages 356–368, November 1994.
Peter Carlin, Mani Chandy, and Carl Kesselman. The Compositional C++ Language Definition. Technical Report CS-TR-92-02, California Institute of Technology, 1992.
David E. Culler, Andrea Dusseau, Seth Copen Goldstein, Arvind Krishnamurthy, Steven Lumetta, Thorsten von Eicken, and Katherine Yelick. Parallel Programming in Split-C. In Supercomputing '93, pages 262–273, Portland, Oregon, November 1993.
J. Choi, J. Dongarra, R. Pozo, and D. Walker. ScaLAPACK: A Scalable Linear Algebra Library for Distributed Memory Concurrent Computers. In Symposium on the Frontiers of Massively Parallel Computation, McLean, VA, October 1992.
K. M. Chandy and J. Misra. Asynchronous Distributed Simulation via a Sequence of Parallel Computations. Communications of the ACM, 24(11), April 1981.
Soumen Chakrabarti, Abhiram Ranade, and Katherine Yelick. Randomized Load Balancing for Tree-structured Computation. In Proceedings of the Scalable High Performance Computing Conference, Knoxville, TN, May 1994.
D. Culler, A. Sah, K. Schauser, T. von Eicken, and J. Wawrzynek. Fine-grain Parallelism with Minimal Hardware Support: A Compiler-Controlled Threaded Abstract Machine. In Proc. of 4th Int. Conf. on Architectural Support for Programming Languages and Operating Systems, Santa-Clara, CA, April 1991. (Also available as Technical Report UCB/CSD 91/594, CS Div., University of California at Berkeley).
Soumen Chakrabarti and Katherine Yelick. Implementing an Irregular Application on a Distributed Memory Multiprocessor. In Proceedings of the 1993 Conference on Principles and Practice of Parallel Programming, San Diego, CA, May 1993.
Soumen Chakrabarti and Katherine Yelick. On the Correctness of a Distributed Memory Gröbner Basis Computation. In Rewriting Techniques and Applications, Montreal, Canada, June 1993.
Soumen Chakrabarti and Katherine Yelick. Distributed Data Structures and Algorithms for Gröbner Basis Computation. Lisp and Symbolic Computation, 1994.
J. Demmel, I. Dhillon, and H. Ren. On the Correctness of Parallel Bisection in Floating Point. Tech Report UCB//CSD-94-805, UC Berkeley Computer Science Division, March 1994. available via anonymous ftp from tr-ftp.cs.berkeley.edu, in directory pub/tech-reports/csd/csd-94-805, file all.ps.
Raja Das, Mustafa Uysal, Joel Saltz, and Yuan-Shin Hwang. Communication Optimizations for Irregular Scientific Computations on Distributed Memory Architectures. Journal of Parallel and Distributed Computing, September 1994.
J.A. Feldman, C.C. Lim, and T. Rauber. The Shared-memory Language pSather on a Distributed-memory Multiprocessor. In Workshop on Languages, Compilers and Run-Time Environments for Distributed Memory Multiprocessors. Boulder, CO, September 1992.
Jeff Jones. Exploiting Parallelism in the Perfect Phylogeny Computation. Master's thesis. University of California, Berkeley, Computer Science Division, December 1994.
J. Jones and K. Yelick. Parallelizing the Phylogeny Problem. In Supercomputing '95, December 1995.
Arvind Krishnamurthy and Katherine Yelick. Optimizing parallel SPMD programs. In Proceedings of the Workshop on Languages and Compilers for Parallel Computing, August 1994.
Wei Shu and L.V. Kalé. Chare kernel — a runtime support system for parallel computations. Journal of Parallel and Distributed Computing, 11:198–211, 1991.
Daniel J. Scales and Monica S. Lam. A flexible shared memory system for distributed memory machines. Unpublished manuscript, 1993.
Stephen Steinberg. Parallelizing a cell simulation: Analysis, abstraction, and portability. Master's thesis, University of California, Berkeley, Computer Science Division, December 1994.
Chih-Po Wen, Soumen Chakrabarti, Etienne Deprit, Arvind Krishnamurthy, and Katherine Yelick. Runtime Support for Portable Distributed Data Structures. In Third Workshop on Languages, Compilers, and Run-Time Systems for Scalable Computers (LCR), May 1995. Boleslaw K. Szymanski and Balaram Sinharoy (Editors), Kluwer Academic Publishers, Boston, MA, pp. 111–120.
Chih-Po Wen. Parallel Timing Simulation on a Distributed Memory Multiprocessor. Master's thesis, University of California, Berkeley, CA, 1992.
Chih-Po Wen. Portable Library Support for Irregular Applications. PhD thesis, University of California, Berkeley, CA, 1995.
Chih-Po Wen and Katherine Yelick. Parallel Timing Simulation on a Distributed Memory Multiprocessor. In International Conference on CAD, Santa Clara, CA, November 1993.
Chih-Po Wen and Katherine Yelick. Portable Runtime Support for Asynchronous Simulation. In International Conference on Parallel Processing, Oconomowoc, Wisconsin, August 1995.
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1996 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Yelick, K., Wen, CP., Chakrabarti, S., Deprit, E., Jones, J., Krishnamurthy, A. (1996). Portable parallel irregular applications. In: Ito, T., Halstead, R.H., Queinnec, C. (eds) Parallel Symbolic Languages and Systems. PSLS 1995. Lecture Notes in Computer Science, vol 1068. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0023060
Download citation
DOI: https://doi.org/10.1007/BFb0023060
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-61143-1
Online ISBN: 978-3-540-68332-2
eBook Packages: Springer Book Archive