Abstract
A simulation technique for very large-scale data parallel programs is proposed. In our simulation method, a data parallel program is divided into computation and communication sections. When the control flow of the parallel program does not depend on the contents of network messages, the computation time on each processor is calculated independently. An instrumentation tool called EXCIT is used to calculate the execution time on the target architecture and generate message traces. The communication time is calculated on the message traces by using a network simulator, which is generated by a network simulator generating system INSPIRE. With our tool set, the behavior of parallel programs on thousands processors can be estimated within a practical time span. We demonstrate our method to analyze the class B problems of LU and MG programs of the NAS Parallel Benchmarks with various parameters such as cache size and network bandwidth examined. We found that communication overhead affects the total execution time considerably, while cache effect is small.
Chapter PDF
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
A. Ukawa, “Status of the CP-PACS Project,” International Symposium on Lattice Field Theory, 1994.
T.G. Mattson, D. Scott, and S.R. Wheat, “A TeraFLOP Supercomputer in 1996: the ASCI TFLOPS System,” Proceedings of the International Parallel Processing Symposium, 1996.
Maurice Yarrow and Rob Van der Wijngaart, “Communication Improvement for the LU NAS Parallel Benchmark: A Model for Efficient Parallel Relaxation Schemes,” NAS Technical Report NAS-97-032, 1997
Edward Rothberg, Jaswinder Pal Singh, and Anoop Guputa, “Working Sets, Cache Sizes, and Node Granularity Issues for Large-Scale Multiprocessors,” Proc. of ISCA’93, pp.14–25, 1993.
Taisuke Boku, Masahiro Mishima, Ken’ichi Itakura, Hiroshi Nakamura, and Kisaburo Nakazawa, “VIPPES: A performance preevaluation system for parallel processors,” HPCN Europe’96, 1996.
Kazuto Kubota, Ken’ichi Itakura, Mitsuhisa Sato, and Taisuke Boku, “Accurate performance analysis based on code instrumentation,” IPSJ SIG Report, 97ARC123-12, pp.67–72, 1997(In Japanese).
Taisuke Boku, Tomoaki Harada, Takashi Sone, Hiroshi Nakamura, and Kisaburo Nakazawa, “INSPIRE: A general purpose network simulator generating system for massively parallel processors,” Proc. of PERMEAN’95, pp.24–33, 1995.
David Bailey, Tim Harris, William Saphir, Rob van der Wijngaart, Alex Woo, and Maurice Yarrow, “The NAS Parallel Benchmarks 2.0,” NASA Ames Research Center Report, NAS-05-020, 1995.
http://www.spec.org
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1998 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Kubota, K., Itakura, K., Sato, M., Boku, T. (1998). Practical simulation of large-scale parallel programs and its performance analysis of the NAS Parallel Benchmarks. In: Pritchard, D., Reeve, J. (eds) Euro-Par’98 Parallel Processing. Euro-Par 1998. Lecture Notes in Computer Science, vol 1470. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0057859
Download citation
DOI: https://doi.org/10.1007/BFb0057859
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-64952-6
Online ISBN: 978-3-540-49920-6
eBook Packages: Springer Book Archive