Abstract
Based on the framework of BSP, a Hierarchical Bulk Synchronous Parallel (HBSP) performance model is introduced in this paper to capture the performance optimization problem for various stages in parallel program development and to accurately predict the performance of a parallel program by considering factors causing variance at local computation and global communication. The related methodology has been applied to several real applications and the results show that HBSP is a suitable model for optimizing parallel programs.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
McColl W F. Scalable parallel computing: A grand unified theory and its practical development. InProc. 13th IFIP World Computer Congress, 1994, 1.
Hill M D. The theory, practice and a tool for BSP performance prediction. InEuro Par’96, LNCS 1124, Springer-Verleg, 1996, pp.697–705.
Hill M D, Skillicorn D B. Communication performance optimisation requires minimising variance. InHPCN’98, LNCS, Springer-Verlag, 1998.
IBM Corp. Optimization and tuning guide for the XL FORTRAN and XL C compilers.IBM SC09-1545-00, 1992.
Abandah G A, Davidson E S. Modeling the communication performance of the IBM SP2. In10th International Parallel Processing Symposium, 1996.
Hill M D, McColl B, Stefanescu D C. BSPlib — The BSP programming library. Technical Report, PRG-TR-29-9, Oxford University, 1997.
Bailey D H, Harris T, Saphir W. The NAS parallel benchmarks 2.0. Report NAS-95-020, 1995.
Hyaric A L. Converting the NAS benchmarks from MPI to BSP. Computing Laboratory, Oxford University, 1996.
Goedeecker S. Fast radix 2, 3, 4, and 5 kernel for fast Fourier transformations on computing with overlapping multiply-add instructions.SIAM J. Sci. Comput, 1997, 18(6): 1605–1611.
Pfrommer B, Tokuyasu T. Slow Fourier transformations on fast microprocessors. Technical Report, University of Berkeley, http://http.cs.berkeley.edu/~tokuyasu/, 1996.
Agarwal R, Gustavson F, Zubair M. Exploiting functional parallelism of POWER2 to design high-performance numerical algorithms.IBM J. Res. Develop., 1994, 38(5).
Hockney R W, Jesshope C R. Parallel Computers — Architecture, Programming and Algorithms. IOP Publishing Ltd., 1988.
Bailey D H. Unfavorable strides in cache memory systems. RNR Technical Report RNR-92-015, 1992.
Saini S, Bailey D H. NAS parallel benchmark results 12–95. Report NAS-95-021, 1995.
Author information
Authors and Affiliations
Additional information
Partially supported by the National Natural Science Foundation of China.
HUANG Linpeng was born in 1964. Currently he is an Associate Professor in Shanghai Jiao Tong University. He received his Ph.D. degree from Shanghai Jiao Tong University in 1992. His research interests include artificial intelligence, parallel computation model and the theory of programming languages.
SUN Yongqiang is a Professor at Shanghai Jiao Tong University. His research interests include functional programming languages, rewriting system and parallel programming model.
YUAN Wei was born in 1971. He received his Ph.D. degree from Shanghai Jiao Tong University in 1995. His research areas include parallel computation model, parallel programming environment and program performance optimization.
Rights and permissions
About this article
Cite this article
Huang, L., Sun, Y. & Yuan, W. Hierarchical bulk synchronous parallel model and performance optimization. J. of Comput. Sci. & Technol. 14, 224–233 (1999). https://doi.org/10.1007/BF02948510
Received:
Revised:
Issue Date:
DOI: https://doi.org/10.1007/BF02948510