Performance portability on EARTH: a case study across several parallel architectures

Zhu, Weirong; Niu, Yanwei; Gao, Guang R.

doi:10.1007/s10586-007-0011-1

Performance portability on EARTH: a case study across several parallel architectures

Published: 15 March 2007

Volume 10, pages 115–126, (2007)
Cite this article

Cluster Computing Aims and scope Submit manuscript

Weirong Zhu¹,
Yanwei Niu¹ &
Guang R. Gao¹

77 Accesses
6 Citations
Explore all metrics

Abstract

Due to the increase of the diversity of parallel architectures, and the increasing development time for parallel applications, performance portability has become one of the major considerations when designing the next generation of parallel program execution models, APIs, and runtime system software. This paper analyzes both code portability and performance portability of parallel programs for fine-grained multi-threaded execution and architecture models. We concentrate on one particular event-driven fine-grained multi-threaded execution model—EARTH, and discuss several design considerations of the EARTH model and runtime system that contribute to the performance portability of parallel applications. We believe that these are important issues for future high end computing system software design. Four representative benchmarks were conducted on several different parallel architectures, including two clusters listed in the 23rd supercomputer TOP500 list. The results demonstrate that EARTH based programs can achieve robust performance portability across the selected hardware platforms without any code modification or tuning.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

The 23rd TOP500 Supercomputer list for June 2004: http://www.top500.org/list/2004/06
Sun, Y., Bader, D.: Broadcast on clusters of SMPs with optimal concurrency. AHPCC Technical Report 2000-013, June 2000
Reussner, R., Hunzelmann, G.: Achieving performance portability with SKaMPI for high-performance MPI programs. In: ICCS ’01: Proceedings of the International Conference on Computational Science—Part II, pp. 841–850. Springer-Verlag, London (2001)
Foster, I.: Designing and Building Parallel Programs: Concepts and Tools for Parallel Software Engineering. Addison-Wesley, Reading (1995)
MATH Google Scholar
Borkar, S.Y., Mulder, H., Dubey, P., Pawlowski, S.S., Kahn, K.C., Rattner, J.R., Kuck, D.J.: Platform 2015: Intel processor and platform evolution for the next decade. ftp://download.intel.com/technology/computing/archinnov/platform2015/, 2005
The CELL project at IBM Research: http://www.research.ibm.com/cell/
Goodarce, J.: Challenges in programming the multiprocessor platforms. In: 5th International Forum on Application-Specific Multi-Processor SoC, Saint-Maximin la Sainte Baume, France, July 2004
Dennis, J.B., Misunas, D.: A preliminary architecture for a basic data flow processor. In: Proceedings of the 2nd Annual International Symposium on Computer Architecture, 1974, pp. 126–132
Arvind, Gostelow, K.P.: The U-interpreter. IEEE Comput. 15(2), 42–49 (1982)
Google Scholar
Davis, A.L., Keller, R.M.: Data flow progarm graphs. Comput. 15(2), 26–41 (1982)
Google Scholar
Lee, B., Hurson, A.: Dataflow architectures and multithreading. IEEE Comput. 27(8), 27–39 (1994)
Google Scholar
Najjar, W.A., Lee, E.A., Gao, G.R.: Advances in the dataflow computational model. Parallel Comput. 25(13–14), 1907–1929 (1999)
Article Google Scholar
Theobald, K.B.: EARTH: An efficient architecture for running threads. Ph.D. dissertation, May 1999
Hum, H.H.J., Maquelin, O., Theobald, K.B., Tian, X., Tang, X., Gao, G.R.: A design study of the EARTH multiprocessor. In: Proceedings of the Conference on Parallel Architectures and Compilation Techniques (PACT), 1995, pp. 59–68
Theobald, K.B., Agrawal, G., Kumar, R., Heber, G., Gao, G.R., Stodghill, P., Pingali, K.: Landing CG on EARTH: A case study of fine-grained multithreading on an evolutionary path. In: Proceedings of Supercomputing’2000, Nov. 2000
del Cuvillo, J., Tian, X., Gao, G.R., Girkar, M.: Performance study of a whole genome comparison tool on a hyper-threading multiprocessor. In: Fifth International Symposium on High Performance Computing, Tokyo, Japan, Oct. 2003
Zhu, W., Niu, Y., Lu, J., Shen, C., Gao, G.R.: A cluster-based solution for high performance hmmpfam using earth execution model. In: Proceedings of IEEE 5th International Conference on Cluster Computing (CLUSTER’03), Hong Kong, P.R. China, Dec. 2003, pp. 30–37
Chen, F., Theobald, K.B., Gao, G.R.: Implementing parallel conjugate gradient on the EARTH multithreaded architecture. In: Proceedings of IEEE 6th International Conference on Cluster Computing (CLUSTER’04), San Diego, California, 20–23 Sept. 2004
Tremblay, G., Theobald, K.B., Morrone, C.J., Butala, M.D., Amaral, J.N., Gao, G.R.: Threaded-C language reference manual (release 2.0). CAPSL Technical Memo 39 (2000)
Shen, C.: A portable runtime system and its derivation for the hardware SU implementation. Master’s thesis, Univ. of Delaware, Newark, DE, December 2003
Kakulavarapu, P., Maquelin, O., Gao, G.R.: Design of the runtime system for the portable Threaded-C language. CAPSL Technical Memo 24 (1998)
Morrone, C.J.: An EARTH runtime system for multi-processor/multi-node Beowulf clusters. Master’s thesis, Univ. of Delaware, Newark, DE, May 2001
Hum, H.H.J.: The super-actor machine: A hybrid dataflow/von neuman architecture. Ph.D. dissertation, McGill University, Montreal, Canada, May 1992
The Argonne scalable cluster. http://www-unix.mcs.anl.gov/chiba/
The Argonne JAZZ cluster, laboratory computing resource center (lcrc). http://www.lcrc.anl.gov/jazz/
Bailey, D., Harris, T., Saphir, W., van der Wijngaart, R., Woo, A., Yarrow, M.: The NAS parallel benchmarks 2.0. (1995)
HMMER: sequence analysis using profile hidden Markov models. http://hmmer.wustl.edu/
Gao, G., Yates, R.: The argument-fetching dataflow architecture project: A status report. In: Can. Conf. on Elec. and Comp. Eng., Montreal, Sept. 1989
Sodan, A., Gao, G.R., Maquelin, O., Schultz, J.-U., Tian, X.-M.: Experiences with non-numeric applications on multithreaded architectures. In: Proceedings of the Sixth ACM SIGPLAN Symposium on Principles & Practice of Parallel Programming (PPOPP97), 1997, pp. 124–135
Thulasiraman, P., Theobald, K.B., Khokhar, A.A., Gao, G.R.: Multithreaded algorithms for the fast Fourier transform. In: SPAA ’00: Proceedings of the twelfth annual ACM symposium on Parallel algorithms and architectures, pp. 176–185. ACM, New York (2000)
Chapter Google Scholar
Thulasiram, R.K., Litov, L., Nojumi, H., Downing, C.T., Gao, G.R.: Multithreaded algorithms for pricing a class of complex options. In: IPDPS ’01: Proceedings of the 15th International Parallel & Distributed Processing Symposium, p. 18. IEEE Computer Society, Washington, USA (2001)
Google Scholar
Theobald, K.B., Kumar, R., Agrawal, G., Heber, G., Thulasiram, R.K., Gao, G.R.: Implementation and evaluation of a communication intensive application on the EARTH multithreaded system. Concurr. Comput. Pract. Experience 14(3), 183–201 (2002)
Article MATH Google Scholar
Thulasiraman, P., Khokhar, A.A., Heber, G., Gao, G.R.: A fine-grain load-adaptive algorithm of the 2D discrete wavelet transform for multithreaded architectures. J. Parallel Distrib. Comput. 64(1), 68–78 (2004)
Article MATH Google Scholar
Gropp, W., Lusk, E., Skjellum, A.: Using MPI: Portable Parallel Programming with the Message-Passing Interface. MIT Press, Cambridge, USA (1994)
Google Scholar
Pacheco, P.: Parallel Programming with MPI. Morgan Kaufmann, San Francisco (1997)
MATH Google Scholar
Gropp, W.D., Lusk, E.: User’s Guide for MPICH, a Portable Implementation of MPI. Mathematics and Computer Science Division, Argonne National Laboratory, aNL-96/6 (1996)
Tang, H., Yang, T.: Optimizing threaded MPI execution on SMP clusters. In: Proceedings of the 15th ACM International Conference on Supercomputing (ICS-01), pp. 381–392. ACM, New York (2001)
Chapter Google Scholar
Sistare, S., van de Vaart, R., Loh, E.: Optimization of MPI collectives on clusters of large-scale SMPs. In: Proceedings of Supercomputing 1999 (SC99). ACM and IEEE Computer Society Press, New York (1999)
Takahashi, T., O’Carroll, F., Tezuka, H., Hori, A., Sumimoto, S., Harada, H., Ishikawa, Y., Beckman, P.H.: Implementation and evaluation of MPI on an SMP cluster. In: Proceedings of the 11th IPPS/SPDP’99 Workshops Held in Conjunction with the 13th International Parallel Processing Symposium and 10th Symposium on Parallel and Distributed Processing, pp. 1178–1192. Springer-Verlag, London (1999)
TOMPI, a threads-only MPI implementation. http://theory.lcs.mit.edu/~edemaine/TOMPI/
Geist, A., Beguelin, A., Dongarra, J., Jiang, W., Manchek, R., Sunderam, V., PVM: Parallel Virtual Machine—A Users’ Guide and Tutorial for Networked Parallel Computing. MIT Press, Cambridge (1994)
Santos, C., Aude, J.: PM-PVM: A portable multithreaded PVM. In: Proceedings of 13th International Parallel Processing Symposium and 10th Symposium on Parallel and Distributed Processing, San Juan, Puerto Rico, 12–16 April, 1999
Zhou, H., Geist, A.: LPVM: a step towards multithread PVM. Concurr. Pract. Experience 10(5), 407–416 (1998)
Article Google Scholar
Ferrari, A., Sunderam, V.: Multiparadigm Distributed Computing with TPVM. Concurr. Pract. Experience 10(3), 199–228 (1998)
Article MATH Google Scholar
Chandra, R., Menon, R., Dagum, L., Kohr, D., Maydan, D., McDonald, J.: Parallel Programming in OpenMP. Morgan Kaufmann, San Mateo (2000)
Lu, H., Hu, Y.C., Zwaenepoel, W.: OpenMP on network of workstations. In: Proceedings of Supercomputing’98, Oct. 1998
Kee, Y.-S., Kim, J.-S., Ha, S.: ParADE: An OpenMP programming environment for SMP cluster systems. In: Proceedings of Supercomputing 2003 (SC2003). ACM, Phoenix (2003)
Google Scholar
Ojima, Y., Sato, M., Harada, H., Ishikawa, Y.: Performance of cluster-enabled OpenMP for the SCASH software distributed shared memory system. In: Proceedings of the 3rd IEEE/ACM Int’l Symp. on Cluster Computing and the Grid (CCGrid’03), May 2003, pp. 450–456
Butenhof, D.R.: Programming with POSIX(R) Threads. Addison-Wesley, Reading (1997)
Löf, H., Radovic, Z., Hagersten, E.: THROOM—running POSIX multithreaded binaries on a cluster. Department of Information Technology, Uppsala University, Tech. Rep. 2003-026, Apr. 2003
Jamieson, P., Bilas, A.: CableS: Thread control and memory system extensions for shared virtual memory clusters, In: Lecture Notes in Computer Science, vol. 2104 (2001)
Smith, L., Bull, M.: Development of mixed mode MPI/openMP applications. Sci. Program. 9(2–3), 83–98 (2001)
Google Scholar
Cappello, F., Etiemble, D.: MPI versus MPI+openMP on IBM SP for the NAS benchmarks. In: Proceedings of Supercomputing’2000. IEEE and ACM SIGARCH, Dallas (2000)
Google Scholar
Jost, G., Jin, H., an Mey, D., Hatay, F.F.: Comparing the OpenMP, MPI, and hybrid programming paradigms on an SMP cluster. In: Proceedings of the Fifth European Workshop on OpenMP (EWOMP03), Aachen, Germany, September 2003
Rebenseifner, R.: Hybrid parallel programming: Performance problems and chances. In: Proceedings of the 45th Cray User Group Conference, Ohio, 12–16 May 2003

Download references

Author information

Authors and Affiliations

Department of Electrical & Computer Engineering, University of Delaware, Newark, Delaware, 19716, USA
Weirong Zhu, Yanwei Niu & Guang R. Gao

Authors

Weirong Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Yanwei Niu
View author publications
You can also search for this author in PubMed Google Scholar
Guang R. Gao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Weirong Zhu.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhu, W., Niu, Y. & Gao, G. Performance portability on EARTH: a case study across several parallel architectures. Cluster Comput 10, 115–126 (2007). https://doi.org/10.1007/s10586-007-0011-1

Download citation

Published: 15 March 2007
Issue Date: June 2007
DOI: https://doi.org/10.1007/s10586-007-0011-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Performance portability on EARTH: a case study across several parallel architectures

Abstract

Access this article

Similar content being viewed by others

Programming Support for Future Parallel Architectures

Parallel Programming Models

OpenMP as a High-Level Specification Language for Parallelism

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Performance portability on EARTH: a case study across several parallel architectures

Abstract

Access this article

Similar content being viewed by others

Programming Support for Future Parallel Architectures

Parallel Programming Models

OpenMP as a High-Level Specification Language for Parallelism

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation