Skip to main content
Log in

Performance portability on EARTH: a case study across several parallel architectures

  • Published:
Cluster Computing Aims and scope Submit manuscript

Abstract

Due to the increase of the diversity of parallel architectures, and the increasing development time for parallel applications, performance portability has become one of the major considerations when designing the next generation of parallel program execution models, APIs, and runtime system software. This paper analyzes both code portability and performance portability of parallel programs for fine-grained multi-threaded execution and architecture models. We concentrate on one particular event-driven fine-grained multi-threaded execution model—EARTH, and discuss several design considerations of the EARTH model and runtime system that contribute to the performance portability of parallel applications. We believe that these are important issues for future high end computing system software design. Four representative benchmarks were conducted on several different parallel architectures, including two clusters listed in the 23rd supercomputer TOP500 list. The results demonstrate that EARTH based programs can achieve robust performance portability across the selected hardware platforms without any code modification or tuning.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. The 23rd TOP500 Supercomputer list for June 2004: http://www.top500.org/list/2004/06

  2. Sun, Y., Bader, D.: Broadcast on clusters of SMPs with optimal concurrency. AHPCC Technical Report 2000-013, June 2000

  3. Reussner, R., Hunzelmann, G.: Achieving performance portability with SKaMPI for high-performance MPI programs. In: ICCS ’01: Proceedings of the International Conference on Computational Science—Part II, pp. 841–850. Springer-Verlag, London (2001)

  4. Foster, I.: Designing and Building Parallel Programs: Concepts and Tools for Parallel Software Engineering. Addison-Wesley, Reading (1995)

    MATH  Google Scholar 

  5. Borkar, S.Y., Mulder, H., Dubey, P., Pawlowski, S.S., Kahn, K.C., Rattner, J.R., Kuck, D.J.: Platform 2015: Intel processor and platform evolution for the next decade. ftp://download.intel.com/technology/computing/archinnov/platform2015/, 2005

  6. The CELL project at IBM Research: http://www.research.ibm.com/cell/

  7. Goodarce, J.: Challenges in programming the multiprocessor platforms. In: 5th International Forum on Application-Specific Multi-Processor SoC, Saint-Maximin la Sainte Baume, France, July 2004

  8. Dennis, J.B., Misunas, D.: A preliminary architecture for a basic data flow processor. In: Proceedings of the 2nd Annual International Symposium on Computer Architecture, 1974, pp. 126–132

  9. Arvind, Gostelow, K.P.: The U-interpreter. IEEE Comput. 15(2), 42–49 (1982)

    Google Scholar 

  10. Davis, A.L., Keller, R.M.: Data flow progarm graphs. Comput. 15(2), 26–41 (1982)

    Google Scholar 

  11. Lee, B., Hurson, A.: Dataflow architectures and multithreading. IEEE Comput. 27(8), 27–39 (1994)

    Google Scholar 

  12. Najjar, W.A., Lee, E.A., Gao, G.R.: Advances in the dataflow computational model. Parallel Comput. 25(13–14), 1907–1929 (1999)

    Article  Google Scholar 

  13. Theobald, K.B.: EARTH: An efficient architecture for running threads. Ph.D. dissertation, May 1999

  14. Hum, H.H.J., Maquelin, O., Theobald, K.B., Tian, X., Tang, X., Gao, G.R.: A design study of the EARTH multiprocessor. In: Proceedings of the Conference on Parallel Architectures and Compilation Techniques (PACT), 1995, pp. 59–68

  15. Theobald, K.B., Agrawal, G., Kumar, R., Heber, G., Gao, G.R., Stodghill, P., Pingali, K.: Landing CG on EARTH: A case study of fine-grained multithreading on an evolutionary path. In: Proceedings of Supercomputing’2000, Nov. 2000

  16. del Cuvillo, J., Tian, X., Gao, G.R., Girkar, M.: Performance study of a whole genome comparison tool on a hyper-threading multiprocessor. In: Fifth International Symposium on High Performance Computing, Tokyo, Japan, Oct. 2003

  17. Zhu, W., Niu, Y., Lu, J., Shen, C., Gao, G.R.: A cluster-based solution for high performance hmmpfam using earth execution model. In: Proceedings of IEEE 5th International Conference on Cluster Computing (CLUSTER’03), Hong Kong, P.R. China, Dec. 2003, pp. 30–37

  18. Chen, F., Theobald, K.B., Gao, G.R.: Implementing parallel conjugate gradient on the EARTH multithreaded architecture. In: Proceedings of IEEE 6th International Conference on Cluster Computing (CLUSTER’04), San Diego, California, 20–23 Sept. 2004

  19. Tremblay, G., Theobald, K.B., Morrone, C.J., Butala, M.D., Amaral, J.N., Gao, G.R.: Threaded-C language reference manual (release 2.0). CAPSL Technical Memo 39 (2000)

  20. Shen, C.: A portable runtime system and its derivation for the hardware SU implementation. Master’s thesis, Univ. of Delaware, Newark, DE, December 2003

  21. Kakulavarapu, P., Maquelin, O., Gao, G.R.: Design of the runtime system for the portable Threaded-C language. CAPSL Technical Memo 24 (1998)

  22. Morrone, C.J.: An EARTH runtime system for multi-processor/multi-node Beowulf clusters. Master’s thesis, Univ. of Delaware, Newark, DE, May 2001

  23. Hum, H.H.J.: The super-actor machine: A hybrid dataflow/von neuman architecture. Ph.D. dissertation, McGill University, Montreal, Canada, May 1992

  24. The Argonne scalable cluster. http://www-unix.mcs.anl.gov/chiba/

  25. The Argonne JAZZ cluster, laboratory computing resource center (lcrc). http://www.lcrc.anl.gov/jazz/

  26. Bailey, D., Harris, T., Saphir, W., van der Wijngaart, R., Woo, A., Yarrow, M.: The NAS parallel benchmarks 2.0. (1995)

  27. HMMER: sequence analysis using profile hidden Markov models. http://hmmer.wustl.edu/

  28. Gao, G., Yates, R.: The argument-fetching dataflow architecture project: A status report. In: Can. Conf. on Elec. and Comp. Eng., Montreal, Sept. 1989

  29. Sodan, A., Gao, G.R., Maquelin, O., Schultz, J.-U., Tian, X.-M.: Experiences with non-numeric applications on multithreaded architectures. In: Proceedings of the Sixth ACM SIGPLAN Symposium on Principles & Practice of Parallel Programming (PPOPP97), 1997, pp. 124–135

  30. Thulasiraman, P., Theobald, K.B., Khokhar, A.A., Gao, G.R.: Multithreaded algorithms for the fast Fourier transform. In: SPAA ’00: Proceedings of the twelfth annual ACM symposium on Parallel algorithms and architectures, pp. 176–185. ACM, New York (2000)

    Chapter  Google Scholar 

  31. Thulasiram, R.K., Litov, L., Nojumi, H., Downing, C.T., Gao, G.R.: Multithreaded algorithms for pricing a class of complex options. In: IPDPS ’01: Proceedings of the 15th International Parallel & Distributed Processing Symposium, p. 18. IEEE Computer Society, Washington, USA (2001)

    Google Scholar 

  32. Theobald, K.B., Kumar, R., Agrawal, G., Heber, G., Thulasiram, R.K., Gao, G.R.: Implementation and evaluation of a communication intensive application on the EARTH multithreaded system. Concurr. Comput. Pract. Experience 14(3), 183–201 (2002)

    Article  MATH  Google Scholar 

  33. Thulasiraman, P., Khokhar, A.A., Heber, G., Gao, G.R.: A fine-grain load-adaptive algorithm of the 2D discrete wavelet transform for multithreaded architectures. J. Parallel Distrib. Comput. 64(1), 68–78 (2004)

    Article  MATH  Google Scholar 

  34. Gropp, W., Lusk, E., Skjellum, A.: Using MPI: Portable Parallel Programming with the Message-Passing Interface. MIT Press, Cambridge, USA (1994)

    Google Scholar 

  35. Pacheco, P.: Parallel Programming with MPI. Morgan Kaufmann, San Francisco (1997)

    MATH  Google Scholar 

  36. Gropp, W.D., Lusk, E.: User’s Guide for MPICH, a Portable Implementation of MPI. Mathematics and Computer Science Division, Argonne National Laboratory, aNL-96/6 (1996)

  37. Tang, H., Yang, T.: Optimizing threaded MPI execution on SMP clusters. In: Proceedings of the 15th ACM International Conference on Supercomputing (ICS-01), pp. 381–392. ACM, New York (2001)

    Chapter  Google Scholar 

  38. Sistare, S., van de Vaart, R., Loh, E.: Optimization of MPI collectives on clusters of large-scale SMPs. In: Proceedings of Supercomputing 1999 (SC99). ACM and IEEE Computer Society Press, New York (1999)

  39. Takahashi, T., O’Carroll, F., Tezuka, H., Hori, A., Sumimoto, S., Harada, H., Ishikawa, Y., Beckman, P.H.: Implementation and evaluation of MPI on an SMP cluster. In: Proceedings of the 11th IPPS/SPDP’99 Workshops Held in Conjunction with the 13th International Parallel Processing Symposium and 10th Symposium on Parallel and Distributed Processing, pp. 1178–1192. Springer-Verlag, London (1999)

  40. TOMPI, a threads-only MPI implementation. http://theory.lcs.mit.edu/~edemaine/TOMPI/

  41. Geist, A., Beguelin, A., Dongarra, J., Jiang, W., Manchek, R., Sunderam, V., PVM: Parallel Virtual Machine—A Users’ Guide and Tutorial for Networked Parallel Computing. MIT Press, Cambridge (1994)

  42. Santos, C., Aude, J.: PM-PVM: A portable multithreaded PVM. In: Proceedings of 13th International Parallel Processing Symposium and 10th Symposium on Parallel and Distributed Processing, San Juan, Puerto Rico, 12–16 April, 1999

  43. Zhou, H., Geist, A.: LPVM: a step towards multithread PVM. Concurr. Pract. Experience 10(5), 407–416 (1998)

    Article  Google Scholar 

  44. Ferrari, A., Sunderam, V.: Multiparadigm Distributed Computing with TPVM. Concurr. Pract. Experience 10(3), 199–228 (1998)

    Article  MATH  Google Scholar 

  45. Chandra, R., Menon, R., Dagum, L., Kohr, D., Maydan, D., McDonald, J.: Parallel Programming in OpenMP. Morgan Kaufmann, San Mateo (2000)

  46. Lu, H., Hu, Y.C., Zwaenepoel, W.: OpenMP on network of workstations. In: Proceedings of Supercomputing’98, Oct. 1998

  47. Kee, Y.-S., Kim, J.-S., Ha, S.: ParADE: An OpenMP programming environment for SMP cluster systems. In: Proceedings of Supercomputing 2003 (SC2003). ACM, Phoenix (2003)

    Google Scholar 

  48. Ojima, Y., Sato, M., Harada, H., Ishikawa, Y.: Performance of cluster-enabled OpenMP for the SCASH software distributed shared memory system. In: Proceedings of the 3rd IEEE/ACM Int’l Symp. on Cluster Computing and the Grid (CCGrid’03), May 2003, pp. 450–456

  49. Butenhof, D.R.: Programming with POSIX(R) Threads. Addison-Wesley, Reading (1997)

  50. Löf, H., Radovic, Z., Hagersten, E.: THROOM—running POSIX multithreaded binaries on a cluster. Department of Information Technology, Uppsala University, Tech. Rep. 2003-026, Apr. 2003

  51. Jamieson, P., Bilas, A.: CableS: Thread control and memory system extensions for shared virtual memory clusters, In: Lecture Notes in Computer Science, vol. 2104 (2001)

  52. Smith, L., Bull, M.: Development of mixed mode MPI/openMP applications. Sci. Program. 9(2–3), 83–98 (2001)

    Google Scholar 

  53. Cappello, F., Etiemble, D.: MPI versus MPI+openMP on IBM SP for the NAS benchmarks. In: Proceedings of Supercomputing’2000. IEEE and ACM SIGARCH, Dallas (2000)

    Google Scholar 

  54. Jost, G., Jin, H., an Mey, D., Hatay, F.F.: Comparing the OpenMP, MPI, and hybrid programming paradigms on an SMP cluster. In: Proceedings of the Fifth European Workshop on OpenMP (EWOMP03), Aachen, Germany, September 2003

  55. Rebenseifner, R.: Hybrid parallel programming: Performance problems and chances. In: Proceedings of the 45th Cray User Group Conference, Ohio, 12–16 May 2003

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Weirong Zhu.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhu, W., Niu, Y. & Gao, G. Performance portability on EARTH: a case study across several parallel architectures. Cluster Comput 10, 115–126 (2007). https://doi.org/10.1007/s10586-007-0011-1

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10586-007-0011-1

Keywords

Navigation