Skip to main content
Log in

Performance-based parallel application toolkit for high-performance clusters

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

Advances in computer technology, encompassed with fast emerging of multicore processor technology, have made the many-core personal computers available and more affordable. The availability of network of workstations and cluster of many-core SMPs have made them an attractive solution for high performance computing by providing computational power equal or superior to supercomputers or mainframes at an affordable cost using commodity components. In order to search alternative ways to extract unused and idle computing power from these computing resources targeting to improve overall performance, as well as to fully utilize the underlying new hardware platforms, these are major topics in this field of research. In this research paper, the design rationale and implementation of an effective toolkit for performance measurement and analysis of parallel applications in cluster environments is introduced; not only generating parallel applications’ timing graph representation, but also to provide application execution’s performance data charts. The goal in developing this toolkit is to permit application developers have a better understanding of the application’s behavior among selected computing nodes purposed for that particular execution. Additionally, multiple execution results of a given application under development can be combined and overlapped, permitting application developers to perform “what-if” analysis, i.e., to deeper understand the utilization of allocated computational resources. Experimentations using this toolkit have shown its effectiveness on the development and performance tuning of parallel applications, extending the use in teaching of message passing, and shared memory model parallel programming courses.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. ANL—Argonne National Laboratory. MPICH: a portable implementation of MPI. Available via http://www-unix.mcs.anl.gov/mpi/mpich1/

  2. Anderson TE, Culler DE, Patterson DA (1995) A case for NOW (Network of Workstations). IEEE Micro 15(1):54–64

    Article  Google Scholar 

  3. Anik S, Hwu W-W (1992) Executing nested parallel loops on shared-memory multiprocessors. In: Proceedings of the 21st annual international conference on parallel processing (ICPP’92), USA

  4. Beguelin A, Dongarra J (1991) Solving computational Grand Challenges using a network of heterogeneous supercomputers. In: Proc of 5th SIAM conference on parallel processing

  5. CACTI Tool webpage. Available via http://www.cacti.net/

  6. Cain HW, Miller BP, Wylie BJ (2000) A Callgraph-based search strategy for automated performance diagnosis. In: Proc of Euro-Par 2000, Munich, Germany

  7. Chan F, Cao J, Chan ATS, Zhang K (2005) Visual programming support for graph-oriented parallel/distributed processing. Softw Pract Exp 35:1409–1439

    Article  Google Scholar 

  8. DIMEMAS Tool webpage. Available via http://www.cepba.upc.es/dimemas/

  9. El-Rewini H, Lewis TG, Ali HH (1994) Task scheduling in parallel and distributed systems. Prentice Hall, New York

    Google Scholar 

  10. Ganglia Cluster Toolkit. Available via http://ganglia.sourceforge.net/

  11. Geist G, Beguelin A, Dongarra J, Jiang W, Manchek R, Sunderam V (1994) PVM: Parallel Virtual Machine—a user’s guide and tutorial for networked parallel computing. MIT Press, Cambridge

    Google Scholar 

  12. Gropp W, Lusk E, Skjellum A (1994) Using MPI: portable parallel programming with the Message Passing Interface. MIT Press, Cambridge

    Google Scholar 

  13. Gropp W, Lusk E, Doss N, Skjellum A (1996) A high-performance, portable implementation of the MPI message passing interface standard. Available in Argonne National Laboratory’s site at http://www.mcs.anl.gov/mpi/mpicharticle/paper.html

  14. Hoeflinger J et al. (2001) An integrated performance visualizer for MPI/OpenMP programs. In: Proceedings of WOMPAT’2001 international workshop on OpenMP applications and tools. Lecture notes in computer science, vol 2104. Springer, Berlin, pp 40–52

    Google Scholar 

  15. Karypis G, Kumar V (1998) Analysis of multilevel graphs partitioning. Technical Report 98-037, University of Minnesota

  16. Kwok YK, Ahmad I (1999) Benchmarking and comparison of the task graph scheduling algorithms. J Parallel Distrib Comput 59:381–422

    Article  MATH  Google Scholar 

  17. Li K-C, Gaudiot J-L, Sato LM (2002) Performance prediction methodology for parallel programs with MPI in NOW environments. In: Das SK, Bhattacharya S (eds) IWDC’2002 international workshop on distributed computing, Kolkata, India. Lecture notes in computer science, vol 2571. Springer, Heidelberg

    Google Scholar 

  18. Li K-C, Chang H-C, Yang C-T, Chang L-J, Cheng H-Y, Lee L-T (2005) Implementation of visual MPI parallel program performance analysis tool for cluster environments. In: AINA’2005 The 19th IEEE international conference on advanced information networking and applications, Taiwan

  19. Liu LT, Culler D, Yoshikawa C (1996) Benchmarking message passing performance using MPI. In: Proceedings of ICPP’1996 international conference on parallel processing. IEEE Comput. Soc., Los Alamitos, pp 101–110

    Google Scholar 

  20. Lisper B (2003) Fully automatic, parametric worst-case execution time analysis. In: Gustafsson J (ed) Proceedings of the third international workshop on worst-case execution time (WCET) analysis, Porto, Portugal, pp 77–80

  21. MRTG webpage (2007) Available via http://www.mrtg.org

  22. Nagel LW (1975) SPICE2—A Computer program to simulate semiconductor circuits. Memo ERL-M520, University of California, Berkeley, ERL

  23. NWS Information Service (2007) Available at http://nws.cs.ucsb.edu/ewiki/

  24. OpenMP webpage (2007) Available via http://www.openmp.org

  25. Pthreads tutorial webpage (2008) Available via https://computing.llnl.gov/?set=training&page=index

  26. Puschner P, Schedl A (1997) Computing maximum task execution times—a graph-based approach. J Real-Time Syst 13(1):67–91

    Article  Google Scholar 

  27. Quarles TL (1989) Analysis of performance and convergence issues for circuit simulation. Memo ERL-M89, University of California, Berkeley, ERL

  28. RRD tool webpage. Available via http://www.rrdtool.org

  29. Smith L, Bull M (2000) Development of Mixed Mode MPI/OpenMP Applications. In: Proc of the workshop on OpenMP applications and tools (WOMPAT2000)

  30. SUN Microsystems. JAVA2 Second Edition (J2SE). http://java.sun.com

  31. VAMPIR tool. Pallas Products webpage. Available via http://www.vampir.eu

  32. Xu Q, Subhlok J (2005) Automatic clustering of grid nodes. In: Proceedings of the 6th IEEE/ACM international workshop on grid computing, USA

  33. Yang C-T, Cheng K-W, Li K-C (2004) An efficient parallel loop self-scheduling on grid computing environments. In: Jin H, Gao G, Xu Z, Chen H (eds) NPC’2004 IFIP international conference on network and parallel computing. Lecture notes in computer science, vol 3222. Springer, Berlin

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kuan-Ching Li.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Li, KC., Weng, TH. Performance-based parallel application toolkit for high-performance clusters. J Supercomput 48, 43–65 (2009). https://doi.org/10.1007/s11227-008-0204-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-008-0204-2

Keywords

Navigation