Performance-based parallel application toolkit for high-performance clusters

Li, Kuan-Ching; Weng, Tien-Hsiung

doi:10.1007/s11227-008-0204-2

Performance-based parallel application toolkit for high-performance clusters

Published: 09 April 2008

Volume 48, pages 43–65, (2009)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

Kuan-Ching Li¹ &
Tien-Hsiung Weng¹

87 Accesses
Explore all metrics

Abstract

Advances in computer technology, encompassed with fast emerging of multicore processor technology, have made the many-core personal computers available and more affordable. The availability of network of workstations and cluster of many-core SMPs have made them an attractive solution for high performance computing by providing computational power equal or superior to supercomputers or mainframes at an affordable cost using commodity components. In order to search alternative ways to extract unused and idle computing power from these computing resources targeting to improve overall performance, as well as to fully utilize the underlying new hardware platforms, these are major topics in this field of research. In this research paper, the design rationale and implementation of an effective toolkit for performance measurement and analysis of parallel applications in cluster environments is introduced; not only generating parallel applications’ timing graph representation, but also to provide application execution’s performance data charts. The goal in developing this toolkit is to permit application developers have a better understanding of the application’s behavior among selected computing nodes purposed for that particular execution. Additionally, multiple execution results of a given application under development can be combined and overlapped, permitting application developers to perform “what-if” analysis, i.e., to deeper understand the utilization of allocated computational resources. Experimentations using this toolkit have shown its effectiveness on the development and performance tuning of parallel applications, extending the use in teaching of message passing, and shared memory model parallel programming courses.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Scalability and efficiency challenges for the exascale supercomputing system: practice of a parallel supporting environment on the Sunway exascale prototype system

Article 23 January 2023

Multithreaded runtime framework for parallel and adaptive applications

Article 31 July 2022

An Easy-to-Use Execution Environment for the Parallelisation of Computationally Intensive Data Science Applications

References

ANL—Argonne National Laboratory. MPICH: a portable implementation of MPI. Available via http://www-unix.mcs.anl.gov/mpi/mpich1/
Anderson TE, Culler DE, Patterson DA (1995) A case for NOW (Network of Workstations). IEEE Micro 15(1):54–64
Article Google Scholar
Anik S, Hwu W-W (1992) Executing nested parallel loops on shared-memory multiprocessors. In: Proceedings of the 21st annual international conference on parallel processing (ICPP’92), USA
Beguelin A, Dongarra J (1991) Solving computational Grand Challenges using a network of heterogeneous supercomputers. In: Proc of 5th SIAM conference on parallel processing
CACTI Tool webpage. Available via http://www.cacti.net/
Cain HW, Miller BP, Wylie BJ (2000) A Callgraph-based search strategy for automated performance diagnosis. In: Proc of Euro-Par 2000, Munich, Germany
Chan F, Cao J, Chan ATS, Zhang K (2005) Visual programming support for graph-oriented parallel/distributed processing. Softw Pract Exp 35:1409–1439
Article Google Scholar
DIMEMAS Tool webpage. Available via http://www.cepba.upc.es/dimemas/
El-Rewini H, Lewis TG, Ali HH (1994) Task scheduling in parallel and distributed systems. Prentice Hall, New York
Google Scholar
Ganglia Cluster Toolkit. Available via http://ganglia.sourceforge.net/
Geist G, Beguelin A, Dongarra J, Jiang W, Manchek R, Sunderam V (1994) PVM: Parallel Virtual Machine—a user’s guide and tutorial for networked parallel computing. MIT Press, Cambridge
Google Scholar
Gropp W, Lusk E, Skjellum A (1994) Using MPI: portable parallel programming with the Message Passing Interface. MIT Press, Cambridge
Google Scholar
Gropp W, Lusk E, Doss N, Skjellum A (1996) A high-performance, portable implementation of the MPI message passing interface standard. Available in Argonne National Laboratory’s site at http://www.mcs.anl.gov/mpi/mpicharticle/paper.html
Hoeflinger J et al. (2001) An integrated performance visualizer for MPI/OpenMP programs. In: Proceedings of WOMPAT’2001 international workshop on OpenMP applications and tools. Lecture notes in computer science, vol 2104. Springer, Berlin, pp 40–52
Google Scholar
Karypis G, Kumar V (1998) Analysis of multilevel graphs partitioning. Technical Report 98-037, University of Minnesota
Kwok YK, Ahmad I (1999) Benchmarking and comparison of the task graph scheduling algorithms. J Parallel Distrib Comput 59:381–422
Article MATH Google Scholar
Li K-C, Gaudiot J-L, Sato LM (2002) Performance prediction methodology for parallel programs with MPI in NOW environments. In: Das SK, Bhattacharya S (eds) IWDC’2002 international workshop on distributed computing, Kolkata, India. Lecture notes in computer science, vol 2571. Springer, Heidelberg
Google Scholar
Li K-C, Chang H-C, Yang C-T, Chang L-J, Cheng H-Y, Lee L-T (2005) Implementation of visual MPI parallel program performance analysis tool for cluster environments. In: AINA’2005 The 19th IEEE international conference on advanced information networking and applications, Taiwan
Liu LT, Culler D, Yoshikawa C (1996) Benchmarking message passing performance using MPI. In: Proceedings of ICPP’1996 international conference on parallel processing. IEEE Comput. Soc., Los Alamitos, pp 101–110
Google Scholar
Lisper B (2003) Fully automatic, parametric worst-case execution time analysis. In: Gustafsson J (ed) Proceedings of the third international workshop on worst-case execution time (WCET) analysis, Porto, Portugal, pp 77–80
MRTG webpage (2007) Available via http://www.mrtg.org
Nagel LW (1975) SPICE2—A Computer program to simulate semiconductor circuits. Memo ERL-M520, University of California, Berkeley, ERL
NWS Information Service (2007) Available at http://nws.cs.ucsb.edu/ewiki/
OpenMP webpage (2007) Available via http://www.openmp.org
Pthreads tutorial webpage (2008) Available via https://computing.llnl.gov/?set=training&page=index
Puschner P, Schedl A (1997) Computing maximum task execution times—a graph-based approach. J Real-Time Syst 13(1):67–91
Article Google Scholar
Quarles TL (1989) Analysis of performance and convergence issues for circuit simulation. Memo ERL-M89, University of California, Berkeley, ERL
RRD tool webpage. Available via http://www.rrdtool.org
Smith L, Bull M (2000) Development of Mixed Mode MPI/OpenMP Applications. In: Proc of the workshop on OpenMP applications and tools (WOMPAT2000)
SUN Microsystems. JAVA2 Second Edition (J2SE). http://java.sun.com
VAMPIR tool. Pallas Products webpage. Available via http://www.vampir.eu
Xu Q, Subhlok J (2005) Automatic clustering of grid nodes. In: Proceedings of the 6th IEEE/ACM international workshop on grid computing, USA
Yang C-T, Cheng K-W, Li K-C (2004) An efficient parallel loop self-scheduling on grid computing environments. In: Jin H, Gao G, Xu Z, Chen H (eds) NPC’2004 IFIP international conference on network and parallel computing. Lecture notes in computer science, vol 3222. Springer, Berlin
Google Scholar

Download references

Author information

Authors and Affiliations

Dept. of Computer Science and Information Engineering, Providence University, Taichung, Taiwan
Kuan-Ching Li & Tien-Hsiung Weng

Authors

Kuan-Ching Li
View author publications
You can also search for this author inPubMed Google Scholar
Tien-Hsiung Weng
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Kuan-Ching Li.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Li, KC., Weng, TH. Performance-based parallel application toolkit for high-performance clusters. J Supercomput 48, 43–65 (2009). https://doi.org/10.1007/s11227-008-0204-2

Download citation

Received: 23 March 2008
Accepted: 25 March 2008
Published: 09 April 2008
Issue Date: April 2009
DOI: https://doi.org/10.1007/s11227-008-0204-2

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Performance-based parallel application toolkit for high-performance clusters

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Scalability and efficiency challenges for the exascale supercomputing system: practice of a parallel supporting environment on the Sunway exascale prototype system

Multithreaded runtime framework for parallel and adaptive applications

An Easy-to-Use Execution Environment for the Parallelisation of Computationally Intensive Data Science Applications

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now