Abstract
The Linux cluster considered in this paper, formed from shuttle box XPC nodes with 2 GHz Athlon processors connected by dual Gb Ethernet switches, is relatively easily constructed, but, while effective as a throughput engine, may result in disappointing results when running explicitly parallel software if weakly-performing communication mechanisms and process spawning are selected. This paper carefully compares the implementations of communication and spawning primitives in MPICH-2, openMosix, and Linux Remote Procedure Call, forking, and various lower-level communication mechanisms. The test selection compares the provision of both a message-passing library, and a single system image software package, with direct use of lower-level primitives. The information in the paper will be of interest to those considering the use of one of the well-known packages, or directly writing their own distributed applications, or constructing a distributed language by layering on top of an existing set of parallel primitives. The results expose a ranking in terms of process spawning and a similar ranking of communication software performance. They reveal poor performance in certain circumstances, well below the hardware specification, which it is as well that the developer is aware of. In general, the paper emphasizes the importance of efficient transport software to cluster machines.
Similar content being viewed by others
References
A. Geist, A. Beguelin, J. Dongarra, W. Jiang, R. Manchek, and V. Sunderam, PVM: Parallel Virtual Machine, A Users’ Guide and Tutorial for Networked Parallel Computing (MIT, Cambridge, MA, 1994).
C. Lin and L. Snyder, ZPL: An array sublanguage, in: 6th International Workshop on Languages and Compilers for Parallel Computing (1993) pp. 96–114.
J. H. Reppy, Concurrent Programming in ML (Cambridge University Press, Cambridge, UK, 1999).
D. J. Johnston, M. Fleury, and A. C. Downton, Prototyping application models in concurrent ML, in: Euro-Par 2003 Parallel Processing (Springer, Berlin, 2003), pp. 750–759.
W. Gropp, E. Lusk, and R. Thakur, Using MPI-2: Advanced Features of the Message-Passing Interface (MIT, Cambridge, MA, 1999).
T. Sterling, Beowulf Cluster Computing with Linux (MIT, Cambridge, MA, 2002).
G. A. Geist, J. A. Kohl, and P. M. Papadopoulos, PVM and MPI: a comparison of features, Calculateurs Paralleles 8(2) (1996) 137–150.
A. Barak, O. La’dan, and A. Shiloh, Scalable cluster computer with MOSIX on LINUX, in: Linux Expo’99 (1999) pp. 95–100.
A. Barak, S. Guday, and R Wheeler, The MOSIX Distributed Operating System, Load Balancing for UNIX (Springer-Verlag, Berlin, 1993).
M. A. Baker, G. C. Fox, and H. W. Yau, A review of commercial and research cluster management software packages. NHSE Review Electronic Journal, 1(1) (1996), at http://nhse.cs. rice.edu/NHSEreview/96-1.html.
A. Barak and O. La’adan, The MOSIX multicomputer operating system for high performance cluster computing, Journal of Future Generation Computer Systems 13(4–5) (1998) 361–372.
M. J. Rochkind, Advanced UNIX Programming, 2nd edition, (Addison-Wesley, Boston, 2004).
W. R. Stevens, UNIX Network Programming, Interprocess Communication, 2nd edition, (Prentice Hall, Upper Saddle River, NJ, 1999) Vol. 2.
R. W. Hockney, The Science of Computer Benchmarking (SIAM, Philadelphia, PA, 1996).
I. Pyarali, T. H. Harrison, and D. C. Schmidt, Design and performance of an object-oriented framework for high-speed electronic medical imaging, USENIX Computing Systems 9(3) (1996) 265–298.
D. C. Schmidt and T. Suda, Transport system architectures for high-performance communication systems. IEEE Journal on Selected Areas in Communication 11(4) (1993) 489–506.
T. Sterling, Node hardware, in: Beowulf Cluster Computing with Linux. (MIT, Cambridge, MA, 2002) pp. 31–60.
D. Ridge, D. Becker, P. Merkey, and T. Sterling, Beowulf: Harnessing the power of parallelism in a pile-of-PCs, IEEE Aerospace 2 (1997) 79–91.
R. Breyer and S. Riley, Switched, Fast, and Gigabit Ethernet (Macmillan, San Francisco, CA, 1999).
T. Sterling, Network hardware, in: Beowulf Cluster Computing with Linux (MIT, Cambridge, MA, 2002) pp. 113–130.
T. H. Dunigan Jr., J. S. Vetter, J. B. White III, and P. H. Worley, Performance evaluation of the Cray XI distributed shared-memory architecture. IEEE Micro 25(1) (2005) 30–40.
M. Bar, openMOSIX, an open source Linux cluster project, (2002), at http://www.openmosix.org/.
D. Ashton, W. Gropp, E. Lusk, R. Ross, and B. Ronen, MPICH2 design document. Technical report, Argonne National Laboratory (2003). Report # ANL/MCS-TM-00.
W. R. Stevens, UNIX Network Programming: Networking APIs: Sockets and XTI, 2nd edition (Prentice Hall, Upper Saddle River, NJ, 1998).
N. Nupairoj and L. Ni, Performance evaluation of some MPI implementations on workstation clusters, in: Scalable Parallel Libraries Conference (1994) pp. 98–105.
R. Chandra, L. Dagum, D. Kohr, D. Maydan, J. McDonald, and R. Menon, Parallel Programming in OpenMP (Morgan Kaufmann, San Francisco, CA, 2001).
J. Peacock, Gently down the STREAMS, UNIX Review 9 (1992) 33–38.
D. Ritchie, A stream input-output system, AT&T Bell Labs Technical Journal 63 (1984) 311–324.
W. R. Stevens, UNIX Network Programming, 2nd edition, Sockets and XTI (Prentice Hall, Upper Saddle River, NJ, 1999) Vol. 2.
M. Snir, S. W. Otto, S. Huss-Lederman, D. W. Walker, and J. Dongarra, MPI—The Complete Reference: The MPI Core. 2nd edition (MIT, Cambridge, MA, 1998) Vol. 1.
Author information
Authors and Affiliations
Corresponding author
Additional information
David J. Johnston has worked as software engineer in research and development for 20 years, at ICL Ltd. and the Rutherford-Appleton Laboratory, UK. His strengths lie in generating and realizing algorithms for complex systems. His interests include languages and methodologies to shorten the software development process. He has recently completed a Ph.D. at the University of Essex, UK in position identification for augmented reality. He has co-authored a book on Computer Graphics.
Martin Fleury is a Senior Lecturer at the University of Essex, UK, where he was also awarded a Ph.D. in Parallel Image Processing. His first degree was from Oxford University, and he holds an MSc in Astrophysics from the University of London. He is the principal author of a book on parallel computing for embedded systems. He has authored thirty-five journal papers in the last ten years on parallel image and vision processing, performance prediction, real-time systems, reconfigurable computing, software engineering, and video and document compression.
Michael Lincoln has completed an M.Sc. and Ph.D. at the University of Essex, UK in the field of face recognition and face tracking. His work as a Senior Research Officer is concerned with radar control of aircraft landings. The cluster mentioned in the paper was constructed, configured, and commissioned by Michael.
Andrew C. Downton was educated at Southampton University, UK, where he obtained a first class honours degree in Electronic Engineering in 1974, and a Ph.D. in 1982, and where he was also a lecturer. In 1995 he was promoted to a personal Chair at the University of Essex, UK, and in 1999 he became Head of the Department of Electronic Systems Engineering at Essex. His research interests include pattern recognition and image analysis; parallel computer architectures; hardware-software co-design; handwriting recognition; and document analysis. He is a Chartered Engineer and Fellow of the Institution of Electrical Engineers (IEE) and a Senior Member of the IEEE.
Rights and permissions
About this article
Cite this article
Johnston, D.J., Fleury, M., Lincoln, M. et al. Performance of parallel communication and spawning primitives on a Linux cluster. Cluster Comput 9, 375–384 (2006). https://doi.org/10.1007/s10586-006-0007-2
Received:
Revised:
Accepted:
Issue Date:
DOI: https://doi.org/10.1007/s10586-006-0007-2