Abstract
MPI has been extremely successful. In areas like e.g. particle physics most of the available parallel programs are based on MPI. Unfortunately, they must be run in dedicated clusters or parallel machines, being unable to use for long running applications the growing pool of idle time of general-purpose desktop computers. Additionally, MPI offers a quite low level interface, which is hard to use for most scientist programmers. In the research described in this paper, we tried to see how far we could go to solve those two problems, keeping the portability of MPI programs, but drawing upon one restriction – only programs following the FARM paradigm were to be supported. The developed library – MpiFL – did provide us significant insight. It is now being successfully used at the physics department of the University of Coimbra, despite some shortcomings.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Agbaria, A., Friedman, R.: Starfish: Fault-Tolerant Dynamic MPI Programs on Clusters ofWorkstations. In: 8th IEEE International Symposium on High Performance Distributed Computing, August 1999, pp. 167–176 (1999)
Batchu, R., Neelamegam, J.P., Cui, Z., Beddhu, M., Skjellum, A., Dandass, Y., Apte, M.: MPI/FT(tm): Architecture and Taxonomies for Fault-Tolerant, Message- Passing Middleware for Performance-Portable Parallel Computing. In: 3th International Workshop on Software Distributed Shared Memory (WSDSM 2001), Brisbane, Australia, May 16-18 (2001)
Chapple, S., Clarke, L.: PUL: The Parallel Utilities Library. In: Procedings of the IEEE Second Scalabe Parallel Libraries Conference, Mississipi, USA, vol. 4, IEEE Computer Society Press, Los Alamitos (1994) ISBN 0-8186-6895-4
Elnozahy, E.N., Johnson, D.B., Zwaenepoel, W.: The Performance of Consistent Checkpointing. In: Proc. 11th Symposium on Reliable Distributed Systems, pp. 39–47. IEEE Computer Society Press, Los Alamitos (1992)
Fag, E.G., Dongarra, J.J.: FT-MPI: Fault Tolerant MPI, supporting dynamic applications in a dynamic world. In: Dongarra, J., Kacsuk, P., Podhorszki, N. (eds.) PVM/MPI 2000. LNCS, vol. 1908, pp. 346–353. Springer, Heidelberg (2000)
Goux, J.-P., Kulkarni, S., Linderoth, J.T., Yoder, M.E.: Master-Worker: An Enabling Framework for Applications on the Computational Grid. Cluster Computing 4, 63–70 (2001)
Kuchen, H.: A Skeleton Library. In: Monien, B., Feldmann, R.L. (eds.) Euro-Par 2002. LNCS, vol. 2400, Springer, Heidelberg (2002)
Louca, S., Neophytou, N., Lachanas, A., Evripidou, P.: MPI-FT: Portable Fault Tolerance Scheme for Mpi. Parallel Processing Letters 10(4), 371–382 (2000)
MPI: A Message Passing Interface Standard. Version 1.1, Message Passing Interface Forum, June 12 (1995), http://www.mpi-forum.com
MPI-2: Extensions to the Message-Passing Interface Standard. Message Passing Interface Forum, July 18 (1997), http://www.mpi-forum.com
Silva, L.M., Veer, B., Silva, J.G.: How to Get a Fault Tolerant Farm. In: Grebe, R., Hektor, J., Hilton, S.C., Jane, M.R., Welch, P.H. (eds.) Transputer Applications and Systems 1993. 36 in the series Transputer and Occam Engineering, vol. 2, pp. 923–938. IOS Press, Amsterdam (1993)
Silva, L.M., Silva, J.G., Chapple, S., Clarke, L.: Portable Checkpointing and Recovery. In: 4th IEEE International Symposium on High Performance Distributed Computing (HPDC-4), Pentagon City, Virgínia, USA, August 2-4, pp. 188–195. IEEE Computer Society Press, Los Alamitos (1995) ISBN 0-8186-7088-6
Silva, L.M., Silva, J.G.: System-Level versus User-Defined Checkpointing. In: 17th IEEE Symposium on Reliable Distributed Systems, West Lafayette, USA, October 20-23, pp. 68–74. IEEE Computer Society, Los Alamitos (1998) ISBN 0-8186-9218-9
Silva, L.M., Buyya, R.: Parallel Programming Models and Paradigms. In: Buyya, R. (ed.) High Performance Cluster Computing: Archtectures and Systems, vol. 2, Prentice Hall PTR, NJ (1999)
Stellner, G.: CoCheck: Checkpointing and Process Migration for MPI. In: Procedings of the International Parallel Processing Symposium, Honolulu, HI, pp. 526–531. IEEE Computer Society Press, Los Alamitos (1996)
Tannenbaum, T., Wright, D., Miller, K., Livny, M.: Condor: A Distributed Jub Scheduler. In: Sterling, T. (ed.) Beowulf Cluster Computing with Linux, The MIT Press, Cambridge (2002) ISBN 0-262-69274-0
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Fonseca, N., Silva, J.G. (2003). MPI Farm Programs on Non-dedicated Clusters. In: Dongarra, J., Laforenza, D., Orlando, S. (eds) Recent Advances in Parallel Virtual Machine and Message Passing Interface. EuroPVM/MPI 2003. Lecture Notes in Computer Science, vol 2840. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-39924-7_63
Download citation
DOI: https://doi.org/10.1007/978-3-540-39924-7_63
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-20149-6
Online ISBN: 978-3-540-39924-7
eBook Packages: Springer Book Archive