Abstract
AgentTeamwork is a grid-computing middleware system that dispatches a collection of mobile agents to coordinate a user job over remote computing nodes in a decentralized manner. Its utmost focus is to maintain high availability and dynamic balancing of distributed computing resources to a parallel-computing job. For this purpose, a mobile agent is assigned to each process engaged in the same job, monitors its execution at a different machine, takes its periodical execution snapshot, moves it to a lighter-loaded machine, and resumes it from the latest snapshot upon an accidental crash. The system also restores broken inter-process communication involved in the same job using its error-recoverable socket and mpiJava libraries in collaboration among mobile agents.
We have implemented the first version of our middleware including a mobile agent execution platform, error-recoverable socket and mpiJava API libraries, a job wrapper program, and several types of mobile agents such as commander, resource, sentinel, and bookkeeper agents, each orchestrating, allocating resources to, monitoring and maintaining snapshots of a user process respectively. This paper presents AgentTeamwork’s execution model, its implementation techniques, and our performance evaluation using the Java Grande benchmark test programs.
Similar content being viewed by others
References
Cotrim Arabe JN, Beguelin A, Lowekamp B, Seligman E, Starkey M, Stephan P (1996) Dome: parallel programming in a distributed computing environment. In: Proc. of the 10th International Parallel Processing Symposium—IPPS′96, Honolulu, HI. IEEE CS, pp 218–224
Bölöni L (2002) The bond 3 agent system. White paper, School of Computer Science, University of Central Florida
Binder W, Scrugendo G, Hulaas J (2002) Towards a secure and efficient model for grid computing using mobile code. In: Proc. of 8th ECOOP Workshop on Mobile Object Systems: Agent Application and New Frontiers, Malaga, Spain
Casavant T, Kuhl J (1988) A taxonomy of scheduling in general-purpose distributed computing systems. IEEE Transaction on Software Engineering 14(2):141–154
Condor MW Homepage (2004) http://www.cs.wisc.edu/condor/mw/
Condor Project http://www.cs.wisc.edu/condor/
Peixoto do Santos LP (1996) Load distribution: A survey. Technical report UM/DI/TR/96/03, Department of Informatica, University of Minho, Portugal
Foster I, Kesselman C (eds) (2003) The Grid 2 Blueprint for a New Computing Infrastructure, 2nd (ed) Morgan Kaufmann
Foster I, Kesselman C, Tuecke S (2001) The anatomy of the grid: Enabling scalable virtual organizations. International Journal of Supercomputer Applications 15(3):200–222
Gehring J, Reinefeld A (1996) MARS—a framework for minimizing the job execution time in a metacomputing environment. Future Generation Comput Systems 12(1):87–99
Gehring J, Streit A (2000) Robust resource management for metacomputers. In: Proc. of the 9th IEEE International Symposium on High Performance Distributed Computing–HPDC′00, Pittsburgh, PA. IEEE-CS, pp 105–112
Gopalan A, Saleem S, Martin M, Andresen D (1999) Baglets: Adding hierarchical scheduling to aglets. In: Proc. of the Eighth IEEE International Symposium on High Performance Distributed Computing (HPDC8), Los Angeles, CA, pp 229–235
Gray RS, Cybenko G, Kotz D, Peterson RA, Rus D (2002) D’Agents: Applications and performance of a mobile-agent system. Software–Practice and Experience 32(6):543–573
Grid@IFCA commercial grid solutions (2003) http://grid.ifca.unican.es/dissemination/Commercial.htm
Grimshaw AS, Natrajan A, Humphrey MA, Lewis MJ, Nguyen-Tuong A, Karpovich JF, Morgan MM, Ferrari AJ (2003) Grid Computing: Making the Global Infrastructure a Reality, chapter 10, From Legion to Avaki: The Persistence of Vision. John Wiley & Sons, pp 265–298
Hariri S, Djunaedi M, Kim Y, Nellipudi RP, Rajagopalan AK, Vdlamani P, Zhang Y (2000) CATALINA: A smart application control and management environment. In: Proc. of the 2nd International Workshop on Active Middleware Services–AMS2000
Krauter K, Buyya R, Maheswaran M (2002) A taxonomy and survey of grid resource management systems. Software Practice and Experiance 32(2):135–164
Lange DB, Oshima M (1998) Programming and Deploying Java Mobile Agents with Aglets. Addison Wesley Professional
Marques JR, Yamane T, Campbell RH, Mickunas MD (2005) Design, implementation, and performance of an automatic configuration service for distributed component systems. Software: Practice and Experience, to appear
mpiJava Home Page. http://www.hpjava.org/mpijava.html.
Peine H (2002) Application and programming experience with the Ara mobile agent system. Software–Practice and Experience 32(6):515–541
Recursion Software Inc. (2003) Voyager ORB Developer’s Guide. Frisco, TX
Seymour K, YarKhan A, Agrawal S, Dongarra J (2005) Grid Computing and New Frontiers of High Performance Processing, chapter to appear, NetSolve: Grid Enabling Scientific Computing Environments. Elsevior
Shinha PK (1997) Distributed Operating Systems: Concepts and Design, chapter 9.9.7. File Replication, IEEE CS Press, New York, pp 440–447
Subramani V, Kettimuthu R, Srinivasan S, Sadayappan P (2002) Distributed job scheduling on computational grids using multiple simultaneous requests. In: Proc. of the 11th International Symposium on High Performance Distributed Computing—HPDC 2002. Edinburgh, Scotland. IEEE-CS, pp 359–366
Suzuki N (2004) Research on A Parallel Multi-Agent Simulation System Oriented to Complex Systems. PhD thesis, University of Tsukuba, Ibaraki 305, Japan
Tanaka Y, Takemiya H, Nakada H, Sekiguchi S (2004) Design, implementation and performance evaluation of gridrpc programming middleware for a larg-scale computational grid. In: Proc. of the 5th IEEE/ACM International Workshop on Grid Computing, Pittsburgh, PA, pp 298–305
The Java Grande Forum Benchmark Suite (2002) http://www.epcc.ed.ac.uk/javagrande/
The Legion Group (2001) Legion 1.8 basic user manual. Technical report, Department of Computer Science, University of Virginia, Charlottesville, VA
Tomarchio O, Vita L, Puliafito A (2000) Active monitoring in grid environments using mobile agent technology. In: Proc. of the 2nd International Workshop on Active Middleware Services–AMS2000
van der Raadt K, Yang Y, Casanova H (2005) Practical divisible load scheduling on grid platforms with APST-DV. In: Proc. of the 19th International Parallel and Distributed Processing Symposium–IPDPS′05, Denver, CO. IEEE CS
Vogt G (2001) Delegation of tasks and rights. In: Proc. of the 12th Annual IFIP/IEEE International Workshop on Distributed Systems: Operations & Management–DSOM2001, Nance, France. INRIA, pp 327–337
Wicke C, Bic L, Dillencourt M, Fukuda M (1998) Automatic state capture of self-migrating computations in messengers. In: Proc. of the 2nd International Workshop on Mobile Agents—MA′98. Springer, pp 68–79
Wolski R (2003) Experiences with predicting resource performance on-line in computational grid settings. ACM SIGMETRICS Performance Evaluation Review 30(4):41–49
Zandy VC, Miller BP (2002) Reliable network connections. In: Proc. of the 8th Annual International Conference on Mobile Computing and Networking–MOBICOM′02, Atlanta, GA, pp 95–106. ACM Press
Author information
Authors and Affiliations
Corresponding author
Additional information
Munehiro Fukuda received a B.S. from the College of Information Sciences and an M.S. from the Master’s Program in Science and Enginnering at the University of Tsukuba in 1986 and 1988. He received his M.S. and Ph.D. in Information and Computer Science at the University of California at Irvine in 1995 and 1997, respectively. He worked at IBM Tokyo Research Laboratory from 1988 to 1993 and taught at the University of Tsukuba from 1998 to 2001. Since 2001, he has been an assistant professor at Computing & Software Systems, the University of Washington, Bothell. His research interests include mobile agents, multi-threading, cluster computing, grid computing and distributed simulations.
Koichi Kashiwagi received a Bachelor of Science degree from the Faculty of Science, Ehime University in 2000 and a Master of Engineering degree from the Department of Compter Science, Ehime University in 2002. In 2004 he became a research assistant in Department of Compter Science, Ehime University. His research interests include distributed computing, job scheduling, and grid computing.
Shin-ya Kobayashi received the B.E. degree, M.E. degree, and Dr.E. degree in Communication Engineering from Osaka University in 1985, 1988, and 1991 respectively. From 1991 to 1999, he was on the faculty of Engineering at Kanazawa University, Japan. From 1999 to 2004, He was an Associate Professor in the Department of Computer Science, Ehime University. He is a Professor at Graduate School of Science and Engineering, Ehime University. His research interests include distributed processing, and parallel processing. He is a member of the Information Processing Society of Japan, the Institute of Electrical Engineers of Japan, IEEE, and ACM.
Rights and permissions
About this article
Cite this article
Fukuda, M., Kashiwagi, K. & Kobayashi, S. AgentTeamwork: Coordinating grid-computing jobs with mobile agents. Appl Intell 25, 181–198 (2006). https://doi.org/10.1007/s10489-006-9653-6
Issue Date:
DOI: https://doi.org/10.1007/s10489-006-9653-6