Skip to main content
Log in

AgentTeamwork: Coordinating grid-computing jobs with mobile agents

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

AgentTeamwork is a grid-computing middleware system that dispatches a collection of mobile agents to coordinate a user job over remote computing nodes in a decentralized manner. Its utmost focus is to maintain high availability and dynamic balancing of distributed computing resources to a parallel-computing job. For this purpose, a mobile agent is assigned to each process engaged in the same job, monitors its execution at a different machine, takes its periodical execution snapshot, moves it to a lighter-loaded machine, and resumes it from the latest snapshot upon an accidental crash. The system also restores broken inter-process communication involved in the same job using its error-recoverable socket and mpiJava libraries in collaboration among mobile agents.

We have implemented the first version of our middleware including a mobile agent execution platform, error-recoverable socket and mpiJava API libraries, a job wrapper program, and several types of mobile agents such as commander, resource, sentinel, and bookkeeper agents, each orchestrating, allocating resources to, monitoring and maintaining snapshots of a user process respectively. This paper presents AgentTeamwork’s execution model, its implementation techniques, and our performance evaluation using the Java Grande benchmark test programs.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Cotrim Arabe JN, Beguelin A, Lowekamp B, Seligman E, Starkey M, Stephan P (1996) Dome: parallel programming in a distributed computing environment. In: Proc. of the 10th International Parallel Processing Symposium—IPPS′96, Honolulu, HI. IEEE CS, pp 218–224

  2. Bölöni L (2002) The bond 3 agent system. White paper, School of Computer Science, University of Central Florida

  3. Binder W, Scrugendo G, Hulaas J (2002) Towards a secure and efficient model for grid computing using mobile code. In: Proc. of 8th ECOOP Workshop on Mobile Object Systems: Agent Application and New Frontiers, Malaga, Spain

  4. Casavant T, Kuhl J (1988) A taxonomy of scheduling in general-purpose distributed computing systems. IEEE Transaction on Software Engineering 14(2):141–154

    Article  Google Scholar 

  5. Condor MW Homepage (2004) http://www.cs.wisc.edu/condor/mw/

  6. Condor Project http://www.cs.wisc.edu/condor/

  7. Peixoto do Santos LP (1996) Load distribution: A survey. Technical report UM/DI/TR/96/03, Department of Informatica, University of Minho, Portugal

  8. Foster I, Kesselman C (eds) (2003) The Grid 2 Blueprint for a New Computing Infrastructure, 2nd (ed) Morgan Kaufmann

  9. Foster I, Kesselman C, Tuecke S (2001) The anatomy of the grid: Enabling scalable virtual organizations. International Journal of Supercomputer Applications 15(3):200–222

    Article  Google Scholar 

  10. Gehring J, Reinefeld A (1996) MARS—a framework for minimizing the job execution time in a metacomputing environment. Future Generation Comput Systems 12(1):87–99

    Article  Google Scholar 

  11. Gehring J, Streit A (2000) Robust resource management for metacomputers. In: Proc. of the 9th IEEE International Symposium on High Performance Distributed Computing–HPDC′00, Pittsburgh, PA. IEEE-CS, pp 105–112

  12. Gopalan A, Saleem S, Martin M, Andresen D (1999) Baglets: Adding hierarchical scheduling to aglets. In: Proc. of the Eighth IEEE International Symposium on High Performance Distributed Computing (HPDC8), Los Angeles, CA, pp 229–235

  13. Gray RS, Cybenko G, Kotz D, Peterson RA, Rus D (2002) D’Agents: Applications and performance of a mobile-agent system. Software–Practice and Experience 32(6):543–573

    Article  MATH  Google Scholar 

  14. Grid@IFCA commercial grid solutions (2003) http://grid.ifca.unican.es/dissemination/Commercial.htm

  15. Grimshaw AS, Natrajan A, Humphrey MA, Lewis MJ, Nguyen-Tuong A, Karpovich JF, Morgan MM, Ferrari AJ (2003) Grid Computing: Making the Global Infrastructure a Reality, chapter 10, From Legion to Avaki: The Persistence of Vision. John Wiley & Sons, pp 265–298

  16. Hariri S, Djunaedi M, Kim Y, Nellipudi RP, Rajagopalan AK, Vdlamani P, Zhang Y (2000) CATALINA: A smart application control and management environment. In: Proc. of the 2nd International Workshop on Active Middleware Services–AMS2000

  17. Krauter K, Buyya R, Maheswaran M (2002) A taxonomy and survey of grid resource management systems. Software Practice and Experiance 32(2):135–164

    Article  MATH  Google Scholar 

  18. Lange DB, Oshima M (1998) Programming and Deploying Java Mobile Agents with Aglets. Addison Wesley Professional

  19. Marques JR, Yamane T, Campbell RH, Mickunas MD (2005) Design, implementation, and performance of an automatic configuration service for distributed component systems. Software: Practice and Experience, to appear

  20. mpiJava Home Page. http://www.hpjava.org/mpijava.html.

  21. Peine H (2002) Application and programming experience with the Ara mobile agent system. Software–Practice and Experience 32(6):515–541

    Article  MATH  Google Scholar 

  22. Recursion Software Inc. (2003) Voyager ORB Developer’s Guide. Frisco, TX

  23. Seymour K, YarKhan A, Agrawal S, Dongarra J (2005) Grid Computing and New Frontiers of High Performance Processing, chapter to appear, NetSolve: Grid Enabling Scientific Computing Environments. Elsevior

  24. Shinha PK (1997) Distributed Operating Systems: Concepts and Design, chapter 9.9.7. File Replication, IEEE CS Press, New York, pp 440–447

    Google Scholar 

  25. Subramani V, Kettimuthu R, Srinivasan S, Sadayappan P (2002) Distributed job scheduling on computational grids using multiple simultaneous requests. In: Proc. of the 11th International Symposium on High Performance Distributed Computing—HPDC 2002. Edinburgh, Scotland. IEEE-CS, pp 359–366

    Google Scholar 

  26. Suzuki N (2004) Research on A Parallel Multi-Agent Simulation System Oriented to Complex Systems. PhD thesis, University of Tsukuba, Ibaraki 305, Japan

  27. Tanaka Y, Takemiya H, Nakada H, Sekiguchi S (2004) Design, implementation and performance evaluation of gridrpc programming middleware for a larg-scale computational grid. In: Proc. of the 5th IEEE/ACM International Workshop on Grid Computing, Pittsburgh, PA, pp 298–305

  28. The Java Grande Forum Benchmark Suite (2002) http://www.epcc.ed.ac.uk/javagrande/

  29. The Legion Group (2001) Legion 1.8 basic user manual. Technical report, Department of Computer Science, University of Virginia, Charlottesville, VA

  30. Tomarchio O, Vita L, Puliafito A (2000) Active monitoring in grid environments using mobile agent technology. In: Proc. of the 2nd International Workshop on Active Middleware Services–AMS2000

  31. van der Raadt K, Yang Y, Casanova H (2005) Practical divisible load scheduling on grid platforms with APST-DV. In: Proc. of the 19th International Parallel and Distributed Processing Symposium–IPDPS′05, Denver, CO. IEEE CS

  32. Vogt G (2001) Delegation of tasks and rights. In: Proc. of the 12th Annual IFIP/IEEE International Workshop on Distributed Systems: Operations & Management–DSOM2001, Nance, France. INRIA, pp 327–337

  33. Wicke C, Bic L, Dillencourt M, Fukuda M (1998) Automatic state capture of self-migrating computations in messengers. In: Proc. of the 2nd International Workshop on Mobile Agents—MA′98. Springer, pp 68–79

  34. Wolski R (2003) Experiences with predicting resource performance on-line in computational grid settings. ACM SIGMETRICS Performance Evaluation Review 30(4):41–49

    Google Scholar 

  35. Zandy VC, Miller BP (2002) Reliable network connections. In: Proc. of the 8th Annual International Conference on Mobile Computing and Networking–MOBICOM′02, Atlanta, GA, pp 95–106. ACM Press

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Munehiro Fukuda.

Additional information

Munehiro Fukuda received a B.S. from the College of Information Sciences and an M.S. from the Master’s Program in Science and Enginnering at the University of Tsukuba in 1986 and 1988. He received his M.S. and Ph.D. in Information and Computer Science at the University of California at Irvine in 1995 and 1997, respectively. He worked at IBM Tokyo Research Laboratory from 1988 to 1993 and taught at the University of Tsukuba from 1998 to 2001. Since 2001, he has been an assistant professor at Computing & Software Systems, the University of Washington, Bothell. His research interests include mobile agents, multi-threading, cluster computing, grid computing and distributed simulations.

Koichi Kashiwagi received a Bachelor of Science degree from the Faculty of Science, Ehime University in 2000 and a Master of Engineering degree from the Department of Compter Science, Ehime University in 2002. In 2004 he became a research assistant in Department of Compter Science, Ehime University. His research interests include distributed computing, job scheduling, and grid computing.

Shin-ya Kobayashi received the B.E. degree, M.E. degree, and Dr.E. degree in Communication Engineering from Osaka University in 1985, 1988, and 1991 respectively. From 1991 to 1999, he was on the faculty of Engineering at Kanazawa University, Japan. From 1999 to 2004, He was an Associate Professor in the Department of Computer Science, Ehime University. He is a Professor at Graduate School of Science and Engineering, Ehime University. His research interests include distributed processing, and parallel processing. He is a member of the Information Processing Society of Japan, the Institute of Electrical Engineers of Japan, IEEE, and ACM.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Fukuda, M., Kashiwagi, K. & Kobayashi, S. AgentTeamwork: Coordinating grid-computing jobs with mobile agents. Appl Intell 25, 181–198 (2006). https://doi.org/10.1007/s10489-006-9653-6

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-006-9653-6

Keywords

Navigation