skip to main content
10.1145/1551609.1551639acmconferencesArticle/Chapter ViewAbstractPublication PageshpdcConference Proceedingsconference-collections
research-article

Maestro: a self-organizing peer-to-peer dataflow framework using reinforcement learning

Published: 11 June 2009 Publication History

Abstract

In this paper we describe Maestro, a dataflow computation framework for Ibis, our Java-based grid middleware. The novelty of Maestro is that it is a self-organizing peer-to-peer system, meaning that it distributes the tasks in a flow over the available nodes based on local decisions on each node, without any central coordination. As a result, the computations are more scalable, more resilient against failing nodes, and less sensitive to communication latencies.
Maestro uses a task distribution approach based on reinforcement learning, a learning mechanism where the positive outcome of a choice makes it more likely that the same choice repeated in the future. Maestro selects the most efficient node for each stage in the computation based on the observed computation and communication times. To ensure agility, the selection decisions are made as late as possible without letting the nodes fall idle. Using this task distribution algorithm, the nodes can be used efficiently, even in a heterogeneous system with failure-prone nodes communicating through high-latency connections.

References

[1]
D. J. Abadi, Y. Ahmad, M. Balazinska, U. Cetintemel, M. Cherniack, J.-H. Hwang, W. Lindner, A. S. Maskey, A. Rasin, E. Ryvkina, N. Tatbul, Y. Xing, and S. Zdonik. The Design of the Borealis Stream Processing Engine. In Second Biennial Conference on Innovative Data Systems Research (CIDR 2005), pages 277--289, Asilomar, CA, Jan. 2005.
[2]
D. P. Anderson. Boinc: A system for public-resource computing and storage. In Proc. of 5th IEEE/ACM International Workshop on Grid Computing, pages 388--395, Pittsburgh, USA, Nov. 2004.
[3]
O. Beaumont, L. Carter, J. Ferrante, A. Legrand, and Y. Robert. Bandwidth-centric allocation of independent tasks on heterogeneous platforms. In International Parallel and Distributed Processing Symposium (IPDPS 2002), pages 67--72, Los Alamitos, CA, USA, 2002. IEEE Computer Society Press.
[4]
J. A. Boyan and M. L. Littman. Packet routing in dynamically changing networks: A reinforcement learning approach. In Advances in Neural Information Processing Systems 6, pages 671--678. Morgan Kaufmann, 1994.
[5]
R. Buyya. Grid economy comes of age: Emerging gridbus tools for service-oriented cluster and grid computing. In P2P '02: Proceedings of the Second International Conference on Peer-to-Peer Computing, page 13, Washington, DC, USA, 2002. IEEE Computer Society.
[6]
R. Buyya, J. Giddy, and H. Stockinger. Economic models for resource management and scheduling in grid computing. Concurrency and Computation: Practice and Experience, 14:1507--1542, 2002.
[7]
J. Cao, S. A. Jarvis, S. Saini, and G. R. Nudd. Gridflow: Workflow management for grid computing. In CCGRID '03: Proceedings of the 3st International Symposium on Cluster Computing and the Grid, page 198, Washington, DC, USA, 2003. IEEE Computer Society. Cardif University. the Triana project. webpage, 2003. url: http://www.trianacode.org.
[8]
A. Chakravarti, G. Baumgartner, and M. Lauria. Application-specific scheduling for the organic grid. Cluster Computing, IEEE International Conference on, 0:483, 2004.
[9]
N. Drost, R. van Nieuwpoort, J. Maassen, and H. E. Bal. Resource tracking in parallel and distributed applications. In Proceedings of the 17th IEEE International Symposium on High-Performance Distributed Computing (HPDC), pages 221--222, Boston, MA, USA, June 2008.
[10]
A. G. Escribano. Synchronization Architecture in Parallel Programming Models. PhD thesis, Dept. Informatica, University of Valladolid, July 2003. Available at www.infor.uva.es/arturo/PhD/PhD.html.
[11]
IBM. Exploratory stream processing systems. webpage, 2008. url: http://domino.research.ibm.com/comm/research_projects.nsf/pages/esps.index.html.
[12]
C. Kelleher and R. Pausch. Lowering the barriers to programming: A taxonomy of programming environments and languages for novice programmers. ACM Comput. Surv., 37(2):83--137, 2005.
[13]
G. Lee and J. Morris. Dataflow Java: Implicitly parallel Java. In ACAC '00: Proceedings of the 5th Australasian Computer Architecture Conference, page 42, Washington, DC, USA, 2000. IEEE Computer Society.
[14]
M. Lorch and D. Kafura. Symphony - a Java-based composition and manipulation framework for computational grids. Cluster Computing and the Grid, IEEE International Symposium on, 0:136, 2002.
[15]
R. V. v. Nieuwpoort, T. Kielmann, and H. E. Bal. Efficient load balancing for wide-area divide-and-conquer applications. In Proc. Eight ACM SIGPLAN Symp. on Princ. and Practice of Par. Progr. (PPoPP), pages 34--43, Snowbird, UT, USA, June 2001.
[16]
T. Oinn, M. Addis, J. Ferris, D. Marvin, M. Senger, M. Greenwood, T. Carver, K. Glover, M. R. Pocock, A. Wipat, and P. Li. Taverna: a tool for the composition and enactment of bioinformatics workflows. Bioinformatics, 20(17):3045--3054, 2004.
[17]
K. v. Reeuwijk, R. V. v. Nieuwpoort, and H. E. Bal. Developing Java grid applications with Ibis. In Proc. of the 11th International Euro-Par Conference, pages 411--420, Lisbon, Portugal, September 2005.
[18]
J. H. Spring, J. Privat, R. Guerraoui, and J. Vitek. Streamflex: high-throughput stream programming in Java. SIGPLAN Not., 42(10):211--228, 2007.
[19]
D. Thain, T. Tannenbaum, and M. Livny. Distributed computing in practice: the condor experience. Concurrency - Practice and Experience, 17(2-4):323--356, 2005.
[20]
R. van Nieuwpoort, T. Kielmann, and H. Bal. User-friendly and reliable grid computing based on imperfect middleware. In Proc. of the ACM/IEEE Conference on Supercomputing (SC'07), Reno, NV, USA, Nov. 2007.
[21]
C. J. Watkins and P. Dayan. Q-learning. Machine Learning, 8:279--292, 1992.
[22]
H. F. Wedde and M. Farooq. A comprehensive review of nature inspired routing algorithms for fixed telecommunication networks. J. Syst. Archit., 52(8):461--484, 2006.
[23]
P. G. Whiting and R. S. Pascoe. A history of data-flow languages. IEEE Annals of the History of Computing, 16(4):38--59, 1994.

Cited By

View all
  • (2011)Towards jungle computing with Ibis/ConstellationProceedings of the 2011 workshop on Dynamic distributed data-intensive applications, programming abstractions, and systems10.1145/1996010.1996013(7-18)Online publication date: 8-Jun-2011
  • (2010)Chapter 12Search Computing10.5555/2172319.2172334(225-243)Online publication date: 1-Jan-2010
  • (2010)Real-World Distributed Computing with IbisComputer10.1109/MC.2010.18443:8(54-62)Online publication date: 1-Aug-2010
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
HPDC '09: Proceedings of the 18th ACM international symposium on High performance distributed computing
June 2009
237 pages
ISBN:9781605585871
DOI:10.1145/1551609
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 June 2009

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. peer to peer
  2. reinforcement learning
  3. self organizing

Qualifiers

  • Research-article

Conference

HPDC '09
Sponsor:

Acceptance Rates

Overall Acceptance Rate 166 of 966 submissions, 17%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)7
  • Downloads (Last 6 weeks)0
Reflects downloads up to 17 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2011)Towards jungle computing with Ibis/ConstellationProceedings of the 2011 workshop on Dynamic distributed data-intensive applications, programming abstractions, and systems10.1145/1996010.1996013(7-18)Online publication date: 8-Jun-2011
  • (2010)Chapter 12Search Computing10.5555/2172319.2172334(225-243)Online publication date: 1-Jan-2010
  • (2010)Real-World Distributed Computing with IbisComputer10.1109/MC.2010.18443:8(54-62)Online publication date: 1-Aug-2010
  • (2010)Chapter 12: Panta Rhei: Flexible Execution Engine for Search Computing QueriesSearch Computing10.1007/978-3-642-12310-8_12(225-243)Online publication date: 2010

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media