Multi-objective multiagent credit assignment in reinforcement learning and NSGA-II

Yliniemi, Logan; Tumer, Kagan

doi:10.1007/s00500-016-2124-z

Multi-objective multiagent credit assignment in reinforcement learning and NSGA-II

Focus
Published: 28 March 2016

Volume 20, pages 3869–3887, (2016)
Cite this article

Soft Computing Aims and scope Submit manuscript

1189 Accesses
Explore all metrics

Abstract

Multiagent systems have had a powerful impact on the real world. Many of the systems it studies (air traffic, satellite coordination, rover exploration) are inherently multi-objective, but are often treated as single-objective problems within the research. A key concept within multiagent systems is that of credit assignment: quantifying an individual agent’s impact on the overall system performance. In this work,we extend the concept of credit assignment into multi-objective problems. We apply credit assignment through difference evaluations to two different policy selection paradigms to demonstrate their broad applicability. We first examine reinforcement learning, in which using difference evaluations improves performance by (i) increasing learning speed by up to 10$\times $, (ii) producing solutions that dominate all solutions discovered by a traditional team-based credit assignment schema and (iii) losing only 0.61 % of dominated hypervolume in a scenario where 20 % of agents act in their own interests instead of the system’s interests (compared to a 43 % loss when using a traditional global reward in the same scenario). We then derive multiple methods for incorporating difference evaluations into a state-of-the-art multi-objective evolutionary algorithm, NSGA-II. Median performance of the NSGA-II considering credit assignment dominates best-case performance of NSGA-II not considering credit assignment in a multiagent multi-objective problem. Our results strongly suggest that in a multiagent multi-objective problem, proper credit assignment is at least as important to performance as the choice of multi-objective algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Accelerating the Computation of Solutions in Resource Allocation Problems Using an Evolutionary Approach and Multiagent Reinforcement Learning

Multiagent Reinforcement Learning for Combinatorial Optimization

Multi-Agent Reinforcement Learning: A Selective Overview of Theories and Algorithms

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Notes

These values were chosen through a parameter sweep to create the best performance for each reward, though the results are not very sensitive to $\epsilon $ or $\alpha $ values.

References

Agarwal M, Kumar N, Vig L (2014) Non-additive multi-objective robot coalition formation. Exp Syst Appl 41(8):3736–3747
Article Google Scholar
Agogino AK, Tumer K (2008) Analyzing and visualizing multi-agent rewards in dynamic and stochastic domains. J Autonom Agents Multiagent Syst 17(2):320–338
Article Google Scholar
Arthur WB (1984) Inductive reasoning and bounded rationality (the El Farol Problem). Am Econ Rev 84(406):1994
Atiquzzaman M, Liong S-Y, Yu X (2006) Alternative decision making in water distribution network with NSGA-II. J Water Res Plann Manag 132(2):122–126
Article Google Scholar
Brys T, Harutyunyan A, Vrancx P, Taylor M, Kudenko D, Nowe A (2014a) Multi-objectivization of reinforcement learning problems by reward shaping. In: 2014 international joint conference on neural networks (IJCNN), pp 2315–2322
Brys T, Pham TT, Taylor ME (2014b) Distributed learning and multi-objectivity in traffic light control. Connect Sci 26(1):65–83
Article Google Scholar
Colby M, Tumer K (2012) Shaping fitness functions for coevolving cooperative multiagent systems. In: Proceedings of the 11th international conference on autonomous agents and multiagent systems. International foundation for autonomous agents and multiagent systems, vol 1, pp 425–432
Colby M, Tumer K (2015) An evolutionary game theoretic analysis of difference evaluation functions. In: Proceedings of the 2015 on genetic and evolutionary computation conference. ACM, New York, pp 1391–1398
Colby M, Chung JJ, Tumer K (2015) Implicit adaptive multi-robot coordination in dynamic environments. In: 2015 IEEE/RSJ international conference on intelligent robots and systems (IROS). IEEE, New York, pp 5168–5173
Colby M, Yliniemi L, Tumer K (2016) Autonomous multiagent space exploration with high-level human feedback. J Aerospace Inf Syst (to appear)
Damiani S, Verfaillie G, Charmeau MC (2005) An earth watching satellite constellation: how to manage a team of watching agents with limited communications. Autonom Agents Multiagent Syst
Das I, Dennis JE (1997) A closer look at drawbacks of minimizing weighted sums of objectives for pareto set generation in multicriteria optimization problems. Struct Optim, 63–69
Deb K, Pratap A, Moitra S (2000) Mechanical component design for multiple ojectives using elitist non-dominated sorting ga. In: Parallel problem solving from nature PPSN VI. Springer, Berlin, pp 859–868
Deb K, Pratap A, Agarwal S, Meyarivan T (2002) A fast elitist multi-objective genetic algorithm: NSGA-II. Evol Comput 6:182–197
Article Google Scholar
Devlin S, Yliniemi L, Kudenko D, Tumer K (2014) Potential-based difference rewards for multiagent reinforcement learning. In: Proceedings of the 2014 international conference on autonomous agents and multi-agent systems, AAMAS ’14, 2014, pp 165–172. ISBN 978-1-4503-2738-1
Dresner K, Stone P (2008) A multiagent approach to autonomous intersection management. J Artif Intell Res 591–656
Dusparic I, Cahill V (2009) Distributed w-learning: multi-policy optimization in self-organizing systems. In: Third IEEE international conference on self-adaptive and self-organizing systems, 2009. SASO’09. IEEE, New York, pp 20–29
Fonseca CM, Fleming PJ (1996) On the performance assessment and comparison of stochastic multiobjective optimizers. Lect Notes Comput Sci 1141:584–593
Article Google Scholar
Fonseca CM, Guerreiro AP, Lopez-Ibanez M, Paquete L (2011) On the computation of the empirical attainment function. LNCS 6576:121–135
Gábor Z, Kalmár Z, Szepesvári C (1998) Multi-criteria reinforcement learning. ICML 98:197–205
Goldberg DE, Deb K (1991) A comparative analysis of selection schemes used in genetic algorithms. Urbana 51:61801–2996
MathSciNet Google Scholar
Heris SMK, Khaloozadeh H (2011) Open-and closed-loop multiobjective optimal strategies for HIV therapy using NSGA-II. IEEE Trans Biomed Eng 58(6):1678–1685
Article Google Scholar
Huang B, Buckley B, Kechadi T-M (2010) Multi-objective feature selection by using NSGA-II for customer churn prediction in telecommunications. Exp Syst Appl 37(5):3638–3646
Article Google Scholar
Jeyadevi S, Baskar S, Babulal C, Willjuice MI (2011) Solving multiobjective optimal reactive power dispatch using modified NSGA-II. Int J Electrical Power Energy Syst 33(2):219–228
Article Google Scholar
Kaelbling LP, Littman ML, Moore AW (1996) Reinforcement learning: a survey. J Artif Intell Res
Khu ST, Madsen H (2005) Multiobjective calibration with pareto preference ordering: An application to rainfall-runoff model calibration. Water Resour Res 41(3)
Knudson M, Tumer K (2010) Coevolution of heterogeneous multi-robot teams. In: Proceedings of the 12th annual conference on genetic and evolutionary computation
Marler R, Arora JS (2004) Survey of multi-objective optimization methods for engineering. Struct Multidiscipl Optim 26:369–395
Article MathSciNet MATH Google Scholar
Nandasana AD, Ray AK, Gupta SK (2003) Applications of the non-dominated sorting genetic algorithm (NSGA) in chemical reaction engineering. Int J Chem Reactor Eng 1
Panait L (2010) Theoretical convergence guarantees for cooperative coevolutionary algorithms. Evol Comput 18(4):581–615
Article Google Scholar
Panait L, Luke S (2005) Cooperative multi-agent learning: The state of the art. J Auton Agents Multi-Agent Syst 11:387–434
Article Google Scholar
Pareto V (1927) Manual of political economy. MacMillan Press Ltd., London
Parsopoulos K, Vrahatis MN (2002) Particle swarm optimization method in multiobjective problems. In: ACM symposium on applied computing
Proper S, Tumer K (2012) Modeling difference rewards for multiagent learning. In: Proceedings of the 11th international conference on autonomous agents and multiagent systems. International foundation for autonomous agents and multiagent systems, vol 3, pp 1397–1398
Rajagopalan P, Rawal A, Miikkulainen R (2010) Emergence of competitive and cooperative behavior using coevolution. GECCO, pp 1073–1074
Ramesh S, Kannan S, Baskar S (2012) Application of modified NSGA-II algorithm to multi-objective reactive power planning. Appl Soft Comput 12(2):741–753
Article Google Scholar
Rebhuhn C, Gilchrist B, Oman S, Tumer I, Stone R, Tumer K (2014) A multiagent approach to evaluating innovative component selection. In: Gero JS (ed) Design, computing, and cognition
Reddy MJ, Kumar DN (2007) Multiobjective differential evolution with application to reservoir system optimization. J Comput Civil Eng 21(2):136–146
Article Google Scholar
Roijers DM, Vamplew P, Whiteson S, Dazeley R (2013a) A survey of multi-objective sequential decision-making. J Artif Intell Res
Roijers DM, Whiteson S, Oliehoek FA (2013b) Multi-objective variable elimination for collaborative graphical games. In: Proceedings of the 2013 international conference on autonomous agents and multi-agent systems, AAMAS ’13, pp 1209–1210. ISBN 978-1-4503-1993-5
Roijers DM, Whiteson S, Oliehoek FA (2014) Linear support for multi-objective coordination graphs. In: Proceedings of the 2014 international conference on autonomous agents and multi-agent systems. In: International foundation for autonomous agents and multiagent systems, pp 1297–1304
Rosehart W, Cañizares CA, Quintana VH (2001) Multi-objective optimal power flows to evaluate voltage security costs in power networks. IEEE Tr Power Syst
Rubenstein M, Cabrera A, Werfel J, Habibi G, McLurkin J, Nagpal R (2013) Collective transport of complex objects by simple robots: theory and experiments. AAMAS, Bologna
Singh S, Jaakkola T, Littman ML, Szepesvári C (2000) Convergence results for single-step on-policy reinforcement-learning algorithms. Mach Learn 38(3):287–308
Article MATH Google Scholar
Soyel H, Tekguc U, Demirel H (2011) Application of NSGA-II to feature selection for facial expression recognition. Comput Electrical Eng 37(6)
Sutton R, Barto AG (1998) Reinforcement learning: an introduction. MIT Press, Cambridge
Taylor A, Dusparic I, Galván-López E, Clarke S, Cahill V (2014) Accelerating learning in multi-objective systems through transfer learning. In: 2014 international joint conference on neural networks (IJCNN). IEEE, New York, pp 2298–2305
Tomlin C, Pappas GJ, Sastry S (1998) Conflict resolution for air traffic management: a study in multiagent hybrid systems. IEEE Trans Autom Control 43(4):509–521
Tumer K (2005) Designing agent utilities for coordinated, scalable and robust multi-agent systems. In: Scerri P, Mailler R, Vincent R (eds) Challenges in the coordination of large scale multiagent systems. Springer, Berlin
Tumer K, Agogino A (2009) Multiagent learning for black box system reward functions. Adv Complex Syst 12:493–512
Article MATH Google Scholar
Tumer K, Wolpert D (eds) (2004a) Collectives and the design of complex systems. Springer, New York
MATH Google Scholar
Tumer K, Wolpert D (2004b) A survey of collectives. In: Collectives and the design of complex systems. Springer, Berlin, pp 1–42
Tumer, K, Agogino A, Wolpert D (2002) Learning sequences of actions in collectives of autonomous agents. In: AAMAS, Bologna, pp 378–385
Vamplew P, Dazeley R, Berry A, Issabekov R, Dekker E (2010) Empirical evaluation methods for multiobjective reinforcement learning algorithms. Mach Learn
Vasirani M, Ossowski S (2009) A market-inspired approach to reservation-based urban road traffic management. In: Proceedings of the 8th international conference on autonomous agents and multiagent systems
Veldhuizen DAV (1999) Multiobjective evolutionary algorithms: classifications analyses and new innovations. PhD thesis, Air Force Institute of Technology
Veldhuizen DAV, Lamont GB (2000) Multiobjective evolutionary algorithms: analyzing the state-of-the-art. Evol Comput 8(2):125–147
Article Google Scholar
Watkins C, Dayan P (1992) Q-learning. Mach Learn 8(3/4):279–292
Article MATH Google Scholar
Wolpert DH, Tumer K (2001) Optimal payoff functions for members of collectives. Adv Complex Syst 4(2/3):265–279
Article MATH Google Scholar
Wolpert DH, Tumer K (2002) Collective intelligence, data routing and braess’ paradox. J Artif Intell Res 16:359–387
MathSciNet MATH Google Scholar
Wolpert DH, Wheeler K, Tumer K (2000) Collective intelligence for control of distributed dynamical systems. Europhys Lett 49(6)
Wolpert DH, Tumer K, Bandari E (2004) Improving search algorithms by using intelligent coordinates. Phys Rev E 69:017701
Article Google Scholar
Wooldridge M (2002) An introduction to multiagent systems. Wiley, New York
Zitzler E, Laumanns M, Thiele L (2002) SPEA2: improving the strength pareto evolutionary algorithm for multiobjective optimization. In: Evolutionary methods for design, optimisation, and control, pp 19–26

Download references

Author information

Authors and Affiliations

University of Nevada, Reno, USA
Logan Yliniemi
Oregon State University, Corvallis, USA
Kagan Tumer

Authors

Logan Yliniemi
View author publications
You can also search for this author in PubMed Google Scholar
Kagan Tumer
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Logan Yliniemi.

Ethics declarations

Conflict of interest

The authors declare no conflict of interest associated with this manuscript.

Additional information

Communicated by B. Xue and A. G. Chen.

This work was partially supported by the National Energy Technology Laboratory under Grant DE-FE0012302.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yliniemi, L., Tumer, K. Multi-objective multiagent credit assignment in reinforcement learning and NSGA-II. Soft Comput 20, 3869–3887 (2016). https://doi.org/10.1007/s00500-016-2124-z

Download citation

Published: 28 March 2016
Issue Date: October 2016
DOI: https://doi.org/10.1007/s00500-016-2124-z

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multi-objective multiagent credit assignment in reinforcement learning and NSGA-II

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Accelerating the Computation of Solutions in Resource Allocation Problems Using an Evolutionary Approach and Multiagent Reinforcement Learning

Multiagent Reinforcement Learning for Combinatorial Optimization

Multi-Agent Reinforcement Learning: A Selective Overview of Theories and Algorithms

Notes

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Multi-objective multiagent credit assignment in reinforcement learning and NSGA-II

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Accelerating the Computation of Solutions in Resource Allocation Problems Using an Evolutionary Approach and Multiagent Reinforcement Learning

Multiagent Reinforcement Learning for Combinatorial Optimization

Multi-Agent Reinforcement Learning: A Selective Overview of Theories and Algorithms

Explore related subjects

Notes

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation