Skip to main content

Advertisement

Log in

Multi-objective multiagent credit assignment in reinforcement learning and NSGA-II

  • Focus
  • Published:
Soft Computing Aims and scope Submit manuscript

Abstract

Multiagent systems have had a powerful impact on the real world. Many of the systems it studies (air traffic, satellite coordination, rover exploration) are inherently multi-objective, but are often treated as single-objective problems within the research. A key concept within multiagent systems is that of credit assignment: quantifying an individual agent’s impact on the overall system performance. In this work,we extend the concept of credit assignment into multi-objective problems. We apply credit assignment through difference evaluations to two different policy selection paradigms to demonstrate their broad applicability. We first examine reinforcement learning, in which using difference evaluations improves performance by (i) increasing learning speed by up to 10\(\times \), (ii) producing solutions that dominate all solutions discovered by a traditional team-based credit assignment schema and (iii) losing only 0.61 % of dominated hypervolume in a scenario where 20 % of agents act in their own interests instead of the system’s interests (compared to a 43 % loss when using a traditional global reward in the same scenario). We then derive multiple methods for incorporating difference evaluations into a state-of-the-art multi-objective evolutionary algorithm, NSGA-II. Median performance of the NSGA-II considering credit assignment dominates best-case performance of NSGA-II not considering credit assignment in a multiagent multi-objective problem. Our results strongly suggest that in a multiagent multi-objective problem, proper credit assignment is at least as important to performance as the choice of multi-objective algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Rent this article via DeepDyve

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Notes

  1. These values were chosen through a parameter sweep to create the best performance for each reward, though the results are not very sensitive to \(\epsilon \) or \(\alpha \) values.

References

  • Agarwal M, Kumar N, Vig L (2014) Non-additive multi-objective robot coalition formation. Exp Syst Appl 41(8):3736–3747

    Article  Google Scholar 

  • Agogino AK, Tumer K (2008) Analyzing and visualizing multi-agent rewards in dynamic and stochastic domains. J Autonom Agents Multiagent Syst 17(2):320–338

    Article  Google Scholar 

  • Arthur WB (1984) Inductive reasoning and bounded rationality (the El Farol Problem). Am Econ Rev 84(406):1994

  • Atiquzzaman M, Liong S-Y, Yu X (2006) Alternative decision making in water distribution network with NSGA-II. J Water Res Plann Manag 132(2):122–126

    Article  Google Scholar 

  • Brys T, Harutyunyan A, Vrancx P, Taylor M, Kudenko D, Nowe A (2014a) Multi-objectivization of reinforcement learning problems by reward shaping. In: 2014 international joint conference on neural networks (IJCNN), pp 2315–2322

  • Brys T, Pham TT, Taylor ME (2014b) Distributed learning and multi-objectivity in traffic light control. Connect Sci 26(1):65–83

    Article  Google Scholar 

  • Colby M, Tumer K (2012) Shaping fitness functions for coevolving cooperative multiagent systems. In: Proceedings of the 11th international conference on autonomous agents and multiagent systems. International foundation for autonomous agents and multiagent systems, vol 1, pp 425–432

  • Colby M, Tumer K (2015) An evolutionary game theoretic analysis of difference evaluation functions. In: Proceedings of the 2015 on genetic and evolutionary computation conference. ACM, New York, pp 1391–1398

  • Colby M, Chung JJ, Tumer K (2015) Implicit adaptive multi-robot coordination in dynamic environments. In: 2015 IEEE/RSJ international conference on intelligent robots and systems (IROS). IEEE, New York, pp 5168–5173

  • Colby M, Yliniemi L, Tumer K (2016) Autonomous multiagent space exploration with high-level human feedback. J Aerospace Inf Syst (to appear)

  • Damiani S, Verfaillie G, Charmeau MC (2005) An earth watching satellite constellation: how to manage a team of watching agents with limited communications. Autonom Agents Multiagent Syst

  • Das I, Dennis JE (1997) A closer look at drawbacks of minimizing weighted sums of objectives for pareto set generation in multicriteria optimization problems. Struct Optim, 63–69

  • Deb K, Pratap A, Moitra S (2000) Mechanical component design for multiple ojectives using elitist non-dominated sorting ga. In: Parallel problem solving from nature PPSN VI. Springer, Berlin, pp 859–868

  • Deb K, Pratap A, Agarwal S, Meyarivan T (2002) A fast elitist multi-objective genetic algorithm: NSGA-II. Evol Comput 6:182–197

    Article  Google Scholar 

  • Devlin S, Yliniemi L, Kudenko D, Tumer K (2014) Potential-based difference rewards for multiagent reinforcement learning. In: Proceedings of the 2014 international conference on autonomous agents and multi-agent systems, AAMAS ’14, 2014, pp 165–172. ISBN 978-1-4503-2738-1

  • Dresner K, Stone P (2008) A multiagent approach to autonomous intersection management. J Artif Intell Res 591–656

  • Dusparic I, Cahill V (2009) Distributed w-learning: multi-policy optimization in self-organizing systems. In: Third IEEE international conference on self-adaptive and self-organizing systems, 2009. SASO’09. IEEE, New York, pp 20–29

  • Fonseca CM, Fleming PJ (1996) On the performance assessment and comparison of stochastic multiobjective optimizers. Lect Notes Comput Sci 1141:584–593

    Article  Google Scholar 

  • Fonseca CM, Guerreiro AP, Lopez-Ibanez M, Paquete L (2011) On the computation of the empirical attainment function. LNCS 6576:121–135

  • Gábor Z, Kalmár Z, Szepesvári C (1998) Multi-criteria reinforcement learning. ICML 98:197–205

  • Goldberg DE, Deb K (1991) A comparative analysis of selection schemes used in genetic algorithms. Urbana 51:61801–2996

    MathSciNet  Google Scholar 

  • Heris SMK, Khaloozadeh H (2011) Open-and closed-loop multiobjective optimal strategies for HIV therapy using NSGA-II. IEEE Trans Biomed Eng 58(6):1678–1685

    Article  Google Scholar 

  • Huang B, Buckley B, Kechadi T-M (2010) Multi-objective feature selection by using NSGA-II for customer churn prediction in telecommunications. Exp Syst Appl 37(5):3638–3646

    Article  Google Scholar 

  • Jeyadevi S, Baskar S, Babulal C, Willjuice MI (2011) Solving multiobjective optimal reactive power dispatch using modified NSGA-II. Int J Electrical Power Energy Syst 33(2):219–228

    Article  Google Scholar 

  • Kaelbling LP, Littman ML, Moore AW (1996) Reinforcement learning: a survey. J Artif Intell Res

  • Khu ST, Madsen H (2005) Multiobjective calibration with pareto preference ordering: An application to rainfall-runoff model calibration. Water Resour Res 41(3)

  • Knudson M, Tumer K (2010) Coevolution of heterogeneous multi-robot teams. In: Proceedings of the 12th annual conference on genetic and evolutionary computation

  • Marler R, Arora JS (2004) Survey of multi-objective optimization methods for engineering. Struct Multidiscipl Optim 26:369–395

    Article  MathSciNet  MATH  Google Scholar 

  • Nandasana AD, Ray AK, Gupta SK (2003) Applications of the non-dominated sorting genetic algorithm (NSGA) in chemical reaction engineering. Int J Chem Reactor Eng 1

  • Panait L (2010) Theoretical convergence guarantees for cooperative coevolutionary algorithms. Evol Comput 18(4):581–615

    Article  Google Scholar 

  • Panait L, Luke S (2005) Cooperative multi-agent learning: The state of the art. J Auton Agents Multi-Agent Syst 11:387–434

    Article  Google Scholar 

  • Pareto V (1927) Manual of political economy. MacMillan Press Ltd., London

  • Parsopoulos K, Vrahatis MN (2002) Particle swarm optimization method in multiobjective problems. In: ACM symposium on applied computing

  • Proper S, Tumer K (2012) Modeling difference rewards for multiagent learning. In: Proceedings of the 11th international conference on autonomous agents and multiagent systems. International foundation for autonomous agents and multiagent systems, vol 3, pp 1397–1398

  • Rajagopalan P, Rawal A, Miikkulainen R (2010) Emergence of competitive and cooperative behavior using coevolution. GECCO, pp 1073–1074

  • Ramesh S, Kannan S, Baskar S (2012) Application of modified NSGA-II algorithm to multi-objective reactive power planning. Appl Soft Comput 12(2):741–753

    Article  Google Scholar 

  • Rebhuhn C, Gilchrist B, Oman S, Tumer I, Stone R, Tumer K (2014) A multiagent approach to evaluating innovative component selection. In: Gero JS (ed) Design, computing, and cognition

  • Reddy MJ, Kumar DN (2007) Multiobjective differential evolution with application to reservoir system optimization. J Comput Civil Eng 21(2):136–146

    Article  Google Scholar 

  • Roijers DM, Vamplew P, Whiteson S, Dazeley R (2013a) A survey of multi-objective sequential decision-making. J Artif Intell Res

  • Roijers DM, Whiteson S, Oliehoek FA (2013b) Multi-objective variable elimination for collaborative graphical games. In: Proceedings of the 2013 international conference on autonomous agents and multi-agent systems, AAMAS ’13, pp 1209–1210. ISBN 978-1-4503-1993-5

  • Roijers DM, Whiteson S, Oliehoek FA (2014) Linear support for multi-objective coordination graphs. In: Proceedings of the 2014 international conference on autonomous agents and multi-agent systems. In: International foundation for autonomous agents and multiagent systems, pp 1297–1304

  • Rosehart W, Cañizares CA, Quintana VH (2001) Multi-objective optimal power flows to evaluate voltage security costs in power networks. IEEE Tr Power Syst

  • Rubenstein M, Cabrera A, Werfel J, Habibi G, McLurkin J, Nagpal R (2013) Collective transport of complex objects by simple robots: theory and experiments. AAMAS, Bologna

  • Singh S, Jaakkola T, Littman ML, Szepesvári C (2000) Convergence results for single-step on-policy reinforcement-learning algorithms. Mach Learn 38(3):287–308

    Article  MATH  Google Scholar 

  • Soyel H, Tekguc U, Demirel H (2011) Application of NSGA-II to feature selection for facial expression recognition. Comput Electrical Eng 37(6)

  • Sutton R, Barto AG (1998) Reinforcement learning: an introduction. MIT Press, Cambridge

  • Taylor A, Dusparic I, Galván-López E, Clarke S, Cahill V (2014) Accelerating learning in multi-objective systems through transfer learning. In: 2014 international joint conference on neural networks (IJCNN). IEEE, New York, pp 2298–2305

  • Tomlin C, Pappas GJ, Sastry S (1998) Conflict resolution for air traffic management: a study in multiagent hybrid systems. IEEE Trans Autom Control 43(4):509–521

  • Tumer K (2005) Designing agent utilities for coordinated, scalable and robust multi-agent systems. In: Scerri P, Mailler R, Vincent R (eds) Challenges in the coordination of large scale multiagent systems. Springer, Berlin

  • Tumer K, Agogino A (2009) Multiagent learning for black box system reward functions. Adv Complex Syst 12:493–512

    Article  MATH  Google Scholar 

  • Tumer K, Wolpert D (eds) (2004a) Collectives and the design of complex systems. Springer, New York

    MATH  Google Scholar 

  • Tumer K, Wolpert D (2004b) A survey of collectives. In: Collectives and the design of complex systems. Springer, Berlin, pp 1–42

  • Tumer, K, Agogino A, Wolpert D (2002) Learning sequences of actions in collectives of autonomous agents. In: AAMAS, Bologna, pp 378–385

  • Vamplew P, Dazeley R, Berry A, Issabekov R, Dekker E (2010) Empirical evaluation methods for multiobjective reinforcement learning algorithms. Mach Learn

  • Vasirani M, Ossowski S (2009) A market-inspired approach to reservation-based urban road traffic management. In: Proceedings of the 8th international conference on autonomous agents and multiagent systems

  • Veldhuizen DAV (1999) Multiobjective evolutionary algorithms: classifications analyses and new innovations. PhD thesis, Air Force Institute of Technology

  • Veldhuizen DAV, Lamont GB (2000) Multiobjective evolutionary algorithms: analyzing the state-of-the-art. Evol Comput 8(2):125–147

    Article  Google Scholar 

  • Watkins C, Dayan P (1992) Q-learning. Mach Learn 8(3/4):279–292

    Article  MATH  Google Scholar 

  • Wolpert DH, Tumer K (2001) Optimal payoff functions for members of collectives. Adv Complex Syst 4(2/3):265–279

    Article  MATH  Google Scholar 

  • Wolpert DH, Tumer K (2002) Collective intelligence, data routing and braess’ paradox. J Artif Intell Res 16:359–387

    MathSciNet  MATH  Google Scholar 

  • Wolpert DH, Wheeler K, Tumer K (2000) Collective intelligence for control of distributed dynamical systems. Europhys Lett 49(6)

  • Wolpert DH, Tumer K, Bandari E (2004) Improving search algorithms by using intelligent coordinates. Phys Rev E 69:017701

    Article  Google Scholar 

  • Wooldridge M (2002) An introduction to multiagent systems. Wiley, New York

  • Zitzler E, Laumanns M, Thiele L (2002) SPEA2: improving the strength pareto evolutionary algorithm for multiobjective optimization. In: Evolutionary methods for design, optimisation, and control, pp 19–26

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Logan Yliniemi.

Ethics declarations

Conflict of interest

The authors declare no conflict of interest associated with this manuscript.

Additional information

Communicated by B. Xue and A. G. Chen.

This work was partially supported by the National Energy Technology Laboratory under Grant DE-FE0012302.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yliniemi, L., Tumer, K. Multi-objective multiagent credit assignment in reinforcement learning and NSGA-II. Soft Comput 20, 3869–3887 (2016). https://doi.org/10.1007/s00500-016-2124-z

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00500-016-2124-z

Keywords

Navigation