Abstract
Multiagent systems have had a powerful impact on the real world. Many of the systems it studies (air traffic, satellite coordination, rover exploration) are inherently multi-objective, but are often treated as single-objective problems within the research. A key concept within multiagent systems is that of credit assignment: quantifying an individual agent’s impact on the overall system performance. In this work,we extend the concept of credit assignment into multi-objective problems. We apply credit assignment through difference evaluations to two different policy selection paradigms to demonstrate their broad applicability. We first examine reinforcement learning, in which using difference evaluations improves performance by (i) increasing learning speed by up to 10\(\times \), (ii) producing solutions that dominate all solutions discovered by a traditional team-based credit assignment schema and (iii) losing only 0.61 % of dominated hypervolume in a scenario where 20 % of agents act in their own interests instead of the system’s interests (compared to a 43 % loss when using a traditional global reward in the same scenario). We then derive multiple methods for incorporating difference evaluations into a state-of-the-art multi-objective evolutionary algorithm, NSGA-II. Median performance of the NSGA-II considering credit assignment dominates best-case performance of NSGA-II not considering credit assignment in a multiagent multi-objective problem. Our results strongly suggest that in a multiagent multi-objective problem, proper credit assignment is at least as important to performance as the choice of multi-objective algorithm.
Access this article
Rent this article via DeepDyve
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Notes
These values were chosen through a parameter sweep to create the best performance for each reward, though the results are not very sensitive to \(\epsilon \) or \(\alpha \) values.
References
Agarwal M, Kumar N, Vig L (2014) Non-additive multi-objective robot coalition formation. Exp Syst Appl 41(8):3736–3747
Agogino AK, Tumer K (2008) Analyzing and visualizing multi-agent rewards in dynamic and stochastic domains. J Autonom Agents Multiagent Syst 17(2):320–338
Arthur WB (1984) Inductive reasoning and bounded rationality (the El Farol Problem). Am Econ Rev 84(406):1994
Atiquzzaman M, Liong S-Y, Yu X (2006) Alternative decision making in water distribution network with NSGA-II. J Water Res Plann Manag 132(2):122–126
Brys T, Harutyunyan A, Vrancx P, Taylor M, Kudenko D, Nowe A (2014a) Multi-objectivization of reinforcement learning problems by reward shaping. In: 2014 international joint conference on neural networks (IJCNN), pp 2315–2322
Brys T, Pham TT, Taylor ME (2014b) Distributed learning and multi-objectivity in traffic light control. Connect Sci 26(1):65–83
Colby M, Tumer K (2012) Shaping fitness functions for coevolving cooperative multiagent systems. In: Proceedings of the 11th international conference on autonomous agents and multiagent systems. International foundation for autonomous agents and multiagent systems, vol 1, pp 425–432
Colby M, Tumer K (2015) An evolutionary game theoretic analysis of difference evaluation functions. In: Proceedings of the 2015 on genetic and evolutionary computation conference. ACM, New York, pp 1391–1398
Colby M, Chung JJ, Tumer K (2015) Implicit adaptive multi-robot coordination in dynamic environments. In: 2015 IEEE/RSJ international conference on intelligent robots and systems (IROS). IEEE, New York, pp 5168–5173
Colby M, Yliniemi L, Tumer K (2016) Autonomous multiagent space exploration with high-level human feedback. J Aerospace Inf Syst (to appear)
Damiani S, Verfaillie G, Charmeau MC (2005) An earth watching satellite constellation: how to manage a team of watching agents with limited communications. Autonom Agents Multiagent Syst
Das I, Dennis JE (1997) A closer look at drawbacks of minimizing weighted sums of objectives for pareto set generation in multicriteria optimization problems. Struct Optim, 63–69
Deb K, Pratap A, Moitra S (2000) Mechanical component design for multiple ojectives using elitist non-dominated sorting ga. In: Parallel problem solving from nature PPSN VI. Springer, Berlin, pp 859–868
Deb K, Pratap A, Agarwal S, Meyarivan T (2002) A fast elitist multi-objective genetic algorithm: NSGA-II. Evol Comput 6:182–197
Devlin S, Yliniemi L, Kudenko D, Tumer K (2014) Potential-based difference rewards for multiagent reinforcement learning. In: Proceedings of the 2014 international conference on autonomous agents and multi-agent systems, AAMAS ’14, 2014, pp 165–172. ISBN 978-1-4503-2738-1
Dresner K, Stone P (2008) A multiagent approach to autonomous intersection management. J Artif Intell Res 591–656
Dusparic I, Cahill V (2009) Distributed w-learning: multi-policy optimization in self-organizing systems. In: Third IEEE international conference on self-adaptive and self-organizing systems, 2009. SASO’09. IEEE, New York, pp 20–29
Fonseca CM, Fleming PJ (1996) On the performance assessment and comparison of stochastic multiobjective optimizers. Lect Notes Comput Sci 1141:584–593
Fonseca CM, Guerreiro AP, Lopez-Ibanez M, Paquete L (2011) On the computation of the empirical attainment function. LNCS 6576:121–135
Gábor Z, Kalmár Z, Szepesvári C (1998) Multi-criteria reinforcement learning. ICML 98:197–205
Goldberg DE, Deb K (1991) A comparative analysis of selection schemes used in genetic algorithms. Urbana 51:61801–2996
Heris SMK, Khaloozadeh H (2011) Open-and closed-loop multiobjective optimal strategies for HIV therapy using NSGA-II. IEEE Trans Biomed Eng 58(6):1678–1685
Huang B, Buckley B, Kechadi T-M (2010) Multi-objective feature selection by using NSGA-II for customer churn prediction in telecommunications. Exp Syst Appl 37(5):3638–3646
Jeyadevi S, Baskar S, Babulal C, Willjuice MI (2011) Solving multiobjective optimal reactive power dispatch using modified NSGA-II. Int J Electrical Power Energy Syst 33(2):219–228
Kaelbling LP, Littman ML, Moore AW (1996) Reinforcement learning: a survey. J Artif Intell Res
Khu ST, Madsen H (2005) Multiobjective calibration with pareto preference ordering: An application to rainfall-runoff model calibration. Water Resour Res 41(3)
Knudson M, Tumer K (2010) Coevolution of heterogeneous multi-robot teams. In: Proceedings of the 12th annual conference on genetic and evolutionary computation
Marler R, Arora JS (2004) Survey of multi-objective optimization methods for engineering. Struct Multidiscipl Optim 26:369–395
Nandasana AD, Ray AK, Gupta SK (2003) Applications of the non-dominated sorting genetic algorithm (NSGA) in chemical reaction engineering. Int J Chem Reactor Eng 1
Panait L (2010) Theoretical convergence guarantees for cooperative coevolutionary algorithms. Evol Comput 18(4):581–615
Panait L, Luke S (2005) Cooperative multi-agent learning: The state of the art. J Auton Agents Multi-Agent Syst 11:387–434
Pareto V (1927) Manual of political economy. MacMillan Press Ltd., London
Parsopoulos K, Vrahatis MN (2002) Particle swarm optimization method in multiobjective problems. In: ACM symposium on applied computing
Proper S, Tumer K (2012) Modeling difference rewards for multiagent learning. In: Proceedings of the 11th international conference on autonomous agents and multiagent systems. International foundation for autonomous agents and multiagent systems, vol 3, pp 1397–1398
Rajagopalan P, Rawal A, Miikkulainen R (2010) Emergence of competitive and cooperative behavior using coevolution. GECCO, pp 1073–1074
Ramesh S, Kannan S, Baskar S (2012) Application of modified NSGA-II algorithm to multi-objective reactive power planning. Appl Soft Comput 12(2):741–753
Rebhuhn C, Gilchrist B, Oman S, Tumer I, Stone R, Tumer K (2014) A multiagent approach to evaluating innovative component selection. In: Gero JS (ed) Design, computing, and cognition
Reddy MJ, Kumar DN (2007) Multiobjective differential evolution with application to reservoir system optimization. J Comput Civil Eng 21(2):136–146
Roijers DM, Vamplew P, Whiteson S, Dazeley R (2013a) A survey of multi-objective sequential decision-making. J Artif Intell Res
Roijers DM, Whiteson S, Oliehoek FA (2013b) Multi-objective variable elimination for collaborative graphical games. In: Proceedings of the 2013 international conference on autonomous agents and multi-agent systems, AAMAS ’13, pp 1209–1210. ISBN 978-1-4503-1993-5
Roijers DM, Whiteson S, Oliehoek FA (2014) Linear support for multi-objective coordination graphs. In: Proceedings of the 2014 international conference on autonomous agents and multi-agent systems. In: International foundation for autonomous agents and multiagent systems, pp 1297–1304
Rosehart W, Cañizares CA, Quintana VH (2001) Multi-objective optimal power flows to evaluate voltage security costs in power networks. IEEE Tr Power Syst
Rubenstein M, Cabrera A, Werfel J, Habibi G, McLurkin J, Nagpal R (2013) Collective transport of complex objects by simple robots: theory and experiments. AAMAS, Bologna
Singh S, Jaakkola T, Littman ML, Szepesvári C (2000) Convergence results for single-step on-policy reinforcement-learning algorithms. Mach Learn 38(3):287–308
Soyel H, Tekguc U, Demirel H (2011) Application of NSGA-II to feature selection for facial expression recognition. Comput Electrical Eng 37(6)
Sutton R, Barto AG (1998) Reinforcement learning: an introduction. MIT Press, Cambridge
Taylor A, Dusparic I, Galván-López E, Clarke S, Cahill V (2014) Accelerating learning in multi-objective systems through transfer learning. In: 2014 international joint conference on neural networks (IJCNN). IEEE, New York, pp 2298–2305
Tomlin C, Pappas GJ, Sastry S (1998) Conflict resolution for air traffic management: a study in multiagent hybrid systems. IEEE Trans Autom Control 43(4):509–521
Tumer K (2005) Designing agent utilities for coordinated, scalable and robust multi-agent systems. In: Scerri P, Mailler R, Vincent R (eds) Challenges in the coordination of large scale multiagent systems. Springer, Berlin
Tumer K, Agogino A (2009) Multiagent learning for black box system reward functions. Adv Complex Syst 12:493–512
Tumer K, Wolpert D (eds) (2004a) Collectives and the design of complex systems. Springer, New York
Tumer K, Wolpert D (2004b) A survey of collectives. In: Collectives and the design of complex systems. Springer, Berlin, pp 1–42
Tumer, K, Agogino A, Wolpert D (2002) Learning sequences of actions in collectives of autonomous agents. In: AAMAS, Bologna, pp 378–385
Vamplew P, Dazeley R, Berry A, Issabekov R, Dekker E (2010) Empirical evaluation methods for multiobjective reinforcement learning algorithms. Mach Learn
Vasirani M, Ossowski S (2009) A market-inspired approach to reservation-based urban road traffic management. In: Proceedings of the 8th international conference on autonomous agents and multiagent systems
Veldhuizen DAV (1999) Multiobjective evolutionary algorithms: classifications analyses and new innovations. PhD thesis, Air Force Institute of Technology
Veldhuizen DAV, Lamont GB (2000) Multiobjective evolutionary algorithms: analyzing the state-of-the-art. Evol Comput 8(2):125–147
Watkins C, Dayan P (1992) Q-learning. Mach Learn 8(3/4):279–292
Wolpert DH, Tumer K (2001) Optimal payoff functions for members of collectives. Adv Complex Syst 4(2/3):265–279
Wolpert DH, Tumer K (2002) Collective intelligence, data routing and braess’ paradox. J Artif Intell Res 16:359–387
Wolpert DH, Wheeler K, Tumer K (2000) Collective intelligence for control of distributed dynamical systems. Europhys Lett 49(6)
Wolpert DH, Tumer K, Bandari E (2004) Improving search algorithms by using intelligent coordinates. Phys Rev E 69:017701
Wooldridge M (2002) An introduction to multiagent systems. Wiley, New York
Zitzler E, Laumanns M, Thiele L (2002) SPEA2: improving the strength pareto evolutionary algorithm for multiobjective optimization. In: Evolutionary methods for design, optimisation, and control, pp 19–26
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare no conflict of interest associated with this manuscript.
Additional information
Communicated by B. Xue and A. G. Chen.
This work was partially supported by the National Energy Technology Laboratory under Grant DE-FE0012302.
Rights and permissions
About this article
Cite this article
Yliniemi, L., Tumer, K. Multi-objective multiagent credit assignment in reinforcement learning and NSGA-II. Soft Comput 20, 3869–3887 (2016). https://doi.org/10.1007/s00500-016-2124-z
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00500-016-2124-z