Abstract
Acquiring computational specifications of human values is key for building value-aligned and value-aware AI systems. We surveyed methods for learning ethical principles and value-aligned behaviour from human demonstrations or specifications of values. However, to our knowledge, no attempt has been proposed for learning these value specifications from demonstrations, while satisfying the possibly diverse value preferences of different agents. In this work we propose a novel value learning framework (agnostic of specific learning techniques) for (i) learning value groundings (specifications of given values) and (ii) identifying value systems (the particular value preferences of agents) from observed behavior in an application domain. We illustrate our framework in a route choice modeling scenario, using tailored inverse reinforcement learning algorithms. The results show that we can successfully learn value systems that are coherent with observed route choices. Our findings open up intriguing concerns and challenges for future research in the area.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
- 2.
In many application domains it seems feasible to identify such behaviors or actions, e.g., by asking agents.
- 3.
In this use case, the condomain of alignment is set as \(COD = \mathbb {R}^-\), where negative real values define the regret (or negative reward) w.r.t. each value or the value systems of the agents. That is, lower values of alignment indicate a higher demotion of values (or value preferences) and vice versa.
- 4.
The values of the three properties have been normalized by dividing the original value by the maximum value of all edges.
- 5.
In future works we will analyze other, more complex scenarios, where the groundings of values might be more complex.
- 6.
- 7.
References
Abbeel, P., Ng, A.Y.: Apprenticeship learning via inverse reinforcement learning. In: Proceedings of the Twenty-First International Conference on Machine Learning, p. 1. ICML 2004, Association for Computing Machinery, New York, NY, USA (2004). https://doi.org/10.1145/1015330.1015430
Anderson, M., Anderson, S.L.: Ethel: Toward a principled ethical eldercare system. In: AAAI Fall Symposium: AI in Eldercare: New Solutions to Old Problems, vol. 2 (2008)
Anderson, M., Anderson, S.L.: Geneth: a general ethical dilemma analyzer. Paladyn 9, 337–357 (2 2018). https://doi.org/10.1515/PJBR-2018-0024/MACHINEREADABLECITATION/RIS, https://www.degruyter.com/document/doi/10.1515/pjbr-2018-0024/html
Anderson, M., Anderson, S.L., Armen, C.: An approach to computing ethics. IEEE Intell. Syst. 21, 56–63 (2006). https://doi.org/10.1109/MIS.2006.64
Bench-Capon, T., Atkinson, K., McBurney, P.: Using argumentation to model agent decision making in economic experiments. Auton. Agent. Multi-Agent Syst. 25, 183–208 (2012)
Christiano, P., Leike, J., Brown, T.B., Martic, M., Legg, S., Amodei, D.: Deep reinforcement learning from human preferences (2023)
Fu, J., Luo, K., Levine, S.: Learning robust rewards with adverserial inverse reinforcement learning. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=rkHywl-A-
Fürnkranz, J., Hüllermeier, E., Cheng, W., Park, S.H.: Preference-based reinforcement learning: a formal framework and a policy iteration algorithm. Mach. Learn. 89, 123–156 (2012)
Gabriel, I.: Artificial intelligence, values, and alignment. Minds Mach. 30, 411–437 (2020). https://doi.org/10.1007/S11023-020-09539-2
Graham, J., et al.: Chapter two - moral foundations theory: the pragmatic validity of moral pluralism. In: Devine, P., Plant, A. (eds.) Advances in Experimental Social Psychology, vol. 47, pp. 55–130. Academic Press (2013). https://doi.org/10.1016/B978-0-12-407236-7.00002-4, https://www.sciencedirect.com/science/article/pii/B9780124072367000024
Hadfield-Menell, D., Russell, S.J., Abbeel, P., Dragan, A.: Cooperative inverse reinforcement learning. In: Lee, D., Sugiyama, M., Luxburg, U., Guyon, I., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 29. Curran Associates, Inc. (2016)
Holgado-Sánchez, A., Billhardt, H., Ossowski, S., Fernández, A.: An ontology for value awareness engineering. In: Proceedings of the 16th International Conference on Agents and Artificial Intelligence - Volume 3: AWAI, pp. 1421–1428. INSTICC, SciTePress (2024). https://doi.org/10.5220/0012595500003636
Jaynes, E.T.: Information theory and statistical mechanics. Phys. Rev. 106, 620– 630 (1957). https://doi.org/10.1103/PhysRev.106.620, https://link.aps.org/doi/10.1103/PhysRev.106.620
Jiang, J., Lu, Z.: Learning fairness in multi-agent systems. In: Advances in Neural Information Processing Systems, vol. 32 (2019)
Kalweit, G., Huegle, M., Werling, M., Boedecker, J.: Deep inverse q-learning with constraints. In: Advances in Neural Information Processing Systems (2020). https://arxiv.org/abs/2008.01712v1
Karanik, M., Billhardt, H., Fernáindez, A., Ossowski, S.: On the relevance of value system structure for automated value-aligned decision-making. In: Proceedings of the 39th ACM/SIGAPP Symposium on Applied Computing, pp. 679–686. Association for Computing Machinery (2024). https://doi.org/10.1145/3605098.3636057, https://doi.org/10.1145/3605098.3636057
Koch, T., Dugundji, E.: A review of methods to model route choice behavior of bicyclists: inverse reinforcement learning in spatial context and recursive logit. In: Proceedings of the 3rd ACM SIGSPATIAL International Workshop on GeoSpatial Simulation, pp. 30–37. GeoSim 2020, Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3423335.3428165, https://doi.org/10.1145/3423335.3428165
Lera-Leri, R., Bistaffa, F., Serramia, M., Lopez-Sanchez, M., Rodriguez-Aguilar, J.: Towards pluralistic value alignment: aggregating value systems through LP-regression. In: Proceedings of the 21st International Conference on Autonomous Agents and Multiagent Systems, pp. 780–788. AAMAS 2022, IFAAMAS (2022)
Leri, R.X.L., et al.: Aggregating value systems for decision support. Knowl.Based Syst. 287, 111453 (2024). https://doi.org/10.1016/j.knosys.2024.111453, https://www.sciencedirect.com/science/article/pii/S0950705124000881
Levine, S., Popovic, Z., Koltun, V.: Nonlinear inverse reinforcement learning with gaussian processes. In: Shawe-Taylor, J., Zemel, R., Bartlett, P., Pereira, F., Weinberger, K. (eds.) Advances in Neural Information Processing Systems, vol. 24. Curran Associates, Inc. (2011). https://proceedings.neurips.cc/paper_files/paper/2011/file/c51ce410c124a10e0db5e4b97fc2af39-Paper.pdf
Liscio, E., van der Meer, M., Cavalcante Siebert, L., Mouter, N., Jonker, C., Murukannaiah, P.: Axies: identifying and evaluating context-specific values. In: Proceedings of the 20th International Conference on Autonomous Agents and MultiAgent Systems, pp. 799–808. AAMAS 2021, International Foundation for Autonomous Agents and Multiagent Systems (2021), 20th International Conference on Autonomous Agentsand Multiagent Systems, AAMAS 2021 ; Conference date: 03-05-2021 Through 07-05-2021
Liscio, E., Dondera, A., Geadau, A., Jonker, C., Murukannaiah, P.: Cross-domain classification of moral values. In: Carpuat, M., de Marneffe, M.C., Meza Ruiz, I.V. (eds.) Findings of the Association for Computational Linguistics: NAACL 2022, pp. 2727–2745. Association for Computational Linguistics, Seattle, United States (2022). https://doi.org/10.18653/v1/2022.findings-naacl.209, https://aclanthology.org/2022.findings-naacl.209
Liscio, E., et al.: Inferring values via hybrid intelligence. In: HHAI 2023: Augmenting Human Intellect: Proceedings of the Second International Conference on Hybrid Human-Artificial Intelligence. Front. Artif. Intell. Appl. 368, 373–378 (2023). IOS Press BV (2023). https://doi.org/10.3233/FAIA230102
Liscio, E., et al.: Value inference in sociotechnical systems. In: Proceedings of the 2023 International Conference on Autonomous Agents and Multiagent Systems, pp. 1774–1780. AAMAS 2023, International Foundation for Autonomous Agents and Multiagent Systems, Richland, SC (2023). https://doi.org/10.5555/3545946.3598838
Liu, S., Jiang, H.: Personalized route recommendation for ride-hailing with deep inverse reinforcement learning and real-time traffic conditions. Transp. Res. Part E Logist. Transp. Rev. 164, 102780 (2022). https://doi.org/10.1016/j.tre.2022.102780, https://www.sciencedirect.com/science/article/pii/S1366554522001715
Liu, S., Jiang, H., Chen, S., Ye, J., He, R., Sun, Z.: Integrating dijkstra’s algorithm into deep inverse reinforcement learning for food delivery route planning. Transp. Res. Part E Logist. Transp. Rev. 142, 102070 (2020)
Liu, S., Araujo, M., Brunskill, E., Rossetti, R., Barros, J., Krishnan, R.: Understanding sequential decisions via inverse reinforcement learning. In: 2013 IEEE 14th International Conference on Mobile Data Management, vol. 1, pp. 177–186 (2013). https://doi.org/10.1109/MDM.2013.28
Lovreglio, R., Fonzone, A., dell’Olio, L.: A mixed logit model for predicting exit choice during building evacuations. Transp. Res. Part A Policy Pract. 92, 59–75 (2016). https://doi.org/10.1016/j.tra.2016.06.018
Montes, N., Osman, N., Sierra, C., Slavkovik, M.: Value engineering for autonomous agents. CoRR abs/2302.08759 (2023). https://doi.org/10.48550/arXiv.2302.08759
Montes, N., Sierra, C.: Synthesis and properties of optimally value-aligned normative systems. J. Artif. Intell. Res. 74, 1739–1774 (2022). https://doi.org/10.1613/jair.1.13487
Ng, A.Y., Russell, S.J.: Algorithms for inverse reinforcement learning. In: Proceedings of the Seventeenth International Conference on Machine Learning, pp. 663–670. ICML 2000, Morgan Kaufmann Publishers Inc., San Francisco, CA, USA (2000)
Osman, N., d’Inverno, M.: A computational framework of human values. In: Proceedings of the 23rd International Conference on Autonomous Agents and Multiagent Systems, pp. 1531–1539. AAMAS 2024, International Foundation for Autonomous Agents and Multiagent Systems, Richland, SC (2024)
Prato, C.G.: Route choice modeling: past, present and future research directions. J. Choice Modell. 2(1), 65–100 (2009). https://doi.org/10.1016/S1755-5345(13)70005-8, https://www.sciencedirect.com/science/article/pii/S1755534513700058
Qiu, L., et al.: ValueNet: a new dataset for human value driven dialogue system. In: Proceedings of the AAAI Conference on Artificial Intelligence , vol. 36, no. 10, pp. 11183–11191 (2022). https://doi.org/10.1609/aaai.v36i10.21368, https://ojs.aaai.org/index.php/AAAI/article/view/21368
Ramos, G.D.M., Daamen, W., Hoogendoorn, S.: Modelling travellers’ heterogeneous route choice behaviour as prospect maximizers. J. Choice Modell. 6, 17–33 (2013). https://doi.org/10.1016/j.jocm.2013.04.002
Rizzi, L., Ortúzar, J.: Stated preference in the valuation of interurban road safety. Accid. Anal. Prev. 35, 9–22 (2003). https://doi.org/10.1016/S0001-4575(01)00082-3
Rodriguez-Soto, M., Serramia, M., Lopez-Sanchez, M., Rodriguez-Aguilar, J.A.: Instilling moral value alignment by means of multi-objective reinforcement learning. Ethics Inf. Technol. 24(1), 1–17 (2022). https://doi.org/10.1007/s10676-022-09635-0
Russell, S.: Artificial Intelligence and the Problem of Control, pp. 19–24. Springer International Publishing, Cham (2022). https://doi.org/10.1007/978-3-030-86144-5_3
Sadeghi-Niaraki, A., Kim, K.: Ontology based personalized route planning system using a multi-criteria decision making approach. Expert Syst. Appl. 36, 2250–2259 (2009). https://doi.org/10.1016/j.eswa.2007.12.053
Scheiner, J., Holz-Rau, C.: Travel mode choice: affected by objective or subjective determinants? Transportation 34, 487–511 (2007). https://doi.org/10.1007/s11116-007-9112-1
Schwartz, S.H.: Universals in the content and structure of values: Theoretical advances and empirical tests in 20 countries. In: Advances in Experimental Social Psychology, vol. 25, pp. 1–65. Elsevier (1992)
Schwartz, S.H.: An overview of the schwartz theory of basic values. Online Readings Psychol. Cult. 2(1), 11 (2012)
Serramia, M., Lopez-Sanchez, M., Rodriguez-Aguilar, J.A.: A qualitative approach to composing value-aligned norm systems. In: Proceedings of the 19th International Conference on Autonomous Agents and MultiAgent Systems, pp. 1233–1241. International Foundation for Autonomous Agents and Multiagent Systems (2020)
Serramia, M., et al.: Moral values in norm decision making. IFAAMAS 9 (2018). www.ifaamas.org
Serramia, M., et al.: Encoding ethics to compute value-aligned norms. Minds Mach., 1–30 (2023). https://doi.org/10.1007/s11023-023-09649-7
Soares, N.: The value learning problem. Artif. Intell. Saf. Secur. (2018). https://api.semanticscholar.org/CorpusID:13096553
Veronese, C., Meli, D., Bistaffa, F., Rodríguez-Soto, M., Farinelli, A., Rodríguez-Aguilar, J.A.: Inductive logic programming for transparent alignment with multiple moral values. In: CEUR WORKSHOP PROCEEDINGS, vol. 7, pp. 84–88 (2024). https://doi.org/10.2/JQUERY.MIN.JS, https://iris.univr.it/handle/11562/1120547
Weidinger, L., et al.: Using the veil of ignorance to align AI systems with principles of justice. Proc. Nat. Acad. Sci. 120(18), e2213709120 (2023). https://doi.org/10.1073/pnas.2213709120, https://www.pnas.org/doi/abs/10.1073/pnas.2213709120
Wilson, S.R., Shen, Y., Mihalcea, R.: Building and validating hierarchical lexicons with a case study on personal values. In: Staab, S., Koltsova, O., Ignatov, D.I. (eds.) Social Informatics, pp. 455–470. Springer International Publishing, Cham (2018). https://doi.org/10.1007/978-3-030-01129-1_28
Wulfmeier, M., Ondrúška, P., Ondrúška, O., Posner, I.: Maximum entropy deep inverse reinforcement learning. arXiv preprint arXiv:1507.04888 (2015). https://doi.org/10.48550/arXiv.1507.04888, https://arxiv.org/abs/1507.04888v3
Yang, Y., Yao, E., Yang, Z., Zhang, R.: Modeling the charging and route choice behavior of BEV drivers. Transp. Res. Part C: Emerg. Technol. 65, 190–204 (2016). https://doi.org/10.1016/j.trc.2015.09.008
Zhao, Z., Liang, Y.: A deep inverse reinforcement learning approach to route choice modeling with context-dependent rewards. Transp. Res. Part C: Emerg. Technol. 149, 104079 (2023). https://doi.org/10.1016/j.trc.2023.104079, https://www.sciencedirect.com/science/article/pii/S0968090X23000682
Zhong, M., Kim, J., Zheng, Z.: Estimating link flows in road networks with synthetic trajectory data generation: inverse reinforcement learning approach. IEEE Open J. Intell. Transp. Syst. 4, 14–29 (2023). https://doi.org/10.1109/OJITS.2022.3233904
Ziebart, B.D.: Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Ph.D. thesis, CMU School of Computer Science, USA (2010). https://doi.org/10.1184/R1/6720692.v1
Ziebart, B.D., Maas, A., Bagnell, J.A., Dey, A.K.: Maximum entropy inverse reinforcement learning. In: Proceedings of the 23rd National Conference on Artificial Intelligence - Volume 3, pp. 1433–1438. AAAI Press (2008)
Acknowledgments
This work has been supported by grant VAE: TED2021-131295B-C33 funded by MCIN/AEI/10.13039/501100011033 and by the “European Union NextGenerationEU/PRTR”’, by grant COSASS: PID2021-123673OB-C32 funded by MCIN/AEI/10.13039/501100011033 and by “ERDF A way of making Europe”, and by the AGROBOTS Project of Universidad Rey Juan Carlos funded by the Community of Madrid, Spain.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Ethics declarations
Disclosure of Interests
The authors have no competing interests to declare that are relevant to the content of this article.
Rights and permissions
Copyright information
© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Holgado-Sánchez, A., Bajo, J., Billhardt, H., Ossowski, S., Arias, J. (2025). Value Learning for Value-Aligned Route Choice Modeling via Inverse Reinforcement Learning. In: Osman, N., Steels, L. (eds) Value Engineering in Artificial Intelligence. VALE 2024. Lecture Notes in Computer Science(), vol 15356. Springer, Cham. https://doi.org/10.1007/978-3-031-85463-7_3
Download citation
DOI: https://doi.org/10.1007/978-3-031-85463-7_3
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-85462-0
Online ISBN: 978-3-031-85463-7
eBook Packages: Computer ScienceComputer Science (R0)