Value Learning for Value-Aligned Route Choice Modeling via Inverse Reinforcement Learning

Holgado-Sánchez, Andrés; Bajo, Javier; Billhardt, Holger; Ossowski, Sascha; Arias, Joaquín

doi:10.1007/978-3-031-85463-7_3

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 15356))

Included in the following conference series:

International Workshop on Value Engineering in AI

81 Accesses

Abstract

Acquiring computational specifications of human values is key for building value-aligned and value-aware AI systems. We surveyed methods for learning ethical principles and value-aligned behaviour from human demonstrations or specifications of values. However, to our knowledge, no attempt has been proposed for learning these value specifications from demonstrations, while satisfying the possibly diverse value preferences of different agents. In this work we propose a novel value learning framework (agnostic of specific learning techniques) for (i) learning value groundings (specifications of given values) and (ii) identifying value systems (the particular value preferences of agents) from observed behavior in an application domain. We illustrate our framework in a route choice modeling scenario, using tailored inverse reinforcement learning algorithms. The results show that we can successfully learn value systems that are coherent with observed route choices. Our findings open up intriguing concerns and challenges for future research in the area.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
If values are initially unknown we can identify them by using Value Identification approaches [21, 34, 49].
2.
In many application domains it seems feasible to identify such behaviors or actions, e.g., by asking agents.
3.
In this use case, the condomain of alignment is set as $COD = \mathbb {R}^-$, where negative real values define the regret (or negative reward) w.r.t. each value or the value systems of the agents. That is, lower values of alignment indicate a higher demotion of values (or value preferences) and vice versa.
4.
The values of the three properties have been normalized by dividing the original value by the maximum value of all edges.
5.
In future works we will analyze other, more complex scenarios, where the groundings of values might be more complex.
6.
https://github.com/andresh26-uam/VAE-ValueLearning/tree/main/ValueLearningIRL.
7.
The 100 OD pairs come from real routes from [52], but the actual routes are not used here. We always use generated ones as described in Sect. 5.1.

References

Abbeel, P., Ng, A.Y.: Apprenticeship learning via inverse reinforcement learning. In: Proceedings of the Twenty-First International Conference on Machine Learning, p. 1. ICML 2004, Association for Computing Machinery, New York, NY, USA (2004). https://doi.org/10.1145/1015330.1015430
Anderson, M., Anderson, S.L.: Ethel: Toward a principled ethical eldercare system. In: AAAI Fall Symposium: AI in Eldercare: New Solutions to Old Problems, vol. 2 (2008)
Google Scholar
Anderson, M., Anderson, S.L.: Geneth: a general ethical dilemma analyzer. Paladyn 9, 337–357 (2 2018). https://doi.org/10.1515/PJBR-2018-0024/MACHINEREADABLECITATION/RIS, https://www.degruyter.com/document/doi/10.1515/pjbr-2018-0024/html
Anderson, M., Anderson, S.L., Armen, C.: An approach to computing ethics. IEEE Intell. Syst. 21, 56–63 (2006). https://doi.org/10.1109/MIS.2006.64
Bench-Capon, T., Atkinson, K., McBurney, P.: Using argumentation to model agent decision making in economic experiments. Auton. Agent. Multi-Agent Syst. 25, 183–208 (2012)
MATH Google Scholar
Christiano, P., Leike, J., Brown, T.B., Martic, M., Legg, S., Amodei, D.: Deep reinforcement learning from human preferences (2023)
Google Scholar
Fu, J., Luo, K., Levine, S.: Learning robust rewards with adverserial inverse reinforcement learning. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=rkHywl-A-
Fürnkranz, J., Hüllermeier, E., Cheng, W., Park, S.H.: Preference-based reinforcement learning: a formal framework and a policy iteration algorithm. Mach. Learn. 89, 123–156 (2012)
MathSciNet MATH Google Scholar
Gabriel, I.: Artificial intelligence, values, and alignment. Minds Mach. 30, 411–437 (2020). https://doi.org/10.1007/S11023-020-09539-2
Graham, J., et al.: Chapter two - moral foundations theory: the pragmatic validity of moral pluralism. In: Devine, P., Plant, A. (eds.) Advances in Experimental Social Psychology, vol. 47, pp. 55–130. Academic Press (2013). https://doi.org/10.1016/B978-0-12-407236-7.00002-4, https://www.sciencedirect.com/science/article/pii/B9780124072367000024
Hadfield-Menell, D., Russell, S.J., Abbeel, P., Dragan, A.: Cooperative inverse reinforcement learning. In: Lee, D., Sugiyama, M., Luxburg, U., Guyon, I., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 29. Curran Associates, Inc. (2016)
Google Scholar
Holgado-Sánchez, A., Billhardt, H., Ossowski, S., Fernández, A.: An ontology for value awareness engineering. In: Proceedings of the 16th International Conference on Agents and Artificial Intelligence - Volume 3: AWAI, pp. 1421–1428. INSTICC, SciTePress (2024). https://doi.org/10.5220/0012595500003636
Jaynes, E.T.: Information theory and statistical mechanics. Phys. Rev. 106, 620– 630 (1957). https://doi.org/10.1103/PhysRev.106.620, https://link.aps.org/doi/10.1103/PhysRev.106.620
Jiang, J., Lu, Z.: Learning fairness in multi-agent systems. In: Advances in Neural Information Processing Systems, vol. 32 (2019)
Google Scholar
Kalweit, G., Huegle, M., Werling, M., Boedecker, J.: Deep inverse q-learning with constraints. In: Advances in Neural Information Processing Systems (2020). https://arxiv.org/abs/2008.01712v1
Karanik, M., Billhardt, H., Fernáindez, A., Ossowski, S.: On the relevance of value system structure for automated value-aligned decision-making. In: Proceedings of the 39th ACM/SIGAPP Symposium on Applied Computing, pp. 679–686. Association for Computing Machinery (2024). https://doi.org/10.1145/3605098.3636057, https://doi.org/10.1145/3605098.3636057
Koch, T., Dugundji, E.: A review of methods to model route choice behavior of bicyclists: inverse reinforcement learning in spatial context and recursive logit. In: Proceedings of the 3rd ACM SIGSPATIAL International Workshop on GeoSpatial Simulation, pp. 30–37. GeoSim 2020, Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3423335.3428165, https://doi.org/10.1145/3423335.3428165
Lera-Leri, R., Bistaffa, F., Serramia, M., Lopez-Sanchez, M., Rodriguez-Aguilar, J.: Towards pluralistic value alignment: aggregating value systems through LP-regression. In: Proceedings of the 21st International Conference on Autonomous Agents and Multiagent Systems, pp. 780–788. AAMAS 2022, IFAAMAS (2022)
Google Scholar
Leri, R.X.L., et al.: Aggregating value systems for decision support. Knowl.Based Syst. 287, 111453 (2024). https://doi.org/10.1016/j.knosys.2024.111453, https://www.sciencedirect.com/science/article/pii/S0950705124000881
Levine, S., Popovic, Z., Koltun, V.: Nonlinear inverse reinforcement learning with gaussian processes. In: Shawe-Taylor, J., Zemel, R., Bartlett, P., Pereira, F., Weinberger, K. (eds.) Advances in Neural Information Processing Systems, vol. 24. Curran Associates, Inc. (2011). https://proceedings.neurips.cc/paper_files/paper/2011/file/c51ce410c124a10e0db5e4b97fc2af39-Paper.pdf
Liscio, E., van der Meer, M., Cavalcante Siebert, L., Mouter, N., Jonker, C., Murukannaiah, P.: Axies: identifying and evaluating context-specific values. In: Proceedings of the 20th International Conference on Autonomous Agents and MultiAgent Systems, pp. 799–808. AAMAS 2021, International Foundation for Autonomous Agents and Multiagent Systems (2021), 20th International Conference on Autonomous Agentsand Multiagent Systems, AAMAS 2021 ; Conference date: 03-05-2021 Through 07-05-2021
Google Scholar
Liscio, E., Dondera, A., Geadau, A., Jonker, C., Murukannaiah, P.: Cross-domain classification of moral values. In: Carpuat, M., de Marneffe, M.C., Meza Ruiz, I.V. (eds.) Findings of the Association for Computational Linguistics: NAACL 2022, pp. 2727–2745. Association for Computational Linguistics, Seattle, United States (2022). https://doi.org/10.18653/v1/2022.findings-naacl.209, https://aclanthology.org/2022.findings-naacl.209
Liscio, E., et al.: Inferring values via hybrid intelligence. In: HHAI 2023: Augmenting Human Intellect: Proceedings of the Second International Conference on Hybrid Human-Artificial Intelligence. Front. Artif. Intell. Appl. 368, 373–378 (2023). IOS Press BV (2023). https://doi.org/10.3233/FAIA230102
Liscio, E., et al.: Value inference in sociotechnical systems. In: Proceedings of the 2023 International Conference on Autonomous Agents and Multiagent Systems, pp. 1774–1780. AAMAS 2023, International Foundation for Autonomous Agents and Multiagent Systems, Richland, SC (2023). https://doi.org/10.5555/3545946.3598838
Liu, S., Jiang, H.: Personalized route recommendation for ride-hailing with deep inverse reinforcement learning and real-time traffic conditions. Transp. Res. Part E Logist. Transp. Rev. 164, 102780 (2022). https://doi.org/10.1016/j.tre.2022.102780, https://www.sciencedirect.com/science/article/pii/S1366554522001715
Liu, S., Jiang, H., Chen, S., Ye, J., He, R., Sun, Z.: Integrating dijkstraâ€™s algorithm into deep inverse reinforcement learning for food delivery route planning. Transp. Res. Part E Logist. Transp. Rev. 142, 102070 (2020)
Google Scholar
Liu, S., Araujo, M., Brunskill, E., Rossetti, R., Barros, J., Krishnan, R.: Understanding sequential decisions via inverse reinforcement learning. In: 2013 IEEE 14th International Conference on Mobile Data Management, vol. 1, pp. 177–186 (2013). https://doi.org/10.1109/MDM.2013.28
Lovreglio, R., Fonzone, A., dell’Olio, L.: A mixed logit model for predicting exit choice during building evacuations. Transp. Res. Part A Policy Pract. 92, 59–75 (2016). https://doi.org/10.1016/j.tra.2016.06.018
Article Google Scholar
Montes, N., Osman, N., Sierra, C., Slavkovik, M.: Value engineering for autonomous agents. CoRR abs/2302.08759 (2023). https://doi.org/10.48550/arXiv.2302.08759
Montes, N., Sierra, C.: Synthesis and properties of optimally value-aligned normative systems. J. Artif. Intell. Res. 74, 1739–1774 (2022). https://doi.org/10.1613/jair.1.13487
Article MathSciNet MATH Google Scholar
Ng, A.Y., Russell, S.J.: Algorithms for inverse reinforcement learning. In: Proceedings of the Seventeenth International Conference on Machine Learning, pp. 663–670. ICML 2000, Morgan Kaufmann Publishers Inc., San Francisco, CA, USA (2000)
Google Scholar
Osman, N., d’Inverno, M.: A computational framework of human values. In: Proceedings of the 23rd International Conference on Autonomous Agents and Multiagent Systems, pp. 1531–1539. AAMAS 2024, International Foundation for Autonomous Agents and Multiagent Systems, Richland, SC (2024)
Google Scholar
Prato, C.G.: Route choice modeling: past, present and future research directions. J. Choice Modell. 2(1), 65–100 (2009). https://doi.org/10.1016/S1755-5345(13)70005-8, https://www.sciencedirect.com/science/article/pii/S1755534513700058
Qiu, L., et al.: ValueNet: a new dataset for human value driven dialogue system. In: Proceedings of the AAAI Conference on Artificial Intelligence , vol. 36, no. 10, pp. 11183–11191 (2022). https://doi.org/10.1609/aaai.v36i10.21368, https://ojs.aaai.org/index.php/AAAI/article/view/21368
Ramos, G.D.M., Daamen, W., Hoogendoorn, S.: Modelling travellers’ heterogeneous route choice behaviour as prospect maximizers. J. Choice Modell. 6, 17–33 (2013). https://doi.org/10.1016/j.jocm.2013.04.002
Article Google Scholar
Rizzi, L., Ortúzar, J.: Stated preference in the valuation of interurban road safety. Accid. Anal. Prev. 35, 9–22 (2003). https://doi.org/10.1016/S0001-4575(01)00082-3
Article MATH Google Scholar
Rodriguez-Soto, M., Serramia, M., Lopez-Sanchez, M., Rodriguez-Aguilar, J.A.: Instilling moral value alignment by means of multi-objective reinforcement learning. Ethics Inf. Technol. 24(1), 1–17 (2022). https://doi.org/10.1007/s10676-022-09635-0
Article MATH Google Scholar
Russell, S.: Artificial Intelligence and the Problem of Control, pp. 19–24. Springer International Publishing, Cham (2022). https://doi.org/10.1007/978-3-030-86144-5_3
Sadeghi-Niaraki, A., Kim, K.: Ontology based personalized route planning system using a multi-criteria decision making approach. Expert Syst. Appl. 36, 2250–2259 (2009). https://doi.org/10.1016/j.eswa.2007.12.053
Article MATH Google Scholar
Scheiner, J., Holz-Rau, C.: Travel mode choice: affected by objective or subjective determinants? Transportation 34, 487–511 (2007). https://doi.org/10.1007/s11116-007-9112-1
Article Google Scholar
Schwartz, S.H.: Universals in the content and structure of values: Theoretical advances and empirical tests in 20 countries. In: Advances in Experimental Social Psychology, vol. 25, pp. 1–65. Elsevier (1992)
Google Scholar
Schwartz, S.H.: An overview of the schwartz theory of basic values. Online Readings Psychol. Cult. 2(1), 11 (2012)
MATH Google Scholar
Serramia, M., Lopez-Sanchez, M., Rodriguez-Aguilar, J.A.: A qualitative approach to composing value-aligned norm systems. In: Proceedings of the 19th International Conference on Autonomous Agents and MultiAgent Systems, pp. 1233–1241. International Foundation for Autonomous Agents and Multiagent Systems (2020)
Google Scholar
Serramia, M., et al.: Moral values in norm decision making. IFAAMAS 9 (2018). www.ifaamas.org
Serramia, M., et al.: Encoding ethics to compute value-aligned norms. Minds Mach., 1–30 (2023). https://doi.org/10.1007/s11023-023-09649-7
Soares, N.: The value learning problem. Artif. Intell. Saf. Secur. (2018). https://api.semanticscholar.org/CorpusID:13096553
Veronese, C., Meli, D., Bistaffa, F., Rodríguez-Soto, M., Farinelli, A., Rodríguez-Aguilar, J.A.: Inductive logic programming for transparent alignment with multiple moral values. In: CEUR WORKSHOP PROCEEDINGS, vol. 7, pp. 84–88 (2024). https://doi.org/10.2/JQUERY.MIN.JS, https://iris.univr.it/handle/11562/1120547
Google Scholar
Weidinger, L., et al.: Using the veil of ignorance to align AI systems with principles of justice. Proc. Nat. Acad. Sci. 120(18), e2213709120 (2023). https://doi.org/10.1073/pnas.2213709120, https://www.pnas.org/doi/abs/10.1073/pnas.2213709120
Wilson, S.R., Shen, Y., Mihalcea, R.: Building and validating hierarchical lexicons with a case study on personal values. In: Staab, S., Koltsova, O., Ignatov, D.I. (eds.) Social Informatics, pp. 455–470. Springer International Publishing, Cham (2018). https://doi.org/10.1007/978-3-030-01129-1_28
Chapter Google Scholar
Wulfmeier, M., Ondrúška, P., Ondrúška, O., Posner, I.: Maximum entropy deep inverse reinforcement learning. arXiv preprint arXiv:1507.04888 (2015). https://doi.org/10.48550/arXiv.1507.04888, https://arxiv.org/abs/1507.04888v3
Yang, Y., Yao, E., Yang, Z., Zhang, R.: Modeling the charging and route choice behavior of BEV drivers. Transp. Res. Part C: Emerg. Technol. 65, 190–204 (2016). https://doi.org/10.1016/j.trc.2015.09.008
Article MATH Google Scholar
Zhao, Z., Liang, Y.: A deep inverse reinforcement learning approach to route choice modeling with context-dependent rewards. Transp. Res. Part C: Emerg. Technol. 149, 104079 (2023). https://doi.org/10.1016/j.trc.2023.104079, https://www.sciencedirect.com/science/article/pii/S0968090X23000682
Zhong, M., Kim, J., Zheng, Z.: Estimating link flows in road networks with synthetic trajectory data generation: inverse reinforcement learning approach. IEEE Open J. Intell. Transp. Syst. 4, 14–29 (2023). https://doi.org/10.1109/OJITS.2022.3233904
Article MATH Google Scholar
Ziebart, B.D.: Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Ph.D. thesis, CMU School of Computer Science, USA (2010). https://doi.org/10.1184/R1/6720692.v1
Ziebart, B.D., Maas, A., Bagnell, J.A., Dey, A.K.: Maximum entropy inverse reinforcement learning. In: Proceedings of the 23rd National Conference on Artificial Intelligence - Volume 3, pp. 1433–1438. AAAI Press (2008)
Google Scholar

Download references

Acknowledgments

This work has been supported by grant VAE: TED2021-131295B-C33 funded by MCIN/AEI/10.13039/501100011033 and by the “European Union NextGenerationEU/PRTR”’, by grant COSASS: PID2021-123673OB-C32 funded by MCIN/AEI/10.13039/501100011033 and by “ERDF A way of making Europe”, and by the AGROBOTS Project of Universidad Rey Juan Carlos funded by the Community of Madrid, Spain.

Author information

Authors and Affiliations

CETINIA, Universidad Rey Juan Carlos, Madrid, Spain
Andrés Holgado-Sánchez, Holger Billhardt, Sascha Ossowski & Joaquín Arias
Departamento de Inteligencia Artificial, ETSI Informáticos, Universidad Politécnica de Madrid, Campus de Montegancedo, Madrid, 28660, Spain
Javier Bajo

Authors

Andrés Holgado-Sánchez
View author publications
You can also search for this author in PubMed Google Scholar
Javier Bajo
View author publications
You can also search for this author in PubMed Google Scholar
Holger Billhardt
View author publications
You can also search for this author in PubMed Google Scholar
Sascha Ossowski
View author publications
You can also search for this author in PubMed Google Scholar
Joaquín Arias
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Andrés Holgado-Sánchez .

Editor information

Editors and Affiliations

Institute (IIIA-CSIC), Artificial Intelligence Research, Bellaterra, Spain
Nardine Osman
Studio Stelluti, Brussels, Belgium
Luc Steels

Ethics declarations

Disclosure of Interests

The authors have no competing interests to declare that are relevant to the content of this article.

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Holgado-Sánchez, A., Bajo, J., Billhardt, H., Ossowski, S., Arias, J. (2025). Value Learning for Value-Aligned Route Choice Modeling via Inverse Reinforcement Learning. In: Osman, N., Steels, L. (eds) Value Engineering in Artificial Intelligence. VALE 2024. Lecture Notes in Computer Science(), vol 15356. Springer, Cham. https://doi.org/10.1007/978-3-031-85463-7_3

Download citation

DOI: https://doi.org/10.1007/978-3-031-85463-7_3
Published: 21 March 2025
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-85462-0
Online ISBN: 978-3-031-85463-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Value Learning for Value-Aligned Route Choice Modeling via Inverse Reinforcement Learning