Skip to main content

Value Learning for Value-Aligned Route Choice Modeling via Inverse Reinforcement Learning

  • Conference paper
  • First Online:
Value Engineering in Artificial Intelligence (VALE 2024)

Abstract

Acquiring computational specifications of human values is key for building value-aligned and value-aware AI systems. We surveyed methods for learning ethical principles and value-aligned behaviour from human demonstrations or specifications of values. However, to our knowledge, no attempt has been proposed for learning these value specifications from demonstrations, while satisfying the possibly diverse value preferences of different agents. In this work we propose a novel value learning framework (agnostic of specific learning techniques) for (i) learning value groundings (specifications of given values) and (ii) identifying value systems (the particular value preferences of agents) from observed behavior in an application domain. We illustrate our framework in a route choice modeling scenario, using tailored inverse reinforcement learning algorithms. The results show that we can successfully learn value systems that are coherent with observed route choices. Our findings open up intriguing concerns and challenges for future research in the area.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    If values are initially unknown we can identify them by using Value Identification approaches [21, 34, 49].

  2. 2.

    In many application domains it seems feasible to identify such behaviors or actions, e.g., by asking agents.

  3. 3.

    In this use case, the condomain of alignment is set as \(COD = \mathbb {R}^-\), where negative real values define the regret (or negative reward) w.r.t. each value or the value systems of the agents. That is, lower values of alignment indicate a higher demotion of values (or value preferences) and vice versa.

  4. 4.

    The values of the three properties have been normalized by dividing the original value by the maximum value of all edges.

  5. 5.

    In future works we will analyze other, more complex scenarios, where the groundings of values might be more complex.

  6. 6.

    https://github.com/andresh26-uam/VAE-ValueLearning/tree/main/ValueLearningIRL.

  7. 7.

    The 100 OD pairs come from real routes from [52], but the actual routes are not used here. We always use generated ones as described in Sect. 5.1.

References

  1. Abbeel, P., Ng, A.Y.: Apprenticeship learning via inverse reinforcement learning. In: Proceedings of the Twenty-First International Conference on Machine Learning, p. 1. ICML 2004, Association for Computing Machinery, New York, NY, USA (2004). https://doi.org/10.1145/1015330.1015430

  2. Anderson, M., Anderson, S.L.: Ethel: Toward a principled ethical eldercare system. In: AAAI Fall Symposium: AI in Eldercare: New Solutions to Old Problems, vol. 2 (2008)

    Google Scholar 

  3. Anderson, M., Anderson, S.L.: Geneth: a general ethical dilemma analyzer. Paladyn 9, 337–357 (2 2018). https://doi.org/10.1515/PJBR-2018-0024/MACHINEREADABLECITATION/RIS, https://www.degruyter.com/document/doi/10.1515/pjbr-2018-0024/html

  4. Anderson, M., Anderson, S.L., Armen, C.: An approach to computing ethics. IEEE Intell. Syst. 21, 56–63 (2006). https://doi.org/10.1109/MIS.2006.64

  5. Bench-Capon, T., Atkinson, K., McBurney, P.: Using argumentation to model agent decision making in economic experiments. Auton. Agent. Multi-Agent Syst. 25, 183–208 (2012)

    MATH  Google Scholar 

  6. Christiano, P., Leike, J., Brown, T.B., Martic, M., Legg, S., Amodei, D.: Deep reinforcement learning from human preferences (2023)

    Google Scholar 

  7. Fu, J., Luo, K., Levine, S.: Learning robust rewards with adverserial inverse reinforcement learning. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=rkHywl-A-

  8. Fürnkranz, J., Hüllermeier, E., Cheng, W., Park, S.H.: Preference-based reinforcement learning: a formal framework and a policy iteration algorithm. Mach. Learn. 89, 123–156 (2012)

    MathSciNet  MATH  Google Scholar 

  9. Gabriel, I.: Artificial intelligence, values, and alignment. Minds Mach. 30, 411–437 (2020). https://doi.org/10.1007/S11023-020-09539-2

  10. Graham, J., et al.: Chapter two - moral foundations theory: the pragmatic validity of moral pluralism. In: Devine, P., Plant, A. (eds.) Advances in Experimental Social Psychology, vol. 47, pp. 55–130. Academic Press (2013). https://doi.org/10.1016/B978-0-12-407236-7.00002-4, https://www.sciencedirect.com/science/article/pii/B9780124072367000024

  11. Hadfield-Menell, D., Russell, S.J., Abbeel, P., Dragan, A.: Cooperative inverse reinforcement learning. In: Lee, D., Sugiyama, M., Luxburg, U., Guyon, I., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 29. Curran Associates, Inc. (2016)

    Google Scholar 

  12. Holgado-Sánchez, A., Billhardt, H., Ossowski, S., Fernández, A.: An ontology for value awareness engineering. In: Proceedings of the 16th International Conference on Agents and Artificial Intelligence - Volume 3: AWAI, pp. 1421–1428. INSTICC, SciTePress (2024). https://doi.org/10.5220/0012595500003636

  13. Jaynes, E.T.: Information theory and statistical mechanics. Phys. Rev. 106, 620– 630 (1957). https://doi.org/10.1103/PhysRev.106.620, https://link.aps.org/doi/10.1103/PhysRev.106.620

  14. Jiang, J., Lu, Z.: Learning fairness in multi-agent systems. In: Advances in Neural Information Processing Systems, vol. 32 (2019)

    Google Scholar 

  15. Kalweit, G., Huegle, M., Werling, M., Boedecker, J.: Deep inverse q-learning with constraints. In: Advances in Neural Information Processing Systems (2020). https://arxiv.org/abs/2008.01712v1

  16. Karanik, M., Billhardt, H., Fernáindez, A., Ossowski, S.: On the relevance of value system structure for automated value-aligned decision-making. In: Proceedings of the 39th ACM/SIGAPP Symposium on Applied Computing, pp. 679–686. Association for Computing Machinery (2024). https://doi.org/10.1145/3605098.3636057, https://doi.org/10.1145/3605098.3636057

  17. Koch, T., Dugundji, E.: A review of methods to model route choice behavior of bicyclists: inverse reinforcement learning in spatial context and recursive logit. In: Proceedings of the 3rd ACM SIGSPATIAL International Workshop on GeoSpatial Simulation, pp. 30–37. GeoSim 2020, Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3423335.3428165, https://doi.org/10.1145/3423335.3428165

  18. Lera-Leri, R., Bistaffa, F., Serramia, M., Lopez-Sanchez, M., Rodriguez-Aguilar, J.: Towards pluralistic value alignment: aggregating value systems through LP-regression. In: Proceedings of the 21st International Conference on Autonomous Agents and Multiagent Systems, pp. 780–788. AAMAS 2022, IFAAMAS (2022)

    Google Scholar 

  19. Leri, R.X.L., et al.: Aggregating value systems for decision support. Knowl.Based Syst. 287, 111453 (2024). https://doi.org/10.1016/j.knosys.2024.111453, https://www.sciencedirect.com/science/article/pii/S0950705124000881

  20. Levine, S., Popovic, Z., Koltun, V.: Nonlinear inverse reinforcement learning with gaussian processes. In: Shawe-Taylor, J., Zemel, R., Bartlett, P., Pereira, F., Weinberger, K. (eds.) Advances in Neural Information Processing Systems, vol. 24. Curran Associates, Inc. (2011). https://proceedings.neurips.cc/paper_files/paper/2011/file/c51ce410c124a10e0db5e4b97fc2af39-Paper.pdf

  21. Liscio, E., van der Meer, M., Cavalcante Siebert, L., Mouter, N., Jonker, C., Murukannaiah, P.: Axies: identifying and evaluating context-specific values. In: Proceedings of the 20th International Conference on Autonomous Agents and MultiAgent Systems, pp. 799–808. AAMAS 2021, International Foundation for Autonomous Agents and Multiagent Systems (2021), 20th International Conference on Autonomous Agentsand Multiagent Systems, AAMAS 2021 ; Conference date: 03-05-2021 Through 07-05-2021

    Google Scholar 

  22. Liscio, E., Dondera, A., Geadau, A., Jonker, C., Murukannaiah, P.: Cross-domain classification of moral values. In: Carpuat, M., de Marneffe, M.C., Meza Ruiz, I.V. (eds.) Findings of the Association for Computational Linguistics: NAACL 2022, pp. 2727–2745. Association for Computational Linguistics, Seattle, United States (2022). https://doi.org/10.18653/v1/2022.findings-naacl.209, https://aclanthology.org/2022.findings-naacl.209

  23. Liscio, E., et al.: Inferring values via hybrid intelligence. In: HHAI 2023: Augmenting Human Intellect: Proceedings of the Second International Conference on Hybrid Human-Artificial Intelligence. Front. Artif. Intell. Appl. 368, 373–378 (2023). IOS Press BV (2023). https://doi.org/10.3233/FAIA230102

  24. Liscio, E., et al.: Value inference in sociotechnical systems. In: Proceedings of the 2023 International Conference on Autonomous Agents and Multiagent Systems, pp. 1774–1780. AAMAS 2023, International Foundation for Autonomous Agents and Multiagent Systems, Richland, SC (2023). https://doi.org/10.5555/3545946.3598838

  25. Liu, S., Jiang, H.: Personalized route recommendation for ride-hailing with deep inverse reinforcement learning and real-time traffic conditions. Transp. Res. Part E Logist. Transp. Rev. 164, 102780 (2022). https://doi.org/10.1016/j.tre.2022.102780, https://www.sciencedirect.com/science/article/pii/S1366554522001715

  26. Liu, S., Jiang, H., Chen, S., Ye, J., He, R., Sun, Z.: Integrating dijkstra’s algorithm into deep inverse reinforcement learning for food delivery route planning. Transp. Res. Part E Logist. Transp. Rev. 142, 102070 (2020)

    Google Scholar 

  27. Liu, S., Araujo, M., Brunskill, E., Rossetti, R., Barros, J., Krishnan, R.: Understanding sequential decisions via inverse reinforcement learning. In: 2013 IEEE 14th International Conference on Mobile Data Management, vol. 1, pp. 177–186 (2013). https://doi.org/10.1109/MDM.2013.28

  28. Lovreglio, R., Fonzone, A., dell’Olio, L.: A mixed logit model for predicting exit choice during building evacuations. Transp. Res. Part A Policy Pract. 92, 59–75 (2016). https://doi.org/10.1016/j.tra.2016.06.018

    Article  Google Scholar 

  29. Montes, N., Osman, N., Sierra, C., Slavkovik, M.: Value engineering for autonomous agents. CoRR abs/2302.08759 (2023). https://doi.org/10.48550/arXiv.2302.08759

  30. Montes, N., Sierra, C.: Synthesis and properties of optimally value-aligned normative systems. J. Artif. Intell. Res. 74, 1739–1774 (2022). https://doi.org/10.1613/jair.1.13487

    Article  MathSciNet  MATH  Google Scholar 

  31. Ng, A.Y., Russell, S.J.: Algorithms for inverse reinforcement learning. In: Proceedings of the Seventeenth International Conference on Machine Learning, pp. 663–670. ICML 2000, Morgan Kaufmann Publishers Inc., San Francisco, CA, USA (2000)

    Google Scholar 

  32. Osman, N., d’Inverno, M.: A computational framework of human values. In: Proceedings of the 23rd International Conference on Autonomous Agents and Multiagent Systems, pp. 1531–1539. AAMAS 2024, International Foundation for Autonomous Agents and Multiagent Systems, Richland, SC (2024)

    Google Scholar 

  33. Prato, C.G.: Route choice modeling: past, present and future research directions. J. Choice Modell. 2(1), 65–100 (2009). https://doi.org/10.1016/S1755-5345(13)70005-8, https://www.sciencedirect.com/science/article/pii/S1755534513700058

  34. Qiu, L., et al.: ValueNet: a new dataset for human value driven dialogue system. In: Proceedings of the AAAI Conference on Artificial Intelligence , vol. 36, no. 10, pp. 11183–11191 (2022). https://doi.org/10.1609/aaai.v36i10.21368, https://ojs.aaai.org/index.php/AAAI/article/view/21368

  35. Ramos, G.D.M., Daamen, W., Hoogendoorn, S.: Modelling travellers’ heterogeneous route choice behaviour as prospect maximizers. J. Choice Modell. 6, 17–33 (2013). https://doi.org/10.1016/j.jocm.2013.04.002

    Article  Google Scholar 

  36. Rizzi, L., Ortúzar, J.: Stated preference in the valuation of interurban road safety. Accid. Anal. Prev. 35, 9–22 (2003). https://doi.org/10.1016/S0001-4575(01)00082-3

    Article  MATH  Google Scholar 

  37. Rodriguez-Soto, M., Serramia, M., Lopez-Sanchez, M., Rodriguez-Aguilar, J.A.: Instilling moral value alignment by means of multi-objective reinforcement learning. Ethics Inf. Technol. 24(1), 1–17 (2022). https://doi.org/10.1007/s10676-022-09635-0

    Article  MATH  Google Scholar 

  38. Russell, S.: Artificial Intelligence and the Problem of Control, pp. 19–24. Springer International Publishing, Cham (2022). https://doi.org/10.1007/978-3-030-86144-5_3

  39. Sadeghi-Niaraki, A., Kim, K.: Ontology based personalized route planning system using a multi-criteria decision making approach. Expert Syst. Appl. 36, 2250–2259 (2009). https://doi.org/10.1016/j.eswa.2007.12.053

    Article  MATH  Google Scholar 

  40. Scheiner, J., Holz-Rau, C.: Travel mode choice: affected by objective or subjective determinants? Transportation 34, 487–511 (2007). https://doi.org/10.1007/s11116-007-9112-1

    Article  Google Scholar 

  41. Schwartz, S.H.: Universals in the content and structure of values: Theoretical advances and empirical tests in 20 countries. In: Advances in Experimental Social Psychology, vol. 25, pp. 1–65. Elsevier (1992)

    Google Scholar 

  42. Schwartz, S.H.: An overview of the schwartz theory of basic values. Online Readings Psychol. Cult. 2(1), 11 (2012)

    MATH  Google Scholar 

  43. Serramia, M., Lopez-Sanchez, M., Rodriguez-Aguilar, J.A.: A qualitative approach to composing value-aligned norm systems. In: Proceedings of the 19th International Conference on Autonomous Agents and MultiAgent Systems, pp. 1233–1241. International Foundation for Autonomous Agents and Multiagent Systems (2020)

    Google Scholar 

  44. Serramia, M., et al.: Moral values in norm decision making. IFAAMAS 9 (2018). www.ifaamas.org

  45. Serramia, M., et al.: Encoding ethics to compute value-aligned norms. Minds Mach., 1–30 (2023). https://doi.org/10.1007/s11023-023-09649-7

  46. Soares, N.: The value learning problem. Artif. Intell. Saf. Secur. (2018). https://api.semanticscholar.org/CorpusID:13096553

  47. Veronese, C., Meli, D., Bistaffa, F., Rodríguez-Soto, M., Farinelli, A., Rodríguez-Aguilar, J.A.: Inductive logic programming for transparent alignment with multiple moral values. In: CEUR WORKSHOP PROCEEDINGS, vol. 7, pp. 84–88 (2024). https://doi.org/10.2/JQUERY.MIN.JS, https://iris.univr.it/handle/11562/1120547

    Google Scholar 

  48. Weidinger, L., et al.: Using the veil of ignorance to align AI systems with principles of justice. Proc. Nat. Acad. Sci. 120(18), e2213709120 (2023). https://doi.org/10.1073/pnas.2213709120, https://www.pnas.org/doi/abs/10.1073/pnas.2213709120

  49. Wilson, S.R., Shen, Y., Mihalcea, R.: Building and validating hierarchical lexicons with a case study on personal values. In: Staab, S., Koltsova, O., Ignatov, D.I. (eds.) Social Informatics, pp. 455–470. Springer International Publishing, Cham (2018). https://doi.org/10.1007/978-3-030-01129-1_28

    Chapter  Google Scholar 

  50. Wulfmeier, M., Ondrúška, P., Ondrúška, O., Posner, I.: Maximum entropy deep inverse reinforcement learning. arXiv preprint arXiv:1507.04888 (2015). https://doi.org/10.48550/arXiv.1507.04888, https://arxiv.org/abs/1507.04888v3

  51. Yang, Y., Yao, E., Yang, Z., Zhang, R.: Modeling the charging and route choice behavior of BEV drivers. Transp. Res. Part C: Emerg. Technol. 65, 190–204 (2016). https://doi.org/10.1016/j.trc.2015.09.008

    Article  MATH  Google Scholar 

  52. Zhao, Z., Liang, Y.: A deep inverse reinforcement learning approach to route choice modeling with context-dependent rewards. Transp. Res. Part C: Emerg. Technol. 149, 104079 (2023). https://doi.org/10.1016/j.trc.2023.104079, https://www.sciencedirect.com/science/article/pii/S0968090X23000682

  53. Zhong, M., Kim, J., Zheng, Z.: Estimating link flows in road networks with synthetic trajectory data generation: inverse reinforcement learning approach. IEEE Open J. Intell. Transp. Syst. 4, 14–29 (2023). https://doi.org/10.1109/OJITS.2022.3233904

    Article  MATH  Google Scholar 

  54. Ziebart, B.D.: Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Ph.D. thesis, CMU School of Computer Science, USA (2010). https://doi.org/10.1184/R1/6720692.v1

  55. Ziebart, B.D., Maas, A., Bagnell, J.A., Dey, A.K.: Maximum entropy inverse reinforcement learning. In: Proceedings of the 23rd National Conference on Artificial Intelligence - Volume 3, pp. 1433–1438. AAAI Press (2008)

    Google Scholar 

Download references

Acknowledgments

This work has been supported by grant VAE: TED2021-131295B-C33 funded by MCIN/AEI/10.13039/501100011033 and by the “European Union NextGenerationEU/PRTR”’, by grant COSASS: PID2021-123673OB-C32 funded by MCIN/AEI/10.13039/501100011033 and by “ERDF A way of making Europe”, and by the AGROBOTS Project of Universidad Rey Juan Carlos funded by the Community of Madrid, Spain.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Andrés Holgado-Sánchez .

Editor information

Editors and Affiliations

Ethics declarations

Disclosure of Interests

The authors have no competing interests to declare that are relevant to the content of this article.

Rights and permissions

Reprints and permissions

Copyright information

© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Holgado-Sánchez, A., Bajo, J., Billhardt, H., Ossowski, S., Arias, J. (2025). Value Learning for Value-Aligned Route Choice Modeling via Inverse Reinforcement Learning. In: Osman, N., Steels, L. (eds) Value Engineering in Artificial Intelligence. VALE 2024. Lecture Notes in Computer Science(), vol 15356. Springer, Cham. https://doi.org/10.1007/978-3-031-85463-7_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-85463-7_3

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-85462-0

  • Online ISBN: 978-3-031-85463-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics