Skip to main content
Log in

Privacy preserving planning in multi-agent stochastic environments

  • Published:
Autonomous Agents and Multi-Agent Systems Aims and scope Submit manuscript

Abstract

Collaborative privacy preserving planning (cppp) gained much attention in the past decade. cppp aims to create solutions for multi agent planning problems where cooperation is required to achieve an efficient solution, without exposing information that the agent considers private in the process. To date, cppp has focused on domains with deterministic action effects. However, in real-world problems action effects are often non-deterministic, and actions can have multiple possible effects with varying probabilities. In this paper, we introduce Stochastic cppp (scppp), which is an extension of cppp to domains with stochastic action effects. We show how scppp can be modeled as a Markov decision process (mdp) and how the value-iteration algorithm can be adapted to solve it. This adaptation requires extending value-iteration to support multiple agents and privacy. Then, we present two adaptions of the real-time dynamic programming (rtdp) algorithm, a popular algorithm for solving mdps, designed to solve scppp problems. The first rtdp adaptation, called distributed rtdp (drtdp), yields identical behavior to applying rtdp in a centralized manner on the joint problem. To preserve privacy, drtdp uses a message passing mechanism adopted from the mafs algorithm. The second rtdp adaptation is an approximation of drtdp called public synchronization rtdp (ps-rtdp). ps-rtdp differs from drtdp mainly in its message passing mechanism, where ps-rtdp sends significantly fewer messages than drtdp. We experimented on domains adapted from the deterministic cppp literature by adding different stochastic effects to different actions. The results show that ps-rtdp can reduce the amount of messages compared to drtdp by orders of magnitude thus improving run-time, while producing policies with similar expected costs.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1

Similar content being viewed by others

Notes

  1. https://github.com/TmanHef/SCPPP-Algs

  2. http://agents.fel.cvut.cz/codmap/

  3. https://github.com/TmanHef/SCPPP-PDDLs

References

  1. Hefner, T., Stern, R., & Shani, G. (2020). Privacy preserving planning in stochastic environments. In J. C. Beck, O. Buffet, J. Hoffmann, E. Karpas, & S. Sohrabi, (Eds.). Proceedings of the Thirtieth International Conference on Automated Planning and Scheduling, Nancy, France, October 26–30 (pp. 258–262). AAAI Press. https://aaai.org/ojs/index.php/ICAPS/article/view/6669

  2. Brafman, R. I., & Domshlak, C. (2008). From one to many: Planning for loosely coupled multi-agent systems. In ICAPS (pp. 28–35).

  3. Bellman, R. (1966). Dynamic programming. Science, 153(3731), 34–37.

    Article  Google Scholar 

  4. Kolobov, A. . (2012). Planning with Markov decision processes: An AI perspective. Synthesis Lectures on Artificial Intelligence and Machine Learning, 6(1), 1–210.

    MATH  Google Scholar 

  5. Barto, A. G., Bradtke, S. J., & Singh, S. P. (1995). Learning to act using real-time dynamic programming. Artificial Intelligence, 72(1–2), 81–138.

    Article  Google Scholar 

  6. Nissim, R., & Brafman, R. I. (2014). Distributed heuristic forward search for multi-agent planning. JAIR, 51, 293–332.

    Article  MathSciNet  Google Scholar 

  7. Štolba, M., Komenda, A., & Kovacs, D. L. (2015). Competition of distributed and multiagent planners (codmap). In The International Planning Competition (WIPC-15), (p. 24).

  8. Brafman, R. I., & Domshlak, C. (2013). On the complexity of planning for agent teams and its implications for single agent planning. Artificial Intelligence, 198, 52–71.

    Article  MathSciNet  Google Scholar 

  9. Brafman, R. I. (2015). A privacy preserving algorithm for multi-agent planning and search. In IJCAI (pp. 1530–1536).

  10. Tožička, J., Štolba, M., & Komenda, A. (2017). The limits of strong privacy preserving multi-agent planning. In Proceedings of the International Conference on Automated Planning and Scheduling (Vol. 27).

  11. Maliah, S., Shani, G., & Stern, R. (2018). Action dependencies in privacy-preserving multi-agent planning. Autonomous Agents and Multi-Agent Systems, 32(6), 779–821.

    Article  Google Scholar 

  12. Maliah, S., Shani, G., & Brafman, R. I. (2016a). Online macro generation for privacy preserving planning. In Twenty-Sixth International Conference on Automated Planning and Scheduling.

  13. Wu, F., Zilberstein, S., & Chen, X. (2018). Privacy-preserving policy iteration for decentralized pomdps. In Proceedings of the 32nd AAAI Conference on Artificial Intelligence (AAAI), New Orleans, USA (pp. 4759–4766).

  14. Štolba, M., Tožička, J., & Komenda, A. (2018). Quantifying privacy leakage in multi-agent planning. TOIT, 18(3), 28.

    Article  Google Scholar 

  15. Štolba, M., Fišer, D., & Komenda, A. (2019). Privacy leakage of search-based multi-agent planning algorithms. In Proceedings of the International Conference on Automated Planning and Scheduling (Vol. 29, pp. 482–490).

  16. Guestrin, C., Koller, D., & Parr, R. (2001). Max-norm projections for factored mdps. InIn IJCAI (Vol. 17, pp. 673–680).

  17. Bellman, R. (1957). A Markovian decision process. Journal of Mathematics and Mechanics, 6, 679–684.

    MathSciNet  MATH  Google Scholar 

  18. Puterman, M. L. (2014). Markov decision processes: Discrete stochastic dynamic programming. Wiley.

    MATH  Google Scholar 

  19. Trevizan, F. W., Teichteil-Königsbuch, F., & Thiébaux, S. (2017). Efficient solutions for stochastic shortest path problems with dead ends. In UAI.

  20. Du, Wenliang., & Zhan, Zhijun. (2002). A practical approach to solve secure multi-party computation problems. In Proceedings of the 2002 workshop on New security paradigms, pp 127–135.

  21. Bertsekas, D. (1982). Distributed dynamic programming. IEEE Transactions on Automatic Control, 27(3), 610–616.

    Article  MathSciNet  Google Scholar 

  22. Witzig, J., Beckenbach, I., Eifler, L., Fackeldey, K., Gleixner, A., Grever, A., & Weber, M. (2018). Mixed-integer programming for cycle detection in nonreversible Markov processes. Multiscale Modeling & Simulation, 16(1), 248–265.

    Article  MathSciNet  Google Scholar 

  23. Ameloot, T. J., & Van den Bussche, J. (2015). On the convergence of cycle detection for navigational reinforcement learning. http://arxiv.org/1511.08724

  24. Papadimitriou, C. H., & Tsitsiklis, J. N. (1987). The complexity of Markov decision processes. Mathematics of Operations Research, 12(3), 441–450.

    Article  MathSciNet  Google Scholar 

  25. Bertsekas, D. P., & Tsitsiklis, J. N. (1991). An analysis of stochastic shortest path problems. Mathematics of Operations Research, 16(3), 580–595.

    Article  MathSciNet  Google Scholar 

  26. Littman, M. L., Dean, T. L., & Kaelbling, L. P. (2013). On the complexity of solving markov decision problems. http://arxiv.org/1302.4971

  27. Štolba, M., & Komenda, A. (2017). The Madla planner: Multi-agent planning by combination of distributed and local heuristic search. Artificial Intelligence, 252, 175–210.

    Article  MathSciNet  Google Scholar 

  28. Maliah, S., Shani, G., & Stern, R. (2016b). Stronger privacy preserving projections for multi-agent planning. In the International Conference on Automated Planning and Scheduling (ICAPS) (pp. 221–229).

  29. Maliah, S., Shani, G., & Stern, R. (2016c). Stronger privacy preserving projections for multi-agent planning. In Proceedings of the International Conference on Automated Planning and Scheduling (Vol. 26).

  30. Maliah, S., Shani, G., & Stern, R. (2014). Privacy preserving landmark detection. In the European Conference on Artificial Intelligence (ECAI) (pp. 597–602).

  31. Hauskrecht, M., Meuleau, N., Kaelbling, L. P, Dean, T. L., & Boutilier, C. (2013). Hierarchical solution of markov decision processes using macro-actions. http://arxiv.org/1301.7381

  32. Gerevini, A. E., Lipovetzky, N., Percassi, F., Saetti, A., & Serina, I. (2019). Best-first width search for multi agent privacy-preserving planning. In Proceedings of the International Conference on Automated Planning and Scheduling (Vol. 29, pp. 163–171).

  33. Gohari, P., Hale, M., & Topcu, U. (2020). Privacy-preserving policy synthesis in markov decision processes. In 2020 59th IEEE Conference on Decision and Control (CDC) (pp. 6266–6271). IEEE.

  34. Venkitasubramaniam, P. (2013). Privacy in stochastic control: A markov decision process perspective. In 2013 51st Annual Allerton Conference on Communication, Control, and Computing (Allerton) (pp. 381–388). IEEE.

  35. Boutilier, C. (1999). Sequential optimality and coordination in multiagent systems. In In IJCAI (Vol. 99, pp. 478–485).

  36. Amato, C., Chowdhary, G., Geramifard, A., Kemal Üre, N., & Kochenderfer, M. J. (2013). Decentralized control of partially observable markov decision processes. In 52nd IEEE Conference on Decision and Control (pp. 2398–2405). IEEE.

  37. Kocsis, L., Szepesvári, C. (2006). Bandit based monte-carlo planning. In European conference on machine learning (pp. 282–293). Springer.

  38. Brendan, M. H., Likhachev, M., & Gordon, G. J. (2005). Bounded real-time dynamic programming: Rtdp with monotone upper bounds and performance guarantees. In Proceedings of the 22nd international conference on Machine learning (pp. 569–576). ACM.

  39. Smith, T., & Simmons, R. (2006). Focused real-time dynamic programming for mdps: Squeezing more out of a heuristic. In AAAI (pp. 1227–1232).

  40. Bonet, B., & Geffner, H. (2001). Planning as heuristic search. Artificial Intelligence, 129(1), 5–33.

    Article  MathSciNet  Google Scholar 

  41. Keller, T., & Eyerich, P. (2012). Prost: Probabilistic planning based on uct. In Proceedings of the International Conference on Automated Planning and Scheduling (Vol. 22).

  42. Bonet, B., & Geffner, H. (2006). Learning depth-first search: A unified approach to heuristic search in deterministic and non-deterministic settings, and its application to mdps. In In ICAPS (Vol. 6, pp. 142–151).

  43. Stolba, M., Tozicka, J., & Komenda, A. (2018). Quantifying privacy leakage in multi-agent planning. ACM Transactions on Internet Technology, 18(3), 28:1-28:21.

    Article  Google Scholar 

  44. Stolba, M., Fiser, D., & Komenda, A. (2019). Privacy leakage of search-based multi-agent planning algorithms. In J. Benton, N. Lipovetzky, E. Onaindia, D. E. Smith, & S. Srivastava (Eds.). Proceedings of the Twenty-Ninth International Conference on Automated Planning and Scheduling, ICAPS 2018, Berkeley, CA, USA, July 11–15, 2019 (pp. 482–490). AAAI Press.

Download references

Acknowledgements

This research was supported by ISF Grant #210/17 to Roni Stern. This research was also supported by the ISF fund under Grant #1210/18.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tommy Hefner.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Parts of this paper appeared in [1].

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Hefner, T., Shani, G. & Stern, R. Privacy preserving planning in multi-agent stochastic environments. Auton Agent Multi-Agent Syst 36, 22 (2022). https://doi.org/10.1007/s10458-022-09554-w

Download citation

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10458-022-09554-w

Keywords

Navigation