Digging for Decision Trees: A Case Study in Strategy Sampling and Learning

Budde, Carlos E.; D’Argenio, Pedro R.; Hartmanns, Arnd

doi:10.1007/978-3-031-75434-0_24

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 15217))

Included in the following conference series:

International Conference on Bridging the Gap between AI and Reality

223 Accesses

Abstract

We introduce a formal model of transportation in an open-pit mine for the purpose of optimising the mine’s operations. The model is a network of Markov automata (MA); the optimisation goal corresponds to maximising a time-bounded expected reward property. Today’s model checking algorithms exacerbate the state space explosion problem by applying a discretisation approach to such properties on MA. We show that model checking is infeasible even for small mine instances. Instead, we propose statistical model checking with lightweight strategy sampling or table-based Q-learning over untimed strategies as an alternative to approach the optimisation task, using the Modest Toolset’s modes tool. We add support for partial observability to modes so that strategies can be based on carefully selected model features, and we implement a connection from modes to the dtControl tool to convert sampled or learned strategies into decision trees. We experimentally evaluate the adequacy of our new tooling on the open-pit mine case study. Our experiments demonstrate the limitations of Q-learning, the impact of feature selection, and the usefulness of decision trees as an explainable representation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 64.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Learning Explainable and Better Performing Representations of POMDP Strategies

SOS: Safe, Optimal and Small Strategies for Hybrid Markov Decision Processes

Counterexample Explanation by Learning Small Strategies in Markov Decision Processes

Data availability

The models and tools/scripts to reproduce our experimental evaluation are archived and available at DOI 10.5281/zenodo.13327230 [10].

Notes

1.
To ease the presentation, we assume actions to uniquely identify transitions per state.
2.
We use the average runtime of both properties per model and run, whose coefficient of variation was in almost all cases below 10% and in two cases 25.3% and 25.1%.

References

Agha, G., Palmskog, K.: A survey of statistical model checking. ACM Trans. Model. Comput. Simul. 28(1), 6:1–6:39 (2018). https://doi.org/10.1145/3158668
Alarie, S., Gamache, M.: Overview of solution strategies used in truck dispatching systems for open pit mines. Int. J. Surf. Min. Reclam. Environ. 16(1), 59–76 (2002). https://doi.org/10.1076/ijsm.16.1.59.3408
Article MATH Google Scholar
Alur, R., Dill, D.L.: A theory of timed automata. Theor. Comput. Sci. 126(2), 183–235 (1994). https://doi.org/10.1016/0304-3975(94)90010-8
Article MathSciNet MATH Google Scholar
Ashok, P., Butkova, Y., Hermanns, H., Křetínský, J.: Continuous-time Markov decisions based on partial exploration. In: Lahiri, S.K., Wang, C. (eds.) ATVA 2018. LNCS, vol. 11138, pp. 317–334. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01090-4_19
Ashok, P., Jackermeier, M., Křetínský, J., Weinhuber, C., Weininger, M., Yadav, M.: dtControl 2.0: explainable strategy representation via decision tree learning steered by experts. In: TACAS 2021. LNCS, vol. 12652, pp. 326–345. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-72013-1_17
Chapter MATH Google Scholar
Baier, C., de Alfaro, L., Forejt, V., Kwiatkowska, M.: Model checking probabilistic systems. In: Handbook of Model Checking, pp. 963–999. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-10575-8_28
Chapter MATH Google Scholar
Behrmann, G., David, A., Larsen, K.G.: A tutorial on Uppaal. In: Bernardo, M., Corradini, F. (eds.) SFM-RT 2004. LNCS, vol. 3185, pp. 200–236. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-30080-9_7
Chapter MATH Google Scholar
Bellman, R.: A Markovian decision process. J. Mathem. Mech. 6(5), 679–684 (1957)
MathSciNet MATH Google Scholar
Bohnenkamp, H.C., D’Argenio, P.R., Hermanns, H., Katoen, J.P.: MoDeST: a compositional modeling formalism for hard and softly timed systems. IEEE Trans. Software Eng. 32(10), 812–830 (2006). https://doi.org/10.1109/TSE.2006.104
Article MATH Google Scholar
Budde, C.E., D’Argenio, P.R., Hartmanns, A.: Artifact for digging for decision trees: a case study in strategy sampling and learning. Zenodo (2024). https://doi.org/10.5281/zenodo.13327230
Article MATH Google Scholar
Budde, C.E., D’Argenio, P.R., Hartmanns, A., Sedwards, S.: An efficient statistical model checker for nondeterminism and rare events. Int. J. Softw. Tools Technol. Transf. 22(6), 759–780 (2020). https://doi.org/10.1007/S10009-020-00563-2
Article MATH Google Scholar
Bulychev, P.E., et al.: Uppaal-SMC: statistical model checking for priced timed automata. In: Wiklicky, H., Massink, M. (eds.) 10th Workshop on Quantitative Aspects of Programming Languages and Systems (QAPL). EPTCS, vol. 85, pp. 1–16 (2012). https://doi.org/10.4204/EPTCS.85.1
Butkova, Y., Fox, G.: Optimal time-bounded reachability analysis for concurrent systems. In: Vojnar, T., Zhang, L. (eds.) TACAS 2019. LNCS, vol. 11428, pp. 191–208. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-17465-1_11
Chapter MATH Google Scholar
Butkova, Y., Hartmanns, A., Hermanns, H.: A Modest approach to Markov automata. ACM Trans. Model. Comput. Simul. 31(3), 14:1–14:34 (2021). https://doi.org/10.1145/3449355
Butkova, Y., Hatefi, H., Hermanns, H., Krčál, J.: Optimal continuous time Markov decisions. In: Finkbeiner, B., Pu, G., Zhang, L. (eds.) ATVA 2015. LNCS, vol. 9364, pp. 166–182. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24953-7_12
Bäuerle, N., Rieder, U.: Markov decision processes with applications to finance. Springer (2011). https://doi.org/10.1007/978-3-642-18324-9
Article MATH Google Scholar
D’Argenio, P.R., Hartmanns, A., Sedwards, S.: Lightweight statistical model checking in nondeterministic continuous time. In: Margaria, T., Steffen, B. (eds.) ISoLA 2018. LNCS, vol. 11245, pp. 336–353. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-03421-4_22
Chapter MATH Google Scholar
D’Argenio, P.R., Legay, A., Sedwards, S., Traonouez, L.M.: Smart sampling for lightweight verification of Markov decision processes. Int. J. Softw. Tools Technol. Transf. 17(4), 469–484 (2015). https://doi.org/10.1007/S10009-015-0383-0
Article MATH Google Scholar
David, A., Jensen, P.G., Larsen, K.G., Mikučionis, M., Taankvist, J.H.: Uppaal Stratego. In: Baier, C., Tinelli, C. (eds.) TACAS 2015. LNCS, vol. 9035, pp. 206–211. Springer, Heidelberg (2015). https://doi.org/10.1007/978-3-662-46681-0_16
Chapter MATH Google Scholar
Eisentraut, C., Hermanns, H., Zhang, L.: On probabilistic automata in continuous time. In: 25th Annual IEEE Symposium on Logic in Computer Science (LICS), pp. 342–351. IEEE Computer Society (2010). https://doi.org/10.1109/LICS.2010.41
Gros, T.P., et al.: DSMC evaluation stages: fostering robust and safe behavior in deep reinforcement learning – extended version. ACM Trans. Model. Comput. Simul. 33(4), 17:1–17:28 (2023). https://doi.org/10.1145/3607198
Hahn, E.M., Hartmanns, A., Hermanns, H., Katoen, J.P.: A compositional modelling and analysis framework for stochastic hybrid systems. Formal Methods Syst. Des. 43(2), 191–232 (2013). https://doi.org/10.1007/S10703-012-0167-Z
Article MATH Google Scholar
Hartmanns, A., Hermanns, H.: The Modest Toolset: an integrated environment for quantitative modelling and verification. In: Ábrahám, E., Havelund, K. (eds.) TACAS 2014. LNCS, vol. 8413, pp. 593–598. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-642-54862-8_51
Chapter MATH Google Scholar
Hartmanns, A., Hermanns, H.: A Modest Markov automata tutorial. In: Krötzsch, M., Stepanova, D. (eds.) Reasoning Web. Explainable Artificial Intelligence. LNCS, vol. 11810, pp. 250–276. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-31423-1_8
Hartmanns, A., Junges, S., Quatmann, T., Weininger, M.: A practitioner’s guide to MDP model checking algorithms. In: Sankaranarayanan, S., Sharygina, N. (eds.) 29th International Conference on Tools and Algorithms for the Construction and Analysis of Systems (TACAS). LNCS, vol. 13993, pp. 469–488. Springer (2023). https://doi.org/10.1007/978-3-031-30823-9_24
Hartmanns, A., Klauck, M.: The Modest state of learning, sampling, and verifying strategies. In: Margaria, T., Steffen, B. (eds.) 11th International Symposium on Leveraging Applications of Formal Methods, Verification and Validation (ISoLA). LNCS, vol. 13703, pp. 406–432. Springer (2022). https://doi.org/10.1007/978-3-031-19759-8_25
Hatefi-Ardakani, H.: Finite horizon analysis of Markov automata. Ph.D. thesis, Saarland University, Germany (2017). http://scidok.sulb.uni-saarland.de/volltexte/2017/6743/
Hensel, C., Junges, S., Katoen, J.P., Quatmann, T., Volk, M.: The probabilistic model checker Storm. Int. J. Softw. Tools Technol. Transf. 24(4), 589–610 (2022). https://doi.org/10.1007/S10009-021-00633-Z
Article MATH Google Scholar
Jensen, P.G., Larsen, K.G., Mikucionis, M.: Playing wordle with Uppaal Stratego. In: Jansen, N., Stoelinga, M., van den Bos, P. (eds.) A Journey from Process Algebra via Timed Automata to Model Learning - Essays Dedicated to Frits Vaandrager on the Occasion of His 60th Birthday. LNCS, vol. 13560, pp. 283–305. Springer (2022). https://doi.org/10.1007/978-3-031-15629-8_15
Kaelbling, L.P., Littman, M.L., Cassandra, A.R.: Planning and acting in partially observable stochastic domains. Artif. Intell. 101(1–2), 99–134 (1998). https://doi.org/10.1016/S0004-3702(98)00023-X
Article MathSciNet MATH Google Scholar
Kretínský, J., Meggendorfer, T.: Of cores: a partial-exploration framework for Markov decision processes. Log. Methods Comput. Sci. 16(4) (2020). https://lmcs.episciences.org/6833
Kwiatkowska, M., Norman, G., Parker, D.: PRISM 4.0: verification of probabilistic real-time systems. In: Gopalakrishnan, G., Qadeer, S. (eds.) CAV 2011. LNCS, vol. 6806, pp. 585–591. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-22110-1_47
Chapter MATH Google Scholar
Law, A.M.: Simulation Modeling and Analysis, 5th edn. McGraw-Hill series in industrial engineering and management science, McGraw-Hill Education (2015)
Google Scholar
Legay, A., Sedwards, S., Traonouez, L.-M.: Scalable verification of Markov decision processes. In: Canal, C., Idani, A. (eds.) SEFM 2014. LNCS, vol. 8938, pp. 350–362. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-15201-1_23
Moradi Afrapoli, A., Askari-Nasab, H.: Mining fleet management systems: a review of models and algorithms. Int. J. Min. Reclam. Environ. 33(1), 42–60 (2019). https://doi.org/10.1080/17480930.2017.1336607
Article MATH Google Scholar
Norman, G., Parker, D., Zou, X.: Verification and control of partially observable probabilistic systems. Real Time Syst. 53(3), 354–402 (2017). https://doi.org/10.1007/S11241-017-9269-4
Article MATH Google Scholar
Puterman, M.L.: Markov Decision Processes: Discrete Stochastic Dynamic Programming. Wiley Series in Probability and Statistics, Wiley (1994). https://doi.org/10.1002/9780470316887
Sutton, R.S., Barto, A.G.: Reinforcement learning - An introduction. MIT Press, Adaptive computation and machine learning (1998)
Google Scholar
Watkins, C.J.C.H., Dayan, P.: Q-learning. Mach. Learn. 8, 279–292 (1992). https://doi.org/10.1007/BF00992698

Download references

Acknowledgments

We are grateful to Matías D. Lee and Joaquín Feltes for discussions and insights on early versions of the open-pit mine model.

Funding

This work was supported by Agencia I$+$D$+$i grant PICT 2022-09-00580 (CoSMoSS), the European Union’s Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreements 101008233 (MISSION) and 101067199 (ProSVED), the Interreg North Sea project STORM_SAFE, the NextGenerationEU projects D53D23008400006 (SMARTITUDE) under the MUR PRIN 2022 and PE00000014 (SERICS) under the MUR PNRR, NWO VIDI grant VI.Vidi.223.110 (TruSTy), and SeCyT-UNC grant 33620230100384CB (MECANO).

Author information

Authors and Affiliations

University of Trento, Trento, Italy
Carlos E. Budde
Universidad Nacional de Córdoba, Córdoba, Argentina
Pedro R. D’Argenio
CONICET, Córdoba, Argentina
Pedro R. D’Argenio
University of Twente, Enschede, The Netherlands
Arnd Hartmanns

Authors

Carlos E. Budde
View author publications
You can also search for this author in PubMed Google Scholar
Pedro R. D’Argenio
View author publications
You can also search for this author in PubMed Google Scholar
Arnd Hartmanns
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Arnd Hartmanns .

Editor information

Editors and Affiliations

TU Dortmund University, Dortmund, Germany
Bernhard Steffen

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Budde, C.E., D’Argenio, P.R., Hartmanns, A. (2025). Digging for Decision Trees: A Case Study in Strategy Sampling and Learning. In: Steffen, B. (eds) Bridging the Gap Between AI and Reality. AISoLA 2024. Lecture Notes in Computer Science, vol 15217. Springer, Cham. https://doi.org/10.1007/978-3-031-75434-0_24

Download citation

DOI: https://doi.org/10.1007/978-3-031-75434-0_24
Published: 30 December 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-75433-3
Online ISBN: 978-3-031-75434-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Digging for Decision Trees: A Case Study in Strategy Sampling and Learning