Skip to main content

Probabilistic Program Analysis

  • Conference paper
  • First Online:
Grand Timely Topics in Software Engineering (GTTSE 2015)

Abstract

This paper provides a survey of recent work on adapting techniques for program analysis to compute probabilistic characterizations of program behavior. We survey how the frameworks of data flow analysis and symbolic execution have incorporated information about input probability distributions to quantify the likelihood of properties of program states. We identify themes that relate and distinguish a variety of techniques that have been developed over the past 15 years in this area. In doing so, we point out opportunities for future research that builds on the strengths of different techniques.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    More precisely, Eq. (1) represents the probability of satisfying the constraint \(\phi \) conditioned on the fact that the input is within the prescribed domain D.

References

  1. Aydin, A., Bang, L., Bultan, T.: Automata-based model counting for string constraints. In: Proceedings of the 27th International Conference on Computer Aided Verification, CAV 2015, Part I, San Francisco, CA, USA, 18–24 July 2015, pp. 255–272 (2015)

    Google Scholar 

  2. Bagnara, R., Hill, P.M., Zaffanella, E.: The parma polyhedra library: toward a complete set of numerical abstractions for the analysis and verification of hardware and software systems. Sci. Comput. Program. 72(1), 3–21 (2008)

    Article  MathSciNet  Google Scholar 

  3. Bang, L., Aydin, A., Phan, Q., Pasareanu, C.S., Bultan, T.: String analysis for side channels with segmented oracles. In: Proceedings of the 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering, FSE 2016, Seattle, WA, USA, 13–18 November 2016, pp. 193–204 (2016)

    Google Scholar 

  4. Barvinok, A.I.: A polynomial time algorithm for counting integral points in polyhedra when the dimension is fixed. Math. Oper. Res. 19(4), 769–779 (1994)

    Article  MathSciNet  MATH  Google Scholar 

  5. de Berg, M.: Computational Geometry: Algorithms and Applications. Springer, Heidelberg (2008)

    Book  MATH  Google Scholar 

  6. Biere, A., van Maaren, H.: Handbook of Satisfiability. Frontiers in Artificial Intelligence and Applications. IOS Press, Amsterdam (2009)

    MATH  Google Scholar 

  7. Bishop, C.: Pattern Recognition and Machine Learning. Information Science and Statistics. Springer, New York (2006)

    MATH  Google Scholar 

  8. Borges, M., Filieri, A., d’Amorim, M., Păsăreanu, C.S.: Iterative distribution-aware sampling for probabilistic symbolic execution. In: Proceedings of the 10th Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering, ESEC/FSE 2015. ACM (2015)

    Google Scholar 

  9. Borges, M., Filieri, A., d’Amorim, M., Păsăreanu, C.S., Visser, W.: Compositional solution space quantification for probabilistic software analysis. In: Proceedings of the 35th ACM SIGPLAN Conference on Programming Language Design and Implementation, pp. 123–132. ACM (2014)

    Google Scholar 

  10. Bryant, R.E.: Symbolic boolean manipulation with ordered binary-decision diagrams. ACM Comput. Surv. (CSUR) 24(3), 293–318 (1992)

    Article  Google Scholar 

  11. Cadar, C., Dunbar, D., Engler, D.R.: Klee: Unassisted and automatic generation of high-coverage tests for complex systems programs. In: OSDI, vol. 8, pp. 209–224 (2008)

    Google Scholar 

  12. Chakraborty, S., Fremont, D.J., Meel, K.S., Seshia, S.A., Vardi, M.Y.: Distribution-aware sampling and weighted model counting for SAT. In: Twenty-Eighth AAAI Conference on Artificial Intelligence (2014)

    Google Scholar 

  13. Chakraborty, S., Meel, K.S., Vardi, M.Y.: A scalable approximate model counter. In: Schulte, C. (ed.) CP 2013. LNCS, vol. 8124, pp. 200–216. Springer, Heidelberg (2013). doi:10.1007/978-3-642-40627-0_18

    Chapter  Google Scholar 

  14. Chistikov, D., Dimitrova, R., Majumdar, R.: Approximate counting in SMT and value estimation for probabilistic programs. In: Baier, C., Tinelli, C. (eds.) TACAS 2015. LNCS, vol. 9035, pp. 320–334. Springer, Heidelberg (2015). doi:10.1007/978-3-662-46681-0_26

    Google Scholar 

  15. Claret, G., Rajamani, S.K., Nori, A.V., Gordon, A.D., Borgström, J.: Bayesian inference using data flow analysis. In: Proceedings of the 2013 9th Joint Meeting on Foundations of Software Engineering, pp. 92–102. ACM (2013)

    Google Scholar 

  16. Clarke, E.M., Grumberg, O., Peled, D.: Model Checking. MIT Press, Cambridge (1999)

    Google Scholar 

  17. Clarke, L., et al.: A system to generate test data and symbolically execute programs. IEEE Trans. Software Eng. 3, 215–222 (1976)

    Article  MathSciNet  Google Scholar 

  18. Cousot, P., Cousot, R.: Abstract interpretation: a unified lattice model for static analysis of programs by construction or approximation of fixpoints. In: Proceedings of the 4th ACM SIGACT-SIGPLAN Symposium on Principles of Programming Languages, pp. 238–252. ACM (1977)

    Google Scholar 

  19. Cousot, P., Monerau, M.: Probabilistic abstract interpretation. In: Seidl, H. (ed.) ESOP 2012. LNCS, vol. 7211, pp. 169–193. Springer, Heidelberg (2012). doi:10.1007/978-3-642-28869-2_9

    Chapter  Google Scholar 

  20. Di Pierro, A., Wiklicky, H.: Probabilistic data flow analysis: a linear equational approach. arXiv preprint arXiv:1307.4474 (2013)

  21. Dwyer, M.B.: Unifying testing and analysis through behavioral coverage. In: 2011 26th IEEE/ACM International Conference on Automated Software Engineering (ASE), p. 2. IEEE (2011)

    Google Scholar 

  22. Esparza, J., Gaiser, A.: Probabilistic abstractions with arbitrary domains. In: Yahav, E. (ed.) SAS 2011. LNCS, vol. 6887, pp. 334–350. Springer, Heidelberg (2011). doi:10.1007/978-3-642-23702-7_25

    Chapter  Google Scholar 

  23. Filieri, A., Frias, M., Păsăreanu, C., Visser, W.: Model counting for complex data structures. In: Proceedings of the 2015 International SPIN Symposium on Model Checking of Software. ACM (2015)

    Google Scholar 

  24. Filieri, A., Păsăreanu, C.S., Visser, W., Geldenhuys, J.: Statistical symbolic execution with informed sampling. In: Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering, pp. 437–448. ACM (2014)

    Google Scholar 

  25. Filieri, A., Păsăreanu, C.S., Yang, G.: Quantification of software changes through probabilistic symbolic execution. In: Proceedings of the 30th IEEE/ACM International Conference on Automated Software Engineering (ASE) - Short Paper, November 2015

    Google Scholar 

  26. Filieri, A., Păsăreanu, C.S., Visser, W.: Reliability analysis in symbolic pathfinder. In: Proceedings of the 2013 International Conference on Software Engineering, pp. 622–631. IEEE Press (2013)

    Google Scholar 

  27. Fink, S., Dolby, J.: WALA-The TJ watson libraries for analysis (2012)

    Google Scholar 

  28. Floyd, R.W.: Assigning meanings to programs. In: Mathematical Aspects of Computer Science, pp. 19–32 (1967)

    Google Scholar 

  29. Fosdick, L.D., Osterweil, L.J.: Data flow analysis in software reliability. ACM Comput. Surv. (CSUR) 8(3), 305–330 (1976)

    Article  MathSciNet  MATH  Google Scholar 

  30. Fu, K., Huang, T.: Stochastic grammars and languages. Int. J. Comput. Inform. Sci. 1(2), 135–170 (1972)

    Article  MathSciNet  MATH  Google Scholar 

  31. Geldenhuys, J., Dwyer, M.B., Visser, W.: Probabilistic symbolic execution. In: Proceedings of the 2012 International Symposium on Software Testing and Analysis, pp. 166–176. ACM (2012)

    Google Scholar 

  32. Gelman, A., Carlin, J., Stern, H., Dunson, D., Vehtari, A., Rubin, D.: Bayesian Data Analysis, 3rd edn. Chapman & Hall/CRC Texts in Statistical Science, Taylor & Francis (2013)

    Google Scholar 

  33. Gentle, J.: Random Number Generation and Monte Carlo Methods. Statistics and Computing. Springer, New York (2013)

    MATH  Google Scholar 

  34. Godefroid, P., Klarlund, N., Sen, K.: DART: directed automated random testing. In: ACM Sigplan Notices, vol. 40, pp. 213–223. ACM (2005)

    Google Scholar 

  35. Gordon, A.D., Henzinger, T.A., Nori, A.V., Rajamani, S.K.: Probabilistic programming. In: Proceedings of the on Future of Software Engineering, pp. 167–181. ACM (2014)

    Google Scholar 

  36. Graf, S., Saidi, H.: Construction of abstract state graphs with PVS. In: Grumberg, O. (ed.) CAV 1997. LNCS, vol. 1254, pp. 72–83. Springer, Heidelberg (1997). doi:10.1007/3-540-63166-6_10

    Chapter  Google Scholar 

  37. Hahn, E.M., Hermanns, H., Wachter, B., Zhang, L.: PASS: abstraction refinement for infinite probabilistic models. In: Esparza, J., Majumdar, R. (eds.) TACAS 2010. LNCS, vol. 6015, pp. 353–357. Springer, Heidelberg (2010). doi:10.1007/978-3-642-12002-2_30

    Chapter  Google Scholar 

  38. Hasuo, I., Jacobs, B., Sokolova, A.: Generic trace theory. Electron. Notes Theor. Comput. Sci. 164(1), 47–65 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  39. Hérault, T., Lassaigne, R., Magniette, F., Peyronnet, S.: Approximate probabilistic model checking. In: Steffen, B., Levi, G. (eds.) VMCAI 2004. LNCS, vol. 2937, pp. 73–84. Springer, Heidelberg (2004). doi:10.1007/978-3-540-24622-0_8

    Chapter  Google Scholar 

  40. Hoeffding, W.: Probability inequalities for sums of bounded random variables. J. Am. Stat. Assoc. 58(301), 13–30 (1963)

    Article  MathSciNet  MATH  Google Scholar 

  41. Jamrozik, K., Fraser, G., Tillman, N., Halleux, J.: Generating test suites with augmented dynamic symbolic execution. In: Veanes, M., Viganò, L. (eds.) TAP 2013. LNCS, vol. 7942, pp. 152–167. Springer, Heidelberg (2013). doi:10.1007/978-3-642-38916-0_9

    Chapter  Google Scholar 

  42. Jegourel, C., Legay, A., Sedwards, S.: Cross-entropy optimisation of importance sampling parameters for statistical model checking. In: Madhusudan, P., Seshia, S.A. (eds.) CAV 2012. LNCS, vol. 7358, pp. 327–342. Springer, Heidelberg (2012). doi:10.1007/978-3-642-31424-7_26

    Chapter  Google Scholar 

  43. Jegourel, C., Legay, A., Sedwards, S.: Importance splitting for statistical model checking rare properties. In: Sharygina, N., Veith, H. (eds.) CAV 2013. LNCS, vol. 8044, pp. 576–591. Springer, Heidelberg (2013). doi:10.1007/978-3-642-39799-8_38

    Chapter  Google Scholar 

  44. Jones, C.: Probabilistic non-determinism (1990)

    Google Scholar 

  45. Kildall, G.A.: A unified approach to global program optimization. In: Proceedings of the 1st Annual ACM SIGACT-SIGPLAN Symposium on Principles of Programming Languages, pp. 194–206. ACM (1973)

    Google Scholar 

  46. King, J.C.: Symbolic execution and program testing. Commun. ACM 19(7), 385–394 (1976)

    Article  MathSciNet  MATH  Google Scholar 

  47. Kozen, D.: Semantics of probabilistic programs. J. Comput. Syst. Sci. 22(3), 328–350 (1981)

    Article  MathSciNet  MATH  Google Scholar 

  48. Kozen, D.: A probabilistic PDL. In: Proceedings of the Fifteenth Annual ACM Symposium on Theory of Computing, pp. 291–297. ACM (1983)

    Google Scholar 

  49. Kwiatkowska, M., Norman, G., Parker, D.: Advances and challenges of probabilistic model checking. In: 2010 Proceedings of the 48th Annual Allerton Conference on Communication, Control, and Computing (Allerton) (2010)

    Google Scholar 

  50. Kwiatkowska, M., Norman, G., Parker, D.: PRISM 4.0: verification of probabilistic real-time systems. In: Gopalakrishnan, G., Qadeer, S. (eds.) CAV 2011. LNCS, vol. 6806, pp. 585–591. Springer, Heidelberg (2011). doi:10.1007/978-3-642-22110-1_47

    Chapter  Google Scholar 

  51. Lam, P., Bodden, E., Lhoták, O., Hendren, L.: The Soot framework for Java program analysis: a retrospective. In: Cetus Users and Compiler Infrastructure Workshop, Galveston Island, TX, October 2011

    Google Scholar 

  52. Lattner, C., Adve, V.: LLVM: a compilation framework for lifelong program analysis & transformation. In: Proceedings of the International Symposium on Code Generation and Optimization: Feedback-Directed and Runtime Optimization (2004)

    Google Scholar 

  53. Legay, A., Delahaye, B., Bensalem, S.: Statistical model checking: an overview. In: Barringer, H., Falcone, Y., Finkbeiner, B., Havelund, K., Lee, I., Pace, G., Roşu, G., Sokolsky, O., Tillmann, N. (eds.) RV 2010. LNCS, vol. 6418, pp. 122–135. Springer, Heidelberg (2010). doi:10.1007/978-3-642-16612-9_11

    Chapter  Google Scholar 

  54. Luckow, K., Păsăreanu, C.S., Dwyer, M.B., Filieri, A., Visser, W.: Exact and approximate probabilistic symbolic execution for nondeterministic programs. In: Proceedings of the 29th ACM/IEEE International Conference on Automated Software Engineering, pp. 575–586. ACM (2014)

    Google Scholar 

  55. Luu, L., Shinde, S., Saxena, P., Demsky, B.: A model counter for constraints over unbounded strings. In: Proceedings of the 35th ACM SIGPLAN Conference on Programming Language Design and Implementation, pp. 565–576. ACM (2014)

    Google Scholar 

  56. Mardziel, P., Magill, S., Hicks, M., Srivatsa, M.: Dynamic enforcement of knowledge-based security policies using probabilistic abstract interpretation. J. Comput. Secur. 21(4), 463–532 (2013)

    Article  Google Scholar 

  57. McDonald, J.B.: Some generalized functions for the size distribution of income. Econometrica J. Econometric Soc. 52, 647–663 (1984)

    Article  MATH  Google Scholar 

  58. Meel, K.S.: Sampling techniques for boolean satisfiability. CoRR abs/1404.6682 (2014). http://arxiv.org/abs/1404.6682

  59. Monniaux, D.: Abstract interpretation of probabilistic semantics. In: Palsberg, J. (ed.) SAS 2000. LNCS, vol. 1824, pp. 322–339. Springer, Heidelberg (2000). doi:10.1007/978-3-540-45099-3_17

    Chapter  Google Scholar 

  60. Monniaux, D.: Backwards Abstract Interpretation of Probabilistic Programs. In: Sands, D. (ed.) ESOP 2001. LNCS, vol. 2028, pp. 367–382. Springer, Heidelberg (2001). doi:10.1007/3-540-45309-1_24

    Chapter  Google Scholar 

  61. Monniaux, D.: Abstract interpretation of programs as markov decision processes. Sci. Comput. Program. 58(1), 179–205 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  62. Morgan, C., McIver, A., Seidel, K.: Probabilistic predicate transformers. ACM Trans. Program. Lang. Syst. (TOPLAS) 18(3), 325–353 (1996)

    Article  Google Scholar 

  63. Murta, D., Oliveira, J.N.: A study of risk-aware program transformation. Sci. Comput. Program. 110(C), 51–77 (2015)

    Article  Google Scholar 

  64. Oliveira, J.N., Miraldo, V.C.: “Keep definition, change category” — a practical approach to state-based system calculi. J. Logical Algebraic Methods Program. 85(4), 449–474 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  65. Pasareanu, C.S., Phan, Q., Malacaria, P.: Multi-run side-channel analysis using symbolic execution and Max-SMT. In: IEEE 29th Computer Security Foundations Symposium, CSF 2016, Lisbon, Portugal, 27 June–1 July 2016, pp. 387–400 (2016)

    Google Scholar 

  66. Păsăreanu, C.S., Rungta, N.: Symbolic pathfinder: symbolic execution of Java bytecode. In: Proceedings of the IEEE/ACM International Conference on Automated Software Engineering, pp. 179–180. ACM (2010)

    Google Scholar 

  67. Pestman, W.R.: Mathematical Statistics: An Introduction, vol. 1. Walter de Gruyter, Berlin (1998)

    Book  MATH  Google Scholar 

  68. Puggelli, A., Li, W., Sangiovanni-Vincentelli, A.L., Seshia, S.A.: Polynomial-time verification of PCTL properties of MDPs with convex uncertainties. In: Sharygina, N., Veith, H. (eds.) CAV 2013. LNCS, vol. 8044, pp. 527–542. Springer, Heidelberg (2013). doi:10.1007/978-3-642-39799-8_35

    Chapter  Google Scholar 

  69. Puterman, M.L.: Markov Decision Processes: Discrete Stochastic Dynamic Programming. Wiley, New York (1994)

    Book  MATH  Google Scholar 

  70. Ramalingam, G.: Data flow frequency analysis. In: ACM SIGPLAN Notices, vol. 31, pp. 267–277. ACM (1996)

    Google Scholar 

  71. Robert, C.: The Bayesian Choice: From Decision-Theoretic Foundations to Computational Implementation. Springer Texts in Statistics. Springer, New York (2007)

    MATH  Google Scholar 

  72. Robert, C., Casella, G.: Monte Carlo Statistical Methods. Springer, New York (2013)

    MATH  Google Scholar 

  73. Robert, C.P., Casella, G.: Monte Carlo Statistical Methods. Springer, New York (2005)

    MATH  Google Scholar 

  74. Sang, T., Beame, P., Kautz, H.: Heuristics for fast exact model counting. In: Bacchus, F., Walsh, T. (eds.) SAT 2005. LNCS, vol. 3569, pp. 226–240. Springer, Heidelberg (2005). doi:10.1007/11499107_17

    Chapter  Google Scholar 

  75. Schmidt, D.A.: Data flow analysis is model checking of abstract interpretations. In: Proceedings of the 25th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, pp. 38–48. ACM (1998)

    Google Scholar 

  76. Sen, K., Marinov, D., Agha, G.: CUTE: A concolic unit testing engine for C (2005)

    Google Scholar 

  77. Smith, M.J.: Probabilistic abstract interpretation of imperative programs using truncated normal distributions. Electron. Notes Theor. Comput. Sci. 220(3), 43–59 (2008)

    Article  MATH  Google Scholar 

  78. Song, D., et al.: BitBlaze: a new approach to computer security via binary analysis. In: Sekar, R., Pujari, A.K. (eds.) ICISS 2008. LNCS, vol. 5352, pp. 1–25. Springer, Heidelberg (2008). doi:10.1007/978-3-540-89862-7_1

    Chapter  Google Scholar 

  79. Thakur, A., Elder, M., Reps, T.: Bilateral algorithms for symbolic abstraction. In: Miné, A., Schmidt, D. (eds.) SAS 2012. LNCS, vol. 7460, pp. 111–128. Springer, Heidelberg (2012). doi:10.1007/978-3-642-33125-1_10

    Chapter  Google Scholar 

  80. The Apache Software Foundation: Commons math. http://commons.apache.org/proper/commons-math/. Accessed 16 Dec 2014

  81. Thurley, M.: sharpSAT – counting models with advanced component caching and implicit BCP. In: Biere, A., Gomes, C.P. (eds.) SAT 2006. LNCS, vol. 4121, pp. 424–429. Springer, Heidelberg (2006). doi:10.1007/11814948_38

    Chapter  Google Scholar 

  82. Thurow, L.C.: Analyzing the American income distribution. Am. Econ. Rev. 60, 261–269 (1970)

    Google Scholar 

  83. UC Davis, Mathematics: LattE. http://www.math.ucdavis.edu/latte

  84. Vallée-Rai, R., Co, P., Gagnon, E., Hendren, L., Lam, P., Sundaresan, V.: Soot-a Java bytecode optimization framework. In: Proceedings of the 1999 Conference of the Centre for Advanced Studies on Collaborative Research, p. 13. IBM Press (1999)

    Google Scholar 

  85. Verdoolaege, S.: Software package barvinok (2004). http://freshmeat.net/projects/barvinok

  86. Wachter, B., Zhang, L.: Best probabilistic transformers. In: Barthe, G., Hermenegildo, M. (eds.) VMCAI 2010. LNCS, vol. 5944, pp. 362–379. Springer, Heidelberg (2010). doi:10.1007/978-3-642-11319-2_26

    Chapter  Google Scholar 

  87. Zuliani, P., Platzer, A., Clarke, E.M.: Bayesian statistical model checking with application to simulink/stateflow verification. In: Proceedings of the 13th ACM International Conference on Hybrid Systems: Computation and Control, pp. 243–252. ACM (2010)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Matthew B. Dwyer .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Dwyer, M.B., Filieri, A., Geldenhuys, J., Gerrard, M., Păsăreanu, C.S., Visser, W. (2017). Probabilistic Program Analysis. In: Cunha, J., Fernandes, J., Lämmel, R., Saraiva, J., Zaytsev, V. (eds) Grand Timely Topics in Software Engineering. GTTSE 2015. Lecture Notes in Computer Science(), vol 10223. Springer, Cham. https://doi.org/10.1007/978-3-319-60074-1_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-60074-1_1

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-60073-4

  • Online ISBN: 978-3-319-60074-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics