Skip to main content

Accelerating Program Analyses in Datalog by Merging Library Facts

  • Conference paper
  • First Online:
Static Analysis (SAS 2021)

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 12913))

Included in the following conference series:

Abstract

Static program analysis uses sensitivity to balance between precision and scalability. However, finer sensitivity does not necessarily lead to more precise results but may reduce scalability. Recently, a number of approaches have been proposed to finely tune the sensitivity of different program parts. However, these approaches are usually designed for specific program analyses, and their abstraction adjustments are coarse-grained as they directly drop sensitivity elements.

In this paper, we propose a new technique, 4DM, to tune abstractions for program analyses in Datalog. 4DM merges values in a domain, allowing fine-grained sensitivity tuning. 4DM uses a data-driven algorithm for automatically learning a merging strategy for a library from a training set of programs. Unlike existing approaches that rely on the properties of a certain analysis, our learning algorithm works for a wide range of Datalog analyses. We have evaluated our approach on a points-to analysis and a liveness analysis, on the DaCapo benchmark suite. Our evaluation results suggest that our technique achieves a significant speedup and negligible precision loss, reaching a good balance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 69.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 89.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Bell number can be recursively calculated as \(B(n+1)=\varSigma _{k=0}^nC_n^kB(n)\).

References

  1. Barbuti, R., Giacobazzi, R., Levi, G.: A general framework for semantics-based bottom-up abstract interpretation of logic programs. ACM Trans. Program. Lang. Syst. (TOPLAS) 15(1), 133–181 (1993)

    Article  Google Scholar 

  2. Blackburn, S.M., et al.: The DaCapo benchmarks: Java benchmarking development and analysis. In: Proceedings of the 21st Annual ACM SIGPLAN Conference on Object-Oriented Programming Systems, Languages, and Applications, OOPSLA 2006, pp. 169–190. ACM, New York (2006)

    Google Scholar 

  3. Bravenboer, M., Smaragdakis, Y.: Strictly declarative specification of sophisticated points-to analyses. In: Proceedings of the 24th ACM SIGPLAN Conference on Object Oriented Programming Systems Languages and Applications, pp. 243–262 (2009)

    Google Scholar 

  4. Cortesi, A., Filé, G.: Abstract interpretation of logic programs: an abstract domain for groundness, sharing, freeness and compoundness analysis. ACM SIGPLAN Not. 26(9), 52–61 (1991). PEPM 1991

    Article  Google Scholar 

  5. Cousot, P., Cousot, R.: Abstract interpretation and application to logic programs. J. Log. Program. 13(2), 103–179 (1992)

    Article  MathSciNet  Google Scholar 

  6. Debray, S.K.: Global optimization of logic programs (analysis, transformation, compilation) (1987)

    Google Scholar 

  7. Delzanno, G., Giacobazzi, R., Ranzato, F.: Static analysis, abstract interpretation and verification in (constraint logic) programming. In: Dovier, A., Pontelli, E. (eds.) A 25-Year Perspective on Logic Programming. LNCS, vol. 6125, pp. 136–158. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-14309-0_7

    Chapter  MATH  Google Scholar 

  8. Fähndrich, M., Foster, J.S., Su, Z., Aiken, A.: Partial online cycle elimination in inclusion constraint graphs. In: Proceedings of the ACM SIGPLAN 1998 Conference on Programming Language Design and Implementation (PLDI), Montreal, Canada, 17–19 June 1998, pp. 85–96 (1998)

    Google Scholar 

  9. Hardekopf, B., Lin, C.: The ant and the grasshopper: fast and accurate pointer analysis for millions of lines of code. In: Proceedings of the ACM SIGPLAN 2007 Conference on Programming Language Design and Implementation, San Diego, California, USA, 10–13 June 2007, pp. 290–299 (2007)

    Google Scholar 

  10. Jeon, M., Jeong, S., Oh, H.: Precise and scalable points-to analysis via data-driven context tunneling. PACMPL 2(OOPSLA), 140:1–140:29 (2018)

    Google Scholar 

  11. Jeong, S., Jeon, M., Cha, S.D., Oh, H.: Data-driven context-sensitivity for points-to analysis. PACMPL 1(OOPSLA), 100:1–100:28 (2017)

    Google Scholar 

  12. Kulkarni, S., Mangal, R., Zhang, X., Naik, M.: Accelerating program analyses by cross-program training. In: Proceedings of the 2016 ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications, OOPSLA 2016, Part of SPLASH 2016, Amsterdam, The Netherlands, 30 October–4 November 2016, pp. 359–377 (2016)

    Google Scholar 

  13. Lhoták, O., Hendren, L.J.: Context-sensitive points-to analysis: is it worth it? In: Compiler Construction, Proceedings of the 15th International Conference, CC 2006, Held as Part of the Joint European Conferences on Theory and Practice of Software, ETAPS 2006, Vienna, Austria, 30–31 March 2006, pp. 47–64 (2006)

    Google Scholar 

  14. Li, Y., Tan, T., Møller, A., Smaragdakis, Y.: Precision-guided context sensitivity for pointer analysis. PACMPL 2(OOPSLA), 141:1–141:29 (2018)

    Google Scholar 

  15. Li, Y., Tan, T., Møller, A., Smaragdakis, Y.: Scalability-first pointer analysis with self-tuning context-sensitivity. In: Proceedings of the 2018 ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/SIGSOFT FSE 2018, Lake Buena Vista, FL, USA, 04–09 November 2018, pp. 129–140 (2018)

    Google Scholar 

  16. Liang, P., Tripp, O., Naik, M.: Learning minimal abstractions. In: Proceedings of the 38th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL 2011, Austin, TX, USA, 26–28 January 2011, pp. 31–42 (2011)

    Google Scholar 

  17. Nielson, F., Nielson, H.R., Hankin, C.: Principles of Program Analysis. Springer, Heidelberg (1999). https://doi.org/10.1007/978-3-662-03811-6

    Book  MATH  Google Scholar 

  18. pjBench: parallel Java benchmarks (2014)

    Google Scholar 

  19. Shivers, O.: Control-flow analysis of higher-order languages (1991)

    Google Scholar 

  20. Smaragdakis, Y., Kastrinis, G., Balatsouras, G.: Introspective analysis: context-sensitivity, across the board. In: Proceedings of the 35th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2014, pp. 485–495. ACM, New York (2014)

    Google Scholar 

  21. Tan, T., Li, Y., Xue, J.: Efficient and precise points-to analysis: modeling the heap by merging equivalent automata. In: Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2017, pp. 278–291. ACM, New York (2017)

    Google Scholar 

  22. Tang, H., Wang, D., Xiong, Y., Zhang, L., Wang, X., Zhang, L.: Conditional Dyck-CFL reachability analysis for complete and efficient library summarization. In: Yang, H. (ed.) ESOP 2017. LNCS, vol. 10201, pp. 880–908. Springer, Heidelberg (2017). https://doi.org/10.1007/978-3-662-54434-1_33

    Chapter  MATH  Google Scholar 

  23. Tang, H., Wang, X., Zhang, L., Xie, B., Zhang, L., Mei, H.: Summary-based context-sensitive data-dependence analysis in presence of callbacks. In: Proceedings of the 42nd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL 2015, Mumbai, India, 15–17 January 2015, pp. 83–95 (2015)

    Google Scholar 

  24. Xiao, X., Zhang, C.: Geometric encoding: forging the high performance context sensitive points-to analysis for Java. In: Proceedings of the 20th International Symposium on Software Testing and Analysis, ISSTA 2011, Toronto, ON, Canada, 17–21 July 2011, pp. 188–198 (2011)

    Google Scholar 

  25. Xu, G., Rountev, A.: Merging equivalent contexts for scalable heap-cloning-based context-sensitive points-to analysis. In: Proceedings of the 2008 International Symposium on Software Testing and Analysis, ISSTA 2008, pp. 225–236. ACM, New York (2008)

    Google Scholar 

  26. Yan, H., Sui, Y., Chen, S., Xue, J.: Spatio-temporal context reduction: a pointer-analysis-based static approach for detecting use-after-free vulnerabilities. In: Proceedings of the 40th International Conference on Software Engineering, ICSE 2018, Gothenburg, Sweden, 27 May–03 June 2018, pp. 327–337. ACM (2018)

    Google Scholar 

  27. Zhang, X., Mangal, R., Grigore, R., Naik, M., Yang, H.: On abstraction refinement for program analyses in datalog. In: ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2014, Edinburgh, United Kingdom, 09–11 June 2014, pp. 239–248. ACM (2014)

    Google Scholar 

Download references

Acknowledgements

This work is supported in part by the National Key Research and Development Program of China No. 2019YFE0198100, National Natural Science Foundation of China under Grant Nos. 61922003, and a grant from ZTE-PKU Joint Laboratory for Foundation Software.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xin Zhang .

Editor information

Editors and Affiliations

Appendices

A Incremental Merging in N-Domain-Wise Cases

Here we describe how to define an incremental merging function for general cases of N-domain-wise merging.

Suppose we want to define a merging function on N domains \(D_1, D_2, \ldots , D_N\). Without loss of generality, we suppose \(D_i\) can only be a super-domain of \(D_j\) when \(i > j\). We define \(\pi _i\) (\(1 \le i \le N\)) by induction:

  • When \(i = 1\), we can define an arbitrary 1-domain-wise merging function on \(D_1\).

  • Suppose we have defined a function \(\pi _i : D_1\times \dots \times D_i \rightarrow \widehat{D_{1\ldots i}}\), we can define \(\pi _{i+1} : D_1\times \dots \times D_i\times D_{i+1} \rightarrow \widehat{D_{1\ldots i+1}}\) with a helper function \(\pi _{i+1}'\):

    $$\begin{aligned} \pi _{i+1}' : \widehat{D_{1\ldots i}} \times \widehat{D_{i+1}} \rightarrow \widehat{D_{1\ldots i+1}} \end{aligned}$$

    Here \(\widehat{D_{i+1}}\) represents the changed domain of original \(D_{i+1}\) after applying the merging \(\pi _i\), and \(\pi _{i+1}(d_1,\ldots ,d_i,d_{i+1}) = \pi _{i+1}'(\pi _i(d_1, \ldots , d_i), \pi _i(d_{i+1}))\). Note that \(\pi _i(d_{i+1})\), which \(d_{i+1}\) would become under \(\pi _i\), is determined after applying \(\pi _i\) to the Datalog program.

With \(\pi = \pi _N\), we get an N-domain-wise merging function on \(D_1 \times \dots \times D_N\).

B Proof: The Monotonicity of Mergings

Theorem 3 (Monotonicity)

Given a Datalog program C. If the merging \(\pi _b\) of the domains \(D_1, \dots , D_N\) is a finer merging than \(\pi _a\), then applying \(\pi _b\) to C will deduce no fewer results than \(\pi _a\). It means

Proof

Given a Datalog program C, any derivation on the proof tree in has the form

$$\begin{aligned} R^0(\varPi _a(d^0_1), \varPi _a(d^0_2), \ldots ) :\,\!:= \ldots , R^j(\varPi _a(d^j_1), \varPi _a(d^j_2), \ldots ), \ldots . \end{aligned}$$

(note that the concrete values \(\{d^j_i\}\) may not originally match with each other, but get matched after applying \(\varPi _a\)).

As is defined, for any domain D, \(\forall x,y\in D, \varPi _a(x) = \varPi _a(y) \rightarrow \varPi _b(x) = \varPi _b(y)\), so in ’s proof tree,

$$\begin{aligned} R^0(\varPi _b(d^0_1), \varPi _b(d^0_2), \ldots ) :\,\!:= \ldots , R^j(\varPi _b(d^j_1), \varPi _b(d^j_2), \ldots ), \ldots . \end{aligned}$$

also holds as a renaming of the previous instantiation.

Therefore, .

C Detailed Performance of 4DM in Points-to Analysis on Large Projects

Table 2 presents the performance of 4DM by applying learnt heuristics from small projects to large projects. Each column of 4DM shows the result of applying a heuristic learnt from a training set of 12 projects. The partition of training set is the same as in Sect. 6.1.

Table 2. Performance of applying heuristics learnt by 4DM from small projects to large projects.

D Detailed Performance of 4DM in Liveness Analysis

Table 3 shows the detailed results of applying 4DM to liveness analysis.

One practical concern in evaluation is that liveness analysis is usually very fast, hence random events in the processor could have a big influence on the measured run time of compiled executable. We tackle this problem by using Soufflé interpreter to run the analysis and obtain a more reliable record of analysis time.

We use a T-test to check whether the distribution of analysis time changes significantly after merging. The smaller the p-value, the stronger the evidence that two distributions are different. We perform 50 independent runs for each benchmark and analysis. The analysis time is the average of the 50 runs, and the p-value is calculated base on these data.

We measure the precision by the size of calculated live-variable set at all call sites. Note that since our approach is sound, this property is non-decreasing; and the less the size increases, the more precise the analysis is.

Table 3. Performance for liveness analysis. lib1 = {luindex, sunflow, hsqldb}, lib2 = {avrora, batik, bloat}, lib3 = {chart, lusearch, pmd}.

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Chen, Y. et al. (2021). Accelerating Program Analyses in Datalog by Merging Library Facts. In: Drăgoi, C., Mukherjee, S., Namjoshi, K. (eds) Static Analysis. SAS 2021. Lecture Notes in Computer Science(), vol 12913. Springer, Cham. https://doi.org/10.1007/978-3-030-88806-0_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-88806-0_4

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-88805-3

  • Online ISBN: 978-3-030-88806-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics