ABSTRACT
When dealing with millions of lines of code, we still cannot have the cake and eat it: sparse value-flow analysis is powerful in checking source-sink problems, but existing work cannot escape from the “pointer trap” – a precise points-to analysis limits its scalability and an imprecise one seriously undermines its precision. We present Pinpoint, a holistic approach that decomposes the cost of high-precision points-to analysis by precisely discovering local data dependence and delaying the expensive inter-procedural analysis through memorization. Such memorization enables the on-demand slicing of only the necessary inter-procedural data dependence and path feasibility queries, which are then solved by a costly SMT solver. Experiments show that Pinpoint can check programs such as MySQL (around 2 million lines of code) within 1.5 hours. The overall false positive rate is also very low (14.3% - 23.6%). Pinpoint has discovered over forty real bugs in mature and extensively checked open source systems. And the implementation of Pinpoint and all experimental results are freely available.
Supplemental Material
- Alex Aiken, Suhabe Bugrara, Isil Dillig, Thomas Dillig, Brian Hackett, and Peter Hawkins. 2006. The Saturn Program Analysis System. Stanford University.Google Scholar
- Steven Arzt, Siegfried Rasthofer, Christian Fritz, Eric Bodden, Alexandre Bartel, Jacques Klein, Yves Le Traon, Damien Octeau, and Patrick McDaniel. 2014. Flowdroid: Precise context, flow, field, object-sensitive and lifecycle-aware taint analysis for android apps. Acm Sigplan Notices 49, 6 (2014), 259–269. Google ScholarDigital Library
- D. Babic and A. Hu. 2008. Calysto: Scalable and Precise Extended Static Checking. In 2008 ACM/IEEE 30th International Conference on Software Engineering (ICSE 2008). IEEE, 211–220. Google ScholarDigital Library
- Thomas Ball and Sriram K. Rajamani. 2002. The SLAM Project: Debugging System Software via Static Analysis. In Proceedings of the 29th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL ’02). ACM, 1–3. Google ScholarDigital Library
- Al Bessey, Ken Block, Ben Chelf, Andy Chou, Bryan Fulton, Seth Hallem, Charles Henri-Gros, Asya Kamsky, Scott McPeak, and Dawson Engler. 2010. A few billion lines of code later: using static analysis to find bugs in the real world. Commun. ACM 53, 2 (2010), 66–75. Google ScholarDigital Library
- Frederick E Boland Jr and Paul E Black. 2012. The Juliet 1.1 C/C++ and Java Test Suite. Computer (IEEE Computer) 45, 10 (2012). Google ScholarDigital Library
- Juan Caballero, Gustavo Grieco, Mark Marron, and Antonio Nappa. 2012. Undangle: early detection of dangling pointers in use-after-free and double-free vulnerabilities. In Proceedings of the 2012 International Symposium on Software Testing and Analysis. ACM, 133–143. Google ScholarDigital Library
- Sagar Chaki, Edmund M Clarke, Alex Groce, Somesh Jha, and Helmut Veith. 2004. Modular verification of software components in C. IEEE Transactions on Software Engineering 30, 6 (2004), 388–402. Google ScholarDigital Library
- Sigmund Cherem, Lonnie Princehouse, and Radu Rugina. 2007. Practical Memory Leak Detection Using Guarded Value-flow Analysis. In Proceedings of the 28th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI ’07). ACM, 480–491. Google ScholarDigital Library
- Chia Yuan Cho, Vijay D’Silva, and Dawn Song. 2013. Blitz: Compositional bounded model checking for real-world programs. In Automated Software Engineering (ASE), 2013 IEEE/ACM 28th International Conference on. IEEE, 136–146. Google ScholarDigital Library
- Edmund Clarke, Daniel Kroening, Natasha Sharygina, and Karen Yorav. 2004. Predicate Abstraction of ANSI-C Programs Using SAT. Formal Methods in System Design 25, 2 (2004), 105–127. Google ScholarDigital Library
- Edmund Clarke, Daniel Kroening, and Karen Yorav. 2003. Behavioral consistency of C and Verilog programs using bounded model checking. In Proceedings of the 40th annual Design Automation Conference. ACM, 368–371. Google ScholarDigital Library
- Manuvir Das, Sorin Lerner, and Mark Seigle. 2002. ESP: Path-sensitive Program Verification in Polynomial Time. In Proceedings of the ACM SIGPLAN 2002 Conference on Programming Language Design and Implementation (PLDI ’02). ACM, 57–68. Google ScholarDigital Library
- Leonardo De Moura and Nikolaj Bjørner. 2008. Z3: An efficient SMT solver. In International conference on Tools and Algorithms for the Construction and Analysis of Systems. Springer, 337–340. Google ScholarDigital Library
- Jeffrey Dean, David Grove, and Craig Chambers. 1995. Optimization of object-oriented programs using static class hierarchy analysis. In European Conference on Object-Oriented Programming. Springer, 77– 101. Google ScholarDigital Library
- David Dewey, Bradley Reaves, and Patrick Traynor. 2015. Uncovering Use-After-Free Conditions in Compiled Code. In Availability, Reliability and Security (ARES), 2015 10th International Conference on. IEEE, 90–99. Google ScholarDigital Library
- Isil Dillig, Thomas Dillig, and Alex Aiken. 2008. Sound, complete and scalable path-sensitive analysis. In ACM SIGPLAN Notices, Vol. 43. ACM, 270–280. Google ScholarDigital Library
- Isil Dillig, Thomas Dillig, Alex Aiken, and Mooly Sagiv. 2011. Precise and compact modular procedure summaries for heap manipulating programs. In ACM SIGPLAN Notices, Vol. 46. ACM, 567–577. Google ScholarDigital Library
- Lisa Nguyen Quang Do, Karim Ali, Benjamin Livshits, Eric Bodden, Justin Smith, and Emerson Murphy-Hill. 2017. Just-in-time static analysis. In Proceedings of the 26th ACM SIGSOFT International Symposium on Software Testing and Analysis. ACM, 307–317. Google ScholarDigital Library
- N. Dor, S. Adams, M. Das, and Z. Yang. 2004. Software Validation via scalable path-sensitive value flow analysis. In Proceedings of the 2004 ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA ’04). ACM, 12–22. Google ScholarDigital Library
- Josselin Feist, Laurent Mounier, and Marie-Laure Potet. 2014. Statically detecting use after free on binary code. Journal of Computer Virology and Hacking Techniques 10, 3 (2014), 211–217.Google ScholarCross Ref
- Jeanne Ferrante, Karl J. Ottenstein, and Joe D. Warren. 1987. The Program Dependence Graph and Its Use in Optimization. ACM Trans. Program. Lang. Syst. 9, 3 (1987), 319–349. Google ScholarDigital Library
- Neville Grech and Yannis Smaragdakis. 2017. P/Taint: Unified Pointsto and Taint Analysis. Proc. ACM Program. Lang. 1, OOPSLA (2017), 102:1–102:28. Google ScholarDigital Library
- Samuel Guyer and Calvin Lin. 2003. Client-driven pointer analysis. Static Analysis (2003), 1073–1073. Google ScholarDigital Library
- Samuel Z Guyer and Calvin Lin. 2005. Error checking with clientdriven pointer analysis. Science of Computer Programming 58, 1-2 (2005), 83–114. Google ScholarDigital Library
- Nevin Heintze and Olivier Tardieu. 2001. Demand-driven pointer analysis. In ACM SIGPLAN Notices, Vol. 36. ACM, 24–34. Google ScholarDigital Library
- Thomas A. Henzinger, Ranjit Jhala, Rupak Majumdar, and Grégoire Sutre. 2002. Lazy Abstraction. In Proceedings of the 29th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL ’02). ACM, 58–70. Google ScholarDigital Library
- Michael Hind. 2001. Pointer analysis: Haven’t we solved this problem yet?. In Proceedings of the 2001 ACM SIGPLAN-SIGSOFT workshop on Program analysis for software tools and engineering. ACM, 54–61. Google ScholarDigital Library
- David Hovemeyer and William Pugh. 2007. Finding more null pointer bugs, but not too many. In Proceedings of the 7th ACM SIGPLAN-SIGSOFT workshop on Program analysis for software tools and engineering. ACM, 9–14. Google ScholarDigital Library
- David Hovemeyer, Jaime Spacco, and William Pugh. 2005. Evaluating and tuning a static analysis to find null pointer bugs. In ACM SIGSOFT Software Engineering Notes, Vol. 31. ACM, 13–19. Google ScholarDigital Library
- James C King. 1976. Symbolic execution and program testing. Commun. ACM 19, 7 (1976), 385–394. Google ScholarDigital Library
- Chris Lattner and Vikram Adve. 2004. LLVM: A compilation framework for lifelong program analysis & transformation. In Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization. IEEE, 75. Google ScholarDigital Library
- Chris Lattner, Andrew Lenharth, and Vikram Adve. 2007. Making context-sensitive points-to analysis with heap cloning practical for the real world. ACM SIGPLAN Notices 42, 6 (2007), 278–289. Google ScholarDigital Library
- Benjamin Livshits, Manu Sridharan, Yannis Smaragdakis, Ondřej Lhoták, J Nelson Amaral, Bor-Yuh Evan Chang, Samuel Z Guyer, Uday P Khedker, Anders Møller, and Dimitrios Vardoulakis. 2015. In defense of soundiness: a manifesto. Commun. ACM 58, 2 (2015), 44–46. Google ScholarDigital Library
- V Benjamin Livshits and Monica S Lam. 2003. Tracking pointers with path and context sensitivity for bug detection in C programs. ACM SIGSOFT Software Engineering Notes 28, 5 (2003), 317–326. Google ScholarDigital Library
- Scott McPeak, Charles-Henri Gros, and Murali Krishna Ramanathan. 2013. Scalable and incremental software bug detection. In Proceedings of the 2013 9th Joint Meeting on Foundations of Software Engineering. ACM, 554–564. Google ScholarDigital Library
- Nomair A Naeem and Ondrej Lhoták. 2011. Faster Alias Set Analysis Using Summaries.. In CC. Springer, 82–103. Google ScholarDigital Library
- Hakjoo Oh, Kihong Heo, Wonchan Lee, Woosuk Lee, and Kwangkeun Yi. 2012. Design and implementation of sparse global analyses for C-like languages. In ACM SIGPLAN Notices, Vol. 47. ACM, 229–238. Google ScholarDigital Library
- Thomas Reps, Susan Horwitz, and Mooly Sagiv. 1995. Precise interprocedural dataflow analysis via graph reachability. In Proceedings of the 22nd ACM SIGPLAN-SIGACT symposium on Principles of programming languages. ACM, 49–61. Google ScholarDigital Library
- Wolf-Steffen Rödiger. 2011. Merging Static Analysis and model checking for improved security vulnerability detection. Ph.D. Dissertation. Master thesis, Dept. of Com. Sc. Augsburg University.Google Scholar
- Diptikalyan Saha and CR Ramakrishnan. 2005. Incremental and demand-driven points-to analysis using logic programming. In Proceedings of the 7th ACM SIGPLAN international conference on Principles and practice of declarative programming. ACM, 117–128. Google ScholarDigital Library
- LA Sandra. 1994. PHB Practical Handbook of Curve Fitting.Google Scholar
- G Snelting, T Robschink, and J Krinke. 2006. Efficient Path Conditions in Dependence Graphs for Software Safety Analysis. ACM Transactions on Software Engineering and Methodology (TOSEM) 15, 4 (2006), 410– 457. Google ScholarDigital Library
- Manu Sridharan, Denis Gopan, Lexin Shan, and Rastislav Bodík. 2005. Demand-driven points-to analysis for Java. In ACM SIGPLAN Notices, Vol. 40. ACM, 59–76. Google ScholarDigital Library
- Yulei Sui and Jingling Xue. 2016. SVF: Interprocedural static value-flow analysis in LLVM. In Proceedings of the 25th International Conference on Compiler Construction. ACM, 265–266. Google ScholarDigital Library
- Yulei Sui and Jingling Xue. 2016. SVF: Interprocedural Static Value-flow Analysis in LLVM. In Proceedings of the 25th International Conference on Compiler Construction (CC 2016). ACM, 265–266. Google ScholarDigital Library
- Y. Sui, D. Ye, and J. Xue. 2014. Detecting Memory Leaks Statically with Full-Sparse Value-Flow Analysis. IEEE Transactions on Software Engineering 40, 2 (2014), 107–122. Google ScholarDigital Library
- Peng Tu and David Padua. 1995. Efficient building and placing of gating functions. ACM SIGPLAN Notices 30, 6 (1995), 47–55. Google ScholarDigital Library
- Mark N Wegman and F Kenneth Zadeck. 1991. Constant propagation with conditional branches. ACM Transactions on Programming Languages and Systems (TOPLAS) 13, 2 (1991), 181–210. Google ScholarDigital Library
- John Whaley and Monica S Lam. 2004. Cloning-based context-sensitive pointer alias analysis using binary decision diagrams. In ACM SIGPLAN Notices, Vol. 39. ACM, 131–144. Google ScholarDigital Library
- Robert P Wilson and Monica S Lam. 1995. Efficient context-sensitive pointer analysis for C programs. Vol. 30. ACM. Google ScholarDigital Library
- Yichen Xie and Alex Aiken. 2005. Context-and path-sensitive memory leak detection. In ACM SIGSOFT Software Engineering Notes, Vol. 30. ACM, 115–125. Google ScholarDigital Library
- Yichen Xie and Alex Aiken. 2005. Scalable Error Detection Using Boolean Satisfiability. In Proceedings of the 32nd ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL ’05). ACM, 351–363. Google ScholarDigital Library
- Dacong Yan, Guoqing Xu, and Atanas Rountev. 2011. Demand-driven context-sensitive alias analysis for Java. In Proceedings of the 2011 International Symposium on Software Testing and Analysis. ACM, 155– 165. Google ScholarDigital Library
- Xin Zheng and Radu Rugina. 2008. Demand-driven alias analysis for C. ACM SIGPLAN Notices 43, 1 (2008), 197–208. Google ScholarDigital Library
Index Terms
- Pinpoint: fast and precise sparse value flow analysis for million lines of code
Recommendations
Static memory leak detection using full-sparse value-flow analysis
ISSTA 2012: Proceedings of the 2012 International Symposium on Software Testing and AnalysisWe introduce a static detector, Saber, for detecting memory leaks in C programs. Leveraging recent advances on sparse pointer analysis, Saber is the first to use a full-sparse value-flow analysis for leak detection. Saber tracks the flow of values from ...
Pinpoint: fast and precise sparse value flow analysis for million lines of code
PLDI '18When dealing with millions of lines of code, we still cannot have the cake and eat it: sparse value-flow analysis is powerful in checking source-sink problems, but existing work cannot escape from the “pointer trap” – a precise points-to analysis limits ...
Tracking pointers with path and context sensitivity for bug detection in C programs
This paper proposes a pointer alias analysis for automatic error detection. State-of-the-art pointer alias analyses are either too slow or too imprecise for finding errors in real-life programs. We propose a hybrid pointer analysis that tracks actively ...
Comments