skip to main content
10.1145/3377811.3380425acmconferencesArticle/Chapter ViewAbstractPublication PagesicseConference Proceedingsconference-collections
research-article

Pipelining bottom-up data flow analysis

Authors Info & Claims
Published:01 October 2020Publication History

ABSTRACT

Bottom-up program analysis has been traditionally easy to parallelize because functions without caller-callee relations can be analyzed independently. However, such function-level parallelism is significantly limited by the calling dependence - functions with caller-callee relations have to be analyzed sequentially because the analysis of a function depends on the analysis results, a.k.a., function summaries, of its callees. We observe that the calling dependence can be relaxed in many cases and, as a result, the parallelism can be improved. In this paper, we present Coyote, a framework of bottom-up data flow analysis, in which the analysis task of each function is elaborately partitioned into multiple sub-tasks to generate pipelineable function summaries. These sub-tasks are pipelined and run in parallel, even though the calling dependence exists. We formalize our idea under the IFDS/IDE framework and have implemented an application to checking null-dereference bugs and taint issues in C/C++ programs. We evaluate Coyote on a series of standard benchmark programs and open-source software systems, which demonstrates significant speedup over a conventional parallel design.

References

  1. Aws Albarghouthi, Rahul Kumar, Aditya V Nori, and Sriram K Rajamani. 2012. Parallelizing top-down interprocedural analyses. In Proceedings of the 33rd ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI '12). ACM, 217--228.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Nicholas Allen, Padmanabhan Krishnan, and Bernhard Scholz. 2015. Combining type-analysis with points-to analysis for analyzing Java library source-code. In Proceedings of the 4th ACM SIGPLAN International Workshop on State Of the Art in Program Analysis (SOAP '15). ACM, 13--18.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Steven Arzt, Siegfried Rasthofer, Christian Fritz, Eric Bodden, Alexandre Bartel, Jacques Klein, Yves Le Traon, Damien Octeau, and Patrick McDaniel. 2014. Flowdroid: Precise context, flow, field, object-sensitive and lifecycle-aware taint analysis for android apps. In Proceedings of the 35th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI '14). ACM, 259--269.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Domagoj Babic and Alan J. Hu. 2008. Calysto: Scalable and precise extended static checking. In Proceedings of the 30th International Conference on Software Engineering (ICSE '08). IEEE, 211--220.Google ScholarGoogle Scholar
  5. Thomas Ball, Vladimir Levin, and Sriram K Rajamani. 2011. A decade of software model checking with SLAM. Commun. ACM 54, 7 (2011), 68--76.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Jiri Barnat, Lubos Brim, and Jitka Stříbrná. 2001. Distributed LTL model-checking in SPIN. In International SPIN Workshop on Model Checking of Software. Springer, 200--216.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Martin Bravenboer and Yannis Smaragdakis. 2009. Strictly declarative specification of sophisticated points-to analyses. In Proceedings of the 24th ACM SIGPLAN Conference on Object Oriented Programming Systems Languages and Applications (OOPSLA '09). ACM, 243--262.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Cristiano Calcagno, Dino Distefano, Peter W. O'Hearn, and Hongseok Yang. 2011. Compositional shape analysis by means of bi-abduction. J. ACM 58, 6 (2011), 26:1--26:66.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Sagar Chaki, Edmund M Clarke, Alex Groce, Somesh Jha, and Helmut Veith. 2004. Modular verification of software components in C. IEEE Transactions on Software Engineering 30, 6 (2004), 388--402.Google ScholarGoogle ScholarCross RefCross Ref
  10. Sigmund Cherem, Lonnie Princehouse, and Radu Rugina. 2007. Practical memory leak detection using guarded value-flow analysis. In Proceedings of the 28th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI '07). ACM, 480--491.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Chia Yuan Cho, Vijay D'Silva, and Dawn Song. 2013. BLITZ: Compositional bounded model checking for real-world programs. In Proceedings of the 28th IEEE/ACM International Conference on Automated Software Engineering (ASE '13). IEEE, 136--146.Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Liviu Ciortea, Cristian Zamfir, Stefan Bucur, Vitaly Chipounov, and George Candea. 2010. Cloud9: A software testing service. ACM SIGOPS Operating Systems Review 43, 4 (2010), 5--10.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Leonardo De Moura and Nikolaj Bjørner. 2008. Z3: An efficient SMT solver. In International conference on Tools and Algorithms for the Construction and Analysis of Systems. Springer, 337--340.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Kyle Dewey, Vineeth Kashyap, and Ben Hardekopf. 2015. A parallel abstract interpreter for JavaScript. In 2015 IEEE/ACM International Symposium on Code Generation and Optimization (CGO '15). IEEE, 34--45.Google ScholarGoogle ScholarCross RefCross Ref
  15. Isil Dillig, Thomas Dillig, and Alex Aiken. 2008. Sound, complete and scalable path-sensitive analysis. In Proceedings of the 29th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI '08). ACM, 270--280.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Isil Dillig, Thomas Dillig, Alex Aiken, and Mooly Sagiv. 2011. Precise and compact modular procedure summaries for heap manipulating programs. In Proceedings of the 32nd ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI '11). ACM, 567--577.Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Matthew B Dwyer, Sebastian Elbaum, Suzette Person, and Rahul Purandare. 2007. Parallel randomized state-space search. In Proceedings of the 29th International Conference on Software Engineering (ICSE '07). IEEE, 3--12.Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Marcus Edvinsson, Jonas Lundberg, and Welf Löwe. 2011. Parallel points-to analysis for multi-core machines. In Proceedings of the 6th International Conference on High Performance and Embedded Architectures and Compilers. ACM, 45--54.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Stephen J Fink, Eran Yahav, Nurit Dor, G Ramalingam, and Emmanuel Geay. 2008. Effective typestate verification in the presence of aliasing. ACM Transactions on Software Engineering and Methodology (TOSEM) 17, 2 (2008), 9.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Sumit Ganguly, Avi Silberschatz, and Shalom Tsur. 1990. A Framework for the Parallel Processing of Datalog Queries. In Proceedings of the 1990 ACM SIGMOD International Conference on Management of Data (SIGMOD '90). ACM, 143--152.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Diego Garbervetsky, Edgardo Zoppi, and Benjamin Livshits. 2017. Toward full elasticity in distributed static analysis: the case of callgraph analysis. In Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering (FSE '17). ACM, 442--453.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Orna Grumberg, Tamir Heyman, Nili Ifergan, and Assaf Schuster. 2005. Achieving speedups in distributed symbolic reachability analysis through asynchronous computation. In Advanced Research Working Conference on Correct Hardware Design and Verification Methods. Springer, 129--145.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Salvatore Guarnieri, Marco Pistoia, Omer Tripp, Julian Dolby, Stephen Teilhet, and Ryan Berg. 2011. Saving the world wide web from vulnerable JavaScript. In Proceedings of the 2011 International Symposium on Software Testing and Analysis (ISSTA '11). ACM, 177--187.Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Ben Hardekopf and Calvin Lin. 2011. Flow-sensitive pointer analysis for millions of lines of code. In Code Generation and Optimization (CGO), 2011 9th Annual IEEE/ACM International Symposium on. IEEE, 289--298.Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Behnaz Hassanshahi, Raghavendra Kagalavadi Ramesh, Padmanabhan Krishnan, Bernhard Scholz, and Yi Lu. 2017. An efficient tunable selective points-to analysis for large codebases. In Proceedings of the 6th ACM SIGPLAN International Workshop on State Of the Art in Program Analysis (SOAP '17). ACM, 13--18.Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Gerard J Holzmann and Dragan Bosnacki. 2007. The design of a multicore extension of the SPIN model checker. IEEE Transactions on Software Engineering 33, 10 (2007), 659--674.Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. G. Hulin. 1989. Parallel Processing of Recursive Queries in Distributed Architectures. In Proceedings of the 15th International Conference on Very Large Data Bases (VLDB '89). Morgan Kaufmann Publishers Inc., 87--96.Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Herbert Jordan, Pavle Subotić, David Zhao, and Bernhard Scholz. 2019. A specialized B-tree for concurrent datalog evaluation. In Proceedings of the 24th Symposium on Principles and Practice of Parallel Programming (PPoPP '19). ACM, 327--339.Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Yong-fong Lee and Barbara G Ryder. 1992. A comprehensive approach to parallel data flow analysis. In Proceedings of the 6th International Conference on Supercomputing. ACM, 236--247.Google ScholarGoogle Scholar
  30. Jan Karel Lenstra and AHG Rinnooy Kan. 1978. Complexity of scheduling under precedence constraints. Operations Research 26, 1 (1978), 22--35.Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Bozhen Liu, Jeff Huang, and Lawrence Rauchwerger. 2019. Rethinking Incremental and Parallel Pointer Analysis. ACM Transactions on Programming Languages and Systems (TOPLAS) 41, 1 (2019), 6.Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Benjamin Livshits, Manu Sridharan, Yannis Smaragdakis, Ondřej Lhoták, J Nelson Amaral, Bor-Yuh Evan Chang, Samuel Z Guyer, Uday P Khedker, Anders Møller, and Dimitrios Vardoulakis. 2015. In defense of soundiness: a manifesto. Commun. ACM 58, 2 (2015), 44--46.Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Nuno P Lopes and Andrey Rybalchenko. 2011. Distributed and predictable software model checking. In International Workshop on Verification, Model Checking, and Abstract Interpretation. Springer, 340--355.Google ScholarGoogle ScholarCross RefCross Ref
  34. Carlos Alberto Martínez-Angeles, Inês Dutra, Vítor Santos Costa, and Jorge Buenabad-Chávez. 2013. A datalog engine for gpus. In Declarative Programming and Knowledge Management. Springer, 152--168.Google ScholarGoogle Scholar
  35. Scott McPeak, Charles-Henri Gros, and Murali Krishna Ramanathan. 2013. Scalable and incremental software bug detection. In Proceedings of the 2013 9th Joint Meeting on Foundations of Software Engineering (ESEC/FSE '13). ACM, 554--564.Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Mario Mendez-Lojo, Martin Burtscher, and Keshav Pingali. 2012. A GPU implementation of inclusion-based points-to analysis. In Proceedings of the 17th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP '12). ACM, 107--116.Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Mario Méndez-Lojo, Augustine Mathew, and Keshav Pingali. 2010. Parallel inclusion-based points-to analysis. In Proceedings of the ACM International Conference on Object Oriented Programming Systems Languages and Applications (OOPSLA '10). ACM, 428--443.Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. David Monniaux. 2005. The parallel implementation of the Astrée static analyzer. In Asian Symposium on Programming Languages and Systems. Springer, 86--96.Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Nomair A Naeem and Ondrej Lhotak. 2008. Typestate-like analysis of multiple interacting objects. In Proceedings of the 23rd ACM SIGPLAN Conference on Object-oriented Programming Systems Languages and Applications (OOPSLA '08). ACM, 347--366.Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Nomair A Naeem and Ondrej Lhoták. 2009. Efficient alias set analysis using SSA form. In Proceedings of the 2009 International Symposium on Memory Management (ISMM '09). ACM, 79--88.Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Vaivaswatha Nagaraj and R Govindarajan. 2013. Parallel flow-sensitive pointer analysis by graph-rewriting. In Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques. IEEE, 19--28.Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Damien Octeau, Patrick McDaniel, Somesh Jha, Alexandre Bartel, Eric Bodden, Jacques Klein, and Yves Le Traon. 2013. Effective inter-component communication mapping in android: An essential step towards holistic security analysis. In Presented as part of the 22nd USENIX Security Symposium (USENIX Security '13). USENIX Association, 543--558.Google ScholarGoogle Scholar
  43. Tarun Prabhu, Shreyas Ramalingam, Matthew Might, and Mary Hall. 2011. EigenCFA: Accelerating flow analysis with GPUs. In Proceedings of the 38th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL '11). ACM, 511--522.Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Sandeep Putta and Rupesh Nasre. 2012. Parallel replication-based points-to analysis. In International Conference on Compiler Construction (CC '12). Springer, 61--80.Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Thomas Reps, Susan Horwitz, and Mooly Sagiv. 1995. Precise interprocedural dataflow analysis via graph reachability. In Proceedings of the 22nd ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL '95). ACM, 49--61.Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Thomas Reps, Susan Horwitz, Mooly Sagiv, and Genevieve Rosay. 1994. Speeding up slicing. In Proceedings of the 2nd ACM SIGSOFT Symposium on Foundations of Software Engineering (FSE '94). ACM, 11--20.Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Noam Rinetzky, Mooly Sagiv, and Eran Yahav. 2005. Interprocedural shape analysis for cutpoint-free programs. In International Static Analysis Symposium. Springer, 284--302.Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Jonathan Rodriguez and Ondřej Lhoták. 2011. Actor-based parallel dataflow analysis. In International Conference on Compiler Construction (CC '11). Springer, 179--197.Google ScholarGoogle ScholarCross RefCross Ref
  49. Atanas Rountev, Mariana Sharp, and Guoqing Xu. 2008. IDE dataflow analysis in the presence of large object-oriented libraries. In International Conference on Compiler Construction (CC '08). Springer, 53--68.Google ScholarGoogle ScholarCross RefCross Ref
  50. Mooly Sagiv, Thomas Reps, and Susan Horwitz. 1996. Precise interprocedural dataflow analysis with applications to constant propagation. Theoretical Computer Science 167, 1 (1996), 131--170.Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Bernhard Scholz, Herbert Jordan, Pavle Subotić, and Till Westmann. 2016. On fast large-scale program analysis in datalog. In International Conference on Compiler Construction (CC '16). ACM, 196--206.Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Jürgen Seib and Georg Lausen. 1991. Parallelizing Datalog programs by generalized pivoting. In Proceedings of the tenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems. ACM, 241--251.Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Marianne Shaw, Paraschos Koutris, Bill Howe, and Dan Suciu. 2012. Optimizing large-scale Semi-Naïve datalog evaluation in hadoop. In International Datalog 2.0 Workshop. Springer, 165--176.Google ScholarGoogle Scholar
  54. Qingkai Shi, Xiao Xiao, Rongxin Wu, Jinguo Zhou, Gang Fan, and Charles Zhang. 2018. Pinpoint: Fast and precise sparse value flow analysis for million lines of code. In Proceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI '18). ACM, 693--706.Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. Sharon Shoham, Eran Yahav, Stephen J Fink, and Marco Pistoia. 2008. Static specification mining using automata-based abstractions. IEEE Transactions on Software Engineering 34, 5 (2008), 651--666.Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. Bjarne Steensgaard. 1996. Points-to analysis in almost linear time. In Proceedings of the 23rd ACM SIGPLAN-SIGACT symposium on Principles of programming languages. ACM, 32--41.Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. Yu Su, Ding Ye, and Jingling Xue. 2014. Parallel pointer analysis with CFL-reachability. In 2014 43rd International Conference on Parallel Processing. IEEE, 451--460.Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. Yulei Sui, Ding Ye, and Jingling Xue. 2014. Detecting memory leaks statically with full-sparse value-flow analysis. IEEE Transactions on Software Engineering 40, 2 (2014), 107--122.Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. Omer Tripp, Marco Pistoia, Patrick Cousot, Radhia Cousot, and Salvatore Guarnieri. 2013. Andromeda: Accurate and scalable security analysis of web applications. In International Conference on Fundamental Approaches to Software Engineering. Springer, 210--225.Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. Kai Wang, Aftab Hussain, Zhiqiang Zuo, Guoqing Xu, and Ardalan Amiri Sani. 2017. Graspan: A single-machine disk-based graph system for interprocedural static analyses of large-scale systems code. ACM SIGOPS Operating Systems Review 51, 2 (2017), 389--404.Google ScholarGoogle ScholarCross RefCross Ref
  61. Ouri Wolfson and Aya Ozeri. 1990. A New Paradigm for Parallel and Distributed Rule-processing. In Proceedings of the 1990 ACM SIGMOD International Conference on Management of Data (SIGMOD '90). ACM, 133--142.Google ScholarGoogle ScholarDigital LibraryDigital Library
  62. Ouri Wolfson and Avi Silberschatz. 1988. Distributed Processing of Logic Programs. In Proceedings of the 1988 ACM SIGMOD International Conference on Management of Data (SIGMOD '88). ACM, 329--336.Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. Yichen Xie and Alex Aiken. 2005. Context- and path-sensitive memory leak detection. In Proceedings of the 10th European Software Engineering Conference Held Jointly with 13th ACM SIGSOFT International Symposium on Foundations of Software Engineering (ESEC/FSE '05). ACM, 115--125.Google ScholarGoogle ScholarDigital LibraryDigital Library
  64. Yichen Xie and Alex Aiken. 2005. Scalable error detection using Boolean satisfiability. In Proceedings of the 32nd ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL '05). ACM, 351--363.Google ScholarGoogle ScholarDigital LibraryDigital Library
  65. Hongseok Yang, Oukseh Lee, Josh Berdine, Cristiano Calcagno, Byron Cook, Dino Distefano, and Peter O'Hearn. 2008. Scalable shape analysis for systems code. In International Conference on Computer Aided Verification. Springer, 385--398.Google ScholarGoogle ScholarDigital LibraryDigital Library
  66. Mohan Yang, Alexander Shkapsky, and Carlo Zaniolo. 2017. Scaling up the performance of more powerful Datalog systems on multicore machines. The VLDB Journal - The International Journal on Very Large Data Bases 26, 2 (2017), 229--248.Google ScholarGoogle ScholarDigital LibraryDigital Library
  67. Greta Yorsh, Eran Yahav, and Satish Chandra. 2008. Generating precise and concise procedure summaries. In Proceedings of the 35th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL '08). ACM, 221--234.Google ScholarGoogle ScholarDigital LibraryDigital Library
  68. Zhiqiang Zuo, John Thorpe, Yifei Wang, Qiuhong Pan, Shenming Lu, Kai Wang, Guoqing Harry Xu, Linzhang Wang, and Xuandong Li. 2019. Grapple: A graph system for static finite-state property checking of large-scale systems code. In Proceedings of the Fourteenth EuroSys Conference 2019 (EuroSys '19). ACM, 38.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Pipelining bottom-up data flow analysis

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      ICSE '20: Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering
      June 2020
      1640 pages
      ISBN:9781450371216
      DOI:10.1145/3377811

      Copyright © 2020 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 1 October 2020

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      Overall Acceptance Rate276of1,856submissions,15%

      Upcoming Conference

      ICSE 2025

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader