research-article

Pipelining bottom-up data flow analysis

Authors:
Qingkai Shi

The Hong Kong University of Science and Technology, Hong Kong, China

The Hong Kong University of Science and Technology, Hong Kong, China
View Profile

,
Charles Zhang

The Hong Kong University of Science and Technology, Hong Kong, China

The Hong Kong University of Science and Technology, Hong Kong, China
View Profile

ICSE '20: Proceedings of the ACM/IEEE 42nd International Conference on Software EngineeringJune 2020Pages 835–847https://doi.org/10.1145/3377811.3380425

Published:01 October 2020Publication History

ICSE '20: Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering

Pages 835–847

ABSTRACT

Bottom-up program analysis has been traditionally easy to parallelize because functions without caller-callee relations can be analyzed independently. However, such function-level parallelism is significantly limited by the calling dependence - functions with caller-callee relations have to be analyzed sequentially because the analysis of a function depends on the analysis results, a.k.a., function summaries, of its callees. We observe that the calling dependence can be relaxed in many cases and, as a result, the parallelism can be improved. In this paper, we present Coyote, a framework of bottom-up data flow analysis, in which the analysis task of each function is elaborately partitioned into multiple sub-tasks to generate pipelineable function summaries. These sub-tasks are pipelined and run in parallel, even though the calling dependence exists. We formalize our idea under the IFDS/IDE framework and have implemented an application to checking null-dereference bugs and taint issues in C/C++ programs. We evaluate Coyote on a series of standard benchmark programs and open-source software systems, which demonstrates significant speedup over a conventional parallel design.

References

Aws Albarghouthi, Rahul Kumar, Aditya V Nori, and Sriram K Rajamani. 2012. Parallelizing top-down interprocedural analyses. In Proceedings of the 33rd ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI '12). ACM, 217--228.Google ScholarDigital Library
Nicholas Allen, Padmanabhan Krishnan, and Bernhard Scholz. 2015. Combining type-analysis with points-to analysis for analyzing Java library source-code. In Proceedings of the 4th ACM SIGPLAN International Workshop on State Of the Art in Program Analysis (SOAP '15). ACM, 13--18.Google ScholarDigital Library
Steven Arzt, Siegfried Rasthofer, Christian Fritz, Eric Bodden, Alexandre Bartel, Jacques Klein, Yves Le Traon, Damien Octeau, and Patrick McDaniel. 2014. Flowdroid: Precise context, flow, field, object-sensitive and lifecycle-aware taint analysis for android apps. In Proceedings of the 35th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI '14). ACM, 259--269.Google ScholarDigital Library
Domagoj Babic and Alan J. Hu. 2008. Calysto: Scalable and precise extended static checking. In Proceedings of the 30th International Conference on Software Engineering (ICSE '08). IEEE, 211--220.Google Scholar
Thomas Ball, Vladimir Levin, and Sriram K Rajamani. 2011. A decade of software model checking with SLAM. Commun. ACM 54, 7 (2011), 68--76.Google ScholarDigital Library
Jiri Barnat, Lubos Brim, and Jitka Stříbrná. 2001. Distributed LTL model-checking in SPIN. In International SPIN Workshop on Model Checking of Software. Springer, 200--216.Google ScholarDigital Library
Martin Bravenboer and Yannis Smaragdakis. 2009. Strictly declarative specification of sophisticated points-to analyses. In Proceedings of the 24th ACM SIGPLAN Conference on Object Oriented Programming Systems Languages and Applications (OOPSLA '09). ACM, 243--262.Google ScholarDigital Library
Cristiano Calcagno, Dino Distefano, Peter W. O'Hearn, and Hongseok Yang. 2011. Compositional shape analysis by means of bi-abduction. J. ACM 58, 6 (2011), 26:1--26:66.Google ScholarDigital Library
Sagar Chaki, Edmund M Clarke, Alex Groce, Somesh Jha, and Helmut Veith. 2004. Modular verification of software components in C. IEEE Transactions on Software Engineering 30, 6 (2004), 388--402.Google ScholarCross Ref
Sigmund Cherem, Lonnie Princehouse, and Radu Rugina. 2007. Practical memory leak detection using guarded value-flow analysis. In Proceedings of the 28th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI '07). ACM, 480--491.Google ScholarDigital Library
Chia Yuan Cho, Vijay D'Silva, and Dawn Song. 2013. BLITZ: Compositional bounded model checking for real-world programs. In Proceedings of the 28th IEEE/ACM International Conference on Automated Software Engineering (ASE '13). IEEE, 136--146.Google ScholarDigital Library
Liviu Ciortea, Cristian Zamfir, Stefan Bucur, Vitaly Chipounov, and George Candea. 2010. Cloud9: A software testing service. ACM SIGOPS Operating Systems Review 43, 4 (2010), 5--10.Google ScholarDigital Library
Leonardo De Moura and Nikolaj Bjørner. 2008. Z3: An efficient SMT solver. In International conference on Tools and Algorithms for the Construction and Analysis of Systems. Springer, 337--340.Google ScholarDigital Library
Kyle Dewey, Vineeth Kashyap, and Ben Hardekopf. 2015. A parallel abstract interpreter for JavaScript. In 2015 IEEE/ACM International Symposium on Code Generation and Optimization (CGO '15). IEEE, 34--45.Google ScholarCross Ref
Isil Dillig, Thomas Dillig, and Alex Aiken. 2008. Sound, complete and scalable path-sensitive analysis. In Proceedings of the 29th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI '08). ACM, 270--280.Google ScholarDigital Library
Isil Dillig, Thomas Dillig, Alex Aiken, and Mooly Sagiv. 2011. Precise and compact modular procedure summaries for heap manipulating programs. In Proceedings of the 32nd ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI '11). ACM, 567--577.Google ScholarDigital Library
Matthew B Dwyer, Sebastian Elbaum, Suzette Person, and Rahul Purandare. 2007. Parallel randomized state-space search. In Proceedings of the 29th International Conference on Software Engineering (ICSE '07). IEEE, 3--12.Google ScholarDigital Library
Marcus Edvinsson, Jonas Lundberg, and Welf Löwe. 2011. Parallel points-to analysis for multi-core machines. In Proceedings of the 6th International Conference on High Performance and Embedded Architectures and Compilers. ACM, 45--54.Google ScholarDigital Library
Stephen J Fink, Eran Yahav, Nurit Dor, G Ramalingam, and Emmanuel Geay. 2008. Effective typestate verification in the presence of aliasing. ACM Transactions on Software Engineering and Methodology (TOSEM) 17, 2 (2008), 9.Google ScholarDigital Library
Sumit Ganguly, Avi Silberschatz, and Shalom Tsur. 1990. A Framework for the Parallel Processing of Datalog Queries. In Proceedings of the 1990 ACM SIGMOD International Conference on Management of Data (SIGMOD '90). ACM, 143--152.Google ScholarDigital Library
Diego Garbervetsky, Edgardo Zoppi, and Benjamin Livshits. 2017. Toward full elasticity in distributed static analysis: the case of callgraph analysis. In Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering (FSE '17). ACM, 442--453.Google ScholarDigital Library
Orna Grumberg, Tamir Heyman, Nili Ifergan, and Assaf Schuster. 2005. Achieving speedups in distributed symbolic reachability analysis through asynchronous computation. In Advanced Research Working Conference on Correct Hardware Design and Verification Methods. Springer, 129--145.Google ScholarDigital Library
Salvatore Guarnieri, Marco Pistoia, Omer Tripp, Julian Dolby, Stephen Teilhet, and Ryan Berg. 2011. Saving the world wide web from vulnerable JavaScript. In Proceedings of the 2011 International Symposium on Software Testing and Analysis (ISSTA '11). ACM, 177--187.Google ScholarDigital Library
Ben Hardekopf and Calvin Lin. 2011. Flow-sensitive pointer analysis for millions of lines of code. In Code Generation and Optimization (CGO), 2011 9th Annual IEEE/ACM International Symposium on. IEEE, 289--298.Google ScholarDigital Library
Behnaz Hassanshahi, Raghavendra Kagalavadi Ramesh, Padmanabhan Krishnan, Bernhard Scholz, and Yi Lu. 2017. An efficient tunable selective points-to analysis for large codebases. In Proceedings of the 6th ACM SIGPLAN International Workshop on State Of the Art in Program Analysis (SOAP '17). ACM, 13--18.Google ScholarDigital Library
Gerard J Holzmann and Dragan Bosnacki. 2007. The design of a multicore extension of the SPIN model checker. IEEE Transactions on Software Engineering 33, 10 (2007), 659--674.Google ScholarDigital Library
G. Hulin. 1989. Parallel Processing of Recursive Queries in Distributed Architectures. In Proceedings of the 15th International Conference on Very Large Data Bases (VLDB '89). Morgan Kaufmann Publishers Inc., 87--96.Google ScholarDigital Library
Herbert Jordan, Pavle Subotić, David Zhao, and Bernhard Scholz. 2019. A specialized B-tree for concurrent datalog evaluation. In Proceedings of the 24th Symposium on Principles and Practice of Parallel Programming (PPoPP '19). ACM, 327--339.Google ScholarDigital Library
Yong-fong Lee and Barbara G Ryder. 1992. A comprehensive approach to parallel data flow analysis. In Proceedings of the 6th International Conference on Supercomputing. ACM, 236--247.Google Scholar
Jan Karel Lenstra and AHG Rinnooy Kan. 1978. Complexity of scheduling under precedence constraints. Operations Research 26, 1 (1978), 22--35.Google ScholarDigital Library
Bozhen Liu, Jeff Huang, and Lawrence Rauchwerger. 2019. Rethinking Incremental and Parallel Pointer Analysis. ACM Transactions on Programming Languages and Systems (TOPLAS) 41, 1 (2019), 6.Google ScholarDigital Library
Benjamin Livshits, Manu Sridharan, Yannis Smaragdakis, Ondřej Lhoták, J Nelson Amaral, Bor-Yuh Evan Chang, Samuel Z Guyer, Uday P Khedker, Anders Møller, and Dimitrios Vardoulakis. 2015. In defense of soundiness: a manifesto. Commun. ACM 58, 2 (2015), 44--46.Google ScholarDigital Library
Nuno P Lopes and Andrey Rybalchenko. 2011. Distributed and predictable software model checking. In International Workshop on Verification, Model Checking, and Abstract Interpretation. Springer, 340--355.Google ScholarCross Ref
Carlos Alberto Martínez-Angeles, Inês Dutra, Vítor Santos Costa, and Jorge Buenabad-Chávez. 2013. A datalog engine for gpus. In Declarative Programming and Knowledge Management. Springer, 152--168.Google Scholar
Scott McPeak, Charles-Henri Gros, and Murali Krishna Ramanathan. 2013. Scalable and incremental software bug detection. In Proceedings of the 2013 9th Joint Meeting on Foundations of Software Engineering (ESEC/FSE '13). ACM, 554--564.Google ScholarDigital Library
Mario Mendez-Lojo, Martin Burtscher, and Keshav Pingali. 2012. A GPU implementation of inclusion-based points-to analysis. In Proceedings of the 17th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP '12). ACM, 107--116.Google ScholarDigital Library
Mario Méndez-Lojo, Augustine Mathew, and Keshav Pingali. 2010. Parallel inclusion-based points-to analysis. In Proceedings of the ACM International Conference on Object Oriented Programming Systems Languages and Applications (OOPSLA '10). ACM, 428--443.Google ScholarDigital Library
David Monniaux. 2005. The parallel implementation of the Astrée static analyzer. In Asian Symposium on Programming Languages and Systems. Springer, 86--96.Google ScholarDigital Library
Nomair A Naeem and Ondrej Lhotak. 2008. Typestate-like analysis of multiple interacting objects. In Proceedings of the 23rd ACM SIGPLAN Conference on Object-oriented Programming Systems Languages and Applications (OOPSLA '08). ACM, 347--366.Google ScholarDigital Library
Nomair A Naeem and Ondrej Lhoták. 2009. Efficient alias set analysis using SSA form. In Proceedings of the 2009 International Symposium on Memory Management (ISMM '09). ACM, 79--88.Google ScholarDigital Library
Vaivaswatha Nagaraj and R Govindarajan. 2013. Parallel flow-sensitive pointer analysis by graph-rewriting. In Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques. IEEE, 19--28.Google ScholarDigital Library
Damien Octeau, Patrick McDaniel, Somesh Jha, Alexandre Bartel, Eric Bodden, Jacques Klein, and Yves Le Traon. 2013. Effective inter-component communication mapping in android: An essential step towards holistic security analysis. In Presented as part of the 22nd USENIX Security Symposium (USENIX Security '13). USENIX Association, 543--558.Google Scholar
Tarun Prabhu, Shreyas Ramalingam, Matthew Might, and Mary Hall. 2011. EigenCFA: Accelerating flow analysis with GPUs. In Proceedings of the 38th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL '11). ACM, 511--522.Google ScholarDigital Library
Sandeep Putta and Rupesh Nasre. 2012. Parallel replication-based points-to analysis. In International Conference on Compiler Construction (CC '12). Springer, 61--80.Google ScholarDigital Library
Thomas Reps, Susan Horwitz, and Mooly Sagiv. 1995. Precise interprocedural dataflow analysis via graph reachability. In Proceedings of the 22nd ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL '95). ACM, 49--61.Google ScholarDigital Library
Thomas Reps, Susan Horwitz, Mooly Sagiv, and Genevieve Rosay. 1994. Speeding up slicing. In Proceedings of the 2nd ACM SIGSOFT Symposium on Foundations of Software Engineering (FSE '94). ACM, 11--20.Google ScholarDigital Library
Noam Rinetzky, Mooly Sagiv, and Eran Yahav. 2005. Interprocedural shape analysis for cutpoint-free programs. In International Static Analysis Symposium. Springer, 284--302.Google ScholarDigital Library
Jonathan Rodriguez and Ondřej Lhoták. 2011. Actor-based parallel dataflow analysis. In International Conference on Compiler Construction (CC '11). Springer, 179--197.Google ScholarCross Ref
Atanas Rountev, Mariana Sharp, and Guoqing Xu. 2008. IDE dataflow analysis in the presence of large object-oriented libraries. In International Conference on Compiler Construction (CC '08). Springer, 53--68.Google ScholarCross Ref
Mooly Sagiv, Thomas Reps, and Susan Horwitz. 1996. Precise interprocedural dataflow analysis with applications to constant propagation. Theoretical Computer Science 167, 1 (1996), 131--170.Google ScholarDigital Library
Bernhard Scholz, Herbert Jordan, Pavle Subotić, and Till Westmann. 2016. On fast large-scale program analysis in datalog. In International Conference on Compiler Construction (CC '16). ACM, 196--206.Google ScholarDigital Library
Jürgen Seib and Georg Lausen. 1991. Parallelizing Datalog programs by generalized pivoting. In Proceedings of the tenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems. ACM, 241--251.Google ScholarDigital Library
Marianne Shaw, Paraschos Koutris, Bill Howe, and Dan Suciu. 2012. Optimizing large-scale Semi-Naïve datalog evaluation in hadoop. In International Datalog 2.0 Workshop. Springer, 165--176.Google Scholar
Qingkai Shi, Xiao Xiao, Rongxin Wu, Jinguo Zhou, Gang Fan, and Charles Zhang. 2018. Pinpoint: Fast and precise sparse value flow analysis for million lines of code. In Proceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI '18). ACM, 693--706.Google ScholarDigital Library
Sharon Shoham, Eran Yahav, Stephen J Fink, and Marco Pistoia. 2008. Static specification mining using automata-based abstractions. IEEE Transactions on Software Engineering 34, 5 (2008), 651--666.Google ScholarDigital Library
Bjarne Steensgaard. 1996. Points-to analysis in almost linear time. In Proceedings of the 23rd ACM SIGPLAN-SIGACT symposium on Principles of programming languages. ACM, 32--41.Google ScholarDigital Library
Yu Su, Ding Ye, and Jingling Xue. 2014. Parallel pointer analysis with CFL-reachability. In 2014 43rd International Conference on Parallel Processing. IEEE, 451--460.Google ScholarDigital Library
Yulei Sui, Ding Ye, and Jingling Xue. 2014. Detecting memory leaks statically with full-sparse value-flow analysis. IEEE Transactions on Software Engineering 40, 2 (2014), 107--122.Google ScholarDigital Library
Omer Tripp, Marco Pistoia, Patrick Cousot, Radhia Cousot, and Salvatore Guarnieri. 2013. Andromeda: Accurate and scalable security analysis of web applications. In International Conference on Fundamental Approaches to Software Engineering. Springer, 210--225.Google ScholarDigital Library
Kai Wang, Aftab Hussain, Zhiqiang Zuo, Guoqing Xu, and Ardalan Amiri Sani. 2017. Graspan: A single-machine disk-based graph system for interprocedural static analyses of large-scale systems code. ACM SIGOPS Operating Systems Review 51, 2 (2017), 389--404.Google ScholarCross Ref
Ouri Wolfson and Aya Ozeri. 1990. A New Paradigm for Parallel and Distributed Rule-processing. In Proceedings of the 1990 ACM SIGMOD International Conference on Management of Data (SIGMOD '90). ACM, 133--142.Google ScholarDigital Library
Ouri Wolfson and Avi Silberschatz. 1988. Distributed Processing of Logic Programs. In Proceedings of the 1988 ACM SIGMOD International Conference on Management of Data (SIGMOD '88). ACM, 329--336.Google ScholarDigital Library
Yichen Xie and Alex Aiken. 2005. Context- and path-sensitive memory leak detection. In Proceedings of the 10th European Software Engineering Conference Held Jointly with 13th ACM SIGSOFT International Symposium on Foundations of Software Engineering (ESEC/FSE '05). ACM, 115--125.Google ScholarDigital Library
Yichen Xie and Alex Aiken. 2005. Scalable error detection using Boolean satisfiability. In Proceedings of the 32nd ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL '05). ACM, 351--363.Google ScholarDigital Library
Hongseok Yang, Oukseh Lee, Josh Berdine, Cristiano Calcagno, Byron Cook, Dino Distefano, and Peter O'Hearn. 2008. Scalable shape analysis for systems code. In International Conference on Computer Aided Verification. Springer, 385--398.Google ScholarDigital Library
Mohan Yang, Alexander Shkapsky, and Carlo Zaniolo. 2017. Scaling up the performance of more powerful Datalog systems on multicore machines. The VLDB Journal - The International Journal on Very Large Data Bases 26, 2 (2017), 229--248.Google ScholarDigital Library
Greta Yorsh, Eran Yahav, and Satish Chandra. 2008. Generating precise and concise procedure summaries. In Proceedings of the 35th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL '08). ACM, 221--234.Google ScholarDigital Library
Zhiqiang Zuo, John Thorpe, Yifei Wang, Qiuhong Pan, Shenming Lu, Kai Wang, Guoqing Harry Xu, Linzhang Wang, and Xuandong Li. 2019. Grapple: A graph system for static finite-state property checking of large-scale systems code. In Proceedings of the Fourteenth EuroSys Conference 2019 (EuroSys '19). ACM, 38.Google ScholarDigital Library

Index Terms

Pipelining bottom-up data flow analysis
1. Software and its engineering
  1. Software creation and management
    1. Software verification and validation

Recommendations

Conquering the extensional scalability problem for value-flow analysis frameworks
ICSE '20: Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering

Modern static analyzers often need to simultaneously check a few dozen or even hundreds of value-flow properties, causing serious scalability issues when high precision is required. A major factor to this deficiency, as we observe, is that the core ...
Read More
(R) Program Analysis for Cache Coherence: Beyond Procedural Boundaries
ICPP '96: Proceedings of the Proceedings of the 1996 International Conference on Parallel Processing - Volume 3

Abstract: The presence of procedures and procedure calls introduces side effects, which complicates the analysis of stale reference detection in compiler-directed cache coherence schemes. Previous compiler algorithms use cache invalidation at procedure ...
Read More
Eliminating two kinds of data flow inaccuracy in the presence of pointer aliasing
APDC '97: Proceedings of the 1997 Advances in Parallel and Distributed Computing Conference (APDC '97)

Program languages with sophisticated usage of pointers as C are hard to analyze. Recent researches on pointer analysis focus on tracking the possible values of pointers, when a program point is reached, and great progress has been achieved. However, how ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ICSE '20: Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering
June 2020
1640 pages
ISBN:9781450371216
DOI:10.1145/3377811
General Chairs:
Gregg Rothermel
North Carolina State University
,
Doo-Hwan Bae
KAIST, South Korea
Copyright © 2020 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 1 October 2020
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
IFDS/IDE
bottom-up analysis
compositional program analysis
data flow analysis
modular program analysis
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate276of1,856submissions,15%

Upcoming Conference

ICSE 2025

2025 IEEE/ACM 46th International Conference on Software Engineering

April 26 - May 3, 2025

Ottawa , ON , Canada
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 9
  Total Citations
  View Citations
- 283
  Total Downloads
- Downloads (Last 12 months)57
- Downloads (Last 6 weeks)2
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Pipelining bottom-up data flow analysis

ICSE '20: Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering

ABSTRACT

References

Cited By

Index Terms

Recommendations

Conquering the extensional scalability problem for value-flow analysis frameworks

(R) Program Analysis for Cache Coherence: Beyond Procedural Boundaries

Eliminating two kinds of data flow inaccuracy in the presence of pointer aliasing

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Pipelining bottom-up data flow analysis

ICSE '20: Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering

ABSTRACT

References

Cited By

Index Terms

Recommendations

Conquering the extensional scalability problem for value-flow analysis frameworks

(R) Program Analysis for Cache Coherence: Beyond Procedural Boundaries

Eliminating two kinds of data flow inaccuracy in the presence of pointer aliasing

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media