A New Algorithm for Identifying Loops in Decompilation

Wei, Tao; Mao, Jian; Zou, Wei; Chen, Yu

doi:10.1007/978-3-540-74061-2_11

Tao Wei¹,
Jian Mao¹,
Wei Zou¹ &
…
Yu Chen¹

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 4634))

Included in the following conference series:

International Static Analysis Symposium

1006 Accesses
15 Citations

Abstract

Loop identification is an essential step of control flow analysis in decompilation. The Classical algorithm for identifying loops is Tarjan’s interval-finding algorithm, which is restricted to reducible graphs. Havlak presents one extension of Tarjan’s algorithm to deal with irreducible graphs, which constructs a loop-nesting forest for an arbitrary flow graph. There’s evidence showing that the running time of this algorithm is quadratic in the worst-case, and not almost linear as claimed. Ramalingam presents an improved algorithm with low time complexity on arbitrary graphs, but it performs not quite well on “real” control flow graphs (CFG). We present a novel algorithm for identifying loops in arbitrary CFGs. Based on a more detailed exploration on properties of loops and depth-first search (DFS), this algorithm traverses a CFG only once based on DFS and collects all information needed on the fly. It runs in approximately linear time and does not use any complicated data structures such as Interval/Derived Sequence of Graphs (DSG) or UNION-FIND sets. To perform complexity analysis of the algorithm, we introduce a new concept called unstructuredness coefficient to describe the unstructuredness of CFGs, and we find that the unstructuredness coefficients of these executables are usually small (<1.5). Such “low-unstructuredness” property distinguishes these CFGs from general single-root connected directed graphs, and it offers an explanation why those algorithms existed perform not quite well on real-world cases. The new algorithm has been applied to 11526 CFGs in 6 typical binary executables on both Linux and Window platforms. Experimental result has validated our theoretical analysis and it shows that our algorithm runs 2-5 times faster than the Havlak-Tarjan algorithm, and 2-8 times faster than the Ramalingam-Havlak-Tarjan algorithm.

Supported by The National High Technology Research and Development Program of China (No. 2006AA01Z402).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

http://www.program-transformation.org/Transform/HistoryOfDecompilation1
Muchnick, S.S.: Advanced Compiler Design and Implementation. Elsevier Science, Amsterdam (1997)
Google Scholar
Ramalingam, G.: Identifying loops in almost linear time. ACM Transactions on Programming Languages and Systems 21(2) (1999)
Google Scholar
Ramalingam, G.: On loops, dominators, and dominance frontiers. ACM Transactions on Programming Languages and Systems 24(5) (2002)
Google Scholar
Allen, F.E.: Control flow analysis. SIGPLAN Notices 5(7), 1–19 (1970)
Article Google Scholar
Cocke, J.: Global common subexpression elimination. SIGPLAN Notices 5(7), 20–25 (1970)
Article Google Scholar
Havlak, P.: Nesting of reducible and irreducible loops. ACM Transactions on Programming Languages and Systems 19(4) (1997)
Google Scholar
Tarjan, R.E.: Testing flow graph reducibility, J. Comput. Syst. Sci. 9 (1974)
Google Scholar
Cormen, T.H., Leiserson, C.E., Rivest, R.L., Stein, C.: Introduction to Algorithms. The MIT Press, Cambridge (2001)
MATH Google Scholar
Cooper, K.D., Harvey, T.J., Kennedy, K.: A Simple Fast Dominance Algorithm, Software Practice and Experience (2001)
Google Scholar
Hecht, M.S., Ullman, J.D.: Flow graph reducibility. SIAM Journal of Computing 1(2), 188–202 (1972)
Article MATH MathSciNet Google Scholar
Steensgaard, B.: Sequentializing program dependence graphs for irreducible programs. Tech. Rep. MSR-TR-93-14, Microsoft Research, Redmond, Wash (1993)
Google Scholar
Sreedhar, V.C., Gao, G.R., Lee, Y.F.: Identifying loops using DJ graphs. ACM Transactions on Programming Languages and Systems 18(6) (1996)
Google Scholar
Cifuentes, C.: Reverse compilation techniques. PhD Thesis, Queensland University of Technology (1994)
Google Scholar

Download references

Author information

Authors and Affiliations

Institute of Computer Science and Technology, Peking University,
Tao Wei, Jian Mao, Wei Zou & Yu Chen

Authors

Tao Wei
View author publications
You can also search for this author in PubMed Google Scholar
Jian Mao
View author publications
You can also search for this author in PubMed Google Scholar
Wei Zou
View author publications
You can also search for this author in PubMed Google Scholar
Yu Chen
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Hanne Riis Nielson Gilberto Filé

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wei, T., Mao, J., Zou, W., Chen, Y. (2007). A New Algorithm for Identifying Loops in Decompilation. In: Nielson, H.R., Filé, G. (eds) Static Analysis. SAS 2007. Lecture Notes in Computer Science, vol 4634. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-74061-2_11

Download citation

DOI: https://doi.org/10.1007/978-3-540-74061-2_11
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-74060-5
Online ISBN: 978-3-540-74061-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics