skip to main content
10.1145/3339252.3339260acmotherconferencesArticle/Chapter ViewAbstractPublication PagesaresConference Proceedingsconference-collections
research-article

Obfuscation-Resilient Code Recognition in Android Apps

Published:26 August 2019Publication History

ABSTRACT

Many Android developers take advantage of third-party libraries and code snippets from public sources to add functionality to apps. Besides making development more productive, external code can also be harmful, introduce vulnerabilities, or raise critical privacy issues that threaten the security of sensitive user data and amplify an app's attack surface. Reliably recognizing such code fragments in Android applications is challenging due to the widespread use of obfuscation techniques and a variety of ways, how developers can express semantically similar program statements.

We propose a code recognition technique that is resilient against common code transformations and that excels in identifying code fragments and libraries in Android applications. Our method relies on obfuscation-resilient features from the Abstract Syntax Tree of methods and uses them in combination with invariant attributes from method signatures to derive well-characterizing fingerprints. To identify similar code, we elaborate an effective scoring metric that reliably compares fingerprints at method, class, and package level. We investigate how well our solution tackles obfuscated, shrunken, and optimized code by applying our technique to real-world applications. We thoroughly evaluate our solution and demonstrate its practical ability to fingerprint and recognize code with high precision and recall.

References

  1. {n. d.}. Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, Vienna, Austria, October 24-28, 2016. ACM.Google ScholarGoogle Scholar
  2. {n. d.}. Proceedings of the 40th International Conference on Software Engineering, ICSE 2018, Gothenburg, Sweden, May 27 - June 03, 2018. ACM.Google ScholarGoogle Scholar
  3. Marat Kh. Akhin and Vladimir M. Itsykson. 2013. Tree slicing: Finding intertwined and gapped clones in one simple step. Automatic Control and Computer Sciences 47 (2013), 427--432.Google ScholarGoogle ScholarCross RefCross Ref
  4. Michael Backes, Sven Bugiel, and Erik Derr. 2016. Reliable Third-Party Library Detection in Android and its Security Applications, See {1}, 356--367. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Richard Baumann, Mykolai Protsenko, and Tilo Müller. 2017. Anti-ProGuard: Towards Automated Deobfuscation of Android Apps. In Workshop on Security in Highly Connected IT Systems -- SHIS. ACM, 7--12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Ira D. Baxter, Andrew Yahin, Leonardo Mendonça de Moura, Marcelo Sant'Anna, and Lorraine Bier. 1998. Clone Detection Using Abstract Syntax Trees. In International Conference on Software Maintenance -- ICSM 1998. IEEE Computer Society, 368--377. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Benjamin Bichsel, Veselin Raychev, Petar Tsankov, and Martin T. Vechev. 2016. Statistical Deobfuscation of Android Applications, See {1}, 343--355. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Theodore Book, Adam Pridgen, and Dan S. Wallach. 2013. Longitudinal Analysis of Android Ad Library Permissions. CoRR abs/1303.0857 (2013).Google ScholarGoogle Scholar
  9. Kai Chen, Peng Liu, and Yingjun Zhang. 2014. Achieving accuracy and scalability simultaneously in detecting application clones on Android markets. In International Conference on Software Engineering -- ICSE 2014. ACM, 175--186. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Kai Chen, Xueqiang Wang, Yi Chen, Peng Wang, Yeonjoon Lee, XiaoFeng Wang, Bin Ma, Aohui Wang, Yingjun Zhang, and Wei Zou. 2016. Following Devil's Footprints: Cross-Platform Analysis of Potentially Harmful Libraries on Android and iOS. In IEEE Symposium on Security and Privacy -- S&P 2016. IEEE Computer Society, 357--376.Google ScholarGoogle ScholarCross RefCross Ref
  11. Jonathan Crussell, Clint Gibler, and Hao Chen. 2012. Attack of the Clones: Detecting Cloned Applications on Android Markets. In European Symposium on Research in Computer Security -- ESORICS 2012 (LNCS), Vol. 7459. Springer, 37--54.Google ScholarGoogle Scholar
  12. Jonathan Crussell, Clint Gibler, and Hao Chen. 2013. AnDarwin: Scalable Detection of Semantically Similar Android Applications. In European Symposium on Research in Computer Security -- ESORICS 2013 (LNCS), Vol. 8134. Springer, 182--199.Google ScholarGoogle Scholar
  13. Erik Derr, Sven Bugiel, Sascha Fahl, Yasemin Acar, and Michael Backes. 2017. Keep me Updated: An Empirical Study of Third-Party Library Updatability on Android. In Conference on Computer and Communications Security -- CCS 2017. ACM, 2187--2200. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Felix Fischer, Konstantin Böttinger, Huang Xiao, Christian Stransky, Yasemin Acar, Michael Backes, and Sascha Fahl. 2017. Stack Overflow Considered Harmful? The Impact of Copy&Paste on Android Application Security. In IEEE Symposium on Security and Privacy -- S&P 2017. IEEE Computer Society, 121--136.Google ScholarGoogle Scholar
  15. Joshua Garcia, Mahmoud Hammad, and Sam Malek. 2018. Lightweight, obfuscation-resilient detection and family identification of Android malware, See {2}, 497. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Leonid Glanz, Sven Amann, Michael Eichberg, Michael Reif, Ben Hermann, Johannes Lerch, and Mira Mezini. 2017. CodeMatch: obfuscation won't conceal your repackaged app. In Foundations of Software Engineering -- FSE 2017. ACM, 638--648. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Michael C. Grace, Wu Zhou, Xuxian Jiang, and Ahmad-Reza Sadeghi. 2012. Unsafe exposure analysis of mobile in-app advertisements. In Security and Privacy in Wireless and Mobile Networks -- WISEC 2012. ACM, 101--112. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Mahmoud Hammad, Joshua Garcia, and Sam Malek. 2018. A large-scale empirical study on the effects of code obfuscations on Android apps and anti-malware products, See {2}, 421--431. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Harold W. Kuhn. 2010. The Hungarian Method for the Assignment Problem. In 50 Years of Integer Programming 1958-2008 - From the Early Years to the State-of-the-Art. Springer, 29--47.Google ScholarGoogle Scholar
  20. Li Li, Tegawendé F. Bissyandé, Jacques Klein, and Yves Le Traon. 2016. An Investigation into the Use of Common Libraries in Android Apps. In Software Analysis, Evolution, and Reengineering -- SANER 2016. IEEE Computer Society, 403--414.Google ScholarGoogle ScholarCross RefCross Ref
  21. Menghao Li, Wei Wang, Pei Wang, Shuai Wang, Dinghao Wu, Jian Liu, Rui Xue, and Wei Huo. 2017. LibD: scalable and precise third-party library detection in android markets. In International Conference on Software Engineering -- ICSE 2017. IEEE / ACM, 335--346. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Bin Liu, Bin Liu, Hongxia Jin, and Ramesh Govindan. 2015. Efficient Privilege De-Escalation for Ad Libraries in Mobile Apps. In Mobile Systems -- MobiSys 2015. ACM, 89--103. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Chao Liu, Chen Chen, Jiawei Han, and Philip S. Yu. 2006. GPLAG: detection of software plagiarism by program dependence graph analysis. In Conference on Knowledge Discovery and Data Mining -- SIGKDD 2006. ACM, 872--881. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Lannan Luo, Jiang Ming, Dinghao Wu, Peng Liu, and Sencun Zhu. 2014. Semantics-based obfuscation-resilient binary code similarity comparison with applications to software plagiarism detection. In Foundations of Software Engineering -- FSE 2014. ACM, 389--400. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Ziang Ma, Haoyu Wang, Yao Guo, and Xiangqun Chen. 2016. LibRadar: fast and accurate detection of third-party libraries in Android apps. In International Conference on Software Engineering -- ICSE 2016. ACM, 653--656. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Annamalai Narayanan, Lihui Chen, and Chee Keong Chan. 2014. AdDetect: Automated detection of Android ad libraries using semantic analysis. In International Conference on Intelligent Sensors, Sensor Networks and Information Processing -- ISSNIP 2014. IEEE, 1--6.Google ScholarGoogle ScholarCross RefCross Ref
  27. Hoan Anh Nguyen, Tung Thanh Nguyen, Nam H. Pham, Jafar M. Al-Kofahi, and Tien N. Nguyen. 2009. Accurate and Efficient Structural Characteristic Feature Extraction for Clone Detection. In Fundamental Approaches to Software Engineering -- FASE 2009 (LNCS), Vol. 5503. Springer, 440--455. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Rahul Potharaju, Andrew Newell, Cristina Nita-Rotaru, and Xiangyu Zhang. 2012. Plagiarizing Smartphone Applications: Attack Strategies and Defense Techniques. In Engineering Secure Software and Systems -- ESSoS 2012 (LNCS), Vol. 7159. Springer, 106--120. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Jaebaek Seo, Daehyeok Kim, Donghyun Cho, Insik Shin, and Taesoo Kim. 2016. FLEXDROID: Enforcing In-App Privilege Separation in Android. In Network and Distributed System Security Symposium -- NDSS 2016. The Internet Society.Google ScholarGoogle Scholar
  30. Yuru Shao, Xiapu Luo, Chenxiong Qian, Pengfei Zhu, and Lei Zhang. 2014. Towards a scalable resource-driven approach for detecting repackaged Android applications. In Annual Computer Security Applications Conference -- ACSAC 2014. ACM, 56--65. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Marina Sokolova and Guy Lapalme. 2009. A systematic analysis of performance measures for classification tasks. Inf. Process. Manage. 45 (2009), 427--437. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Mario Linares Vásquez, Andrew Holtzhauer, Carlos Bernal-Cárdenas, and Denys Poshyvanyk. 2014. Revisiting Android reuse studies in the context of code obfuscation and library usages. In Mining Software Repositories -- MSR 2014. ACM, 242--251. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Mario Linares Vásquez, Andrew Holtzhauer, and Denys Poshyvanyk. 2016. On automatically detecting similar Android apps. In International Conference on Program Comprehension -- ICPC 2016. IEEE Computer Society, 1--10.Google ScholarGoogle Scholar
  34. Nicolas Viennot, Edward Garcia, and Jason Nieh. 2014. A measurement study of google play. In Measurement and Modeling of Computer Systems -- SIGMETRICS 2014. ACM, 221--233. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Haoyu Wang, Yao Guo, Ziang Ma, and Xiangqun Chen. 2015. WuKong: a scalable and accurate two-phase approach to Android app clone detection. In Symposium on Software Testing and Analysis -- ISSTA 2015. ACM, 71--82. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Dominik Wermke, Nicolas Huaman, Yasemin Acar, Bradley Reaves, Patrick Traynor, and Sascha Fahl. 2018. A Large Scale Investigation of Obfuscation Use in Google Play. In Annual Computer Security Applications Conference -- ACSAC 2018. ACM, 222--235. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Fangfang Zhang, Heqing Huang, Sencun Zhu, Dinghao Wu, and Peng Liu. 2014. ViewDroid: towards obfuscation-resilient mobile application repackaging detection. In Security and Privacy in Wireless and Mobile Networks -- WISEC 2014. ACM, 25--36. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Yuan Zhang, Jiarun Dai, Xiaohan Zhang, Sirong Huang, Zhemin Yang, Min Yang, and Hao Chen. 2018. Detecting third-party libraries in Android applications with high precision and recall. In Software Analysis, Evolution, and Reengineering -- SANER 2018. IEEE Computer Society, 141--152.Google ScholarGoogle ScholarCross RefCross Ref
  39. Yajin Zhou and Xuxian Jiang. 2012. Dissecting Android Malware: Characterization and Evolution. In IEEE Symposium on Security and Privacy -- S&P 2012. IEEE Computer Society, 95--109. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Obfuscation-Resilient Code Recognition in Android Apps

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Other conferences
      ARES '19: Proceedings of the 14th International Conference on Availability, Reliability and Security
      August 2019
      979 pages
      ISBN:9781450371643
      DOI:10.1145/3339252

      Copyright © 2019 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 26 August 2019

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed limited

      Acceptance Rates

      Overall Acceptance Rate228of451submissions,51%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader