ABSTRACT
Many Android developers take advantage of third-party libraries and code snippets from public sources to add functionality to apps. Besides making development more productive, external code can also be harmful, introduce vulnerabilities, or raise critical privacy issues that threaten the security of sensitive user data and amplify an app's attack surface. Reliably recognizing such code fragments in Android applications is challenging due to the widespread use of obfuscation techniques and a variety of ways, how developers can express semantically similar program statements.
We propose a code recognition technique that is resilient against common code transformations and that excels in identifying code fragments and libraries in Android applications. Our method relies on obfuscation-resilient features from the Abstract Syntax Tree of methods and uses them in combination with invariant attributes from method signatures to derive well-characterizing fingerprints. To identify similar code, we elaborate an effective scoring metric that reliably compares fingerprints at method, class, and package level. We investigate how well our solution tackles obfuscated, shrunken, and optimized code by applying our technique to real-world applications. We thoroughly evaluate our solution and demonstrate its practical ability to fingerprint and recognize code with high precision and recall.
- {n. d.}. Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, Vienna, Austria, October 24-28, 2016. ACM.Google Scholar
- {n. d.}. Proceedings of the 40th International Conference on Software Engineering, ICSE 2018, Gothenburg, Sweden, May 27 - June 03, 2018. ACM.Google Scholar
- Marat Kh. Akhin and Vladimir M. Itsykson. 2013. Tree slicing: Finding intertwined and gapped clones in one simple step. Automatic Control and Computer Sciences 47 (2013), 427--432.Google ScholarCross Ref
- Michael Backes, Sven Bugiel, and Erik Derr. 2016. Reliable Third-Party Library Detection in Android and its Security Applications, See {1}, 356--367. Google ScholarDigital Library
- Richard Baumann, Mykolai Protsenko, and Tilo Müller. 2017. Anti-ProGuard: Towards Automated Deobfuscation of Android Apps. In Workshop on Security in Highly Connected IT Systems -- SHIS. ACM, 7--12. Google ScholarDigital Library
- Ira D. Baxter, Andrew Yahin, Leonardo Mendonça de Moura, Marcelo Sant'Anna, and Lorraine Bier. 1998. Clone Detection Using Abstract Syntax Trees. In International Conference on Software Maintenance -- ICSM 1998. IEEE Computer Society, 368--377. Google ScholarDigital Library
- Benjamin Bichsel, Veselin Raychev, Petar Tsankov, and Martin T. Vechev. 2016. Statistical Deobfuscation of Android Applications, See {1}, 343--355. Google ScholarDigital Library
- Theodore Book, Adam Pridgen, and Dan S. Wallach. 2013. Longitudinal Analysis of Android Ad Library Permissions. CoRR abs/1303.0857 (2013).Google Scholar
- Kai Chen, Peng Liu, and Yingjun Zhang. 2014. Achieving accuracy and scalability simultaneously in detecting application clones on Android markets. In International Conference on Software Engineering -- ICSE 2014. ACM, 175--186. Google ScholarDigital Library
- Kai Chen, Xueqiang Wang, Yi Chen, Peng Wang, Yeonjoon Lee, XiaoFeng Wang, Bin Ma, Aohui Wang, Yingjun Zhang, and Wei Zou. 2016. Following Devil's Footprints: Cross-Platform Analysis of Potentially Harmful Libraries on Android and iOS. In IEEE Symposium on Security and Privacy -- S&P 2016. IEEE Computer Society, 357--376.Google ScholarCross Ref
- Jonathan Crussell, Clint Gibler, and Hao Chen. 2012. Attack of the Clones: Detecting Cloned Applications on Android Markets. In European Symposium on Research in Computer Security -- ESORICS 2012 (LNCS), Vol. 7459. Springer, 37--54.Google Scholar
- Jonathan Crussell, Clint Gibler, and Hao Chen. 2013. AnDarwin: Scalable Detection of Semantically Similar Android Applications. In European Symposium on Research in Computer Security -- ESORICS 2013 (LNCS), Vol. 8134. Springer, 182--199.Google Scholar
- Erik Derr, Sven Bugiel, Sascha Fahl, Yasemin Acar, and Michael Backes. 2017. Keep me Updated: An Empirical Study of Third-Party Library Updatability on Android. In Conference on Computer and Communications Security -- CCS 2017. ACM, 2187--2200. Google ScholarDigital Library
- Felix Fischer, Konstantin Böttinger, Huang Xiao, Christian Stransky, Yasemin Acar, Michael Backes, and Sascha Fahl. 2017. Stack Overflow Considered Harmful? The Impact of Copy&Paste on Android Application Security. In IEEE Symposium on Security and Privacy -- S&P 2017. IEEE Computer Society, 121--136.Google Scholar
- Joshua Garcia, Mahmoud Hammad, and Sam Malek. 2018. Lightweight, obfuscation-resilient detection and family identification of Android malware, See {2}, 497. Google ScholarDigital Library
- Leonid Glanz, Sven Amann, Michael Eichberg, Michael Reif, Ben Hermann, Johannes Lerch, and Mira Mezini. 2017. CodeMatch: obfuscation won't conceal your repackaged app. In Foundations of Software Engineering -- FSE 2017. ACM, 638--648. Google ScholarDigital Library
- Michael C. Grace, Wu Zhou, Xuxian Jiang, and Ahmad-Reza Sadeghi. 2012. Unsafe exposure analysis of mobile in-app advertisements. In Security and Privacy in Wireless and Mobile Networks -- WISEC 2012. ACM, 101--112. Google ScholarDigital Library
- Mahmoud Hammad, Joshua Garcia, and Sam Malek. 2018. A large-scale empirical study on the effects of code obfuscations on Android apps and anti-malware products, See {2}, 421--431. Google ScholarDigital Library
- Harold W. Kuhn. 2010. The Hungarian Method for the Assignment Problem. In 50 Years of Integer Programming 1958-2008 - From the Early Years to the State-of-the-Art. Springer, 29--47.Google Scholar
- Li Li, Tegawendé F. Bissyandé, Jacques Klein, and Yves Le Traon. 2016. An Investigation into the Use of Common Libraries in Android Apps. In Software Analysis, Evolution, and Reengineering -- SANER 2016. IEEE Computer Society, 403--414.Google ScholarCross Ref
- Menghao Li, Wei Wang, Pei Wang, Shuai Wang, Dinghao Wu, Jian Liu, Rui Xue, and Wei Huo. 2017. LibD: scalable and precise third-party library detection in android markets. In International Conference on Software Engineering -- ICSE 2017. IEEE / ACM, 335--346. Google ScholarDigital Library
- Bin Liu, Bin Liu, Hongxia Jin, and Ramesh Govindan. 2015. Efficient Privilege De-Escalation for Ad Libraries in Mobile Apps. In Mobile Systems -- MobiSys 2015. ACM, 89--103. Google ScholarDigital Library
- Chao Liu, Chen Chen, Jiawei Han, and Philip S. Yu. 2006. GPLAG: detection of software plagiarism by program dependence graph analysis. In Conference on Knowledge Discovery and Data Mining -- SIGKDD 2006. ACM, 872--881. Google ScholarDigital Library
- Lannan Luo, Jiang Ming, Dinghao Wu, Peng Liu, and Sencun Zhu. 2014. Semantics-based obfuscation-resilient binary code similarity comparison with applications to software plagiarism detection. In Foundations of Software Engineering -- FSE 2014. ACM, 389--400. Google ScholarDigital Library
- Ziang Ma, Haoyu Wang, Yao Guo, and Xiangqun Chen. 2016. LibRadar: fast and accurate detection of third-party libraries in Android apps. In International Conference on Software Engineering -- ICSE 2016. ACM, 653--656. Google ScholarDigital Library
- Annamalai Narayanan, Lihui Chen, and Chee Keong Chan. 2014. AdDetect: Automated detection of Android ad libraries using semantic analysis. In International Conference on Intelligent Sensors, Sensor Networks and Information Processing -- ISSNIP 2014. IEEE, 1--6.Google ScholarCross Ref
- Hoan Anh Nguyen, Tung Thanh Nguyen, Nam H. Pham, Jafar M. Al-Kofahi, and Tien N. Nguyen. 2009. Accurate and Efficient Structural Characteristic Feature Extraction for Clone Detection. In Fundamental Approaches to Software Engineering -- FASE 2009 (LNCS), Vol. 5503. Springer, 440--455. Google ScholarDigital Library
- Rahul Potharaju, Andrew Newell, Cristina Nita-Rotaru, and Xiangyu Zhang. 2012. Plagiarizing Smartphone Applications: Attack Strategies and Defense Techniques. In Engineering Secure Software and Systems -- ESSoS 2012 (LNCS), Vol. 7159. Springer, 106--120. Google ScholarDigital Library
- Jaebaek Seo, Daehyeok Kim, Donghyun Cho, Insik Shin, and Taesoo Kim. 2016. FLEXDROID: Enforcing In-App Privilege Separation in Android. In Network and Distributed System Security Symposium -- NDSS 2016. The Internet Society.Google Scholar
- Yuru Shao, Xiapu Luo, Chenxiong Qian, Pengfei Zhu, and Lei Zhang. 2014. Towards a scalable resource-driven approach for detecting repackaged Android applications. In Annual Computer Security Applications Conference -- ACSAC 2014. ACM, 56--65. Google ScholarDigital Library
- Marina Sokolova and Guy Lapalme. 2009. A systematic analysis of performance measures for classification tasks. Inf. Process. Manage. 45 (2009), 427--437. Google ScholarDigital Library
- Mario Linares Vásquez, Andrew Holtzhauer, Carlos Bernal-Cárdenas, and Denys Poshyvanyk. 2014. Revisiting Android reuse studies in the context of code obfuscation and library usages. In Mining Software Repositories -- MSR 2014. ACM, 242--251. Google ScholarDigital Library
- Mario Linares Vásquez, Andrew Holtzhauer, and Denys Poshyvanyk. 2016. On automatically detecting similar Android apps. In International Conference on Program Comprehension -- ICPC 2016. IEEE Computer Society, 1--10.Google Scholar
- Nicolas Viennot, Edward Garcia, and Jason Nieh. 2014. A measurement study of google play. In Measurement and Modeling of Computer Systems -- SIGMETRICS 2014. ACM, 221--233. Google ScholarDigital Library
- Haoyu Wang, Yao Guo, Ziang Ma, and Xiangqun Chen. 2015. WuKong: a scalable and accurate two-phase approach to Android app clone detection. In Symposium on Software Testing and Analysis -- ISSTA 2015. ACM, 71--82. Google ScholarDigital Library
- Dominik Wermke, Nicolas Huaman, Yasemin Acar, Bradley Reaves, Patrick Traynor, and Sascha Fahl. 2018. A Large Scale Investigation of Obfuscation Use in Google Play. In Annual Computer Security Applications Conference -- ACSAC 2018. ACM, 222--235. Google ScholarDigital Library
- Fangfang Zhang, Heqing Huang, Sencun Zhu, Dinghao Wu, and Peng Liu. 2014. ViewDroid: towards obfuscation-resilient mobile application repackaging detection. In Security and Privacy in Wireless and Mobile Networks -- WISEC 2014. ACM, 25--36. Google ScholarDigital Library
- Yuan Zhang, Jiarun Dai, Xiaohan Zhang, Sirong Huang, Zhemin Yang, Min Yang, and Hao Chen. 2018. Detecting third-party libraries in Android applications with high precision and recall. In Software Analysis, Evolution, and Reengineering -- SANER 2018. IEEE Computer Society, 141--152.Google ScholarCross Ref
- Yajin Zhou and Xuxian Jiang. 2012. Dissecting Android Malware: Characterization and Evolution. In IEEE Symposium on Security and Privacy -- S&P 2012. IEEE Computer Society, 95--109. Google ScholarDigital Library
Index Terms
- Obfuscation-Resilient Code Recognition in Android Apps
Recommendations
CodeMatch: obfuscation won't conceal your repackaged app
ESEC/FSE 2017: Proceedings of the 2017 11th Joint Meeting on Foundations of Software EngineeringAn established way to steal the income of app developers, or to trick users into installing malware, is the creation of repackaged apps. These are clones of - typically - successful apps. To conceal their nature, they are often obfuscated by their ...
A Large Scale Investigation of Obfuscation Use in Google Play
ACSAC '18: Proceedings of the 34th Annual Computer Security Applications ConferenceAndroid applications are frequently plagiarized or repackaged, and software obfuscation is a recommended protection against these practices. However, there is very little data on the overall rates of app obfuscation, the techniques used, or factors that ...
Lightweight, Obfuscation-Resilient Detection and Family Identification of Android Malware
The number of malicious Android apps is increasing rapidly. Android malware can damage or alter other files or settings, install additional applications, and so on. To determine such behaviors, a security analyst can significantly benefit from ...
Comments