skip to main content
10.1145/3106237.3106305acmconferencesArticle/Chapter ViewAbstractPublication PagesfseConference Proceedingsconference-collections
research-article

CodeMatch: obfuscation won't conceal your repackaged app

Published: 21 August 2017 Publication History

Abstract

An established way to steal the income of app developers, or to trick users into installing malware, is the creation of repackaged apps. These are clones of - typically - successful apps. To conceal their nature, they are often obfuscated by their creators. But, given that it is a common best practice to obfuscate apps, a trivial identification of repackaged apps is not possible. The problem is further intensified by the prevalent usage of libraries. In many apps, the size of the overall code base is basically determined by the used libraries. Therefore, two apps, where the obfuscated code bases are very similar, do not have to be repackages of each other. To reliably detect repackaged apps, we propose a two step approach which first focuses on the identification and removal of the library code in obfuscated apps. This approach - LibDetect - relies on code representations which abstract over several parts of the underlying bytecode to be resilient against certain obfuscation techniques. Using this approach, we are able to identify on average 70% more used libraries per app than previous approaches. After the removal of an app's library code, we then fuzzy hash the most abstract representation of the remaining app code to ensure that we can identify repackaged apps even if very advanced obfuscation techniques are used. This makes it possible to identify repackaged apps. Using our approach, we found that ≈ 15% of all apps in Android app stores are repackages

References

[1]
2017. Anzhi App Marketplace. (2017). Retrieved 01/11/2017 from http://www. anzhi.com/ 2017. App China App Marketplace. (2017). Retrieved 01/11/2017 from http: //www.appchina.com/ 2017. Freeware Lovers App Marketplace. (2017). Retrieved 01/11/2017 from http://www.freewarelovers.com/ 2017. HiApk App Marketplace. (2017). Retrieved 01/11/2017 from http://www. hiapk.com/
[2]
Alexandr Andoni and Piotr Indyk. 2006. Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. In 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS’06). IEEE, 459–468.
[3]
Eric Lafortune at GuardSquare. 2017. ProGuard. (2017). Retrieved 01/11/2017 from http://proguard.sourceforge.net/
[4]
Michael Backes, Sven Bugiel, and Erik Derr. 2016. Reliable Third-Party Library Detection in Android and its Security Applications. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security. ACM, 356– 367.
[5]
Apps Builder. 2017. Apps Builder. (2017). Retrieved 01/11/2017 from http: //www.apps-builder.com
[6]
Jian Chen, Manar H Alalfi, Thomas R Dean, and Ying Zou. 2015. Detecting Android Malware Using Clone Detection. Journal of Computer Science and Technology 30, 5 (2015), 942–956.
[7]
Kai Chen, Peng Liu, and Yingjun Zhang. 2014. Achieving accuracy and scalability simultaneously in detecting application clones on android markets. In Proceedings of the 36th International Conference on Software Engineering. ACM, 175–186.
[8]
CodePath. 2017. Must-Have Libraries. (2017). Retrieved 02/24/2017 from https: //github.com/codepath/android_guides/wiki/Must-Have-Libraries
[9]
Christian Collberg, Clark Thomborson, and Douglas Low. 1998. Manufacturing cheap, resilient, and stealthy opaque constructs. In Proceedings of the 25th ACM SIGPLAN-SIGACT symposium on Principles of programming languages. ACM, 184–196.
[10]
Licel Corporation. 2016. DexProtector Android Obfuscator. (2016). Retrieved 01/20/2017 from https://dexprotector.com
[11]
Licel Corporation. 2016. Stringer Java Obfuscator. (2016). Retrieved 01/20/2017 from https://jfxstore.com/stringer/
[12]
Jonathan Crussell, Clint Gibler, and Hao Chen. 2012. Attack of the clones: Detecting cloned applications on android markets. In Computer Security–ESORICS 2012. Springer, 37–54.
[13]
Jonathan Crussell, Clint Gibler, and Hao Chen. 2013. Scalable semantics-based detection of similar android applications. In Proc. of Esorics, Vol. 13. Citeseer.
[14]
Yang Cuixia, Zuo Chaoshun, Guo Shanqing, Hu Chengyu, and Cui Lizhen. 2015. UI Ripping in Android: Reverse Engineering of Graphical User Interfaces and its Application. In 2015 IEEE Conference on Collaboration and Internet Computing (CIC). IEEE, 160–167.
[15]
D Eastlake 3rd and Paul Jones. 2001. US secure hash algorithm 1 (SHA1). Technical Report.
[16]
Michael Eichberg and Ben Hermann. 2014. A software product line for static analyses: the OPAL framework. In Proceedings of the 3rd ACM SIGPLAN International Workshop on the State of the Art in Java Program Analysis. ACM, 1–6.
[17]
Parvez Faruki, Vijay Ganmoor, Vijay Laxmi, Manoj Singh Gaur, and Ammar Bharmal. 2013. AndroSimilar: robust statistical feature signature for Android malware detection. In Proceedings of the 6th International Conference on Security of Information and Networks. ACM, 152–159.
[18]
Leonid Glanz. 2017. CodeMatch Artifacts. (2017). Retrieved 06/30/2017 from http://www.st.informatik.tu-darmstadt.de/artifacts/codematch/
[19]
Hugo Gonzalez, Natalia Stakhanova, and Ali A Ghorbani. 2014. Droidkin: Lightweight detection of android apps similarity. In International Conference on Security and Privacy in Communication Systems. Springer, 436–453.
[20]
Google. 2017. Android Developers http://developer.android.com/tools/help/ proguard.html. (2017). Retrieved 01/11/2017 from http://developer.android.com/ tools/help/proguard.html
[21]
Google. 2017. Enjarify. (2017). Retrieved 01/11/2017 from https://github.com/ google/enjarify
[22]
Wenjun Hu, Jing Tao, Xiaobo Ma, Wenyu Zhou, Shuang Zhao, and Ting Han. 2014. Migdroid: Detecting app-repackaging android malware via method invocation graph. In 2014 23rd International Conference on Computer Communication and Networks (ICCCN). IEEE, 1–7.
[23]
Smardex Inc. 2017. Allatori Java Obfuscator. (2017). Retrieved 02/20/2017 from http://www.allatori.com
[24]
Jake J. 2017. PlayDrone Archive Snapshot 10/31/2014. (2017). Retrieved 01/11/2017 from http://archive.org/download/playdrone-snapshots/2014-10-31. json
[25]
Jesse Kornblum. 2006. Identifying almost identical files using context triggered piecewise hashing. Digital investigation 3 (2006), 91–97.
[26]
Li Li, Jacques Klein, Yves Le Traon, et al. 2016. An investigation into the use of common libraries in android apps. In 2016 IEEE 23rd International Conference on Software Analysis, Evolution, and Reengineering (SANER), Vol. 1. IEEE, 403–414.
[27]
Menghao Li, Wei Wang, Pei Wang, Shuai Wang, Dinghao Wu, Jian Liu, Rui Xue, and Wei Huo. 2017. Libd: Scalable and precise third-party library detection in Android markets. In Proceedings of the 39th International Conference on Software Engineering. IEEE Press, 335–346.
[28]
Ziang Ma, Haoyu Wang, Yao Guo, and Xiangqun Chen. 2016. LibRadar: fast and accurate detection of third-party libraries in Android apps. In Proceedings of the 38th International Conference on Software Engineering Companion. ACM, 653–656.
[29]
Service mark of Sonatype Inc. 2017. Maven Central. (2017). Retrieved 01/11/2017 from http://search.maven.org/
[30]
GuardSquare nv. 2017. DexGuard Android Obfuscator. (2017). Retrieved 02/20/2017 from https://www.guardsquare.com/en/dexguard
[31]
Siegfried Rasthofer, Steven Arzt, Max Kolhagen, Brian Pfretzschner, Stephan Huber, Eric Bodden, and Philipp Richter. 2015. Droidsearch: A tool for scaling android app triage to real-world app stores. In Science and Information Conference (SAI), 2015. IEEE, 247–256.
[32]
Huan Ren and Huihong Luo. 2017. LeapDroid. (2017). Retrieved 01/11/2017 from http://www.leapdroid.com
[33]
Monirul Sharif, Andrea Lanzi, Jonathon Giffin, and Wenke Lee. 2009. Automatic reverse engineering of malware emulators. In Security and Privacy, 2009 30th IEEE Symposium on. IEEE, 94–109.
[34]
PreEmptive Solutions. 2017. DashO Java Obfuscator. (2017). Retrieved 02/20/2017 from http://www.preemptive.com/products/dasho
[35]
Xabier Ugarte-Pedrero, Igor Santos, Pablo G Bringas, Mikel Gastesi, and José Miguel Esparza. 2011. Semi-supervised learning for packed executable detection. In Network and System Security (NSS), 2011 5th International Conference on. IEEE, 342–346.
[36]
Haoyu Wang, Yao Guo, Ziang Ma, and Xiangqun Chen. 2015. Wukong: A scalable and accurate two-phase approach to android app clone detection. In Proceedings of the 2015 International Symposium on Software Testing and Analysis. ACM, 71–82.
[37]
Christian Winter, Markus Schneider, and York Yannikos. 2013. F2S2: Fast forensic similarity search through indexing piecewise hash signatures. Digital Investigation 10, 4 (2013), 361–371.
[38]
Fangfang Zhang, Heqing Huang, Sencun Zhu, Dinghao Wu, and Peng Liu. 2014. ViewDroid: towards obfuscation-resilient mobile application repackaging detection. In Proceedings of the 2014 ACM conference on Security and privacy in wireless & mobile networks. ACM, 25–36.
[39]
Yury Zhauniarovich, Olga Gadyatskaya, Bruno Crispo, Francesco La Spina, and Ermanno Moser. 2014. FSquaDRA: fast detection of repackaged applications. In IFIP Annual Conference on Data and Applications Security and Privacy. Springer, 130–145.
[40]
Min Zheng, Mingshen Sun, and John CS Lui. 2013. Droid analytics: a signature based analytic system to collect, extract, analyze and associate android malware. In 2013 12th IEEE International Conference on Trust, Security and Privacy in Computing and Communications. IEEE, 163–171.
[41]
Wu Zhou, Zhi Wang, Yajin Zhou, and Xuxian Jiang. 2014. Divilar: Diversifying intermediate language for anti-repackaging on android platform. In Proceedings of the 4th ACM conference on Data and application security and privacy. ACM, 199–210.
[42]
Wu Zhou, Yajin Zhou, Michael Grace, Xuxian Jiang, and Shihong Zou. 2013. Fast, scalable detection of piggybacked mobile applications. In Proceedings of the third ACM conference on Data and application security and privacy. ACM, 185–196.
[43]
Wu Zhou, Yajin Zhou, Xuxian Jiang, and Peng Ning. 2012. Detecting repackaged smartphone applications in third-party android marketplaces. In Proceedings of the second ACM conference on Data and Application Security and Privacy. ACM, 317–326.

Cited By

View all
  • (2024)Evaluation Methodologies in Software Protection ResearchACM Computing Surveys10.1145/3702314Online publication date: 2-Nov-2024
  • (2024)How Does Code Optimization Impact Third-party Library Detection for Android Applications?Proceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering10.1145/3691620.3695554(1919-1931)Online publication date: 27-Oct-2024
  • (2024)Maven Unzipped: Exploring the Impact of Library Packaging on the Ecosystem2024 IEEE International Conference on Software Maintenance and Evolution (ICSME)10.1109/ICSME58944.2024.00016(50-62)Online publication date: 6-Oct-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ESEC/FSE 2017: Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering
August 2017
1073 pages
ISBN:9781450351058
DOI:10.1145/3106237
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 21 August 2017

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. code analysis
  2. library detection
  3. obfuscation
  4. repackage detection

Qualifiers

  • Research-article

Conference

ESEC/FSE'17
Sponsor:

Acceptance Rates

Overall Acceptance Rate 112 of 543 submissions, 21%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)34
  • Downloads (Last 6 weeks)6
Reflects downloads up to 27 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Evaluation Methodologies in Software Protection ResearchACM Computing Surveys10.1145/3702314Online publication date: 2-Nov-2024
  • (2024)How Does Code Optimization Impact Third-party Library Detection for Android Applications?Proceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering10.1145/3691620.3695554(1919-1931)Online publication date: 27-Oct-2024
  • (2024)Maven Unzipped: Exploring the Impact of Library Packaging on the Ecosystem2024 IEEE International Conference on Software Maintenance and Evolution (ICSME)10.1109/ICSME58944.2024.00016(50-62)Online publication date: 6-Oct-2024
  • (2023)LibScanProceedings of the 32nd USENIX Conference on Security Symposium10.5555/3620237.3620427(3385-3402)Online publication date: 9-Aug-2023
  • (2023)Smartphone Security and Privacy: A Survey on APTs, Sensor-Based Attacks, Side-Channel Attacks, Google Play Attacks, and DefensesTechnologies10.3390/technologies1103007611:3(76)Online publication date: 12-Jun-2023
  • (2023)A Survey and Evaluation of Android-Based Malware Evasion Techniques and Detection FrameworksInformation10.3390/info1407037414:7(374)Online publication date: 30-Jun-2023
  • (2023)ANDetect: A Third-party Ad Network Libraries Detection Framework for Android ApplicationsProceedings of the 39th Annual Computer Security Applications Conference10.1145/3627106.3627182(98-112)Online publication date: 4-Dec-2023
  • (2023)Scalably Detecting Third-Party Android Libraries With Two-Stage Bloom FilteringIEEE Transactions on Software Engineering10.1109/TSE.2022.321562849:4(2272-2284)Online publication date: 1-Apr-2023
  • (2023)Libra: Library Identification in Obfuscated Android AppsInformation Security10.1007/978-3-031-49187-0_11(205-225)Online publication date: 1-Dec-2023
  • (2022)Active Warden Attack: On the (In)Effectiveness of Android App Repackage-ProofingIEEE Transactions on Dependable and Secure Computing10.1109/TDSC.2021.310087719:5(3508-3520)Online publication date: 1-Sep-2022
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media