Abstract
This paper addresses the problem of detecting plagiarized mobile apps. Plagiarism is the practice of building mobile apps by reusing code from other apps without the consent of the corresponding app developers. Recent studies on third-party app markets have suggested that plagiarized apps are an important vehicle for malware delivery on mobile phones. Malware authors repackage official versions of apps with malicious functionality, and distribute them for free via these third-party app markets. An effective technique to detect app plagiarism can therefore help identify malicious apps. Code plagiarism has long been a problem and a number of code similarity detectors have been developed over the years to detect plagiarism. In this paper we show that obfuscation techniques can be used to easily defeat similarity detectors that rely solely on statically scanning the code of an app. We propose a dynamic technique to detect plagiarized apps that works by observing the interaction of an app with the underlying mobile platform via its API invocations. We propose API birthmarks to characterize unique app behaviors, and develop a robust plagiarism detection tool using API birthmarks.







Similar content being viewed by others
Explore related subjects
Discover the latest articles and news from researchers in related subjects, suggested using machine learning.References
Allatori Java obfuscator. http://www.allatori.com/
Androguard. http://code.google.com/p/androguard/wiki/Usage#Androsim
Baker, B.S.: On finding duplication and near-duplication in large software systems. In: WCRE (1995)
Baxter, I., Yahin, A., Moura, L., Sant’Anna, M., Bier, L.: Clone detection using abstract syntax trees. In: ICSM (1998)
Bellon, S., Koschke, R., Antoniol, G., Krinke, J., Merlo, E.: Comparison and evaluation of clone detection tools. IEEE Trans. Softw. Eng. 33(9), 577–591 (2007)
Christodorescu, M., Jha, S.: Testing malware detectors. In: ISSTA (2004)
Crussell, J., Gibler, C., Chen, H.: AnDarwin: Scalable detection of semantically similar android applications. In: ESORICS (2013)
Crussell, J., Gibler, C., Chen, H.: Attack of the clones: detecting cloned applications on android markets. In: ESORICS (2012)
Dalvik Debug Monitor Server (DDMS). http://developer.android.com/tools/debugging/ddms.html
Dalvik Debug Monitor Server (DDMS). http://docs.eoeandroid.com/tools/debugging/debugging-tracing.html
DashO Java obfuscator. http://www.preemptive.com/products/dasho/overview
Ducasse, S., Nierstrasz, O., Rieger, M.: On the effectiveness of clone detection by string matching. J. Softw. Maint. 18(1), 37–58 (2006)
Felt, A.P., Finifter, M., Chin, E., Hanna, S., Wagner, D.: A survey of mobile malware in the wild. In: SPSM (2011)
FOSS apps for Android. https://f-droid.org
Gibler, C., Stevens, R., Crussell, J., Chen, H., Zang, H., Choi, H.: Adrob: examining the landscape and impact of android application plagiarism. In: MobiSys (2013)
Hanna, S., Huang, L., Wu, E., Li, S., Chen, C., Song, D.: Juxtapp: a scalable system for detecting code reuse among android applications. In: DIMVA (2012)
Higo, Y., Ueda, Y., Kamiya, T., Kusumoto, S., Inoue, K.: On software maintenance process improvement based on code clone analysis. In: PROFES (2002)
Jaccard index. http://en.wikipedia.org/wiki/Jaccard_index
Kamiya, T., Kusumoto, S., Inoue, K.: CCFinder: a multilinguistic token-based code clone detection system for large scale source code. IEEE Trans. Softw. Eng. 28(7), 654–670 (2002)
Kontogiannis, K., de Mori, R., Bernstein, M., Galler, M., Merlo, E.: Pattern matching for design concept localization. In: WCRE (1995)
Krinke, J.: Identifying similar code with program dependence graphs. In: WCRE (2001)
Lim, H.I., Park, H., Choi, S., Han, T.: Detecting theft of java applications via a static birthmark based on weighted stack patterns. In: IEICE (2008)
Liu, C., Chen, C., Han, J., Yu, P.S.: GPLAG: detection of software plagiarism by program dependence graph analysis. In: KDD (2006)
Machiry, A., Tahiliani, R., Naik, M.: Dynodroid: an input generation system for android apps. In: ESEC/FSE 2013 (2013)
Myles, G., Collberg, C.S.: Detecting software theft via whole program path birthmarks. In: ISC (2004)
Myles, G., Collberg, C.: K-gram based software birthmarks. In: SAC (2005)
PhoneGap. http://phonegap.com/
ProGuard. http://proguard.sourceforge.net/
Rastogi, V., Chen, Y., Jiang, X.: DroidChameleon: evaluating Android anti-malware against transformation attacks. In: ASIACCS (2013)
Schuler, D., Dallmeier, V., Lindig, C.: A dynamic birthmark for java. In: ASE (2007)
Tamada, H., Okamoto, K., Nakamura, M., Monden, A., Matsumoto, K.I.: Dynamic software birthmarks to detect the theft of windows applications. In: ISFST (2004)
UI/Application exerciser monkey. http://developer.android.com/tools/help/monkey.html
Yeh, T., Chang, T.-H., Miller, R.C.: Sikuli: using gui screenshots for search and automation. In: UIST (2009)
You, I., Yim, K.: Malware obfuscation techniques: a brief survey. In: BWCCA (2010)
Zhang, F., Jhi, Y.-C., Wu, D., Liu, P., Zhu, S.: A first step towards algorithm plagiarism detection. In: ISSTA (2012)
Zhou, Y., Jiang, X.: Dissecting android malware: characterization and evolution. In: IEEE Symposium on Security and Privacy (2012)
Zhou, W., Zhou, Y., Jiang, X., Ning, P.: Detecting repackaged smartphone applications in third-party android marketplaces. In: CODASPY (2012)
Zhou, W., Zhou, Y., Grace, M., Jiang, X., Zou, S.: Fast, scalable detection of “piggybacked” mobile applications. In: CODASPY (2013)
Acknowledgments
This work is supported in part by NSF Grants 0952128, 1117711, 1420815, 1441724 and 1408803.
Author information
Authors and Affiliations
Corresponding author
Appendix
Appendix
See Table 9.
Rights and permissions
About this article
Cite this article
Kim, D., Gokhale, A., Ganapathy, V. et al. Detecting plagiarized mobile apps using API birthmarks. Autom Softw Eng 23, 591–618 (2016). https://doi.org/10.1007/s10515-015-0182-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10515-015-0182-6