Skip to main content

Libra: Library Identification in Obfuscated Android Apps

  • Conference paper
  • First Online:
Information Security (ISC 2023)

Abstract

In the Android apps ecosystem, third-party libraries play a crucial role in providing common services and features. However, these libraries introduce complex dependencies that can impact stability, performance, and security. Therefore, detecting libraries used in Android apps is critical for understanding functionality, compliance, and security risks. Existing library identification approaches face challenges when obfuscation is applied to apps, leading to performance degradation. In this study, we propose Libra, a novel solution for library identification in obfuscated Android apps. Libra leverages method headers and bodies, encodes instructions compactly, and employs piecewise fuzzy hashing for effective detection of libraries in obfuscated apps. Our two-phase approach achieves high F1 scores of \(88\%\) for non-obfuscated and 50–87% for obfuscated apps, surpassing previous works by significant margins. Extensive evaluations demonstrate Libra’s effectiveness and robustness against various obfuscation techniques.

K. Nwodo—This work was done as part of an internship at Quokka.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 69.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 89.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    The process of identifying components used in a software is generally known as creating a Software Bill of Materials (SBOM). See https://www.cisa.gov/sbom for more information about the SBOM concept and standards.

  2. 2.

    At the time of this writing, the Maven Central repository [5] had over 11 million indexed library packages.

  3. 3.

    We exclude the instance initializer method (\(\texttt {{<}init{>}}\)), the class initializer method (\(\texttt {{<}clinit{>}}\)), and the resources class (R) since these tend to be highly similar amongst apps and libraries which may lead to spurious matches.

  4. 4.

    CTPH offers advantages over other hashing methods in this setup as it employs a recursive rolling hash where each piece of the hash is computed based on parts of the data and is not influenced by previously processed data. Consequently, if there are changes to the sequences being hashed, only a small portion of the hash is affected. This is a desirable property for library identification in obfuscated apps since changes to the library bytecode packed in the app are expected.

  5. 5.

    Note that the Android SDK Support Library [7] was excluded from the counts for consistency with all evaluated tools.

References

  1. Allatori. https://allatori.com/

  2. Dasho. https://www.preemptive.com/products/dasho/

  3. Get started with the NDK. https://developer.android.com/ndk/guides

  4. Libdetect dataset. https://sites.google.com/view/libdetect/home/dataset

  5. Maven repository: Central. https://mvnrepository.com/repos/central

  6. Proguard. https://www.guardsquare.com/proguard

  7. Support Library \(|\) Android Developers. https://developer.android.com/topic/libraries/support-library

  8. SolarWinds attack explained: And why it was so hard to detect (2020). https://www.csoonline.com/article/3601508/solarwinds-supply-chain-attack-explained-why-organizations-were-not-prepared.html

  9. Synopsys research reveals significant security concerns in popular mobile apps amid pandemic (2021). https://news.synopsys.com/2021-03-25-Synopsys-Research-Reveals-Significant-Security-Concerns-in-Popular-Mobile-Apps-Amid-Pandemic

  10. Number of apps available in leading app stores as of 3rd quarter 2022 (2021). https://www.statista.com/statistics/276623/number-of-apps-available-in-leading-app-stores/

  11. Numbers from Google I/O: 3 billion active Android devices (2022). https://9to5google.com/2022/05/11/google-io-2022-numbers/

  12. Shrink, obfuscate, and optimize your app (2023). https://developer.android.com/studio/build/shrink-code.html

  13. Ali, M.: Sensors Sandbox. https://github.com/mustafa01ali/SensorsSandbox

  14. Almanee, S., Ünal, A., Payer, M., Garcia, J.: Too quiet in the library: an empirical study of security updates in Android apps’ native code. In: 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE). IEEE (2021)

    Google Scholar 

  15. Backes, M., Bugiel, S., Derr, E.: Reliable third-party library detection in Android and its security applications. In: Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security (2016)

    Google Scholar 

  16. Derr, E., Bugiel, S., Fahl, S., Acar, Y., Backes, M.: Keep me updated: an empirical study of third-party library updatability on Android. In: Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security (2017)

    Google Scholar 

  17. Duan, R., Bijlani, A., Xu, M., Kim, T., Lee, W.: Identifying open-source license violation and 1-day security risk at large scale. In: Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security (2017)

    Google Scholar 

  18. Fisher, R.A.: The use of multiple measurements in taxonomic problems. Ann. Eugen. 7(2), 179–188 (1936)

    Article  Google Scholar 

  19. Glanz, L., et al.: CodeMatch: obfuscation won’t conceal your repackaged app. In: Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering (2017)

    Google Scholar 

  20. Han, H., Li, R., Tang, J.: Identify and inspect libraries in Android applications. Wirel. Pers. Commun. 103(1), 491–503 (2018)

    Article  Google Scholar 

  21. Huang, J., et al.: Scalably detecting third-party Android libraries with two-stage bloom filtering. IEEE Trans. Softw. Eng. (2022)

    Google Scholar 

  22. Kornblum, J.: Identifying almost identical files using context triggered piecewise hashing. Digit. Investig. 3, 91–97 (2006)

    Article  Google Scholar 

  23. Levenshtein, V.I., et al.: Binary codes capable of correcting deletions, insertions, and reversals. In: Soviet Physics Doklady (1966)

    Google Scholar 

  24. Li, M., et al.: LIBD: scalable and precise third-party library detection in Android markets. In: 2017 IEEE/ACM 39th International Conference on Software Engineering (ICSE) (2017)

    Google Scholar 

  25. Liu, B., Liu, B., Jin, H., Govindan, R.: Efficient privilege de-escalation for ad libraries in mobile apps. In: Proceedings of the 13th Annual International Conference on Mobile Systems, Applications, and Services, pp. 89–103 (2015)

    Google Scholar 

  26. Ma, Z., Wang, H., Guo, Y., Chen, X.: LibRadar: fast and accurate detection of third-party libraries in Android apps. In: Proceedings of the 38th International Conference on Software Engineering Companion (2016)

    Google Scholar 

  27. Narayanan, A., Chen, L., Chan, C.K.: AdDetect: automated detection of Android ad libraries using semantic analysis. In: 2014 IEEE Conference on Intelligent Sensors, Sensor Networks and Information Processing (ISSNIP) (2014)

    Google Scholar 

  28. Sihag, V., Vardhan, M., Singh, P.: A survey of Android application and malware hardening. Comput. Sci. Rev. 39, 100365 (2021)

    Article  Google Scholar 

  29. Soh, C., Tan, H.B.K., Arnatovich, Y.L., Narayanan, A., Wang, L.: LibSift: automated detection of third-party libraries in Android applications. In: 2016 23rd Asia-Pacific Software Engineering Conference (APSEC) (2016)

    Google Scholar 

  30. Tang, W., Luo, P., Fu, J., Zhang, D.: LibDX: a cross-platform and accurate system to detect third-party libraries in binary code. In: 2020 IEEE 27th International Conference on Software Analysis, Evolution and Reengineering (SANER) (2020)

    Google Scholar 

  31. Tang, Z., et al.: Securing Android applications via edge assistant third-party library detection. Comput. Secur. 80 (2019)

    Google Scholar 

  32. Wang, H., Guo, Y., Ma, Z., Chen, X.: Wukong: a scalable and accurate two-phase approach to Android app clone detection. In: Proceedings of the 2015 International Symposium on Software Testing and Analysis (2015)

    Google Scholar 

  33. Wang, Y., Rountev, A.: Who changed you? Obfuscator identification for Android. In: 2017 IEEE/ACM 4th International Conference on Mobile Software Engineering and Systems (MOBILESoft), pp. 154–164. IEEE (2017)

    Google Scholar 

  34. Wang, Y., Wu, H., Zhang, H., Rountev, A.: ORLIS: obfuscation-resilient library detection for Android. In: 2018 IEEE/ACM 5th International Conference on Mobile Software Engineering and Systems (MOBILESoft) (2018)

    Google Scholar 

  35. Wang, Y., et al.: An empirical study of usages, updates and risks of third-party libraries in Java projects. In: 2020 IEEE International Conference on Software Maintenance and Evolution (ICSME), pp. 35–45. IEEE (2020)

    Google Scholar 

  36. Xu, J., Yuan, Q.: LibRoad: rapid, online, and accurate detection of TPLs on Android. IEEE Trans. Mob. Comput. 21(1) (2020)

    Google Scholar 

  37. Zhan, X., et al.: ATVHunter: reliable version detection of third-party libraries for vulnerability identification in Android applications. In: 43rd International Conference on Software Engineering (2021)

    Google Scholar 

  38. Zhan, X., et al.: Automated third-party library detection for Android applications: are we there yet? In: 2020 35th IEEE/ACM International Conference on Automated Software Engineering (ASE), pp. 919–930. IEEE (2020)

    Google Scholar 

  39. Zhan, X., et al.: Research on third-party libraries in Android apps: a taxonomy and systematic literature review. IEEE Trans. Softw. Eng. 48(10) (2022)

    Google Scholar 

  40. Zhang, F., Huang, H., Zhu, S., Wu, D., Liu, P.: ViewDroid: towards obfuscation-resilient mobile application repackaging detection. In: Proceedings of the 2014 ACM Conference on Security and Privacy in Wireless & Mobile Networks (2014)

    Google Scholar 

  41. Zhang, J., Beresford, A.R., Kollmann, S.A.: LibID: reliable identification of obfuscated third-party Android libraries. In: Proceedings of the 28th ACM SIGSOFT International Symposium on Software Testing and Analysis, pp. 55–65 (2019)

    Google Scholar 

  42. Zhang, Y., Wang, J., Huang, H., Zhang, Y., Liu, P.: Understanding and conquering the difficulties in identifying third-party librariesfrom millions of Android apps. IEEE Trans. Big Data (2021)

    Google Scholar 

  43. Zhang, Y., et al.: Detecting third-party libraries in Android applications with high precision and recall. In: IEEE 25th Conference on Software Analysis, Evolution and Reengineering (2018)

    Google Scholar 

  44. Zhang, Z., Diao, W., Hu, C., Guo, S., Zuo, C., Li, L.: An empirical study of potentially malicious third-party libraries in Android apps. In: 13th ACM Conference on Security and Privacy in Wireless and Mobile Networks (2020)

    Google Scholar 

Download references

Acknowledgment

We thank the anonymous reviewers for their insightful feedback. Opinions expressed in this article are those of the authors and do not necessarily reflect the official policy or position of their respective institutions.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to David A. Tomassi .

Editor information

Editors and Affiliations

Appendices

A Method Encoding Codebook

Table 10 shows the codebook used by Libra to encode method instructions. We conducted feature selection to determine the best mapping using Fisher’s score [18] to gain insights into the most discriminatory instructions. Our analysis revealed that field getters, setters, and arithmetic operators exhibited low variance, making them less useful for discrimination. Consequently, we decided to combine these arithmetic instructions into a single move instruction.

Table 10. Bytecode encoding codebook used by Libra.

B Search Space Reduction from Library Pairing

The pairing size complexity for pairs that satisfy condition one is O(k), where n is the number of libraries in the database, and \(k \ll n\) represents the group size. On the other hand, the pairing size complexity for condition two is \(O(|P_{C2}|)\), where \(P_{C2}\) is defined as:

$$ P_{C2} = \left\{ \langle C, L \rangle \mid C \in A, L \in D, \frac{{{\,\textrm{abs}\,}}(|A| - |D|)}{\max (|A|, |D|)} < \tau \right\} $$

where C is the library candidate, L is the library, A is the app, and D is the database. If no conditions are met, the library candidate is paired with the entire database, resulting in a pairing size complexity of O(n). Note that this is unlikely as there are a wide range of library sizes from the order of \(10^0\) to \(10^3\) and condition two is likely to be met.

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Tomassi, D.A., Nwodo, K., Elsabagh, M. (2023). Libra: Library Identification in Obfuscated Android Apps. In: Athanasopoulos, E., Mennink, B. (eds) Information Security. ISC 2023. Lecture Notes in Computer Science, vol 14411. Springer, Cham. https://doi.org/10.1007/978-3-031-49187-0_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-49187-0_11

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-49186-3

  • Online ISBN: 978-3-031-49187-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics