Lib2Desc: automatic generation of security-centric Android app descriptions using third-party libraries

Cevik, Beyza; Altiparmak, Nur; Aksu, Murat; Sen, Sevil

doi:10.1007/s10207-022-00601-x

Lib2Desc: automatic generation of security-centric Android app descriptions using third-party libraries

Regular contribution
Published: 04 August 2022

Volume 21, pages 1107–1125, (2022)
Cite this article

International Journal of Information Security Aims and scope Submit manuscript

Beyza Cevik ORCID: orcid.org/0000-0002-3266-2389¹,
Nur Altiparmak¹,
Murat Aksu^1,2 &
…
Sevil Sen¹

359 Accesses
1 Citation
Explore all metrics

Abstract

Android app developers are expected to specify the use of dangerous permissions in their app descriptions. The absence of such data indicates suspicious behavior. However, this is not always caused by the malicious intent of developers; it may be due to the lack of documentation of the third-party libraries they use. To fill this gap in the literature, this study aims to enrich application descriptions with security-centric information of third-party libraries. To automatically generate application definitions, the study explores classifying libraries and extracting code summaries of library methods that use dangerous permissions and/or leak data. Both the textual information of third-party libraries and their source code are used to create these definitions. To the best of our knowledge, this is the first approach in the literature that creates app descriptions based on third-party libraries.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Large Language Model Assisted Software Engineering: Prospects, Challenges, and a Case Study

CoRT: Transformer-based code representations with self-supervision by predicting reserved words for code smell detection

Article 08 April 2024

Evaluating ChatGPT’s Proficiency in Understanding and Answering Microservice Architecture Queries Using Source Code Insights

Article 10 April 2024

Notes

The datasets generated during and/or analyzed during the current study are available in the Lib2Desc repository, [https://wise.cs.hacettepe.edu.tr/projects/desre/Lib2Desc/].

References

Sen, S., Can, B.: Android security using nlp techniques: a review. Preprint arXiv:2107.03072, (2021)
Qu, Z., Rastogi, V., Zhang, X., Chen, Y., Zhu, T., Chen, Z.: Autocog: measuring the description-to-permission fidelity in android applications. In: Proceedings of the 2014 ACM SIGSAC Conference on Computer and Communications Security, pp. 1354–1365 (2014)
Feng, Y., Chen, L., Zheng, A., Gao, C., Zheng, Z.: Ac-net: assessing the consistency of description and permission in android apps. IEEE Access 7, 57829–57842 (2019)
Article Google Scholar
Alecakir, H., Kabukcu, M., Can, B., Sen, S.: Discovering inconsistencies between requested permissions and application metadata by using deep learning. In: 2020 International Conference on Information Security and Cryptology (ISCTURKEY), pp. 56–56, IEEE (2020)
Alecakir, H., Can, B., Sen, S.: Attention: there is an inconsistency between android permissions and application metadata!, pp. 1–19 (2021)
Andow, B., Nadkarni, A., Bassett, B., Enck, W., Xie, T.: A study of grayware on google play. In: 2016 IEEE Security and Privacy Workshops (SPW), pp. 224–233, IEEE (2016)
Wang, H., Guo, Y.: Understanding third-party libraries in mobile app analysis. In: 2017 IEEE/ACM 39th International Conference on Software Engineering Companion (ICSE-C), pp. 515–516, IEEE (2017)
Privacygrade: Grading the privacy of smartphone apps. (2021). (Visited September 2021) [Online]. Available: http://privacygrade.org/
Wang, H., Guo, Y., Ma, Z., Chen, X.: Wukong: a scalable and accurate two-phase approach to android app clone detection. In: Proceedings of the 2015 International Symposium on Software Testing and Analysis, pp. 71–82 (2015)
Book, T., Pridgen, A., Wallach, D.S.: Longitudinal analysis of android ad library permissions. Preprint arXiv:1303.0857, (2013)
Stevens, R., Gibler, C., Crussell, J., Erickson, J., Chen, H.: Investigating user privacy in android ad libraries. In: Workshop on Mobile Security Technologies (MoST), vol. 10, Citeseer (2012)
Zhang, M., Duan, Y., Feng, Q., Yin, H.: Towards automatic generation of security-centric descriptions for android apps. In: Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security, pp. 518–529, ACM (2015)
Wu, T., Tang, L., Zhang, R., Wen, S., Paris, C., Nepal, S., Grobler, M., Xiang, Y.: Catering to your concerns: automatic generation of personalised security-centric descriptions for android apps. ACM Trans. Cyber-Phys. Syst. 3(4), 36 (2019)
Article Google Scholar
Ahmad, W.U., Chakraborty, S., Ray, B., Chang, K.: A transformer-based approach for source code summarization. CoRR, arXiv:abs/2005.00653, (2020)
Liu, X., Leng, Y., Yang, W., Zhai, C., Xie, T.: Mining android app descriptions for permission requirements recommendation. In: 2018 IEEE 26th International Requirements Engineering Conference (RE), pp. 147–158, IEEE (2018)
Wu, T., Tang, L., Zhang, R., Wen, S., Paris, C., Nepal, S., Grobler, M., Xiang, Y.: Catering to your concerns. ACM Trans. Cyber-Phys. Syst. 3, 1–21 (2019)
Article Google Scholar
John, O., Naumann, L., Soto, C.: Paradigm shift to the integrative big five trait taxonomy: History, measurement, and conceptual issues, pp. 114–158. 01 (2008)
Yu, L., Zhang, T., Luo, X., Xue, L.: Autoppg: towards automatic generation of privacy policy for android applications. In: Proceedings of the 5th Annual ACM CCS Workshop on Security and Privacy in Smartphones and Mobile Devices, pp. 39–50 (2015)
Chen, W., Aspinall, D., Gordon, A.D., Sutton, C., Muttik, I.: A text-mining approach to explain unwanted behaviours. In: Proceedings of the 9th European Workshop on System Security, p. 4, ACM (2016)
Tibshirani, R.: Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B (Methodol.) 58(1), 267–288 (1996)
MathSciNet MATH Google Scholar
Grace, M.C., Zhou, W., Jiang, X., Sadeghi, A.-R.: Unsafe exposure analysis of mobile in-app advertisements. In: Proceedings of the fifth ACM conference on Security and Privacy in Wireless and Mobile Networks, pp. 101–112 (2012)
He, Y., Yang, X., Hu, B., Wang, W.: Dynamic privacy leakage analysis of android third-party libraries. J. Inf. Secur. Appl. 46, 259–270 (2019)
Google Scholar
Gorla, A., Tavecchia, I., Gross, F., Zeller, A.: Checking app behavior against app descriptions. In: Proceedings of the 36th International Conference on Software Engineering, pp. 1025–1035 (2014)
Zhang, C., Wang, H., Wang, R., Guo, Y., Xu, G.: Re-checking app behavior against app description in the context of third-party libraries. In: SEKE, pp. 665–664 (2018)
Narayanan, A., Chen, L., Chan, C.K.: Addetect: automated detection of android ad libraries using semantic analysis. In: 2014 IEEE Ninth International Conference on Intelligent Sensors, Sensor Networks and Information Processing (ISSNIP), pp. 1–6, IEEE (2014)
Liu, B., Liu, B., Jin, H., Govindan, R.: Efficient privilege de-escalation for ad libraries in mobile apps. In: Proceedings of the 13th Annual International Conference on Mobile Systems, Applications, and Services, pp. 89–103 (2015)
Allamanis, M., Barr, E.T., Devanbu, P., Sutton, C.: A survey of machine learning for big code and naturalness. ACM Comput. Surv., vol. 51 (2018)
Haiduc, S., Aponte, J., Marcus, A.: Supporting program comprehension with source code summarization. In: Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering - Volume 2, ICSE ’10, (New York, NY, USA), pp. 223–226, Association for Computing Machinery (2010)
LeClair, A., Jiang, S., McMillan, C.: A neural model for generating natural language summaries of program subroutines. CoRR, arXiv:abs/1902.01954 (2019)
Sridhara, G., Pollock, L., Vijay-Shanker, K.: Automatically detecting and describing high level actions within methods. In: 2011 33rd International Conference on Software Engineering (ICSE), pp. 101–110 (2011)
McBurney, P., McMillan, C.: Automatic documentation generation via source code summarization of method context. In: 2nd International Conference on Program Comprehension, ICPC 2014 - Proceedings, 06 (2014)
Alon, U., Levy, O., Yahav, E.: code2seq: Generating sequences from structured representations of code. CoRR, arXiv:abs/1808.01400, (2018)
Hu, X., Li, G., Xia, X., Lo, D., Jin, Z.: Deep code comment generation. In: Proceedings of the 26th Conference on Program Comprehension, ICPC ’18, (New York, NY, USA), pp. 200–210, Association for Computing Machinery (2018)
Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks. CoRR, arXiv:abs/1409.3215, (2014)
Wang, W., Zhang, Y., Zeng, Z., Xu, G.: Trans \(\hat{}\) 3: A transformer-based framework for unifying code summarization and code search. CoRR, arXiv:abs/2003.03238, (2020)
Shi, E., Wang, Y., Du, L., Chen, J., Han, S., Zhang, H., Zhang, D., Sun, H.: Neural code summarization: How far are we? (2021)
Hu, X., Li, G., Xia, X., Lo, D., Lu, S., Jin, Z.: Summarizing source code with transferred api knowledge. In: Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, IJCAI-18, pp. 2269–2275, International Joint Conferences on Artificial Intelligence Organization, 7 (2018)
Rodeghero, P., McMillan, C., Shirey, A.: Api usage in descriptions of source code functionality. In: 2017 IEEE/ACM 1st International Workshop on API Usage and Evolution (WAPI), pp. 3–6, IEEE (2017)
Shahbazi, R., Sharma, R., Fard, F.H.: Api2com: On the improvement of automatically generated code comments using API documentations. CoRR, arXiv:abs/2103.10668, (2021)
Android arsenal: Android developer portal with tools, libraries, and app. https://android-arsenal.com/. Online; last accessed on April 4 (2022)
Sonatype, Maven central repository search. https://search.maven.org/, 2017. Online; last accessed on November 2 (2021)
JFrog, I.: Spring.io. https://repo.spring.io/, 2013. Online; last accessed on November 2 (2021)
JFrog, I.: Jcenter is the place to find and share popular apache maven packages. https://bintray.com/bintray/jcenter, 2016. Online; last accessed on November 2 (2021)
JitPack, Jitpack | publish jvm and android libraries. https://jitpack.io/, 2015. Online; last accessed on November 2 (2021)
Backes, M., Bugiel, S., Derr, E., McDaniel, P., Octeau, D., Weisgerber, S.: On demystifying the android application framework: Re-visiting android permission specification analysis. In: 25th \(\{\)USENIX\(\}\) security symposium (\(\{\)USENIX\(\}\) security 16), pp. 1101–1118 (2016)
Ma, Z., Wang, H., Guo, Y., Chen, X.: Libradar: Fast and accurate detection of third-party libraries in android apps. In: 2016 IEEE/ACM 38th International Conference on Software Engineering Companion (ICSE-C), pp. 653–656 (2016)
Li, M., Wang, W., Wang, P., Wang, S., Wu, D., Liu, J., Xue, R., Huo, W.: Libd: scalable and precise third-party library detection in android markets. In: 2017 IEEE/ACM 39th International Conference on Software Engineering (ICSE), pp. 335–346 (2017)
Zhang, Y., Dai, J., Zhang, X., Huang, S., Yang, Z., Yang, M., Chen, H.: Detecting third-party libraries in android applications with high precision and recall. In: 2018 IEEE 25th International Conference on Software Analysis, Evolution and Reengineering (SANER), pp. 141–152 (2018)
Derr, E., Bugiel, S., Fahl, S., Acar, Y., Backes, M.: Keep me updated: An empirical study of third-party library updatability on android. In: Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, CCS ’17, (New York, NY, USA), pp. 2187–2200, ACM (2017)
Zhang, J., Beresford, A.R., Kollmann, S.A.: Libid: reliable identification of obfuscated third-party android libraries. In: Proceedings of the 28th ACM SIGSOFT International Symposium on Software Testing and Analysis, ISSTA 2019, (New York, NY, USA), p. 55-65, Association for Computing Machinery (2019)
Backes, M., Bugiel, S., Derr, E., McDaniel, P., Octeau, D., Weisgerber, S.: On demystifying the android application framework: Re-visiting android permission specification analysis. In: 25th \(\{\)USENIX\(\}\) security symposium (\(\{\)USENIX\(\}\) security 16), pp. 1101–1118 (2016)
Arzt, S., Rasthofer, S., Fritz, C., Bodden, E., Bartel, A., Klein, J., Le Traon, Y., Octeau, D., McDaniel, P.: Flowdroid: precise context, flow, field, object-sensitive and lifecycle-aware taint analysis for android apps. Acm Sigplan Notices 49(6), 259–269 (2014)
Article Google Scholar
Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. CoRR, arXiv:abs/1810.04805, (2018)
Wolf, T., Debut, L., Sanh, V., Chaumond, J., Delangue, C., Moi, A., Cistac, P., Ma, C., Jernite, Y., Plu, J., Xu, C., Le Scao, T., Gugger, S., Drame, M., Lhoest, Q., Rush, A.M.: Transformers: State-of-the-Art Natural Language Processing 10 (2020)
Bird, S.: Nltk: the natural language toolkit. In: Proceedings of the COLING/ACL 2006 Interactive Presentation Sessions, pp. 69–72 (2006)
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., et al.: Scikit-learn: machine learning in python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
MathSciNet MATH Google Scholar
Stevens, R., Gibler, C., Crussell, J., Erickson, J., Chen, H.: Investigating user privacy in android ad libraries. In: Workshop on Mobile Security Technologies (MoST), vol. 10, Citeseer (2012)
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L., Polosukhin, I.: Attention is all you need. CoRR, arXiv:abs/1706.03762, (2017)
Hu, X., Li, G., Xia, X., Lo, D., Lu, S., Jin, Z.: Summarizing source code with transferred api knowledge. In: Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, IJCAI-18, pp. 2269–2275, International Joint Conferences on Artificial Intelligence Organization, 7 (2018)
LeClair, A., McMillan, C.: Recommendations for datasets for source code summarization. CoRR, arXiv:abs/1904.02660, (2019)
Feizollah, A., Anuar, N.B., Salleh, R., Wahab, A.W.A.: A review on feature selection in mobile malware detection. Digital Investig. 13, 22–37 (2015)
Article Google Scholar
Qu, Z., Rastogi, V., Zhang, X., Zhu, T., Chen, Z.: Autocog: measuring the description-to-permission fidelity in android applications. In: Proceedings of the ACM Conference on Computer and Communications Security, pp. 1354–1365, 11 (2014)
Zhang, F., Huang, H., Zhu, S., Wu, D., Liu, P.: Viewdroid: towards obfuscation-resilient mobile application repackaging detection. In: Proceedings of the 2014 ACM Conference on Security and Privacy in Wireless & Mobile Networks, pp. 25–36 (2014)

Download references

Acknowledgements

We would like to thank TUBITAK for its support. This study is supported by the Scientific and Technological Research Council of Turkey (TUBITAK-118E141).

Author information

Authors and Affiliations

WISE Lab., Department of Computer Engineering, Hacettepe University, Ankara, Turkey
Beyza Cevik, Nur Altiparmak, Murat Aksu & Sevil Sen
Department of Computer Engineering, Izmir Bakircay University, Izmir, Turkey
Murat Aksu

Authors

Beyza Cevik
View author publications
You can also search for this author in PubMed Google Scholar
Nur Altiparmak
View author publications
You can also search for this author in PubMed Google Scholar
Murat Aksu
View author publications
You can also search for this author in PubMed Google Scholar
Sevil Sen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Beyza Cevik.

Ethics declarations

Conflict of interest

Author Beyza Cevik declares that he has no conflict of interest. Author Nur Altiparmak declares that she has no conflict of interest. Author Murat Aksu declares that she has no conflict of interest. Author Sevil Sen declares that she has no conflict of interest.

Ethical approval

This article does not contain any study with human participants or animals performed by any of the authors.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Cevik, B., Altiparmak, N., Aksu, M. et al. Lib2Desc: automatic generation of security-centric Android app descriptions using third-party libraries. Int. J. Inf. Secur. 21, 1107–1125 (2022). https://doi.org/10.1007/s10207-022-00601-x

Download citation

Published: 04 August 2022
Issue Date: October 2022
DOI: https://doi.org/10.1007/s10207-022-00601-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Lib2Desc: automatic generation of security-centric Android app descriptions using third-party libraries

Abstract

Access this article

Similar content being viewed by others

Large Language Model Assisted Software Engineering: Prospects, Challenges, and a Case Study

CoRT: Transformer-based code representations with self-supervision by predicting reserved words for code smell detection

Evaluating ChatGPT’s Proficiency in Understanding and Answering Microservice Architecture Queries Using Source Code Insights

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Lib2Desc: automatic generation of security-centric Android app descriptions using third-party libraries

Abstract

Access this article

Similar content being viewed by others

Large Language Model Assisted Software Engineering: Prospects, Challenges, and a Case Study

CoRT: Transformer-based code representations with self-supervision by predicting reserved words for code smell detection

Evaluating ChatGPT’s Proficiency in Understanding and Answering Microservice Architecture Queries Using Source Code Insights

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation