Demystifying API misuses in deep learning applications

Yang, Deheng; Liu, Kui; Lei, Yan; Li, Li; Xie, Huan; Liu, Chunyan; Wang, Zhenyu; Mao, Xiaoguang; Bissyandé, Tegawendé F.

doi:10.1007/s10664-023-10413-9

Demystifying API misuses in deep learning applications

Published: 16 February 2024

Volume 29, article number 45, (2024)
Cite this article

Empirical Software Engineering Aims and scope Submit manuscript

Deheng Yang¹,
Kui Liu²,
Yan Lei ORCID: orcid.org/0000-0003-4504-6806^3,4,
Li Li⁵,
Huan Xie^3,4,
Chunyan Liu^3,4,
Zhenyu Wang³,
Xiaoguang Mao¹ &
…
Tegawendé F. Bissyandé^6,7

215 Accesses
Explore all metrics

Abstract

Deep Learning (DL) is achieving staggering performance on an increasing number of applications in various areas. Meanwhile, its associated data-driven programming paradigm comes with a set of challenges for the software engineering community, including the debugging activities for DL applications. Recent empirical studies on bugs in DL applications have shown that the API (i.e., Application Program Interface) misuse has been flagged as an important category of DL programming bugs. By exploring this literature towards API misuse bugs in DL applications, we identified three barriers that are locking an entire research direction. However, three barriers are hindering progress in this research direction: misclassification of API misuse bugs, lack of relevant dataset, and limited depth of analysis. Our work unlocks these barriers by providing an in-depth analysis of a frequent bug type that appears as a mystery. Concretely, we first offer a new perspective to a significant misclassification issue in the literature that hinders understanding of API misuses in DL applications. Subsequently, we curate the first dataset MisuAPI of 143 API misuses sampled from real-world DL applications. Finally, we perform systematic analyses to dissect API misuses and enumerate the symptoms of API misuses in DL applications as well as investigate the possibility of detecting them with state-of-the-art static analyzers. Overall, the insights summarized in this work are important for the community: 1) 18-35% of real API misuses are mislabelled in existing DL bug studies; 2) the widely adopted API misuse taxonomy, namely MUC, does not cover the cases of 1 out of 3 encountered API misuses; 3) DL library API misuses show significant differences from the general third-party library API misuses in terms of the API-usage element issue and symptoms; 4) Most (92.3%) API misuses lead to program crashes; 5) 95.8% API misuses remain undetectable by state-of-the-art static analyzers.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 2

Fig. 3

Fig. 8

Fig. 14

Data collection and quality challenges in deep learning: a data-centric AI perspective

Article 03 January 2023

Future of software development with generative AI

Article Open access 11 March 2024

Applications of AI in classical software engineering

Article Open access 26 July 2020

Data Availability

For the sake of Open Science, we make the replication package with source code and the curated dataset MisuAPI publicly available at: https://zenodo.org/record/7684952

Notes

https://github.com/deep-learning/facenet/commit/c39be02589d9cfa08b673d2bae20cde160305c8a
https://github.com/tensorflow/models/pull/1532
https://github.com/GumTreeDiff/gumtree/tree/v3.0.0-beta1
https://github.com/deezer/spleeter
https://github.com/tensorpack/tensorpack
The Inner-Project API category contains seven incorrect type cases and three incorrect value cases, which is consistent with DL Library API category, since the development of inner-project APIs of DL applications is closer to the invocations of DL library APIs.
https://github.com/google/prettytensor/commit/01ee67d6e0cc5e9d6ae5f07045024a638564fe78
https://github.com/horovod/horovod/commit/9420ef71c197b544f122a08ccb8db5491afa3548
https://github.com/tensorflow/models/pull/1480
https://github.com/tensorflow/models/issues/1390
The identification results can be found at: https://doi.org/10.5281/zenodo.8302351. The identification results include whether each API bug is an API misuse and the reason for it being or not being an API misuse.

References

A curated list of static analysis (sast) tools for all programming languages. https://github.com/analysis-tools-dev/static-analysis#python. Accessed June 2021
Abadi M, Barham P, Chen J, Chen Z, Davis A, Dean J, Devin M, Ghemawat S, Irving G, Isard M, et al (2016) Tensorflow: a system for large-scale machine learning. In 12th \(\{\)USENIX\(\}\) symposium on operating systems design and implementation (\(\{\)OSDI\(\}\) 16), p 265–283
Al-Rfou R, Alain G, Almahairi A, Angermueller C, Bahdanau D, Ballas N, Bastien F, Bayer J, Belikov A, The Theano Development Team et al (2016)Theano: a python framework for fast computation of mathematical expressions. arXiv:1605.02688
Amann S, Nguyen HA, Nadi S, Nguyen TN, Mezini M (2018) A systematic evaluation of static api-misuse detectors. IEEE Trans Softw Eng 45(12):1170–1188
Article Google Scholar
Amann S, Nadi S, Nguyen HA, Nguyen TN, Mezini M (2016) Mubench: a benchmark for api-misuse detectors. In Proceedings of the 13th international conference on mining software repositories, pp 464–467
Amann S, Nguyen HA, Nadi S, Nguyen TN, Mezini M (2019) Investigating next steps in static api-misuse detection. In 2019 IEEE/ACM 16th international conference on mining software repositories (MSR), pp 265–275. IEEE
Artifact page of our study (2023). https://github.com/DehengYang/MisuAPI
Bonifacio R, Krüger S, Narasimhan K, Bodden E, Mezini M (2021) Dealing with variability in api misuse specification. arXiv:2105.04950
Cambronero J, Li H, Kim S, Sen K, Chandra S (2019) When deep learning met code search. In Proceedings of the 2019 27th ACM joint meeting on European software engineering conference and symposium on the foundations of software engineering, pp 964–974
Cao J, Li M, Chen X, Wen M, Tian Y, Wu B, Cheung S-C (2022) Deepfd: automated fault diagnosis and localization for deep learning programs. In: Proceedings of the 44th international conference on software engineering, pp 573–585
Casalnuovo C, Suchak Y, Ray B, Rubio-González C (2017) Gitcproc: a tool for processing and classifying github commits. In: Proceedings of the 26th ACM SIGSOFT international symposium on software testing and analysis, pp 396–399
CEO Nvidia (2023) Software is eating the world, but AI is going to eat software. T. Simonite
Chen Z, Yao H, Lou Y, Cao Y, Liu Y, Wang H, Liu X (2021) An empirical study on deployment faults of deep learning based mobile applications. In: 2021 IEEE/ACM 43rd international conference on software engineering (ICSE), pp 674–685. IEEE
Dilhara M, Ketkar A, Dig D (2021) Understanding software-2.0: a study of machine learning library usage and evolution. ACM Trans Soft Eng Methodol (TOSEM) 30(4):1–42
Eghbali A, Pradel M (2020) No strings attached: an empirical study of string-related software bugs. In: 2020 35th IEEE/ACM international conference on automated software engineering (ASE), pp 956–967. IEEE
Example of a missing api with missing exception handling. https://github.com/tensorpack/tensorpack/commit/132dcccd34a831a01e4fcdbd32f869b36f04537e. Accessed June 2021
Example of a misused api with incorrect api call sequence. https://github.com/deezer/spleeter/commit/55723cfa6296388ea1f584e2591f1d89e4c0afb6. Accessed June 2021
Example of a misused api with missing api call. https://github.com/tensorflow/models/commit/001a260214ba34f36e149bbd24f7f5d6a6634500. Accessed June 2021
Example of a misused api with missing condition. https://github.com/tensorpack/tensorpack/commit/ae84b52ad5402ab1716e0f1e9790ce1da9d706d1. Accessed June 2021
Example of a misused dl library api depending on the specific device. https://github.com/google/prettytensor/commit/01ee67d6e0cc5e9d6ae5f07045024a638564fe78. Accessed June 2021
Example of an incorrect parameter value. https://github.com/google/tf-quant-finance/commit/258844720a9bccd326c7b33735f7f81c2d483630. Accessed June 2021
Falleri J-R, Morandat F, Blanc X, Martinez M, Monperrus M (2014) Fine-grained and accurate source code differencing. In: Proceedings of the 29th ACM/IEEE international conference on automated software engineering, pp 313–324
Forward A, Lethbridge TC (2008) A taxonomy of software types to facilitate search and evidence-based software engineering. In: Proceedings of the 2008 conference of the center for advanced studies on collaborative research: meeting of minds, pp 179–191
Github api. https://docs.github.com/en/rest/reference/search. Accessed June 2021
Gulli A, Pal S (2017) Deep learning with Keras. Packt Publishing Ltd
Gu Z, Wu J, Liu J, Zhou M, Gu M (2019) An empirical study on api-misuse bugs in open-source c programs. In: 2019 IEEE 43rd annual computer software and applications conference (COMPSAC), vol 1, pp 11–20. IEEE
Humbatova N, Jahangirova G, Bavota G, Riccio V, Stocco A, Tonella P (2020) Taxonomy of real faults in deep learning systems. In: Proceedings of the ACM/IEEE 42nd international conference on software engineering, pp 1110–1121
Institute of Electrical and Electronics Engineers (1987) IEEE Standard Taxonomy for Software Engineering Standards
Islam MdJ (2020) Towards understanding the challenges faced by machine learning software developers and enabling automated solutions
Islam MdJ, Nguyen HA, Pan R, Rajan H (2019) What do developers ask about ml libraries? a large-scale study using stack overflow. arXiv:1906.11940
Islam MdJ, Nguyen G, Pan R, Rajan H (2019) A comprehensive study on deep learning bug characteristics. In: Proceedings of the 2019 27th ACM joint meeting on european software engineering conference and symposium on the foundations of software engineering, pp 510–520
Islam MdJ, Pan R, Nguyen G, Rajan H (2020) Repairing deep neural networks: fix patterns and challenges. In: 2020 IEEE/ACM 42nd international conference on software engineering (ICSE), pp 1135–1146. IEEE
Jia Y, Shelhamer E, Donahue J, Karayev S, Long J, Girshick R, Guadarrama S, Darrell T (2014) Caffe: convolutional architecture for fast feature embedding. In: Proceedings of the 22nd ACM international conference on multimedia, pp 675–678
Just R, Jalali D, Ernst MD (2014) Defects4j: a database of existing faults to enable controlled testing studies for java programs. In: Proceedings of the 2014 international symposium on software testing and analysis, pp 437–440
Kechagia M, Devroey X, Panichella A, Gousios G, van Deursen A (2019) Effective and efficient api misuse detection via exception propagation and search-based testing. In: Proceedings of the 28th ACM SIGSOFT international symposium on software testing and analysis, pp 192–203
Kechagia M, Mechtaev S, Sarro F, Harman M (2021) Evaluating automatic program repair capabilities to repair api misuses. IEEE Trans Softw Eng
Kuutti S, Bowden R, Jin Y, Barber P, Fallah S (2020) A survey of deep learning applications to autonomous vehicle control. IEEE Trans Intell Trans Syst 22(2):712–733
Article Google Scholar
Kwasnik BH (1999) The role of classification in knowledge representation and discovery
Lamothe M, Guéhéneuc Y-G, Shang W (2021) A systematic review of api evolution literature. ACM Comput Surv (CSUR) 54(8):1–36
Article Google Scholar
Lamothe M, Li H, Shang W (2021) Assisting example-based api misuse detection via complementary artificial examples. IEEE Trans Softw Eng
Landis JR, Koch GG (1977) The measurement of observer agreement for categorical data. Biometrics, pp 159–174
Li X, Jiang J, Benton S, Xiong Y, Zhang L (2021) A large-scale study on api misuses in the wild. In: 2021 14th IEEE conference on software testing, verification and validation (ICST), pp 241–252. IEEE
Liu Y, Liu G, Zhang Q (2019) Deep learning and medical diagnosis. Lancet 394(10210):1709–1710
Article Google Scholar
Liu K, Kim D, Koyuncu A, Li L, Bissyandé TF, Le Traon Y (2018) A closer look at real-world patches. In 2018 IEEE International Conference on Software Maintenance and Evolution (ICSME), p 275–286. IEEE
Mama R (2021) Example of a misused api with incorrect parameter. https://github.com/Rayhane-mamah/Tacotron-2/commit/0ae2901b428afd4127272154b71705e2799a484d. Accessed June 2021
Mamah R (2023) The example of inner api misuse in dl application. https://github.com/Rayhane-mamah/Tacotron-2/commit/fb5564b7584ae0dc62ffecaa89d463ff24a3c251. Accessed Aug 2023
McHugh ML (2012) Interrater reliability: the kappa statistic. Biochem Med 22(3):276–282
Article MathSciNet Google Scholar
Meijer E (2018) Behind every great deep learning framework is an even greater programming languages concept (keynote). In: Proceedings of the 2018 26th ACM joint meeting on European software engineering conference and symposium on the foundations of software engineering, pp 1–1
mypy. https://github.com/python/mypy. Accessed June 2021
Nielebock S, Heumüller R, Schott KM, Ortmeier F (2020) Guided pattern mining for api misuse detection by change-based code analysis. arXiv:2008.00277
Paszke A, Gross S, Chintala S, Chanan G, Yang E, DeVito Z, Lin Z, Desmaison A, Antiga L, Lerer A (2017) Automatic differentiation in pytorch
pylint. https://github.com/PyCQA/pylint. Accessed June 2021
pyre-check. https://github.com/facebook/pyre-check. Accessed June 2021
pyright. https://github.com/microsoft/pyright/. Accessed June 2021
Python standard library. https://docs.python.org/3/library/. Accessed June 2021
Ren X, Ye X, Xing Z, Xia X, Xu X, Zhu L, Sun J (2020) Api-misuse detection driven by fine-grained api-constraint knowledge graph. In: 2020 35th IEEE/ACM international conference on automated software engineering (ASE), pp 461–472. IEEE
Scalabrino S, Bavota G, Linares-Vásquez M, Lanza M, Oliveto R (2019) Data-driven solutions to detect api compatibility issues in android: an empirical study. In: 2019 IEEE/ACM 16th international conference on mining software repositories (MSR), pp 288–298. IEEE
Shen Q, Ma H, Chen J, Tian Y, Cheung S-C, Chen X (2021) A comprehensive study of deep learning compiler bugs. In: Proceedings of the 29th ACM joint meeting on European software engineering conference and symposium on the foundations of software engineering, pp 968–980
Šmite D, Wohlin C, Galviņa Z, Prikladnicki R (2014) An empirically based terminology and taxonomy for global software engineering. Empir Softw Eng 19(1):105–153
Article Google Scholar
Svyatkovskiy A, Deng SK, Fu S, Sundaresan N (2020) Intellicode compose: code generation using transformer. In: Proceedings of the 28th ACM joint meeting on European software engineering conference and symposium on the foundations of software engineering, pp 1433–1443
Tensorflow repositories in githubs. https://github.com/search?q=tensorflow &type=. Accessed June 2021
The manual verification results for api bugs provided by Islam et al. https://zenodo.org/record/8302351. Accessed Aug 2023
Unterkalmsteiner M, Feldt R, Gorschek T (2014) A taxonomy for requirements engineering and software test alignment. ACM Trans Softw Engi Methodol (TOSEM) 23(2):1–38
Article Google Scholar
Usman M, Britto R, Börstler J, Mendes E (2017) Taxonomies in software engineering: a systematic mapping study and a revised taxonomy development method. Inf Softw Technol 85:43–59
Article Google Scholar
Usman M, Gopinath D, Sun Y, Noller Y, Păsăreanu CS (2021) Nn repair: constraint-based repair of neural network classifiers. In: Computer aided verification: 33rd international conference, CAV 2021, Virtual Event, July 20–23, 2021, Proceedings, Part I 33, pp 3–25. Springer
Vélez TC, Khatchadourian R, Bagherzadeh M, Raja A (2022) Challenges in migrating imperative deep learning programs to graph execution: an empirical study. In: Proceedings of the 19th international conference on mining software repositories, pp 469–481
Wan C, Liu S, Hoffmann H, Maire M, Lu S (2021) Are machine learning cloud apis used correctly? In: 2021 IEEE/ACM 43rd international conference on software engineering (ICSE), pp 125–137. IEEE
Wardat M, Cruz BD, Le W, Rajan H (2022) Deepdiagnosis: automatically diagnosing faults and recommending actionable fixes in deep learning programs. In: Proceedings of the 44th international conference on software engineering, pp 561–572
Wardat M, Le W, Rajan H (2021) Deeplocalize: fault localization for deep neural networks. In 2021 IEEE/ACM 43rd international conference on software engineering (ICSE), p 251–262. IEEE
Wen M, Liu Y, Wu R, Xie X, Cheung S-C, Su Z (2019) Exposing library api misuses via mutation analysis. In: 2019 IEEE/ACM 41st international conference on software engineering (ICSE), pp 866–877. IEEE
Wohlin C, Runeson P, Höst M, Ohlsson MC, Regnell B, Wesslén A (2012) Experimentation in software engineering. Springer Science & Business Media
Wu D, Shen B, Chen Y (2021) An empirical study on tensor shape faults in deep learning systems. arXiv:2106.02887
Yan M, Chen J, Zhang X, Tan L, Wang G, Wang Z (2021) Exposing numerical bugs in deep learning via gradient back-propagation. In: Proceedings of the 29th ACM joint meeting on European software engineering conference and symposium on the foundations of software engineering, pp 627–638
Yang Y, Xia X, Lo D, Grundy J (2020) A survey on deep learning for software engineering. arXiv:2011.14597
Yu B, Qi H, Guo Q, Juefei-Xu F, Xie X, Ma L, Zhao J (2021) Deeprepair: style-guided repairing for deep neural networks in the real-world operational environment. IEEE Trans Reliab 71(4):1401–1416
Article Google Scholar
Zar JH (2005) Spearman rank correlation. Encyclopedia of Biostatistics, 7
Zhang Y, Chen Y, Cheung S-C, Xiong Y, Zhang L (2018) An empirical study on tensorflow program bugs. In: Proceedings of the 27th ACM SIGSOFT international symposium on software testing and analysis, pp 129–140
Zhang T, Gao C, Ma L, Lyu M, Kim M (2019) An empirical study of common challenges in developing deep learning applications. In: 2019 IEEE 30th international symposium on software reliability engineering (ISSRE), pp 104–115. IEEE
Zhang T, Upadhyaya G, Reinhardt A, Rajan H, Kim M (2018) Are code examples on an online q &a forum reliable? a study of api misuse on stack overflow. In: Proceedings of the 40th international conference on software engineering, pp 886–896
Zhang T, Upadhyaya G, Reinhardt A, Rajan H, Kim M (2018) Are online code examples reliable? an empirical study of api misuse on stack overflow. In: International conference on software engineering (ICSE), vol 10
Zhang R, Xiao W, Zhang H, Liu Y, Lin H, Yang M (2020) An empirical study on program failures of deep learning jobs. In: Proceedings of the ACM/IEEE 42nd international conference on software engineering, pp 1159–1170
Zhong H, Su Z (2015) An empirical study on real bug fixes. In: 2015 IEEE/ACM 37th IEEE international conference on software engineering, vol 1, pp 913–923. IEEE

Download references

Acknowledgements

This research was partially supported by the National Natural Science Foundation of China (Nos. 62172214, 62272072), the Natural Science Foundation of Jiangsu Province, China (BK20210279), and the Major Key Projectof PCL (No. PCL2021A06).

Author information

Authors and Affiliations

College of Computer, National University of Defense Technology, Changsha, China
Deheng Yang & Xiaoguang Mao
Huawei Software Engineering Application Technology Lab, Ningbo, China
Kui Liu
School of Big Data and Software Engineering, Chongqing University, Chongqing, China
Yan Lei, Huan Xie, Chunyan Liu & Zhenyu Wang
Peng Cheng Laboratory, ShenZhen, China
Yan Lei, Huan Xie & Chunyan Liu
School of Big Data and Software Engineering, Chongqing University, Chongqing, China
Li Li
Interdisciplinary Centre for Security, Reliability and Trust, University of Luxembourg, Luxembourg, Luxembourg
Tegawendé F. Bissyandé
School of Software, Beihang University, Beijing, China
Tegawendé F. Bissyandé

Authors

Deheng Yang
View author publications
You can also search for this author in PubMed Google Scholar
Kui Liu
View author publications
You can also search for this author in PubMed Google Scholar
Yan Lei
View author publications
You can also search for this author in PubMed Google Scholar
Li Li
View author publications
You can also search for this author in PubMed Google Scholar
Huan Xie
View author publications
You can also search for this author in PubMed Google Scholar
Chunyan Liu
View author publications
You can also search for this author in PubMed Google Scholar
Zhenyu Wang
View author publications
You can also search for this author in PubMed Google Scholar
Xiaoguang Mao
View author publications
You can also search for this author in PubMed Google Scholar
Tegawendé F. Bissyandé
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yan Lei.

Ethics declarations

Conflict of interests/Competing interests

The authors have no relevant financial or non-financial interests to disclose.

Ethics approval

No ethics approval was required for this paper.

Additional information

Communicated by: Denys Poshyvanyk.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Yang, D., Liu, K., Lei, Y. et al. Demystifying API misuses in deep learning applications. Empir Software Eng 29, 45 (2024). https://doi.org/10.1007/s10664-023-10413-9

Download citation

Accepted: 18 October 2023
Published: 16 February 2024
DOI: https://doi.org/10.1007/s10664-023-10413-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Demystifying API misuses in deep learning applications

Abstract

Access this article

Similar content being viewed by others

Data collection and quality challenges in deep learning: a data-centric AI perspective

Future of software development with generative AI

Applications of AI in classical software engineering

Data Availability

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interests/Competing interests

Ethics approval

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Demystifying API misuses in deep learning applications

Abstract

Access this article

Similar content being viewed by others

Data collection and quality challenges in deep learning: a data-centric AI perspective

Future of software development with generative AI

Applications of AI in classical software engineering

Data Availability

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interests/Competing interests

Ethics approval

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation