skip to main content
10.1145/3544902.3546246acmconferencesArticle/Chapter ViewAbstractPublication PagesesemConference Proceedingsconference-collections
research-article

Towards Demystifying the Impact of Dependency Structures on Bug Locations in Deep Learning Libraries

Published: 19 September 2022 Publication History

Abstract

Background: Many safety-critical industrial applications have turned to deep learning systems as a fundamental component. Most of these systems rely on deep learning libraries, and bugs of such libraries can have irreparable consequences. Aims: Over the years, dependency structure has shown to be a practical indicator of software quality, widely used in numerous bug prediction techniques. The problem is that when analyzing bugs in deep learning libraries, researchers are unclear whether dependency structures still have a high correlation and which forms of dependency structures perform the best. Method: In this paper, we present a systematic investigation of the above question and implement a dependency structure-centric bug analysis tool: Depend4BL, capturing the interaction between dependency structures and bug locations in deep learning libraries. Results: We employ Depend4BL to analyze the top 5 open-source deep learning libraries on Github in terms of stars and forks, with 279,788 revision commits and 8,715 bug fixes. The results demonstrate the significant differences among syntactic, history, and semantic structures, and their vastly different impacts on bug locations. Their combinations have the potential to further improve bug prediction for deep learning libraries. Conclusions: In summary, our work provides a new perspective regarding to the correlation between dependency structures and bug locations in deep learning libraries. We release a large set of benchmarks and a prototype toolkit to automatically detect various forms of dependency structures for deep learning libraries. Our study also unveils useful findings based on quantitative and qualitative analysis that benefit bug prediction techniques for deep learning libraries.

References

[1]
2022. a775e0c. https://github.com/tensorflow/tensorflow/commit/a775e0c
[2]
2022. ANTLR. https://github.com/antlr/antlr4
[3]
2022. Benchmark and Toolkit. https://anonymous.4open.science/r/ESEM22-Data-038D
[4]
2022. Depends. https://github.com/multilang-depends/depends
[5]
2022. Git. https://git-scm.com
[6]
2022. GumTree. https://github.com/GumTreeDiff/gumtree
[7]
2022. keras. https://keras.io/
[8]
2022. List of self-driving car fatalities. https://en.wikipedia.org/wiki/Self-driving_car#cite_note-15
[9]
2022. Networkx. https://networkx.org
[10]
2022. SVN. https://subversion.apache.org
[11]
2022. Uber is giving up on self-driving cars in California after deadly crash.https://www.vice.com/en_us/article/9kga85/uber-is-giving-up-on-self-driving-cars-in-california-after-deadly-crash
[12]
Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, 2016. {TensorFlow}: A System for {Large-Scale} Machine Learning. In 12th USENIX symposium on operating systems design and implementation (OSDI 16). 265–283.
[13]
Rami Al-Rfou, Guillaume Alain, Amjad Almahairi, Christof Angermueller, Dzmitry Bahdanau, Nicolas Ballas, Frédéric Bastien, Justin Bayer, Anatoly Belikov, Alexander Belopolsky, 2016. Theano: A Python framework for fast computation of mathematical expressions. arXiv e-prints (2016), arXiv–1605.
[14]
Gabriele Bavota, Bogdan Dit, Rocco Oliveto, Massimiliano Di Penta, Denys Poshyvanyk, and Andrea De Lucia. 2013. An empirical study on the developers’ perception of software coupling. In Proceedings of the 2013 International Conference on Software Engineering. IEEE Press, 692–701.
[15]
Marcelo Cataldo, Audris Mockus, Jeffrey A Roberts, and James D Herbsleb. 2009. Software dependencies, work dependencies, and their impact on failures. IEEE Transactions on Software Engineering 35, 6 (2009), 864–878.
[16]
Nitesh V Chawla, Kevin W Bowyer, Lawrence O Hall, and W Philip Kegelmeyer. 2002. SMOTE: synthetic minority over-sampling technique. Journal of artificial intelligence research 16 (2002), 321–357.
[17]
Chenyi Chen, Ari Seff, Alain Kornhauser, and Jianxiong Xiao. 2015. Deepdriving: Learning affordance for direct perception in autonomous driving. In Proceedings of the IEEE international conference on computer vision. 2722–2730.
[18]
Tianqi Chen, Tong He, Michael Benesty, Vadim Khotilovich, Yuan Tang, Hyunsu Cho, 2015. Xgboost: extreme gradient boosting. R package version 0.4-2 1, 4 (2015), 1–4.
[19]
Kyunghyun Cho, Bart Van Merriënboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2014. Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078(2014).
[20]
Di Cui, Ting Liu, Yuanfang Cai, Qinghua Zheng, Qiong Feng, Wuxia Jin, Jiaqi Guo, and Yu Qu. 2019. Investigating the impact of multiple dependency structures on software defects. In Proceedings of the 41st International Conference on Software Engineering. IEEE Press, 584–595.
[21]
Daniel Alencar Da Costa, Shane McIntosh, Weiyi Shang, Uirá Kulesza, Roberta Coelho, and Ahmed E Hassan. 2016. A framework for evaluating the results of the szz approach for identifying bug-introducing changes. IEEE Transactions on Software Engineering 43, 7 (2016), 641–657.
[22]
Lingling Fan, Ting Su, Sen Chen, Guozhu Meng, Yang Liu, Lihua Xu, Geguang Pu, and Zhendong Su. 2018. Large-scale analysis of framework-specific exceptions in Android apps. In 2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE). IEEE, 408–419.
[23]
Ross Girshick. 2015. Fast r-cnn. In Proceedings of the IEEE international conference on computer vision. 1440–1448.
[24]
Todd L Graves, Alan F Karr, James S Marron, and Harvey Siy. 2000. Predicting fault incidence using software change history. IEEE Transactions on software engineering 26, 7 (2000), 653–661.
[25]
Aditya Grover and Jure Leskovec. 2016. node2vec: Scalable feature learning for networks. In Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining. 855–864.
[26]
Kim Herzig. 2013. The Impact of Tangled Code Changes. In Working Conference on Mining Software Repositories.
[27]
Kim Herzig, Sascha Just, and Andreas Zeller. 2013. It’s not a bug, it’s a feature: how misclassification impacts bug prediction. In 2013 35th international conference on software engineering (ICSE). IEEE, 392–401.
[28]
David W Hosmer Jr, Stanley Lemeshow, and Rodney X Sturdivant. 2013. Applied logistic regression. Vol. 398. John Wiley & Sons.
[29]
Nargiz Humbatova, Gunel Jahangirova, Gabriele Bavota, Vincenzo Riccio, Andrea Stocco, and Paolo Tonella. 2020. Taxonomy of real faults in deep learning systems. In Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering. 1110–1121.
[30]
Md Johirul Islam, Giang Nguyen, Rangeet Pan, and Hridesh Rajan. 2019. A comprehensive study on deep learning bug characteristics. In Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 510–520.
[31]
Md Johirul Islam, Rangeet Pan, Giang Nguyen, and Hridesh Rajan. 2020. Repairing deep neural networks: Fix patterns and challenges. In 2020 IEEE/ACM 42nd International Conference on Software Engineering (ICSE). IEEE, 1135–1146.
[32]
Yangqing Jia, Evan Shelhamer, Jeff Donahue, Sergey Karayev, Jonathan Long, Ross Girshick, Sergio Guadarrama, and Trevor Darrell. 2014. Caffe: Convolutional architecture for fast feature embedding. In Proceedings of the 22nd ACM international conference on Multimedia. 675–678.
[33]
N. Kambhatla. 2004. Combining lexical, syntactic, and semantic features with maximum entropy models for information extraction. In Annual Meeting of Association of Computational Linguistics, 2004.
[34]
Richard M Karp. 1975. On the computational complexity of combinatorial problems. Networks 5, 1 (1975), 45–68.
[35]
Donghwa Kim, Deokseong Seo, Suhyoun Cho, and Pilsung Kang. 2019. Multi-co-training for document classification using various document representations: TF–IDF, LDA, and Doc2Vec. Information Sciences 477(2019), 15–29.
[36]
JH Lau and T Baldwin. 2019. An Empirical Evaluation of doc2vec with Practical Insights into Document Embedding Generation, July 2016.
[37]
Duc Minh Le, Pooyan Behnamghader, Joshua Garcia, Daniel Link, Arman Shahbazian, and Nenad Medvidovic. 2015. An empirical study of architectural change in open-source software systems. In Proceedings of the 12th Working Conference on Mining Software Repositories. IEEE Press, 235–245.
[38]
Siqi Liu, Sidong Liu, Weidong Cai, Sonia Pujol, Ron Kikinis, and Dagan Feng. 2014. Early diagnosis of Alzheimer’s disease with deep learning. In 2014 IEEE 11th international symposium on biomedical imaging (ISBI). IEEE, 1015–1018.
[39]
Ran Mo, Yuanfang Cai, Rick Kazman, and Lu Xiao. 2015. Hotspot patterns: The formal definition and automatic detection of architecture smells. In Software Architecture (WICSA), 2015 12th Working IEEE/IFIP Conference on. IEEE, 51–60.
[40]
Ran Mo and Mengya Zhan. 2019. History coupling space: A new model to represent evolutionary relations. In 2019 26th Asia-Pacific Software Engineering Conference (APSEC). IEEE, 126–133.
[41]
Ruihui Mu and Xiaoqin Zeng. 2019. A review of deep learning research. KSII Transactions on Internet and Information Systems (TIIS) 13, 4(2019), 1738–1764.
[42]
Nachiappan Nagappan, Thomas Ball, and Andreas Zeller. 2006. Mining metrics to predict component failures. In Proceedings of the 28th international conference on Software engineering. ACM, 452–461.
[43]
William S Noble. 2006. What is a support vector machine?Nature biotechnology 24, 12 (2006), 1565–1567.
[44]
Mahesh Pal. 2005. Random forest classifier for remote sensing classification. International journal of remote sensing 26, 1 (2005), 217–222.
[45]
Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, 2019. Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems 32 (2019).
[46]
Yu Qu, Xiaohong Guan, Qinghua Zheng, Ting Liu, Lidan Wang, Yuqiao Hou, and Zijiang Yang. 2015. Exploring community structure of software Call Graph and its applications in class cohesion measurement. Journal of Systems and Software 108 (2015), 193–210.
[47]
Yu Qu, Ting Liu, Jianlei Chi, Yangxu Jin, Di Cui, Ancheng He, and Qinghua Zheng. 2018. node2defect: using network embedding to improve software defect prediction. In 2018 33rd IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 844–849.
[48]
Nils Reimers and Iryna Gurevych. 2019. Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics. https://arxiv.org/abs/1908.10084
[49]
Irina Rish 2001. An empirical study of the naive Bayes classifier. In IJCAI 2001 workshop on empirical methods in artificial intelligence, Vol. 3. 41–46.
[50]
S Rasoul Safavian and David Landgrebe. 1991. A survey of decision tree classifier methodology. IEEE transactions on systems, man, and cybernetics 21, 3(1991), 660–674.
[51]
Richard W. Selby and Victor R. Basili. 1991. Analyzing error-prone system structure. IEEE Transactions on Software Engineering 17, 2 (1991), 141–152.
[52]
Qingchao Shen, Haoyang Ma, Junjie Chen, Yongqiang Tian, Shing-Chi Cheung, and Xiang Chen. 2021. A comprehensive study of deep learning compiler bugs. In Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 968–980.
[53]
Chakkrit Tantithamthavorn, Shane McIntosh, Ahmed E Hassan, and Kenichi Matsumoto. 2018. The impact of automated parameter optimization on defect prediction models. IEEE Transactions on Software Engineering 45, 7 (2018), 683–711.
[54]
Ferdian Thung, Shaowei Wang, David Lo, and Lingxiao Jiang. 2012. An empirical study of bugs in machine learning systems. In 2012 IEEE 23rd International Symposium on Software Reliability Engineering. IEEE, 271–280.
[55]
Vassilios Tzerpos and Richard C Holt. 2000. Accd: an algorithm for comprehension-driven clustering. In Proceedings Seventh Working Conference on Reverse Engineering. IEEE, 258–267.
[56]
Vassilios Tzerpos and Richard C Holt. 2000. ACDC: An Algorithm for Comprehension-Driven Clustering. In wcre. 258–267.
[57]
Song Wang, Taiyue Liu, and Lin Tan. 2016. Automatically learning semantic features for defect prediction. In Ieee/acm International Conference on Software Engineering. 297–308.
[58]
Sunny Wong and Yuanfang Cai. 2011. Generalizing evolutionary coupling with stochastic dependencies. In Ieee/acm International Conference on Automated Software Engineering. 293–302.
[59]
Lu Xiao, Yuanfang Cai, and Rick Kazman. 2014. Titan: A toolset that connects software architecture with quality analysis. In Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering. ACM, 763–766.
[60]
Ming Yan, Junjie Chen, Xiangyu Zhang, Lin Tan, Gan Wang, and Zan Wang. 2021. Exposing numerical bugs in deep learning via gradient back-propagation. In Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 627–638.
[61]
Ru Zhang, Wencong Xiao, Hongyu Zhang, Yu Liu, Haoxiang Lin, and Mao Yang. 2020. An empirical study on program failures of deep learning jobs. In 2020 IEEE/ACM 42nd International Conference on Software Engineering (ICSE). IEEE, 1159–1170.
[62]
Yuhao Zhang, Yifan Chen, Shing-Chi Cheung, Yingfei Xiong, and Lu Zhang. 2018. An empirical study on TensorFlow program bugs. In Proceedings of the 27th ACM SIGSOFT International Symposium on Software Testing and Analysis. 129–140.
[63]
Hao Zhong and Zhendong Su. 2015. An empirical study on real bug fixes. In Proceedings of the 37th International Conference on Software Engineering-Volume 1. IEEE Press, 913–923.
[64]
Thomas Zimmermann and Nachiappan Nagappan. 2008. Predicting defects using network analysis on dependency graphs. In Proceedings of the 30th international conference on Software engineering. 531–540.

Index Terms

  1. Towards Demystifying the Impact of Dependency Structures on Bug Locations in Deep Learning Libraries

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      ESEM '22: Proceedings of the 16th ACM / IEEE International Symposium on Empirical Software Engineering and Measurement
      September 2022
      318 pages
      ISBN:9781450394277
      DOI:10.1145/3544902
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 19 September 2022

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. Bug Prediction.
      2. Deep Learning System
      3. Dependency Structure

      Qualifiers

      • Research-article
      • Research
      • Refereed limited

      Conference

      ESEM '22
      Sponsor:

      Acceptance Rates

      Overall Acceptance Rate 130 of 594 submissions, 22%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • 0
        Total Citations
      • 165
        Total Downloads
      • Downloads (Last 12 months)36
      • Downloads (Last 6 weeks)1
      Reflects downloads up to 17 Feb 2025

      Other Metrics

      Citations

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      HTML Format

      View this article in HTML Format.

      HTML Format

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media