skip to main content
10.1145/3551349.3556931acmotherconferencesArticle/Chapter ViewAbstractPublication PagesaseConference Proceedingsconference-collections
research-article

How Useful is Code Change Information for Fault Localization in Continuous Integration?

Published: 05 January 2023 Publication History

Abstract

Continuous integration (CI) is the process in which code changes are automatically integrated, built, and tested in a shared repository. In CI, developers frequently merge and test code under development, which helps isolate faults with finer-grained change information. To identify faulty code, prior research has widely studied and evaluated the performance of spectrum-based fault localization (SBFL) techniques. While the continuous nature of CI requires the code changes to be atomic and presents fine-grained information on what part of the system is being changed, traditional SBFL techniques do not benefit from it. To overcome the limitation, we propose to integrate the code and coverage change information in fault localization under CI settings. First, code changes show how faults are introduced into the system, and provide developers with better understanding on the root cause. Second, coverage changes show how the code coverage is impacted when faults are introduced. This change information can help limit the search space of code coverage, which offers more opportunities for improving fault localization techniques. Based on the above observations, we propose three new change-based fault localization techniques, and compare them with Ochiai, a commonly used SBFL technique. We evaluate these techniques on 192 real faults from seven software systems. Our results show that all three change-based techniques outperform Ochiai on the Defects4J dataset. In particular, the improvement varies from 7% to 23% and 17% to 24% for average MAP and MRR, respectively. Moreover, we find that our change-based fault localization techniques can be integrated with Ochiai, and boost its performance by up to 53% and 52% for average MAP and MRR, respectively.

References

[1]
2021. Cobertura. https://cobertura.github.io/cobertura/. Last accessed May 5 2021.
[2]
2021. JaCoCo. https://www.eclemma.org/jacoco/. Last accessed May 5 2021.
[3]
2022. Deflaker. https://www.deflaker.org/. Last accessed February 28 2022.
[4]
2022. GZoltar. https://gzoltar.com/. Last accessed February 28 2022.
[5]
2022. Leveraging-Change-Information repository. https://github.com/anonymized-datascientist/Leveraging-Change-Information. Last accessed March 7 2022.
[6]
Rui Abreu, Peter Zoeteweij, and Arjan J. C. van Gemund. 2009. Spectrum-Based Multiple Fault Localization. In Proceedings of the IEEE/ACM International Conference on Automated Software Engineering(ASE ’09). 88–99.
[7]
Rui Abreu, Peter Zoeteweij, Rob Golsteijn, and Arjan JC Van Gemund. 2009. A practical evaluation of spectrum-based fault localization. Journal of Systems and Software 82, 11 (2009), 1780–1792.
[8]
Rui Abreu, Peter Zoeteweij, Rob Golsteijn, and Arjan J. C. van Gemund. 2009. A Practical Evaluation of Spectrum-based Fault Localization. Journal of Systems and Software 82, 11 (Nov. 2009), 1780–1792.
[9]
Rui Abreu, Peter Zoeteweij, and Arjan JC Van Gemund. 2007. On the accuracy of spectrum-based fault localization. In Testing: Academic and industrial conference practice and research techniques-MUTATION (TAICPART-MUTATION 2007). IEEE, 89–98.
[10]
Rui Abreu, Peter Zoeteweij, and Arjan J. C. van Gemund. 2007. On the Accuracy of Spectrum-based Fault Localization. In Proceedings of the Testing: Academic and Industrial Conference Practice and Research Techniques - MUTATION(TAICPART-MUTATION ’07). 89–98.
[11]
Elton Alves, Milos Gligoric, Vilas Jagannath, and Marcelo d’Amorim. 2011. Fault-localization using dynamic slicing and change impact analysis. In 2011 26th IEEE/ACM International Conference on Automated Software Engineering (ASE 2011). IEEE, 520–523.
[12]
Jonathan Bell, Owolabi Legunsen, Michael Hilton, Lamyaa Eloussi, Tifany Yung, and Darko Marinov. 2018. DeFlaker: Automatically detecting flaky tests. In 2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE). IEEE, 433–444.
[13]
Moritz Beller, Georgios Gousios, and Andy Zaidman. 2017. Oops, my tests broke the build: An explorative analysis of travis ci with github. In 2017 IEEE/ACM 14th International Conference on Mining Software Repositories (MSR). IEEE, 356–367.
[14]
Marcel Böhme and Abhik Roychoudhury. 2014. Corebench: Studying complexity of regression errors. In Proceedings of the 2014 international symposium on software testing and analysis. 105–115.
[15]
An Ran Chen, Tse-Hsun Peter Chen, and Shaowei Wang. 2021. Pathidea: Improving information retrieval-based bug localization by re-constructing execution paths using logs. IEEE Transactions on Software Engineering(2021).
[16]
Junjie Chen, Jiaqi Han, Peiyi Sun, Lingming Zhang, Dan Hao, and Lu Zhang. 2019. Compiler bug isolation via effective witness test program generation. In Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 223–234.
[17]
Junjie Chen, Haoyang Ma, and Lingming Zhang. 2020. Enhanced compiler bug isolation via memoized search. In Proceedings of the 35th IEEE/ACM International Conference on Automated Software Engineering. 78–89.
[18]
Arpit Christi, Matthew Lyle Olson, Mohammad Amin Alipour, and Alex Groce. 2018. Reduce before you localize: Delta-debugging and spectrum-based fault localization. In 2018 IEEE International Symposium on Software Reliability Engineering Workshops (ISSREW). IEEE, 184–191.
[19]
Jackson Antonio do Prado Lima and Silvia Regina Vergilio. 2020. A multi-armed bandit approach for test case prioritization in continuous integration environments. IEEE Transactions on Software Engineering(2020).
[20]
Sebastian Elbaum, Gregg Rothermel, and John Penix. 2014. Techniques for improving regression testing in continuous integration development environments. In Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering. 235–245.
[21]
Dror G Feitelson, Eitan Frachtenberg, and Kent L Beck. 2013. Development and deployment at facebook. IEEE Internet Computing 17, 4 (2013), 8–17.
[22]
Luca Gazzola, Daniela Micucci, and Leonardo Mariani. 2017. Automatic software repair: A survey. IEEE Transactions on Software Engineering 45, 1 (2017), 34–67.
[23]
Michael Hilton, Jonathan Bell, and Darko Marinov. 2018. A large-scale study of test coverage evolution. In Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering. 53–63.
[24]
Michael Hilton, Timothy Tunnell, Kai Huang, Darko Marinov, and Danny Dig. 2016. Usage, Costs, and Benefits of Continuous Integration in Open-source Projects. In Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering(ASE 2016). 426–437.
[25]
[25] JavaParser.2019. https://javaparser.org/. Last accessed July 1 2020.
[26]
Jiajun Jiang, Ran Wang, Yingfei Xiong, Xiangping Chen, and Lu Zhang. 2019. Combining spectrum-based fault localization and statistical debugging: An empirical study. In 2019 34th IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 502–514.
[27]
Yanjie Jiang, Hui Liu, Nan Niu, Lu Zhang, and Yamin Hu. 2021. Extracting concise bug-fixing patches from human-written patches in version control systems. In 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE). IEEE, 686–698.
[28]
James A Jones, Mary Jean Harrold, and John Stasko. 2002. Visualization of test information to assist fault localization. In Proceedings of the 24th International Conference on Software Engineering. ICSE 2002. IEEE, 467–477.
[29]
René Just, Darioush Jalali, and Michael D Ernst. 2014. Defects4J: A database of existing faults to enable controlled testing studies for Java programs. In Proceedings of the 2014 International Symposium on Software Testing and Analysis. 437–440.
[30]
Pavneet Singh Kochhar, Xin Xia, David Lo, and Shanping Li. 2016. Practitioners’ expectations on automated fault localization. In Proceedings of the 25th International Symposium on Software Testing and Analysis. 165–176.
[31]
Adriaan Labuschagne, Laura Inozemtseva, and Reid Holmes. 2017. Measuring the cost of regression testing in practice: A study of Java projects using continuous integration. In Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering. 821–830.
[32]
Tien-Duy B Le, Ferdian Thung, and David Lo. 2013. Theory and practice, do they match? a case with spectrum-based fault localization. In 2013 IEEE International Conference on Software Maintenance. IEEE, 380–383.
[33]
Xia Li, Wei Li, Yuqun Zhang, and Lingming Zhang. 2019. Deepfl: Integrating multiple fault diagnosis dimensions for deep fault localization. In Proceedings of the 28th ACM SIGSOFT International Symposium on Software Testing and Analysis. 169–180.
[34]
Yi Li, Shaohua Wang, and Tien N Nguyen. 2021. Fault localization with code coverage representation learning. In 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE). IEEE, 661–673.
[35]
Yiling Lou, Ali Ghanbari, Xia Li, Lingming Zhang, Haotian Zhang, Dan Hao, and Lu Zhang. 2020. Can automated program repair refine fault localization? a unified debugging approach. In Proceedings of the 29th ACM SIGSOFT International Symposium on Software Testing and Analysis. 75–87.
[36]
Lucia Lucia, David Lo, Lingxiao Jiang, Ferdian Thung, and Aditya Budi. 2014. Extended comprehensive study of association measures for fault localization. Journal of software: Evolution and Process 26, 2 (2014), 172–219.
[37]
Wes Masri. 2010. Fault localization based on information flow coverage. Software Testing, Verification and Reliability 20, 2(2010), 121–147.
[38]
Raimund Moser, Witold Pedrycz, and Giancarlo Succi. 2008. A comparative analysis of the efficiency of change metrics and static code attributes for defect prediction. In Proceedings of the 30th international conference on Software engineering. 181–190.
[39]
Manish Motwani and Yuriy Brun. 2020. Automatically repairing programs using both tests and bug reports. arXiv preprint arXiv:2011.08340(2020).
[40]
Nachiappan Nagappan and Thomas Ball. 2005. Use of relative code churn measures to predict system defect density. In Proceedings of the 27th international conference on Software engineering (St. Louis, MO, USA) (ICSE ’05). ACM, New York, NY, USA, 284–292.
[41]
Steve Neely and Steve Stolt. 2013. Continuous delivery? easy! just change everything (well, maybe it is not that easy). In 2013 Agile Conference. IEEE, 121–128.
[42]
Spencer Pearson, José Campos, René Just, Gordon Fraser, Rui Abreu, Michael D Ernst, Deric Pang, and Benjamin Keller. 2017. Evaluating and improving fault localization. In 2017 IEEE/ACM 39th International Conference on Software Engineering (ICSE). IEEE, 609–620.
[43]
Mojtaba Shahin, Muhammad Ali Babar, and Liming Zhu. 2017. Continuous integration, delivery and deployment: a systematic review on approaches, tools, challenges and practices. IEEE Access 5(2017), 3909–3943.
[44]
Jeongju Sohn and Shin Yoo. 2017. Fluccs: Using code and change metrics to improve fault localization. In Proceedings of the 26th ACM SIGSOFT International Symposium on Software Testing and Analysis. 273–283.
[45]
Xuezhi Song, Yun Lin, Siang Hwee Ng, Ping Yu, Xin Peng, and Jin Song Dong. 2021. Constructing Regression Dataset from Code Evolution History. arXiv preprint arXiv:2109.12389(2021).
[46]
Matúš Sulír and Jaroslav Porubän. 2016. A quantitative study of java software buildability. In Proceedings of the 7th International Workshop on Evaluation and Usability of Programming Languages and Tools. 17–25.
[47]
Michele Tufano, Fabio Palomba, Gabriele Bavota, Massimiliano Di Penta, Rocco Oliveto, Andrea De Lucia, and Denys Poshyvanyk. 2017. There and back again: Can you compile that snapshot?Journal of Software: Evolution and Process 29, 4 (2017), e1838.
[48]
Bogdan Vasilescu, Yue Yu, Huaimin Wang, Premkumar Devanbu, and Vladimir Filkov. 2015. Quality and productivity outcomes relating to continuous integration in GitHub. In Proceedings of the 2015 10th joint meeting on foundations of software engineering. 805–816.
[49]
Shaowei Wang and David Lo. 2014. Version history, similar report, and structure: Putting them together for improved bug localization. In Proceedings of the 22nd International Conference on Program Comprehension. 53–63.
[50]
Shaowei Wang and David Lo. 2016. AmaLgam+: Composing Rich Information Sources for Accurate Bug Localization. Journal of Software: Evolution and Process 28, 10 (2016), 921–942.
[51]
Shaowei Wang and David Lo. 2016. Amalgam+: Composing rich information sources for accurate bug localization. Journal of Software: Evolution and Process 28, 10 (2016), 921–942.
[52]
Xinming Wang, Shing-Chi Cheung, Wing Kwong Chan, and Zhenyu Zhang. 2009. Taming coincidental correctness: Coverage refinement with context patterns to improve fault localization. In 2009 IEEE 31st International Conference on Software Engineering. IEEE, 45–55.
[53]
Ming Wen, Junjie Chen, Yongqiang Tian, Rongxin Wu, Dan Hao, Shi Han, and Shing-Chi Cheung. 2019. Historical spectrum based fault localization. IEEE Transactions on Software Engineering(2019).
[54]
Ming Wen, Rongxin Wu, and Shing-Chi Cheung. 2016. Locus: Locating bugs from software changes. In 2016 31st IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 262–273.
[55]
Ming Wen, Rongxin Wu, Yepang Liu, Yongqiang Tian, Xuan Xie, Shing-Chi Cheung, and Zhendong Su. 2019. Exploring and exploiting the correlations between bug-inducing and bug-fixing commits. In Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 326–337.
[56]
Chu-Pan Wong, Yingfei Xiong, Hongyu Zhang, Dan Hao, Lu Zhang, and Hong Mei. 2014. Boosting Bug-Report-Oriented Fault Localization with Segmentation and Stack-Trace Analysis. In Proceedings of the 2014 IEEE International Conference on Software Maintenance and Evolution(ICSME ’14). 181–190.
[57]
W Eric Wong, Vidroha Debroy, and Byoungju Choi. 2010. A family of code coverage-based heuristics for effective fault localization. Journal of Systems and Software 83, 2 (2010), 188–208.
[58]
W Eric Wong, Ruizhi Gao, Yihao Li, Rui Abreu, and Franz Wotawa. 2016. A survey on software fault localization. IEEE Transactions on Software Engineering 42, 8 (2016), 707–740.
[59]
Rongxin Wu, Ming Wen, Shing-Chi Cheung, and Hongyu Zhang. 2018. Changelocator: locate crash-inducing changes based on crash reports. Empirical Software Engineering 23, 5 (2018), 2866–2900.
[60]
Xiaoyuan Xie, Tsong Yueh Chen, Fei-Ching Kuo, and Baowen Xu. 2013. A theoretical analysis of the risk evaluation formulas for spectrum-based fault localization. ACM Transactions on Software Engineering and Methodology (TOSEM) 22, 4(2013), 1–40.
[61]
Klaus Changsun Youm, June Ahn, Jeongho Kim, and Eunseok Lee. 2015. Bug localization based on code change histories and bug reports. In 2015 Asia-Pacific Software Engineering Conference (APSEC). IEEE, 190–197.
[62]
Abubakar Zakari, Sai Peck Lee, Rui Abreu, Babiker Hussien Ahmed, and Rasheed Abubakar Rasheed. 2020. Multiple fault localization of software programs: A systematic literature review. Information and Software Technology 124 (2020), 106312.
[63]
Mengshi Zhang, Xia Li, Lingming Zhang, and Sarfraz Khurshid. 2017. Boosting spectrum-based fault localization using pagerank. In Proceedings of the 26th ACM SIGSOFT International Symposium on Software Testing and Analysis. 261–272.
[64]
Mengshi Zhang, Yaoxian Li, Xia Li, Lingchao Chen, Yuqun Zhang, Lingming Zhang, and Sarfraz Khurshid. 2019. An empirical study of boosting spectrum-based fault localization via pagerank. IEEE Transactions on Software Engineering 47, 6 (2019), 1089–1113.
[65]
Jian Zhou, Hongyu Zhang, and David Lo. 2012. Where should the bugs be fixed? more accurate information retrieval-based bug localization based on bug reports. In 2012 34th International Conference on Software Engineering (ICSE). IEEE, 14–24.
[66]
Daming Zou, Jingjing Liang, Yingfei Xiong, Michael D Ernst, and Lu Zhang. 2019. An empirical study of fault localization families and their combinations. IEEE Transactions on Software Engineering 47, 2 (2019), 332–347.

Cited By

View all
  • (2024)Do not neglect what's on your hands: localizing software faults with exception trigger streamProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering10.1145/3691620.3695479(982-994)Online publication date: 27-Oct-2024
  • (2024)Enhancing Code Representation for Improved Graph Neural Network-Based Fault LocalizationCompanion Proceedings of the 32nd ACM International Conference on the Foundations of Software Engineering10.1145/3663529.3664459(686-688)Online publication date: 10-Jul-2024
  • (2024)Towards Better Graph Neural Network-Based Fault Localization through Enhanced Code RepresentationProceedings of the ACM on Software Engineering10.1145/36607931:FSE(1937-1959)Online publication date: 12-Jul-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
ASE '22: Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering
October 2022
2006 pages
ISBN:9781450394758
DOI:10.1145/3551349
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 05 January 2023

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Funding Sources

Conference

ASE '22

Acceptance Rates

Overall Acceptance Rate 82 of 337 submissions, 24%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)86
  • Downloads (Last 6 weeks)9
Reflects downloads up to 05 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Do not neglect what's on your hands: localizing software faults with exception trigger streamProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering10.1145/3691620.3695479(982-994)Online publication date: 27-Oct-2024
  • (2024)Enhancing Code Representation for Improved Graph Neural Network-Based Fault LocalizationCompanion Proceedings of the 32nd ACM International Conference on the Foundations of Software Engineering10.1145/3663529.3664459(686-688)Online publication date: 10-Jul-2024
  • (2024)Towards Better Graph Neural Network-Based Fault Localization through Enhanced Code RepresentationProceedings of the ACM on Software Engineering10.1145/36607931:FSE(1937-1959)Online publication date: 12-Jul-2024
  • (2024)A Systematic Exploration of Mutation‐Based Fault Localization FormulaeSoftware Testing, Verification and Reliability10.1002/stvr.1905Online publication date: 11-Nov-2024

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media