Mining Python fix patterns via analyzing fine-grained source code changes

Yang, Yilin; He, Tianxing; Feng, Yang; Liu, Shaoying; Xu, Baowen

doi:10.1007/s10664-021-10087-1

Mining Python fix patterns via analyzing fine-grained source code changes

Published: 28 January 2022

Volume 27, article number 48, (2022)
Cite this article

Empirical Software Engineering Aims and scope Submit manuscript

Yilin Yang¹,
Tianxing He¹,
Yang Feng ORCID: orcid.org/0000-0002-7477-3642¹,
Shaoying Liu² &
…
Baowen Xu¹

1141 Accesses
7 Citations
Explore all metrics

Abstract

Many code changes are inherently repetitive, and researchers employ repetitiveness of the code changes to generate bug fix patterns. Automatic Program Repair (APR) can automatically detect and fix bugs, thus helping developers to improve the quality of software products. As a critical component of APR, software bug fix patterns have been revealed by existing studies to be very effective in detecting and fixing bugs in different programming languages (e.g., Java/C++); yet the fix patterns proposed by these studies can not be directly applied to improve Python programs because of syntactic incompatibilities and lack of analysis of dynamic features. In this paper, we proposed a mining approach to identify fix patterns of Python programs by extracting fine-grained bug-fixing code changes. We first collected bug reports from GitHub repository and employed the abstract syntax tree edit distance to cluster similar bug-fixing code changes to generate fix patterns. We then evaluated the effectiveness of these fix patterns by applying them to single-hunk bugs in two benchmarks (BugsInPy and QuixBugs). The results show that 13 out of 101 real bugs can be fixed without human intervention; that is, the generated bug patch is identical or semantically equivalent with developer’s patches. Also, we evaluated the fix patterns in the wild. For each complex bug, 15% of the bug code could be fixed, and 37% of the bug code could be matched by fix patterns.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Large Language Model Assisted Software Engineering: Prospects, Challenges, and a Case Study

CoRT: Transformer-based code representations with self-supervision by predicting reserved words for code smell detection

Article 08 April 2024

How different are different diff algorithms in Git?

Article Open access 11 September 2019

Notes

https://github.com/SATE-Lab/PyFPattern
https://guides.github.com/features/issues/
https://www.eclipse.org/jgit/
https://github.com/petr-muller/pyff
Histogram algorithm is the enhanced version of Patience. https://alfedenzo.livejournal.com
https://github.com/JoaoFelipe/apted

References

Åkerblom B, Stendahl J, Tumlin M, Wrigstad T (2014) Tracing dynamic features in python programs. In: Proceedings of the 11th working conference on mining software repositories. pp 292–295
Akimova E N, Bersenev A Y, Deikov A A, Kobylkin K S, Konygin A V, Mezentsev I P, Misilov V E (2021) A survey on software defect prediction using deep learning. Mathematics 9(11):1180
Article Google Scholar
Cadar C, Dunbar D, Engler D R, et al. (2008) Klee: unassisted and automatic generation of high-coverage tests for complex systems programs. In: OSDI, vol 8, pp 209–224
Chakraborty S, Allamanis M, Ray B (2018) Tree2tree neural translation model for learning source code changes. arXiv:181000314
Chen Z, Ma W, Lin W, Chen L, Li Y, Xu B (2018) A study on the changes of dynamic feature code when fixing bugs: towards the benefits and costs of python dynamic features. Science Chin Inf Sci 61(1):012107
Article Google Scholar
Cotroneo D, De Simone L, Iannillo A K, Natella R, Rosiello S, Bidokhti N (2019) Analyzing the context of bug-fixing changes in the openstack cloud computing platform, IEEE
Do H, Elbaum S, Rothermel G (2005) Supporting controlled experimentation with testing techniques: An infrastructure and its potential impact. Empir. Softw. Eng. 10(4):405–435
Article Google Scholar
Durieux T, Cornu B, Seinturier L, Monperrus M (2017) Dynamic patch generation for null pointer exceptions using metaprogramming. In: 2017 IEEE 24th international conference on software analysis, evolution and reengineering (SANER). IEEE, pp 349–358
Fleiss J L (1971) Measuring nominal scale agreement among many raters. Psychol Bull 76(5):378
Article Google Scholar
Fluri B, Wursch M, PInzger M, Gall H (2007) Change distilling: Tree differencing for fine-grained source code change extraction. IEEE Trans Softw Eng 33(11):725–743
Article Google Scholar
Fluri B, Giger E, Gall HC (2008) Discovering patterns of change types. In: 2008 23rd IEEE/ACM international conference on automated software engineering. IEEE, pp 463–466
Habib A, Pradel M (2018) How many of all bugs do we find? a study of static bug detectors, IEEE
Hallgren K A (2012) Computing inter-rater reliability for observational data: an overview and tutorial. Tutor Quant Methods Psychol 8(1):23
Article Google Scholar
Hanam Q, Brito FSdM, Mesbah A (2016) Discovering bug patterns in javascript. In: Proceedings of the 2016 24th ACM SIGSOFT international symposium on foundations of software engineering. pp 144–156
Higo Y, Hayashi S, Hata H, Nagappan M (2020) Ammonia: an approach for deriving project-specific bug patterns. Empir Softw Eng :1–29
Hindle A, Barr E T, Gabel M, Su Z, Devanbu P (2016) On the naturalness of software. Commun. ACM 59(5):122–131
Article Google Scholar
Holkner A, Harland J (2009) Evaluating the dynamic behaviour of python applications. In: Proceedings of the thirty-second australasian conference on computer science-volume 91, pp 19–28
Hong S, Kim M (2013) Effective pattern-driven concurrency bug detection for operating systems. J. Syst. Softw. 86(2):377–388
Article Google Scholar
Hu M, Zhang Y (2020) The python/c api: Evolution, usage statistics, and bug patterns, IEEE
Hua J, Zhang M, Wang K, Khurshid S (2018) Towards practical program repair with on-demand candidate generation. In: Proceedings of the 40th international conference on software engineering. pp 12–23
Jiang L, Su Z (2009) Automatic mining of functionally equivalent code fragments via random testing. In: Proceedings of the eighteenth international symposium on Software testing and analysis. pp 81–92
Kim D, Nam J, Song J, Kim S (2013) Automatic patch generation learned from human-written patches, IEEE
Le Goues C, Dewey-Vogt M, Forrest S, Weimer W (2012) A systematic study of automated program repair: Fixing 55 out of 105 bugs for 8 each, IEEE
Lin D, Koppel J, Chen A, Solar-Lezama A (2017) Quixbugs: A multi-lingual program repair benchmark set based on the quixey challenge. In: Proceedings Companion of the 2017 ACM SIGPLAN international conference on systems, programming, languages, and applications: software for humanity, pp 55–56
Liu K, Kim D, Bissyandé TF, Yoo S, Le Traon Y (2018) Mining fix patterns for findbugs violations. IEEE Trans Softw Eng
Liu K, Koyuncu A, Kim D, Bissyandé TF (2019a) Avatar: Fixing semantic bugs with fix patterns of static analysis violations, IEEE
Liu K, Koyuncu A, Kim D, Bissyandé TF (2019b) Tbar: revisiting template-based automated program repair. In: Proceedings of the 28th ACM SIGSOFT international symposium on software testing and analysis. pp 31–42
Liu X, Zhong H (2018) Mining stackoverflow for program repair, IEEE
Long F, Rinard M (2016) An analysis of the search spaces for generate and validate patch generation systems, IEEE
Long F, Amidon P, Rinard M (2017) Automatic inference of code transforms for patch generation. In: Proceedings of the 2017 11th joint meeting on foundations of software engineering. pp 727–739
Mechtaev S, Yi J, Roychoudhury A (2015) Directfix: Looking for simple program repairs. In: 2015 IEEE/ACM 37th IEEE international conference on software engineering, vol 1. IEEE, pp 448–458
Mechtaev S, Yi J, Roychoudhury A (2016) Angelix: Scalable multiline program patch synthesis via symbolic analysis. In: Proceedings of the 38th international conference on software engineering. pp 691–701
Monat R, Ouadjaout A, Miné A (2020), Static type analysis by abstract interpretation of python programs. ECOOP (LIPIcs). To appear
Negara S, Codoban M, Dig D, Johnson RE (2014) Mining fine-grained code changes to detect unknown change patterns. In: Proceedings of the 36th international conference on software engineering. pp 803–813
Nguyen H A, Nguyen A T, Nguyen T T, Nguyen T N, Rajan H (2013a) A study of repetitiveness of code changes in software evolution, IEEE
Nguyen H D T, Qi D, Roychoudhury A, Chandra S (2013b) Semfix: Program repair via semantic analysis, IEEE
Noda K, Nemoto Y, Hotta K, Tanida H, Kikuchi S (2020) Experience report: How effective is automated program repair for industrial software?, IEEE
Nugroho Y S, Hata H, Matsumoto K (2020) How different are different diff algorithms in git? Empir. Softw. Eng. 25(1):790–823
Article Google Scholar
Pan K, Kim S, Whitehead E J (2009) Toward an understanding of bug fix patterns. Empir. Softw. Eng. 14(3):286–315
Article Google Scholar
Pawlik M, Augsten N (2015) Efficient computation of the tree edit distance. ACM Trans Database Syst (TODS) 40(1):1–40
Article MathSciNet Google Scholar
Pawlik M, Augsten N (2016) Tree edit distance: Robust and memory-efficient. Inf. Syst. 56:157–173
Article Google Scholar
Qi Z, Long F, Achour S, Rinard M (2015) An analysis of patch plausibility and correctness for generate-and-validate patch generation systems. In: Proceedings of the 2015 international symposium on software testing and analysis. pp 24–36
Saha S, et al. (2019) Harnessing evolution for multi-hunk program repair. In: 2019 IEEE/ACM 41st international conference on software engineering (ICSE). IEEE, pp 13–24
Sanner M F, et al. (1999) Python: a programming language for software integration and development. J Mol Graph Model 17(1):57–61
Google Scholar
Seaman C B (1999) Qualitative methods in empirical studies of software engineering. IEEE Trans Softw Eng 25(4):557–572
Article Google Scholar
Shrout P E, Fleiss J L (1979) Intraclass correlations: uses in assessing rater reliability. Psychol Bull 86(2):420
Article Google Scholar
Tratt L (2009) Dynamically typed languages. Adv. Comput. 77:149–184
Article Google Scholar
Van Rossum G, Drake FL Jr (1995) Python tutorial, vol 620. Centrum voor Wiskunde en Informatica Amsterdam
Vasilescu B, Yu Y, Wang H, Devanbu P, Filkov V (2015) Quality and productivity outcomes relating to continuous integration in github. In: Proceedings of the 2015 10th joint meeting on foundations of software engineering. pp 805–816
Wang B, Chen L, Ma W, Chen Z, Xu B (2015) An empirical study on the impact of python dynamic features on change-proneness. In: SEKE. pp 134–139
Wang Y, Meng N, Zhong H (2018) An empirical study of multi-entity changes in real bug fixes, IEEE
Weimer W, Nguyen T, Le Goues C, Forrest S (2009) Automatically finding patches using genetic programming. In: 2009 IEEE 31st international conference on software engineering. IEEE, pp 364–374
Weimer W, Fry Z P, Forrest S (2013) Leveraging program equivalence for adaptive program repair: Models and first results, IEEE
Wen M, Wu R, Cheung S C (2016) Locus: Locating bugs from software changes, IEEE
Wen M, Chen J, Wu R, Hao D, Cheung SC (2018) Context-aware patch generation for better automated program repair, IEEE
Widyasari R, Sim SQ, Lok C, Qi H, Phan J, Tay Q, Tan C, Wee F, Tan JE, Yieh Y, et al. (2020) Bugsinpy: a database of existing bugs in python programs to enable controlled testing and debugging studies. In: Proceedings of the 28th ACM joint meeting on european software engineering conference and symposium on the foundations of software engineering. pp 1556–1560
Xia X, Wan Z, Kochhar P S, Lo D (2019) How practitioners perceive coding proficiency, IEEE
Xin Q, Reiss SP (2017) Identifying test-suite-overfitted patches through test case generation. In: Proceedings of the 26th ACM SIGSOFT international symposium on software testing and analysis. pp 226–236
Xu Z, Liu P, Zhang X, Xu B (2016) Python predictive analysis for bug detection. In: Proceedings of the 2016 24th ACM SIGSOFT international symposium on foundations of software engineering. pp 121–132
Xuan J, Martinez M, Demarco F, Clement M, Marcote S L, Durieux T, Le Berre D, Monperrus M (2016) Nopol: Automatic repair of conditional statement bugs in java programs. IEEE Trans. Softw. Eng. 43(1):34–55
Article Google Scholar
Ye H, Martinez M, Durieux T, Monperrus M (2021) A comprehensive study of automatic program repair on the quixbugs benchmark. J. Syst. Softw. 171:110825
Article Google Scholar
Zhang Y, Chen Y, Cheung SC, Xiong Y, Zhang L (2018) An empirical study on tensorflow program bugs. In: Proceedings of the 27th ACM SIGSOFT international symposium on software testing and analysis. pp 129–140

Download references

Acknowledgements

We thank the anonymous reviewers for their constructive comments. This work is partially supported by the the National Natural Science Foundation of China (No.62172209), the Key Program of the National Natural Science Foundation of China (No.61832009) and Cooperation Fund of Huawei-Nanjing University Next Generation Programming Innovation Lab (No. YBN2019105178SW23).

Author information

Authors and Affiliations

State Key Laboratory for Novel Software Technology, Nanjing University, No. 22 Hankou Rd., Gulou District, Nanjing, Jiangsu, 210093, People’s Republic of China
Yilin Yang, Tianxing He, Yang Feng & Baowen Xu
Graduate School of Advanced Science and Engineering, Hiroshima University, Higashihiroshima, Japan
Shaoying Liu

Authors

Yilin Yang
View author publications
You can also search for this author in PubMed Google Scholar
Tianxing He
View author publications
You can also search for this author in PubMed Google Scholar
Yang Feng
View author publications
You can also search for this author in PubMed Google Scholar
Shaoying Liu
View author publications
You can also search for this author in PubMed Google Scholar
Baowen Xu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yang Feng.

Additional information

Communicated by: Shaowei Wang, Tse-Hsun (Peter) Chen, Sebastian Baltes, Ivano Malavolta, Christoph Treude, Alexander Serebrenik

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This fix pattern changes ’hasattr(object, name)’ to ’getattr(object, name [,default])’. In early versions of Python 2.x, hasattr() had a bug about property function, as follows:

Although this bug has been fixed in later versions. When writing mixed code compatible with Python 2.x and 3.x, users should pay more attention to this bug.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yang, Y., He, T., Feng, Y. et al. Mining Python fix patterns via analyzing fine-grained source code changes. Empir Software Eng 27, 48 (2022). https://doi.org/10.1007/s10664-021-10087-1

Download citation

Accepted: 08 November 2021
Published: 28 January 2022
DOI: https://doi.org/10.1007/s10664-021-10087-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Mining Python fix patterns via analyzing fine-grained source code changes

Abstract

Access this article

Similar content being viewed by others

Large Language Model Assisted Software Engineering: Prospects, Challenges, and a Case Study

CoRT: Transformer-based code representations with self-supervision by predicting reserved words for code smell detection

How different are different diff algorithms in Git?

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Appendix: A

1.1 A.1 (P11) Python 2.x Data Type Compatible with Python 3.x Data Type

Description

1.2 A.2 (P12) Python 2.x xrange() Compatible with Python 3.x range()

Description

1.3 A.3 (P13) Change Python 3.x map() Returned Value to List Type

Description

1.4 A.4 (P14) Python 3.x Check Dictionary has_key()

Description

1.5 A.5 (P15) Change Python 3.x Dictionary API Name

Description

1.6 A.6 (P16) Change Python 3.x Float Division to Integer Division

Description

1.7 A.7 (P17) Change Python 3.x super() Backward Compatibility

Description

1.8 A.8 (P18) Check Property Function by getattr()

Description

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Mining Python fix patterns via analyzing fine-grained source code changes

Abstract

Access this article

Similar content being viewed by others

Large Language Model Assisted Software Engineering: Prospects, Challenges, and a Case Study

CoRT: Transformer-based code representations with self-supervision by predicting reserved words for code smell detection

How different are different diff algorithms in Git?

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Appendix: A

Appendix: A

1.1 A.1 (P11) Python 2.x Data Type Compatible with Python 3.x Data Type

Description

1.2 A.2 (P12) Python 2.x xrange() Compatible with Python 3.x range()

Description

1.3 A.3 (P13) Change Python 3.x map() Returned Value to List Type

Description

1.4 A.4 (P14) Python 3.x Check Dictionary has_key()

Description

1.5 A.5 (P15) Change Python 3.x Dictionary API Name

Description

1.6 A.6 (P16) Change Python 3.x Float Division to Integer Division

Description

1.7 A.7 (P17) Change Python 3.x super() Backward Compatibility

Description

1.8 A.8 (P18) Check Property Function by getattr()

Description

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation