On the use of textual feature extraction techniques to support the automated detection of refactoring documentation

Marmolejos, Licelot; AlOmar, Eman Abdullah; Mkaouer, Mohamed Wiem; Newman, Christian; Ouni, Ali

doi:10.1007/s11334-021-00388-5

On the use of textual feature extraction techniques to support the automated detection of refactoring documentation

S.I. : ACITSEP
Published: 07 March 2021

Volume 18, pages 233–249, (2022)
Cite this article

Innovations in Systems and Software Engineering Aims and scope Submit manuscript

Licelot Marmolejos¹,
Eman Abdullah AlOmar¹,
Mohamed Wiem Mkaouer ORCID: orcid.org/0000-0001-6010-7561¹,
Christian Newman¹ &
…
Ali Ouni²

475 Accesses
2 Citations
Explore all metrics

Abstract

Refactoring is the art of improving the internal structure of a program without altering its external behavior, and it is an important task when it comes to software maintainability. While existing studies have focused on the detection of refactoring operations by mining software repositories, little was done to understand how developers document their refactoring activities. Therefore, there is recent trend trying to detect developers documentation of refactoring, by manually analyzing their internal and external software documentation. However, these techniques are limited by their manual process, which hinders their scalability. Hence, in this study, we tackle the detection of refactoring documentation as binary classification problem. We focus on the automatic detection of refactoring activities in commit messages by relying on text-mining, natural language preprocessing, and supervised machine learning techniques. We design our tool to overcome the limitation of the manual process, previously proposed by existing studies, through exploring the transformation of commit messages into features that are used to train various models. For our evaluation, we use and compare five different binary classification algorithms, and we test the effectiveness of these models using an existing dataset of manually curated messages that are known to be documenting refactoring activities in the source code. The experiments are carried out with different data sizes and number of bits. As per our results, the combination of Chi-Squared with Bayes point machine and Fisher score with Bayes point machine could be the most efficient when it comes to automatically identifying refactoring text patterns in commit messages, with an accuracy of 0.96, and an F-score of 0.96.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Testing of detection tools for AI-generated text

Article Open access 25 December 2023

Applying NLP techniques to malware detection in a practical environment

Article Open access 06 June 2021

Evaluating the efficacy of AI content detection tools in differentiating between human and AI-generated text

Article Open access 01 September 2023

Notes

https://drive.google.com/drive/folders/1h-ek4lc3O2XLCdDTQpMQjos5MM7uECif?usp=sharing.

References

bekvon/Residence. https://github.com/bekvon/residence/commit/76c364ea47e5a28b2041a0bb3323cb48bab180c9. Accessed 3 Jan 2021
AlOmar EA, Mkaouer MW, Ouni A (2019) Can refactoring be self-affirmed? an exploratory study on how developers document their refactoring activities in commit messages. In: Proceedings of the 3nd international workshop on refactoring-accepted. IEEE
AlOmar EA, Mkaouer MW, Ouni A, Kessentini M (2019) On the impact of refactoring on the relationship between quality attributes and design metrics. In: 2019 ACM/IEEE international symposium on empirical software engineering and measurement (ESEM), pp 1–11. IEEE
AlOmar EA, Peruma A, Mkaouer MW, Newman C, Ouni A, Kessentini M (2020) How we refactor and how we document it? On the use of supervised machine learning algorithms to classify refactoring documentation. Expert Syst Appl 167:114176
Article Google Scholar
AlOmar EA, Rodriguez PT, Bowman J, Wang T, Adepoju B, Lopez K, Newman CD, Ouni A, Mkaouer MW (2020) How do developers refactor code to improve code reusability? In: International conference on software and systems reuse. Springer
Alrubaye H, Mkaouer MW, Ouni A (2019) Migrationminer: an automated detection tool of third-party java library migration at the method level. In: 2019 IEEE international conference on software maintenance and evolution (ICSME), pp 414–417. IEEE
Alrubaye H, Mkaouer MW, Ouni A (2019) On the use of information retrieval to automate the detection of third-party java library migration at the method level. In: Proceedings of the 27th international conference on program comprehension, pp 347–357. IEEE Press
Bavota G, Dit B, Oliveto R, Di Penta M, Poshyvanyk D, De Lucia A (2013) An empirical study on the developers’ perception of software coupling. In: Proceedings of the 2013 international conference on software engineering, pp 692–701. IEEE Press
Bavota G, Panichella S, Tsantalis N, Di Penta M, Oliveto R, Canfora G (2014)Recommending refactorings based on team co-maintenance patterns. In: Proceedings of the 29th ACM/IEEE international conference on automated software engineering, pp 337–342. ACM
Chávez A, Ferreira I, Fernandes E, Cedrim D, Garcia A (2017) How does refactoring affect internal quality attributes?: a multi-project study. In: Proceedings of the 31st Brazilian symposium on software engineering, pp 74–83. ACM
Chidamber SR, Kemerer CF (1994) A metrics suite for object oriented design. IEEE Trans Soft Eng 20(6):476–493
Article Google Scholar
Demeyer S, Ducasse S, Nierstrasz O (2000) Finding refactorings via change metrics. In: ACM SIGPLAN notices, vol 35, pp 166–177. ACM
Di Z, Li B, Li Z, Liang P (2018) A preliminary investigation of self-admitted refactorings in open source software (S). In: The 30th international conference on software engineering and knowledge engineering, Hotel Pullman, Redwood City, California, USA, July 1–3, [13], pp 165–164. https://doi.org/10.18293/SEKE2018-081
Dig D, Comertoglu C, Marinov D, Johnson R (2006) Automated detection of refactorings in evolving components. In: European conference on object-oriented programming, pp 404–428. Springer
Freund Y, Schapire RE (1999) Large margin classification using the perceptron algorithm. Mach Learn 37(3):277–296
Article Google Scholar
Hattori LP, Lanza M (2008) On the nature of commits. In: Proceedings of the 23rd IEEE/ACM international conference on automated software engineering, pp III–63. IEEE Press
Hayashi S, Tsuda Y, Saeki M (2010) Search-based refactoring detection from source code revisions. IEICE Trans Inf Syst 93(4):754–762
Article Google Scholar
Herbrich R, Graepel T, Campbell C (2001) Bayes point machines. J Mach Learn Res 1:245–279
MathSciNet MATH Google Scholar
Hindle A, German DM, Holt R (2008) What do large commits tell us?: a taxonomical study of large commits. In: Proceedings of the 2008 international working conference on Mining software repositories, pp 99–108. ACM
Howe NR, Rath TM, Manmatha R (2005) Boosted decision trees for word recognition in handwritten document retrieval. In: Proceedings of the 28th annual international ACM SIGIR conference on research and development in information retrieval, pp 377–383. ACM
Kehrer T, Kelter U, Taentzer G (2011) A rule-based approach to the semantic lifting of model differences in the context of model versioning. In: Proceedings of the 2011 26th IEEE/ACM international conference on automated software engineering, pp 163–172. IEEE Computer Society
Kim M, Gee M, Loh A, Rachatasumrit N (2010) Ref-finder: a refactoring reconstruction tool based on logic query templates. In: Proceedings of the 18th ACM SIGSOFT international symposium on foundations of software engineering, pp 371–372. ACM
Kowsari K, Jafari Meimandi K, Heidarysafa M, Mendu S, Barnes L, Brown D (2019) Text classification algorithms: a survey. Information 10(4):150
Article Google Scholar
Mahouachi R, Kessentini M, Cinnéide MÓ (2013) Search-based refactoring detection using software metrics variation. In: International symposium on search based software engineering, pp 126–140. Springer
Mansouri MM (2018) Detection of rename local variable refactoring instances in commit history. PhD thesis, Concordia University
Mkaouer MW, Kessentini M, Bechikh S, Deb K, Ó Cinnéide M (2014) Recommendation system for software refactoring using innovization and interactive dynamic optimization. In: Proceedings of the 29th ACM/IEEE international conference on automated software engineering, pp 331–336. ACM
Mkaouer MW, Kessentini M, Cinnéide MÓ, Hayashi S, Deb K (2017) A robust multi-objective approach to balance severity and importance of refactoring opportunities. Emp Softw Eng 22(2):894–927
Article Google Scholar
Mund S (2015) Microsoft azure machine learning. Packt Publishing Ltd
Murphy-Hill E, Parnin C, Black AP (2011) How we refactor, and how we know it. IEEE Trans Softw Eng 38(1):5–18
Article Google Scholar
Opdyke WF (1992) Refactoring object-oriented frameworks. University of Illinois at Urbana-Champaign, Champaign
Google Scholar
Pan B, Tian Y, Zhou TS, Wang F, Li JS (2015) Study on image encryption method in clinical data exchange. In: 2015 7th international conference on information technology in medicine and education (ITME), pp 252–255. IEEE
Ratzinger J, Sigmund T, Gall HC (2008) On the relation of refactorings and software defect prediction. In: Proceedings of the 2008 international working conference on mining software repositories, pp 35–38. ACM
Saif H, Fernandez M, He Y, Alani H (2014) On stopwords, filtering and data sparsity for sentiment analysis of Twitter. In: Proceedings of the 9th international conference on language resources and evaluation (LREC’14), pp 810–817. European language resources association (ELRA), Reykjavik, Iceland. http://www.lrec-conf.org/proceedings/lrec2014/pdf/292_Paper.pdf
Shi Q, Petterson J, Dror G, Langford J, Smola A, Vishwanathan S (2009) Hash kernels for structured data. J Mach Learn Res 10:2615–2637
MathSciNet MATH Google Scholar
Silva D, Valente MT (2017) Refdiff: detecting refactorings in version histories. In: Proceedings of the 14th international conference on mining software repositories, pp 269–279. IEEE Press
Soares G, Gheyi R, Serey D, Massoni T (2010) Making program refactoring safer. IEEE Softw 27(4):52–57
Article Google Scholar
Soetens QD, Perez J, Demeyer S (2013) An initial investigation into change-based reconstruction of floss-refactorings. In: 2013 IEEE international conference on software maintenance, pp 384–387. IEEE
Stroggylos K, Spinellis D (2007) Refactoring–does it improve software quality? In: 15th international workshop on software quality (WoSQ’07: ICSE workshops 2007), p 10. IEEE
Taneja K, Dig D, Xie T (2007) Automated detection of api refactorings in libraries. In: Proceedings of the 22nd IEEE/ACM international conference on automated software engineering, pp 377–380. ACM
Thangthumachit S, Hayashi S, Saeki M (2011) Understanding source code differences by separating refactoring effects. In: 2011 18th Asia-Pacific software engineering conference, pp 339–347. IEEE
Tsantalis N, Chaikalis T, Chatzigeorgiou A (2008) Jdeodorant: identification and removal of type-checking bad smells. In: 2008 12th European conference on software maintenance and reengineering, pp 329–331. IEEE
Tsantalis N, Mansouri M, Eshkevari L, Mazinanian D, Dig D (2018) Accurate and efficient refactoring detection in commit history. In: 2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE), pp 483–494. IEEE
Weinberger K, Dasgupta A, Attenberg J, Langford J, Smola A (2009) Feature hashing for large scale multitask learning. ArXiv preprint arXiv:0902.2206
Weissgerber P, Diehl S (2006) Identifying refactorings from source-code changes. In: 21st IEEE/ACM international conference on automated software engineering (ASE’06), pp 231–240. IEEE
Xing Z, Stroulia E (2005) Umldiff: an algorithm for object-oriented design differencing. In: Proceedings of the 20th IEEE/ACM international conference on automated software engineering, pp 54–65. ACM
Yang Y, Pedersen JO (1997) A comparative study on feature selection in text categorization. In: ICML, vol 97, p 35

Download references

Author information

Authors and Affiliations

Rochester Institute of Technology, Rochester, NY, USA
Licelot Marmolejos, Eman Abdullah AlOmar, Mohamed Wiem Mkaouer & Christian Newman
ETS Montreal, University of Quebec, Quebec City, QC, Canada
Ali Ouni

Authors

Licelot Marmolejos
View author publications
You can also search for this author in PubMed Google Scholar
Eman Abdullah AlOmar
View author publications
You can also search for this author in PubMed Google Scholar
Mohamed Wiem Mkaouer
View author publications
You can also search for this author in PubMed Google Scholar
Christian Newman
View author publications
You can also search for this author in PubMed Google Scholar
Ali Ouni
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mohamed Wiem Mkaouer.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Marmolejos, L., AlOmar, E.A., Mkaouer, M.W. et al. On the use of textual feature extraction techniques to support the automated detection of refactoring documentation. Innovations Syst Softw Eng 18, 233–249 (2022). https://doi.org/10.1007/s11334-021-00388-5

Download citation

Received: 15 September 2020
Accepted: 16 February 2021
Published: 07 March 2021
Issue Date: June 2022
DOI: https://doi.org/10.1007/s11334-021-00388-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

On the use of textual feature extraction techniques to support the automated detection of refactoring documentation

Abstract

Access this article

Similar content being viewed by others

Testing of detection tools for AI-generated text

Applying NLP techniques to malware detection in a practical environment

Evaluating the efficacy of AI content detection tools in differentiating between human and AI-generated text

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

On the use of textual feature extraction techniques to support the automated detection of refactoring documentation

Abstract

Access this article

Similar content being viewed by others

Testing of detection tools for AI-generated text

Applying NLP techniques to malware detection in a practical environment

Evaluating the efficacy of AI content detection tools in differentiating between human and AI-generated text

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation