An Empirical Study to Investigate Data Sampling Techniques for Improving Code-Smell Prediction Using Imbalanced Data

Gupta, Himanshu; Misra, Sanjay; Kumar, Lov; Murthy, N. L. Bhanu

doi:10.1007/978-3-030-69143-1_18

An Empirical Study to Investigate Data Sampling Techniques for Improving Code-Smell Prediction Using Imbalanced Data

Himanshu Gupta⁷,
Sanjay Misra⁸,
Lov Kumar⁷ &
…
N. L. Bhanu Murthy⁷

Conference paper
First Online: 14 February 2021

962 Accesses
1 Citations

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1350))

Abstract

A code smell refers to a surface indication that usually indicates a deeper problem within a system. Usually it is associated with an easily traceable issue that often indicates a deeper inherent problem in the code. It has been observed that codes containing code smells are more susceptible to a higher probability of change during the software development process. Refactoring the code at an early stage during the development process saves a lot of time and prevents any kind of hassles at later stages. This paper aims at finding eight different types of code smells using feature engineering and sampling techniques with the purpose of handling imbalanced data. Three naive Bayes classifier are used to find code smells over 629 different packages. The results of this research indicate that the Gaussian Naive Bayes classifier performed the best out of all three classifiers in all samples of data. The results also indicate that the original data was the best data to use in which all three classifiers performed better than other two data sets.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

van Emden, E., Moonen, L.: Java quality assurance by detecting code smells. In: Proceedings of Ninth Working Conference on Reverse Engineering, 2002, pp. 97–106 (2002)
Google Scholar
Yamashita, A., Moonen, L.: Do code smells reflect important maintainability aspects? In 2012 28th IEEE International Conference on Software Maintenance (ICSM), pp. 306–315. IEEE (2012)
Google Scholar
Khomh, F., Di Penta, M., Gueheneuc, Y.G.: An exploratory study of the impact of code smells on software change-proneness. In: 2009 16th Working Conference on Reverse Engineering, pp. 75–84. IEEE (2009)
Google Scholar
Gupta, A., Suri, B., Kumar, V., Misra, S., Blažauskas, T., Damaševičius, R.: Software code smell prediction model using shannon, rényi and tsallis entropies. Entropy 20(5), 372 (2018)
Article Google Scholar
Gupta, A., Suri, B., Misra, S.: A systematic literature review: code bad smells in Java source code. In: Gervasi, O. (ed.) ICCSA 2017. LNCS, vol. 10408, pp. 665–682. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-62404-4_49
Chapter Google Scholar
Wang, T., Li, W.H.: Naive bayes software defect prediction model. In: 2010 International Conference on Computational Intelligence and Software Engineering, pp. 1–4. IEEE (2010)
Google Scholar
Turhan, B., Bener, A.: Analysis of naive bayes’ assumptions on software fault data: an empirical study. Data Knowl. Eng. 68(2), 278–290 (2009)
Article Google Scholar
Chaturvedi, K.K., Bedi, P., Misra, S., Singh, V.B.: An empirical validation of the complexity of code changes and bugs in predicting the release time of open source software. In: 2013 IEEE 16th International Conference on Computational Science and Engineering, pp. 1201–1206. IEEE (2013)
Google Scholar
Coleman, D., Ash, D., Lowther, B., Oman, P.: Using metrics to evaluate software system maintainability. Computer 27(8), 44–49 (1994)
Article Google Scholar
Turhan, B., Bener, A.B.: Software defect prediction: Heuristics for weighted naïve bayes. In: ICSOFT (SE), pp. 244–249 (2007)
Google Scholar
Abd-El-Hafiz, S.K.: A metrics-based data mining approach for software clone detection. In: 2012 IEEE 36th Annual Computer Software and Applications Conference, pp. 35–41. IEEE (2012)
Google Scholar
Fenton, N.E., Neil, M.: Software metrics: roadmap. In: Proceedings of the Conference on the Future of Software Engineering, pp. 357–370 (2000)
Google Scholar
Wilcoxon, F., Katti, S.K., Wilcox, R.A.: Critical values and probability levels for the Wilcoxon rank sum test and the Wilcoxon signed rank test. Selected tables in mathematical statistics 1, 171–259 (1970)
Google Scholar
Podobnik, B., Stanley, H.E.: Detrended cross-correlation analysis: a new method for analyzing two nonstationary time series. Phys. Rev. Lett. 100(8), 084102 (2008)
Article Google Scholar
Singh, G., Kumar, B., Gaur, L., Tyagi, A.: Comparison between multinomial and bernoulli naïve bayes for text classification. In: 2019 International Conference on Automation, Computational and Technology Management (ICACTM), pp. 593–596. IEEE (2019)
Google Scholar
Kibriya, A.M., Frank, E., Pfahringer, B., Holmes, G.: Multinomial Naive Bayes for text categorization revisited. In: Webb, G.I., Yu, X. (eds.) AI 2004. LNCS (LNAI), vol. 3339, pp. 488–499. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-30549-1_43
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

BITS Pilani, Hyderabad Campus, Hyderabad, India
Himanshu Gupta, Lov Kumar & N. L. Bhanu Murthy
Covenant University, Ota, Nigeria
Sanjay Misra

Authors

Himanshu Gupta
View author publications
You can also search for this author in PubMed Google Scholar
Sanjay Misra
View author publications
You can also search for this author in PubMed Google Scholar
Lov Kumar
View author publications
You can also search for this author in PubMed Google Scholar
N. L. Bhanu Murthy
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Himanshu Gupta , Sanjay Misra , Lov Kumar or N. L. Bhanu Murthy .

Editor information

Editors and Affiliations

Covenant University, Ota, Nigeria
Sanjay Misra
Federal University of Technology Minna, Minna, Nigeria
Bilkisu Muhammad-Bello

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Gupta, H., Misra, S., Kumar, L., Murthy, N.L.B. (2021). An Empirical Study to Investigate Data Sampling Techniques for Improving Code-Smell Prediction Using Imbalanced Data. In: Misra, S., Muhammad-Bello, B. (eds) Information and Communication Technology and Applications. ICTA 2020. Communications in Computer and Information Science, vol 1350. Springer, Cham. https://doi.org/10.1007/978-3-030-69143-1_18

Download citation

DOI: https://doi.org/10.1007/978-3-030-69143-1_18
Published: 14 February 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-69142-4
Online ISBN: 978-3-030-69143-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics