Skip to main content

The Impact of Instance Selection Algorithms on Maintenance Effort Estimation for Open-Source Software

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 470))

Abstract

Open-source software are very used nowadays in the industry, and the performance of the estimation of their maintenance effort becomes an interesting research topic. In this context, researchers have conducted many open-source software maintenance effort estimation (O-MEE) studies based on statistical and machine learning (ML) techniques for better estimation. This study focuses on the impact of instance selection on the performance of ML techniques in O-MEE, mainly for bug resolution. An empirical study was conducted using three techniques: K-nearest neighbor (kNN), support vector machine (SVM), and multinomial naïve Bayes (MNB) using all-kNN instance selection algorithms on three datasets: Eclipse JDT, Eclipse Platform, and Mozilla Thunderbird datasets. This study reports on a set of 18 experiments and a comparison of the results. The results of this study show that instance selection helped make ML techniques more performant.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   229.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   299.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    https://www.bugzilla.org.

  2. 2.

    https://www.atlassian.com/software/jira.

  3. 3.

    https://github.com.

  4. 4.

    https://github.com/logpai/bugrepo/tree/master/JDT.

  5. 5.

    https://github.com/logpai/bugrepo/tree/master/EclipsePlatform.

  6. 6.

    https://github.com/logpai/bugrepo/tree/master/Thunderbird.

  7. 7.

    https://www.bugzilla.org.

  8. 8.

    https://rapidminer.com.

References

  1. Guo, S., Chen, R., Wei, M., Li, H., Liu, Y.: Ensemble data reduction techniques and multi-RSMOTE via fuzzy integral for bug report classification. IEEE Access 6, 45934–45950 (2018)

    Article  Google Scholar 

  2. Sabor, K.K., Hamdaqa, M., Hamou-Lhadj, A.: Automatic prediction of the severity of bugs using stack traces and categorical features. Inf. Softw. Technol. 123, 106205 (2020)

    Article  Google Scholar 

  3. Wang, H., Kagdi, H.: A conceptual replication study on bugs that get fixed in open source software. In: The proceedings of IEEE International Conference on Software Maintenance and Evolution (ICSME), pp. 299–310 (2018)

    Google Scholar 

  4. Habayeb, M., Murtaza, S.S., Miranskyy, A., Bener, A.B.: On the use of hidden Markov model to predict the time to fix bugs. IEEE Trans. Softw. Eng. 44(12), 1224–1244 (2018)

    Article  Google Scholar 

  5. Ardimento, P., Dinapoli, A.: Knowledge extraction from on-line open source bug tracking systems to predict bug-fixing time. In: 7th International Conference on Web Intelligence, Mining and Semantics - WIMS 2017, pp. 1–9 (2017)

    Google Scholar 

  6. Thung, F.: Automatic prediction of bug fixing effort measured by code churn size. In: 5th International Workshop on Software Mining - SoftwareMining, pp. 18–23 (2016)

    Google Scholar 

  7. Xiong, C.J., Li, Y.F., Xie, M., Ng, S.H., Goh, T.N.: A model of open source software maintenance activities. In: IEEE International Conference on Industrial Engineering and Engineering Management, Hong Kong, China, pp. 267–271 (2009)

    Google Scholar 

  8. Olvera-López, J.A., Carrasco-Ochoa, J.A., Martínez-Trinidad, J.F., Kittler, J.: A review of instance selection methods. Artif. Intell. Rev. 34(2), 133–143 (2010)

    Article  Google Scholar 

  9. Cover, T., Hart, P.: Nearest neighbor pattern classification. IEEE Trans. Inf. Theory 13(1), 21–27 (1967)

    Article  Google Scholar 

  10. Sain, S.R.: The nature of statistical learning theory. Technometrics 38(4), 409 (1996)

    Article  Google Scholar 

  11. D’Alché-Buc, F.: Incremental Learning Algorithms for Classification and Regression: local strategies. In: Proceedings of AIP Conference, Liege, Belgium, vol. 627, pp. 320–329 (2002)

    Google Scholar 

  12. Chirawichitchai, N.: Sentiment classification by a hybrid method of greedy search and multinomial naive bayes algorithm. In: Eleventh International Conference on ICT and Knowledge Engineering, pp. 1–4, Bangkok, Thailand (2013)

    Google Scholar 

  13. Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)

    Article  Google Scholar 

  14. Guo, P.J., Zimmermann, T., Nagappan, N., Murphy, B.: Characterizing and predicting which bugs get fixed: an empirical study of Microsoft Windows. In: Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering - ICSE 2010, Cape Town, South Africa, vol. 1, p. 495 (2010)

    Google Scholar 

  15. García-Laencinan, P.J., Sancho-Gómez, J.-L., Figueiras-Vidal, A.R.: Pattern classification with missing data: a review. Neural Comput. Appl. 19(2), 263–282 (2010)

    Article  Google Scholar 

  16. Tomek, I.: An experiment with the edited nearest-neighbor rule .IEEE Transactions on Systems, Man, and Cybernetics, 6(6), 448–452 (1976)

    Google Scholar 

  17. Jankowski, N., Grochowski, M.: Comparison of instances seletion algorithms I. Algorithms survey. In: Rutkowski, L., Siekmann, J.H., Tadeusiewicz, R., Zadeh, L.A. (eds.) ICAISC 2004. LNCS (LNAI), vol. 3070, pp. 598–603. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-24844-6_90

    Chapter  Google Scholar 

  18. Abbasi, Z., Rahmani, M.: An instance selection algorithm based on ReliefF. Int. J. Artif. Intell. Tools 28(01), 1950001(2019)

    Google Scholar 

  19. Guan, D., Yuan, W., Lee, Y.-K., Lee, S.: Nearest neighbor editing aided by unlabeled data. Inf. Sci. 179(13), 2273–2282 (2009)

    Article  Google Scholar 

  20. Lessmann, S., Baesens, B., Mues, C., Pietsch, S.: Benchmarking classification models for software defect prediction: a proposed framework and novel findings. IEEE Trans. Softw. Eng. 34(4), 485–496 (2008)

    Article  Google Scholar 

  21. Scott, A.J., Knott, M.: A cluster analysis method for grouping means in the analysis of variance. Biometrics 30(3), 507 (1974)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Laila Cheikhi .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Miloudi, C., Cheikhi, L., Idri, A., Abran, A. (2022). The Impact of Instance Selection Algorithms on Maintenance Effort Estimation for Open-Source Software. In: Rocha, A., Adeli, H., Dzemyda, G., Moreira, F. (eds) Information Systems and Technologies. WorldCIST 2022. Lecture Notes in Networks and Systems, vol 470. Springer, Cham. https://doi.org/10.1007/978-3-031-04829-6_17

Download citation

Publish with us

Policies and ethics