Abstract
Open-source software are very used nowadays in the industry, and the performance of the estimation of their maintenance effort becomes an interesting research topic. In this context, researchers have conducted many open-source software maintenance effort estimation (O-MEE) studies based on statistical and machine learning (ML) techniques for better estimation. This study focuses on the impact of instance selection on the performance of ML techniques in O-MEE, mainly for bug resolution. An empirical study was conducted using three techniques: K-nearest neighbor (kNN), support vector machine (SVM), and multinomial naïve Bayes (MNB) using all-kNN instance selection algorithms on three datasets: Eclipse JDT, Eclipse Platform, and Mozilla Thunderbird datasets. This study reports on a set of 18 experiments and a comparison of the results. The results of this study show that instance selection helped make ML techniques more performant.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
- 2.
- 3.
- 4.
- 5.
- 6.
- 7.
- 8.
References
Guo, S., Chen, R., Wei, M., Li, H., Liu, Y.: Ensemble data reduction techniques and multi-RSMOTE via fuzzy integral for bug report classification. IEEE Access 6, 45934–45950 (2018)
Sabor, K.K., Hamdaqa, M., Hamou-Lhadj, A.: Automatic prediction of the severity of bugs using stack traces and categorical features. Inf. Softw. Technol. 123, 106205 (2020)
Wang, H., Kagdi, H.: A conceptual replication study on bugs that get fixed in open source software. In: The proceedings of IEEE International Conference on Software Maintenance and Evolution (ICSME), pp. 299–310 (2018)
Habayeb, M., Murtaza, S.S., Miranskyy, A., Bener, A.B.: On the use of hidden Markov model to predict the time to fix bugs. IEEE Trans. Softw. Eng. 44(12), 1224–1244 (2018)
Ardimento, P., Dinapoli, A.: Knowledge extraction from on-line open source bug tracking systems to predict bug-fixing time. In: 7th International Conference on Web Intelligence, Mining and Semantics - WIMS 2017, pp. 1–9 (2017)
Thung, F.: Automatic prediction of bug fixing effort measured by code churn size. In: 5th International Workshop on Software Mining - SoftwareMining, pp. 18–23 (2016)
Xiong, C.J., Li, Y.F., Xie, M., Ng, S.H., Goh, T.N.: A model of open source software maintenance activities. In: IEEE International Conference on Industrial Engineering and Engineering Management, Hong Kong, China, pp. 267–271 (2009)
Olvera-López, J.A., Carrasco-Ochoa, J.A., Martínez-Trinidad, J.F., Kittler, J.: A review of instance selection methods. Artif. Intell. Rev. 34(2), 133–143 (2010)
Cover, T., Hart, P.: Nearest neighbor pattern classification. IEEE Trans. Inf. Theory 13(1), 21–27 (1967)
Sain, S.R.: The nature of statistical learning theory. Technometrics 38(4), 409 (1996)
D’Alché-Buc, F.: Incremental Learning Algorithms for Classification and Regression: local strategies. In: Proceedings of AIP Conference, Liege, Belgium, vol. 627, pp. 320–329 (2002)
Chirawichitchai, N.: Sentiment classification by a hybrid method of greedy search and multinomial naive bayes algorithm. In: Eleventh International Conference on ICT and Knowledge Engineering, pp. 1–4, Bangkok, Thailand (2013)
Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)
Guo, P.J., Zimmermann, T., Nagappan, N., Murphy, B.: Characterizing and predicting which bugs get fixed: an empirical study of Microsoft Windows. In: Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering - ICSE 2010, Cape Town, South Africa, vol. 1, p. 495 (2010)
García-Laencinan, P.J., Sancho-Gómez, J.-L., Figueiras-Vidal, A.R.: Pattern classification with missing data: a review. Neural Comput. Appl. 19(2), 263–282 (2010)
Tomek, I.: An experiment with the edited nearest-neighbor rule .IEEE Transactions on Systems, Man, and Cybernetics, 6(6), 448–452 (1976)
Jankowski, N., Grochowski, M.: Comparison of instances seletion algorithms I. Algorithms survey. In: Rutkowski, L., Siekmann, J.H., Tadeusiewicz, R., Zadeh, L.A. (eds.) ICAISC 2004. LNCS (LNAI), vol. 3070, pp. 598–603. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-24844-6_90
Abbasi, Z., Rahmani, M.: An instance selection algorithm based on ReliefF. Int. J. Artif. Intell. Tools 28(01), 1950001(2019)
Guan, D., Yuan, W., Lee, Y.-K., Lee, S.: Nearest neighbor editing aided by unlabeled data. Inf. Sci. 179(13), 2273–2282 (2009)
Lessmann, S., Baesens, B., Mues, C., Pietsch, S.: Benchmarking classification models for software defect prediction: a proposed framework and novel findings. IEEE Trans. Softw. Eng. 34(4), 485–496 (2008)
Scott, A.J., Knott, M.: A cluster analysis method for grouping means in the analysis of variance. Biometrics 30(3), 507 (1974)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Miloudi, C., Cheikhi, L., Idri, A., Abran, A. (2022). The Impact of Instance Selection Algorithms on Maintenance Effort Estimation for Open-Source Software. In: Rocha, A., Adeli, H., Dzemyda, G., Moreira, F. (eds) Information Systems and Technologies. WorldCIST 2022. Lecture Notes in Networks and Systems, vol 470. Springer, Cham. https://doi.org/10.1007/978-3-031-04829-6_17
Download citation
DOI: https://doi.org/10.1007/978-3-031-04829-6_17
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-04828-9
Online ISBN: 978-3-031-04829-6
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)