Abstract
Variation in software requirements, technological upgrade and occurrence of defects necessitate change in software for its effective use. Early detection of those classes of a software which are prone to change is critical for software developers and project managers as it can aid in efficient resource allocation of limited resources. Moreover, change prone classes should be efficiently restructured and designed to prevent introduction of defects. Recently, use of search based techniques and their hybridized counter-parts have been advocated in the field of software engineering predictive modeling as these techniques help in identification of optimal solutions for a specific problem by testing the goodness of a number of possible solutions. In this paper, we propose a novel approach for change prediction using search-based techniques and hybridized techniques. Further, we address the following issues: (i) low repeatability of empirical studies, (ii) less use of statistical tests for comparing the effectiveness of models, and (iii) non-assessment of trade-off between runtime and predictive performance of various techniques. This paper presents an empirical validation of search-based techniques and their hybridized versions, which yields unbiased, accurate and repeatable results. The study analyzes and compares the predictive performance of five search-based, five hybridized techniques and four widely used machine learning techniques and a statistical technique for predicting change prone classes in six application packages of a popular operating system for mobile—Android. The results of the study advocate the use of hybridized techniques for developing models to identify change prone classes.












Similar content being viewed by others
Explore related subjects
Discover the latest articles and news from researchers in related subjects, suggested using machine learning.References
Ali, S., Briand, L.C., Hemmati, H., Panesar-Walawege, R.K.: A systematic review of the application and empirical investigation of search-based test case generation. IEEE Trans. Softw. Eng. 36(6), 742–762 (2010)
Aguilar-Reiz, J.S., Riquelme, J.C., Toro, M.: Evolutionary learning of hierarchical decision rules. IEEE Trans. Syst. Man Cybern. 33(2), 324–331 (2003)
Arcuri, A., Fraser, G.: On Parameter Tuning in Search Based Software Engineering. Springer, Berlin (2011)
Arcuri, A., Briand, L.C.: A practical guide for using statistical tests to assess randomized algorithms in software engineering. In: International Conference on Software Engineering, pp. 1–10 (2011)
Arisholm, E., Briand, L.C., Foyen, A.: Dynamic coupling measurement for object-oriented software. IEEE Trans. Softw. Eng. 30(8), 491–506 (2004)
Arisholm, E., Briand, L.C.: Predicting fault-prone components in a java legacy system. In: Proceedings of ACM/IEEE international symposium on empirical software engineering, pp. 8–17 (2006)
Arisholm, E., Briand, L.C., Johannessen, E.B.: A systematic and comprehensive investigation of methods to build and evaluate fault prediction models. J. Syst. Softw. 83(1), 2–17 (2010)
Bacardit, J., Garrell, J.M. : Evolving multiple discretizations with adaptive intervals for a pittsburgh rule-based learning classifier system. In: Genetic and Evolutionary Computation Conference (GECCO’03), 2724, pp. 1818–1831 (2003)
Bacardit, J.: Pittsburgh genetics-based machine learning in the data mining era: representations, generalization, and run-time. PhD Thesis (2004)
Bacardit, J., Krasnogor, N.: Performance and efficiency of memetic pittsburgh learning classifier systems. Evol. Comput. 17(3), 307–342 (2009)
Bardsiri, V.K., Jawawi, D.N.A., Hashim, S.Z.M., Khatibi, E.: A PSO-based model to increase the accuracy of software development effort estimation. Softw. Qual. J. 21(3), 501–526 (2013)
Bardsiri, V.K., Jawawi, D.N.A., Hashim, S.Z.M., Khatibi, E.: A flexible method to estimate the software development effort based on the classification of projects and localization of comparisons. Empir. Softw. Eng. 19(4), 857–884 (2014)
Barros, M.O., Neto, A.C.D.: Threats to validity in search-based software engineering empirical studies. Technical Report TR 0006/2011, UNIRIO-Universidade Federal do Estado do Rio de Janeiro (2011)
Bernado-Mansilla, E., Garrell-Guiu, J.M.: Accuracy-based learning classifier systems: models, analysis and applications to classification tasks. Evol. Comput. 11(3), 209–238 (2003)
Bieman, J., Jain, D., Yang, H.: OO design patterns, design structure, and program changes: an industrial case study. In: Proceedings of 17th International Conference on Software Maintenance, pp. 580–589 (2001)
Bieman, J., Andrews, A., Yang, H.: Understanding change-proneness in OO software through visualization. In: 11th IEEE International Workshop on Program Comprehension, pp. 44–53 (2003)
Braga, P.L., Oliveira, A.L., Meira, S.R.: A GA-based feature selection and parameters optimization for support vector regression applied to software effort estimation. In: ACM Symposium on Applied Computing, pp. 1788–1798 (2008)
Briand, L., Daly, J., Wust, J.: A unified framework for cohesion measurement in object-oriented systems. Empir. Softw. Eng. 3(1), 65–117 (1998)
Briand, L., Daly, J., Wust, J.: A unified framework for coupling measurement in object-oriented systems. IEEE Trans. Softw. Eng. 25(1), 91–121 (1999)
Briand, L., Wust, J., Daly, J.W.: Exploring the relationship between design measures and software quality in object-oriented Systems. J. Syst. Softw. 51(3), 245–273 (2000)
Briand, L., Wust, J., Lounis, H.: Replicated case studies for investigating quality factors in object oriented designs. Empir. Softw. Eng. J. 6(1), 11–58 (2001)
Burgess, C.J., Lefley, M.: Can genetic programming improve software effort estimation? A comparative evaluation. Inf. Softw. Technol. 43(14), 863–873 (2001)
Butz, M.V., Kovacs, T., Lanzi, P.L., Wilson, S.W.: How XCS evolves accurate classifiers. In: Proceedings of Genetic and Evolutionary Computation Conference, pp. 927–934 (2001)
Canfora, G., De Lucia, A., Di Penta, M., Oliveto, R., Panichella, A., Panichella, S.: Multi-objective cross-project defect prediction. In: 6th International Conference on Software Testing, Verification and Validation, pp. 252–261 (2013)
CartWright, M., Shepperd, M.: An empirical investigation of an object-oriented software system. IEEE Trans. Softw. Eng. 26(8), 786–796 (2000)
Carvalho, D.R., Freitas, A.A.: A hybrid decision tree/genetic algorithm method for data mining. J. Inf. Sci. 163(1–3), 13–35 (2004)
Carvalho, A.B.D., Pozo, A., Vergilio, S.R.: A symbolic fault-prediction model based on multi-objective particle swarm optimization. J. Syst. Softw. 83(5), 868–882 (2010)
Catal, C., Diri, B.: Investigating the effect of dataset size, metrics sets, and feature selection techniques on software fault prediction problem. Inf. Sci. 179(8), 1040–1058 (2009)
Chidamber, S.R., Kemerer, C.F.: A metrics suite for object oriented design. IEEE Trans. Softw. Eng. 20(6), 476–493 (1994)
Chiu, N.-H., Huang, S.-J.: The adjusted analogy-based software effort estimation based on similarity distances. J. Syst. Softw. 80(4), 628–640 (2007)
Clarke, J., Dolado, J.J., Harman, M., Hierons, R., Jones, B., Lumkin, M., Mitchell, B., Mancordis, S., Rees, K., Roper, M., Shepperd, M.: Reformulating software engineering as a search problem. IEEE Proc. Softw. 150(3), 161–175 (2003)
Clerc, M., Kennedy, J.: The particle swarm—explosion, stability, and convergence in a multidimensional complex space. IEEE Trans. Evol. Comput. 6(1), 58–73 (2002)
Corazza, A., Di Martino, S., Ferrucci, F., Gravino, C., Sarro, F., Mendes, E.: Using tabu search to configure support vector regression for effort estimation. Empir. Softw. Eng. 18(1), 506–546 (2013)
Craenen, B.G., Eiben, A.E., van Hemert, J.I.: Comparing evolutionary algorithms on binary constraint satisfaction problems. IEEE Trans. Evol. Comput. 7(5), 424–444 (2003)
Demšar, J.: Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7, 1–30 (2006)
Di Martino, S.D., Ferrucci, F., Gravino, C., Sarro, F.: A genetic algorithm to configure support vector machines for predicting fault-prone components. Prod. Focus. Softw. Process Improv. 6759, 247–261 (2011)
Dolado, J.J.: A validation of component based method for software size estimation. IEEE Trans. Softw. Eng. 26(10), 1006–1021 (2000)
Durbin, R., Rumelhart, D.: Product units: a computationally powerful and biologically plausible extensions to back-propagation networks. Neural Comput. 1, 133–142 (1989)
Elish, K.O., Elish, M.O.: Predicting defect-prone software modules using support vector machines. J. Syst. Softw. 81(5), 649–660 (2008)
Elish, M.O., Al-Khiaty, M.A.: A suite of metrics for quantifying historical changes to predict future change-prone classes in object-oriented software. J. Softw. Evol. Process 25(5), 407–437 (2013)
El Emam, K., Melo, W., Machado, J.C.: The prediction of faulty classes using object-oriented design metrics. J. Syst. Softw. 56(1), 63–75 (2001)
Eski, S., Buzluca, F.: An empirical study on object-oriented metrics and software evolution in order to reduce testing cost by predicting change prone classes. In: International Conference on Software Testing, Verification and Validation Workshop, pp. 566–571 (2011)
Friedman, M.: The use of ranks to avoid the assumption of normality implicit in the analysis of variance. J. Am. Stat. Assoc. 32(200), 675–701 (1937)
Giger, E., Pinzger, M., Gall, H.C.: Can we predict type of code changes? An empirical analysis. In: 9th IEEE Working Conference on Mining Software Repositories, pp. 217–226 (2012)
Gottlieb, J., Marchiori, E., Rossi, C.: Evolutionary algorithms for the satisfiability problem. Evol. Comput. 10(1), 35–50 (2002)
Grosan, C., Abraham, A.: Hybrid evolutionary algorithms: methodologies, architectures and reviews. Stud. Comput. Intell. 75, 1–17 (2007)
Gyimothy, T., Ferenc, R., Siket, I.: Empirical validation of object-oriented metrics on open source software for fault prediction. IEEE Trans. Softw. Eng. 31(10), 897–910 (2005)
Hall, M.A.: Correlation-based feature selection for discrete and numeric class machine learning. In: Proceeding of the Seventeenth International Conference on Machine Learning, pp. 359–366 (2000)
Harman, M., Jones, B.F.: Search based software engineering. Inf. Softw. Technol. 43(14), 833–839 (2001)
Harman, M.: The relationship between search based software engineering and predictive modeling. In: Proceedings of the 6th International Conference on Predictive Models in Software Engineering (2010a)
Harman, M.: Why the virtual nature of software makes it ideal for search based optimization. In: International Conference on Fundamental Approaches to Software Engineering. Springer, Berlin (2010b)
Harman, M., McMinn, P., Teixeira de Souza, J., Yoo, S.: Search based software engineering: techniques, taxonomy, tutorial. In: Empirical Software Engineering and Verification. Lecture Notes in Computer Science, vol. 7007, pp. 1–59 (2012a)
Harman, M., Mansouri, S.A., Zhang, Y.: Search-based software engineering: trends, techniques and applications. ACM Comput. Surv. 45(1), 11 (2012b)
Harman, M., Islam, S., Jia, Y., Minku, L.L., Sarro, F., Sirivisut, K.: Less is more: temporal fault predictive performance over multiple hadoop releases. In: 6th International Symposium on Search Based Software Engineering, pp. 240–246 (2014)
He, H., Garcia, E.A.: Learning from imbalanced data. IEEE Trans. Knowl. Data Eng. 21(9), 1263–1284 (2009)
Henderson-Sellers, B.: Object-Oriented Metrics, Measures of Complexity, Prentice Hall (1996)
Huang, S.J., Chiu, N.H.: Optimization of analogy weights by genetic algorithm for software effort estimation. Inf. Softw. Technol. 48(11), 1034–1045 (2006)
Huang, C.-L., Dun, J.-F.: A distributed PSO-SVM hybrid system with feature selection and parameter optimization. Appl. Soft Comput. 8(4), 1381–1391 (2008)
Khoshgoftaar, T.M., Seliya, N., Sundaresh, N.: An empirical study of predicting faults with case-based reasoning. Softw. Qual. J. 14(2), 85–111 (2006)
Koru, A.G., Tian, J.: Comparing high-change modules and modules with the highest measurement values in two large-scale open-source products. IEEE Trans. Softw. Eng. 31(8), 625–642 (2005)
Koru, A.G., Liu, H.: Identifying and characterizing change-prone classes in two large-scale open-source products. J. Syst. Softw. 80(1), 63–73 (2007)
Kpodjedo, S., Ricca, F., Galnier, P., Gueheneuc, Y.G., Antoniol, G.: Design evolution metrics for defect prediction in object-oriented systems. Empir. Softw. Eng. 16(1), 141–175 (2011)
Kubat, M., Matwin, S.: Addressing the curse of imbalanced training sets: one sided selection. In: Proceedings of 14th International Conference on Machine Learning, vol. 97, pp. 179–186 (1997)
Lessmann, S., Baesans, B., Mues, C., Pietsch, S.: Benchmarking classification models for software defect prediction: a proposed framework and novel finding. IEEE Trans. Softw. Eng. 34(4), 485–496 (2008)
Li, M., Zhang, H., Whu, R., Zhou, Z.: Sample-based software defect prediction with active and semi-supervised learning. Autom. Softw. Eng. 19(2), 201–230 (2012)
Lin, S.-W., Ying, K.-C., Chen, S.-C., Lee, Z.-J.: Particle swarm optimization for parameter determination and feature selection of support vector machines. Expert Syst. Appl. 35(4), 1817–1824 (2008)
Lin, S.-W., Chen, S.-C.: PSOLDA: a particle swarm optimization approach for enhancing classification accuracy rate of linear discriminant analysis. Appl. Soft Comput. 9, 1008–1015 (2009)
Lindvall, M.: Are large C++ classes change-prone? An empirical investigation. Softw. Pract. Exp. 28(15), 1551–1558 (1998)
Lu, H., Zhou, Y., Xu, B., Leung, H., Chen, L.: The ability of object-oriented metrics to predict change-proneness: a meta-analysis. Empir. Softw. Eng. J. 17(3), 200–242 (2012)
Malhotra, R., Khanna, M.: Investigation of relationship between object-oriented metrics and change proneness. Int. J. Mach. Learn. Cybernet. 4(4), 273–286 (2013)
Malhotra, R.: Search based techniques for software fault prediction: current trends and future directions. In: Proceedings of the 7th International Workshop on Search-Based Software Testing, pp. 35-36 (2014)
Malhotra, R., Khanna, M.: The ability of search-based algorithms to predict change-prone classes. Softw. Qual. Prof. 17(1), 17–31 (2014)
Malhotra, R., Nagpal, K., Upmanyu, P., Pritam, N.: Defect collection and reporting system for Git based open source software. In: Proceedings of International Conference on Data Mining and Intelligent Computing, pp. 1–7 (2014)
Malhotra, R.: A systematic review of machine learning techniques for software fault prediction. Appl. Soft Comput. 27, 504–518 (2015)
Malhotra, R., Khanna, M.: Software engineering predictive modeling using search-based techniques: systematic review and future directions. In: Proceedings of 1st American Search-Based Software Engineering Symposium, pp. 1–16 (2015)
Martin, R.C.: Agile Software Development: Principles, Patters, and Practices. Prentice Hall, Upper Saddle River (2002)
Martinez-Estudillo, F.J., Hevas-Martinez, C., Gutierrez, P.A., Martinez-Estudillo, A.C.: Evolutionary product-unit neural network classifiers. J. Neurocomputing 72(1–3), 548–561 (2008)
Menzies, T., Greenwald, J., Frank, A.: Data mining static code attributes to learn defect predictors. IEEE Trans. Softw. Eng. 33(1), 2–13 (2007)
Minku, L.L., Yao, X.: Software effort estimation as a multi-objective learning problem. ACM Trans. Softw. Eng. Methodol. 22(4), 35 (2013)
Misirh, A.T., Bener, A.B., Turhan, B.: An industrial case study of classifier ensembles for locating software defects. Softw. Qual. J. 19(3), 515–536 (2011)
Olague, H., Etzkorn, L., Gholston, S., Quattlebaum, S.: Empirical validation of three software metric suites to predict the fault-proneness of object-oriented classes developed using highly iterative or agile software development processes. IEEE Trans. Softw. Eng. 33(10), 402–419 (2007)
Oliveira, A.L.I., Braga, P.L., Lima, R.M.L., Cornelio, M.L.: GA-based method for feature selection and parameters optimization for machine learning regression applied to software effort estimation. Inf. Softw. Technol. 52(11), 1155–1166 (2010)
Otero, J., Sanchen, L.: Induction of descriptive fuzzy classifiers with the Logitboost Algorithm. Soft. Comput. 10(9), 825–835 (2006)
Ouni, A., Kessentini, M., Sahraoui, H., Boukadoum, M.: Maintainability defects detection and correction: a multi-objective approach. Autom. Softw. Eng. 20(1), 47–79 (2013)
Pai, G.J., Dugan, J.B.: Empirical analysis of software fault content and fault proneness using bayesian methods. IEEE Trans. Softw. Eng. 33(10), 675–686 (2007)
Ramírez, A., Romero, J.R., Ventura, S.: A comparative study of many-objective evolutionary algorithms for the discovery of software architectures. In: Empirical Software Engineering, pp. 1–55 (2015)
Rivest, R.L.: Learning decision lists. Mach. Learn. 1(2), 229–246 (1987)
Rodriguez, D., Ruiz, R., Riquelme, J.C., Aguluir-Ruiz, J.S.: Searching for rules to detect defective modules: a subgroup discovery approach. Inf. Sci. 191, 14–30 (2012)
Romano, D., Pinzger, M.: Using source code metrics to predict change-prone java interfaces. In: 27th IEEE International Conference on Software Maintenance, pp. 303–312 (2011)
Singh, Y., Kaur, A., Malhotra, R.: Empirical validation of object-oriented metrics for predicting fault proneness models. Softw. Qual. J. 18, 3–35 (2009)
Song, L., Minku, L.L., Yao, X.: The impact of parameter tuning on software effort estimation using learning machines. In: Proceedings of the 9th International Conference on Predictive Models in Software Engineering, p. 9 (2013)
Sousa, T., Silva, A., Neves, A.: Particle swarm based data mining algorithms for classification tasks. J. Parallel Comput. 30(5–6), 767–783 (2004)
Stone, M.: Cross-validatory choice and assessment of statistical predictions. J. R. Soc. Ser. A 36, 111–114 (1974)
Zhou, Y., Leung, H., Xu, B.: Examining the potentially confounding effect of class size on the associations between object metrics and change proneness. IEEE Trans. Softw. Eng. 35(5), 607–623 (2009)
Author information
Authors and Affiliations
Corresponding author
Appendix
Appendix
1.1 Descriptive statistics
This appendix presents the descriptive statistics of each data set. Tables 12, 13, 14, 15, 16, and 17 report the minimum (Min.), maximum (Max), mean (Mean), standard deviation (SD), 25 % percentile and 75 % percentile for all the OO metrics used as independent variables in the study, for each data set respectively.
Rights and permissions
About this article
Cite this article
Malhotra, R., Khanna, M. An exploratory study for software change prediction in object-oriented systems using hybridized techniques. Autom Softw Eng 24, 673–717 (2017). https://doi.org/10.1007/s10515-016-0203-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10515-016-0203-0