Abstract
Accurate estimation of software development effort estimation (SDEE) is fundamental for efficient management of software development projects as it assists software managers to efficiently manage their human resources. Over the last four decades, while software engineering researchers have used several effort estimation techniques, including those based on statistical and machine learning methods, no consensus has been reached on the technique that can perform best in all circumstances. To tackle this challenge, Ensemble Effort Estimation, which predicts software development effort by combining more than one solo estimation technique, has recently been investigated. In this paper, heterogeneous ensembles based on four well-known machine learning techniques (K-nearest neighbor, support vector regression, multilayer perceptron and decision trees) were developed and evaluated by investigating the impact of parameter values of the ensemble members on estimation accuracy. In particular, this paper evaluates whether setting ensemble parameters using two optimization techniques (e.g., grid search optimization and particle swarm) permits more accurate estimates of SDEE. The heterogeneous ensembles of this study were built using three combination rules (mean, median and inverse ranked weighted mean) over seven datasets. The results obtained suggest that: (1) Optimized single techniques using grid search or particle swarm optimization provide more accurate estimation; (2) in general ensembles achieve higher accuracy than their single techniques whatever the optimization technique used, even though ensembles do not dominate over all single techniques; (3) heterogeneous ensembles based on optimized single techniques provide more accurate estimation; and (4) generally, particle swarm optimization and grid search techniques generate ensembles with the same predictive capability.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Albrecht AJ, Gaffney JE (1983) Software function, source lines of code, and development effort prediction: a software science validation. IEEE Trans Softw Eng SE–9:639–648. https://doi.org/10.1109/TSE.1983.235271
Altman NS (1992) An introduction to kernel and nearest-neighbor nonparametric regression. Am Stat 46:175–185. https://doi.org/10.1080/00031305.1992.10475879
Amazal FA, Idri A, Abran A (2014a) Software development effort estimation using classical and fuzzy analogy: a cross-validation comparative study. Int J Comput Intell Appl 13:1450013. https://doi.org/10.1142/S1469026814500138
Amazal FA, Idri A, Abran A (2014b) An analogy-based approach to estimation of software development effort using categorical data. In: Joint conference of the international workshop on software measurement and the international conference on software process and product measurement, pp 252–262
Araújo RDA, De Oliveira ALI, Soares S (2010) Hybrid intelligent design of morphological-rank-linear perceptrons for software development cost estimation. In: Proceedings of international conference of tools with artif intell ICTAI, vol 1, pp 160–167. https://doi.org/10.1109/ICTAI.2010.30
Azhar D, Riddle P, Mendes E, et al (2013) Using ensembles for web effort estimation. In: 2013 ACM/IEEE international symposium on empirical software engineering and measurement, pp 173–182
Azzeh M, Nassif AB, Minku LL (2015) An empirical evaluation of ensemble adjustment methods for analogy-based effort estimation. J Syst Softw 103:36–52. https://doi.org/10.1016/j.jss.2015.01.028
Barcelos Tronto IF, da Silva JDS, Sant’Anna N (2008) An investigation of artificial neural networks based prediction systems in software project management. J Syst Softw 81:356–367. https://doi.org/10.1016/j.jss.2007.05.011
Baskeles B, Turhan B, Bener A (2007) Software effort estimation using machine learning methods. In: Proceedings of the 22nd international symposium on computer and information sciences, pp 1–6
Berlin S, Raz T, Glezer C, Zviran M (2009) Comparison of estimation methods of cost and duration in IT projects. Inf Softw Technol 51:738–748
Bibi S, Stamelos I, Angelis L (2008) Combining probabilistic models for explanatory productivity estimation. Inf Softw Technol 50:656–669. https://doi.org/10.1016/j.infsof.2007.06.004
Bisognin D a, Douches DS, Jastrzebski K, Kirk WW (2002) Half-sib progeny evaluation and selection of potatoes resistant to the US8 genotype of Phytophthora infestans from crosses between resistant and susceptible parents. Euphytica 125:129–138. https://doi.org/10.1023/A:1015763207980
Boehm B (1984) Software engineering economics. IEEE Trans Softw Eng 10:4–21
Boeringer DW, Werner DH, Member S (2004) Particle swarm optimization versus genetic algorithms for phased array synthesis. IEEE Trans Antennas Propag 52:771–779
Bony S, Pichon N, Ravel C et al (2001) The relationship between mycotoxin synthesis and isolate morphology in fungal endophytes of Lolium perenne. New Phytol 152:125–137. https://doi.org/10.1046/J.0028-646x.2001.00231.X
Booba B, Gopal TV (2013) Comparison of ant colony optimization & particle swarm optimization in grid environment. Int J Adv Res Comput Sci Appl 1:27–33
Borges L, Ferreira D (2003) Power and type I errors rate of Scott–Knott, Tukey and Newman–Keuls tests under normal and no-normal distributions of the residues. Rev Mat Estat 21:67–83
Box GEP, Cox DR (1964) An analysis of transformations. J R Stat Soc 26:211–252
Braga P, Oliveira A, Ribeiro G, Meira S (2007a) Bagging predictors for estimation of software project effort. In: Proceedings of international joint conference on neural networks, pp 14–19
Braga PL, Oliveira ALI, Meira SRL (2007b) Software effort estimation using machine learning techniques with robust confidence intervals. In: 7th international conference on hybrid intelligent systems (HIS 2007), pp 352–357
Breiman L (1996) Bagging predictors. Mach Learn 26:123–140. https://doi.org/10.1023/A:1018054314350
Brooks Jr FP (1975) The mythical man-month: essays on software engineering. Addison Wesley Longman, Inc, United States, Boston
Burgess CJ, Lefley M, Le M (2001) Can genetic programming improve software effort estimation? A comparative evaluation. Inf Softw Technol 43:863–873. https://doi.org/10.1016/S0950-5849(01)00192-6
Byrne BM (2009) Structural equation modeling with AMOS. Mahwah, New York
Calinski T, Corsten LCA (1985) Clustering means in ANOVA by simultaneous testing. Biometrics 41:39–48
Chandra A, Yao X (2006) Ensemble learning using multi-objective evolutionary algorithms. J Math Model Algorithms 5:417–445. https://doi.org/10.1007/s10852-005-9020-3
Chen KH, Wang KJ, Wang KM, Angelia MA (2014) Applying particle swarm optimization-based decision tree classifier for cancer classification on gene expression data. Appl Soft Comput J 24:773–780. https://doi.org/10.1016/j.asoc.2014.08.032
Cohen J (1992) A power primer. Psychol Bull 112:155–159. https://doi.org/10.1037/0033-2909.112.1.155
Conte SD, Dunsmore HE, Shen YE (1986) Software engineering metrics and models. Benjamin-Cummings Publishing Co., Inc, Redwood City
Cox DR, Spjøtvoll E (1982) On partitioning means into groups. Scand J Stat 9:147–152
Cristianini N, Shawe-Taylor J (2000) An introduction to support vector machines and other kernel-based learning methods. Cambridge University Press, UK
Das H, Jena AK, Nayak J, et al (2014) A novel PSO based back propagation learning-MLP (PSO-BP-MLP) for Classification. In: Proceedings of the international conference on IEEE symposium on computational intelligence and data mining, 20–21 December 2014
Deharnais J (1989) Analyse statistique de la productivitie des projects de development en informatique apartir de la technique des points des fontion. Quebec university
Elish MO (2013) Assessment of voting ensemble for estimating software development effort. In: IEEE symposium on computational intelligence and data mining, Singapore, pp 316–321
Elish MO, Helmy T, Hussain MI (2013) Empirical study of homogeneous and heterogeneous ensemble models for software development effort estimation. Math Prob Eng. https://doi.org/10.1155/2013/312067
Finnie GR, Wittig GE, Desharnais J-M (1997) A comparison of software effort estimation techniques: using function points with neural networks, case-based reasoning and regression models. J Syst Softw 39:281–289
Fonseca CM, Fleming PJ (1993) Genetic algorithms for multiobjective optimization: formulation, discussion and generalization. In: Proceedings of the 5th international conference on genetic algorithms, pp 416–423
Foss T, Myrtveit I, Stensrud E (2001) MRE and heteroscedasticity?: An empirical validation of the assumption of homoscedasticity of the magnitude of relative error. In: ESCOM, 12th european software control and metrics conference, Netherlands, pp 157–164
Foss T, Stensrud E, Kitchenham B, Myrtveit I (2003) A simulation study of the model evaluation criterion MMRE. IEEE Trans Softw Eng 29:985–995. https://doi.org/10.1109/TSE.2003.1245300
Freund Y, Schapire RE (1995) A desicion-theoretic generalization of on-line learning and an application to boosting. In: Computational learning theory, pp 23–37
Göndör M, Bresfelean VP (2012) REPTree and M5P for measuring fiscal policy influences on the Romanian capital market during 2003–2010. Int J Math Comput Simul 6:378–386
Gray AR, MacDonell SG (1997) A comparison of techniques for developing predictive models of software metrics. Inf Softw Technol 39:425–437. https://doi.org/10.1016/S0950-5849(96)00006-7
Hassan R, Cohanim B, De Weck O et al (2005) A comparison of particle swarm optimization and the genetic algorithm. AIAA Pap 2005–1897:1–13
Heiat A (2002) Comparison of artificial neural network and regression models for estimating software development effort. Inf Softw Technol 44:911–922. https://doi.org/10.1016/S0950-5849(02)00128-3
Ho TK (1998) The random subspace method for constructing decision forests. IEEE Trans Pattern Anal Mach Intell 20:832–844. https://doi.org/10.1109/34.709601
Ho TK (2005) Nearest neighbors in random subspaces. Adv Pattern Recognit. https://doi.org/10.1007/BFb0033288
Hosni M, Idri A (2017) Software effort estimation using classical analogy ensembles based on random subspace. In: Proceedings of the ACM symposium on applied computing
Hsu C-J, Rodas NU, Huang C-Y, Peng K-L (2010) A study of improving the accuracy of software effort estimation using linearly weighted combinations. In: Proceedings of the 34th IEEE annual computer software and applications conference workshops, Seoul, pp 98–103
Huang CL, Wang CJ (2006) A GA-based feature selection and parameters optimizationfor support vector machines. Expert Syst Appl 31:231–240. https://doi.org/10.1016/j.eswa.2005.09.024
Hughes RT (1996) Expert judgement as an estimating method. Inf Softw Technol 38:67–75. https://doi.org/10.1016/0950-5849(95)01045-9
Idri A, Abran A, Kjiri L (2000) COCOMO cost model using fuzzy logic. In: Proceedings of the 7th international conference on fuzzy theory & techniques. Atlantic, New Jersey, pp 1–4
Idri A, Amazal FA (2012) Software cost estimation by fuzzy analogy for ISBSG repository. In: Proceedings of the 10th international FLINS conference on uncertainty modeling in knowledge engineering and decision making, Istanbul, Turkey
Idri A, Amazal FA, Abran A (2015a) Analogy-based software development effort estimation: a systematic mapping and review. Inf Softw Technol 58:206–230. https://doi.org/10.1016/j.infsof.2014.07.013
Idri A, azzahra Amazal F, Abran A (2015) Accuracy comparison of analogy-based software development effort estimation techniques. Int J Intell Syst. https://doi.org/10.1002/int
Idri A, Hosni M, Abran A (2016) Systematic literature review of ensemble effort estimation. J Syst Softw 118:151–175. https://doi.org/10.1016/j.jss.2016.05.016
Idri A, Hosni M, Abran A (2016) Improved estimation of software development effort using classical and fuzzy analogy ensembles. Appl Soft Comput. https://doi.org/10.1016/j.asoc.2016.08.012
Idri A, Hosni M, Abran A (2016b) Systematic mapping study of ensemble effort estimation. In: Proceedings of the 11th international conference on evaluation of novel software approaches to software engineering, pp 132–139
Idri A, Khoshgoftaar TM, Abran A (2002) Can neural networks be easily interpreted in software cost estimation? World Congr Comput Intell. https://doi.org/10.1109/FUZZ.2002.1006668
Jeffery R, Ruhe M, Wieczorek I (2001) Using public domain metrics to estimate software development effort. In: Seventh international software metrics symposium, METRICS 2001, pp 16–27
Jolliffe IT (1975) Cluster analysis as multiple comparison method. In: Applied statistics, Proceedings of conference at Dalhousie University. North Holland, pp 159–168
Jorgensen M, Shepperd M (2007) A systematic review of software development cost estimation studies. IEEE Trans Softw Eng 33:33–53. https://doi.org/10.1109/TSE.2007.256943
Kalmegh S (2015) Analysis of WEKA data mining algorithm REPTree, simple cart and randomtree for classification of indian news. Int J Innov Sci Eng Technol 2:438–446
Kemerer CF (1987) An empirical validation of software cost estimation models. Commun ACM 30:416–429. https://doi.org/10.1145/22899.22906
Kennedy J, Eberhart R (1995) Particle swarm optimization. In: Proceedings of IEEE international conference on neural networks, vol 4, pp 1942–1948
Kitchenham B, Pickard LM, MacDonell SG, Shepperd MJ (2001) What accuracy statistics really measure. IEE Proc Softw 148:81. https://doi.org/10.1049/ip-sen:20010506
Kocaguneli E, Kultur Y, Bener AB (2009) Combining multiple learners induced on multiple datasets for software effort prediction. In: Proceedings of international symposium on software reliability engineering
Kocaguneli E, Menzies T (2013) Software effort models should be assessed via leave-one-out validation. J Syst Softw 86:1879–1890. https://doi.org/10.1016/j.jss.2013.02.053
Kocaguneli E, Menzies T, Keung JW (2012) On the value of ensemble effort estimation. IEEE Trans Softw Eng 38:1403–1416. https://doi.org/10.1109/TSE.2011.111
Konak A, Coit DW, Smith AE (2006) Multi-objective optimization using genetic algorithms: a tutorial. Reliab Eng Syst Saf 91:992–1007. https://doi.org/10.1016/j.ress.2005.11.018
Korte M, Port D (2008) Confidence in software cost estimation results based on MMRE and PRED. In: Proceedings of 4th international workshop on predictor models in software engineering, pp 63–70. https://doi.org/10.1145/1370788.1370804
Kuncheva LI, Rodríguez JJ, Plumpton CO et al (2010) Random subspace ensembles for fMRI classification. Lect Notes Comput Sci 29:531–542. https://doi.org/10.1109/TMI.2009.2037756
Lilliefors HW (1967) On the Kolmogorov–Smirnov test for normality with mean and variance unknown. J Am Stat Assoc 62:399–402. https://doi.org/10.1080/01621459.1967.10482916
Liu Y, Yao X (1999) Ensemble learning via negative correlation. Neural Netw 12:1399–1404. https://doi.org/10.1016/S0893-6080(99)00073-8
Lokan C, Wright T, Hill P, Stringer M (2001) Organizational benchmarking using the ISBSG data repository. IEEE Softw 18:26–32. https://doi.org/10.1109/52.951491
Ma X, Zhang Y, Wang Y (2015) Performance evaluation of kernel functions based on grid search for support vector regression. In: 2015 IEEE 7th international conference on cybernetics and intelligent systems (CIS) and IEEE conference on robotics, automation and mechatronics (RAM), pp 283–288
Mansour Y (1997) Pessimistic decision tree pruning based on tree size. In: Proceedings on 14th international conference on machine learning, pp 195–201
Mendes E, Watson I, Triggs C, et al (2002) A comparison of development effort estimation techniques for Web hypermedia applications. In: Proceedings on international software metrics symposium, pp 131–140
Menzies T, Caglayan B, Kocaguneli E, et al (2012) The promise repository of empirical software engineering data. terapromise.csc.ncsu.edu
Menzies T, Chen Z, Hihn J, Lum K (2006) Selecting best practices for effort estimation. IEEE Trans Softw Eng 32:883–895. https://doi.org/10.1109/TSE.2006.114
Minku LL, Yao X (2013) Software effort estimation as a multiobjective learning problem. ACM Trans Softw Eng Methodol 22:35:1–35:32
Minku LL, Yao X (2013b) Ensembles and locality: insight on improving software effort estimation. Inf Softw Technol 55:1512–1528. https://doi.org/10.1016/j.infsof.2012.09.012
Minku LL, Yao X (2013c) An analysis of multi-objective evolutionary algorithms for training ensemble models based on different performance measures in software effort estimation. In: Proceedings of the 9th international conference on predictive models in software engineering—PROMISE ’13, pp 1–10
Minku LL, Yao X (2013d) Ensembles and locality: insight on improving software effort estimation. Inf Softw Technol 55:1512–1528. https://doi.org/10.1016/j.infsof.2012.09.012
Mittas N, Angelis L (2013) Ranking and clustering software cost estimation models through a multiple comparisons algorithm. IEEE Trans Softw Eng 39:537–551. https://doi.org/10.1109/TSE.2012.45
Mittas N, Mamalikidis I, Angelis L (2015) A framework for comparing multiple cost estimation methods using an automated visualization toolkit. Inf Softw Technol 57:310–328. https://doi.org/10.1016/j.infsof.2014.05.010
Miyazaki Y (1991) Method to estimate parameter values in software prediction models. Inf Softw Technol 33:239–243. https://doi.org/10.1016/0950-5849(91)90139-3
Miyazaki Y, Terakado M, Ozaki K (1994) Robust regression for developing software estimation models. J Syst Softw 27:3–16. https://doi.org/10.1016/0164-1212(94)90110-4
Myrtveit I, Stensrud E, Shepperd M (2005) Reliability and validity in comparative studies of software prediction models. IEEE Trans Softw Eng 31:380–391. https://doi.org/10.1109/TSE.2005.58
Nassif AB, Azzeh M, Capretz LF, Ho D (2015) Neural network models for software development effort estimation: a comparative study. Neural Comput Appl. https://doi.org/10.1007/s00521-015-2127-1
Oliveira ALI (2006) Estimation of software project effort with support vector regression. Neurocomputing 69:1749–1753. https://doi.org/10.1016/j.neucom.2005.12.119
Oliveira ALI, Braga PL, Lima RMF, Cornélio ML (2010) GA-based method for feature selection and parameters optimization for machine learning regression applied to software effort estimation. Inf Softw Technol 52:1155–1166. https://doi.org/10.1016/j.infsof.2010.05.009
Pendharkar PC, Subramanian GH, Rodger JA (2005) A probabilistic model for predicting software development effort. IEEE Trans Softw Eng 31:615–624. https://doi.org/10.1109/TSE.2005.75
Putnam LH (1978) A general empirical solution to the macro software sizing and estimating problem. IEEE Trans Softw Eng 4:345–361. https://doi.org/10.1109/TSE.1978.231521
Quenouille AMH (1956) Notes on bias in estimation. Biometrika 43:353–360. https://doi.org/10.1093/biomet/43.3-4.353
Quinlan JR (1986) Induction of decision trees. Mach Learn. https://doi.org/10.1023/A:1022643204877
Quinlan JR (1993) C4.5: program for machine learning. Morgan Kaufmann, Burlington
Sadri J, Suen CY, Bui TD (2003) Application of support vector machines for recognition of handwritten Arabic/Persian digits. In: Second conference on machine vision and image processing & applications (MVIP 2003), pp 300–307
Schapire RE (1990) The strength of weak ties. J Mach Learn 1:197–227. https://doi.org/10.1023/A:1022648800760
Scott AJ, Knott M (1974) A cluster analysis method for grouping means in the analysis of variance. Biometrics 30:507–512
Sharma J, Zettler LW, Van Sambeek JW et al (2003) Symbiotic seed germination and mycorrhizae of federally threatened platanthera praeclara (orchidaceae). Am Midl Nat 149:104–120. https://doi.org/10.1674/0003-0031(2003)149
Shepperd M, MacDonell S (2012) Evaluating prediction systems in software project estimation. Inf Softw Technol 54:820–827. https://doi.org/10.1016/j.infsof.2011.12.008
Shepperd M, Schofield C (1997) Estimating software project effort using analogies. IEEE Trans Softw Eng 23:736–743. https://doi.org/10.1109/32.637387
Shepperd MJ, Kadoda G (2001) Comparing software prediction techniques using simulation. IEEE Trans Softw Eng 27:1014–1022. https://doi.org/10.1109/32.965341
Shi Y, Eberhart R (1998) A modified particle swarm optimizer. In: 1998 IEEE international conference on evolutionary computation proceedings. IEEE world congress on computational intelligence (Cat. No. 98TH8360), pp 69–73
Simon H (1999) Neural networks: a comprehensive foundation, 2nd edn. MacMillan Publishing Company, Basingstoke
Song L, Minku LL, Yao X (2013) The impact of parameter tuning on software effort estimation using learning machines. In: Proceedings of the 9th international conference on predictive models in software engineering
Srinivasan K, Fisher D (1995) Machine learning approaches to estimating software development effort. IEEE Trans Softw Eng 21:126–137. https://doi.org/10.1109/32.345828
Tong S, Koller D (2002) Support vector machine active learning with applications to text classification. J Mach Learn Res 2:45–66. https://doi.org/10.1162/153244302760185243
Tsoumakas G, Angelis L, Vlahavas I (2005) Selective fusion of heterogeneous classifiers. Intell Data Anal 9:511–525
Vapnik V (1992) Principles of risk minimization for learning theory. In: Advances in neural information processing systems, pp 831–838
Vapnik V, Bottou L (1993) Local algorithms for pattern recognition and dependencies estimation. Neural Comput 5:893–909
Vinaykumar K, Ravi V, Carr M (2009) Software cost estimation using soft computing approaches. In: Handbook of research on machine learning applications and trends. IGI-global, pp 499–518
W. N. Haizan W. M, Mohd Najib Mohd S, Abdul Halim O (2012) A comparative study of Reduced Error Pruning method in decision tree algorithms. In: Proceedings—2012 IEEE international conference on control system, computing and engineering, ICCSCE 2012. pp 392–397
Wen J, Li S, Lin Z et al (2012) Systematic literature review of machine learning based software development effort estimation models. Inf Softw Technol 54:41–59. https://doi.org/10.1016/j.infsof.2011.09.002
Witten I, Frank E (2005) Data mining: practical machine learning tools and techniques. Morgan Kaufmann Publishers, Inc, San Francisco, USA
Xiao T, Ren D, Lei S et al (2014) Based on grid-search and PSO parameter optimization for support vector machine. In: 11th world congress on intelligent control and automation (WCICA). IEEE, pp 1529–1533
Zhao Y, Zhang Y (2008) Comparison of decision tree methods for finding active objects. Adv Space Res 41:1955–1959. https://doi.org/10.1016/j.asr.2007.07.020
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors have no conflict of interest.
Additional information
Communicated by S. Deb, T. Hanne, K.C. Wong.
Rights and permissions
About this article
Cite this article
Hosni, M., Idri, A., Abran, A. et al. On the value of parameter tuning in heterogeneous ensembles effort estimation. Soft Comput 22, 5977–6010 (2018). https://doi.org/10.1007/s00500-017-2945-4
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00500-017-2945-4