Abstract
To date most research in software effort estimation has not taken chronology into account when selecting projects for training and validation sets. A chronological split represents the use of a project’s starting and completion dates, such that any model that estimates effort for a new project p only uses as its training set projects that have been completed prior to p’s starting date. A study in 2009 (“S3”) investigated the use of chronological split taking into account a project’s age. The research question investigated was whether the use of a training set containing only the most recent past projects (a “moving window” of recent projects) would lead to more accurate estimates when compared to using the entire history of past projects completed prior to the starting date of a new project. S3 found that moving windows could improve the accuracy of estimates. The study described herein replicates S3 using three different and independent data sets. Estimation models were built using regression, and accuracy was measured using absolute residuals. The results contradict S3, as they do not show any gain in estimation accuracy when using windows for effort estimation. This is a surprising result: the intuition that recent data should be more helpful than old data for effort estimation is not supported. Several factors, which are discussed in this paper, might have contributed to such contradicting results. Some of our future work entails replicating this work using other datasets, to understand better when using windows is a suitable choice for software companies.
Similar content being viewed by others
Notes
Using R version 3.2.2 and relevant packages as current at January 2016.
Using the “fastbw()” function from Harrell’s “rms” package for R.
Using the “cohen.d()” function from the “effsize” package in R.
References
Amasaki S, Lokan C (2012) The effects of moving windows to software estimation: comparative study on linear regression and estimation by analogy. IWSM/Mensura 2012, Assisi
Amasaki S, Lokan C (2013) The evaluation of weighted moving windows for software effort estimation. Product-Foc Software Process Improve, LNCS 7983:214–228, Springer
Amasaki S, Lokan C (2014a) On the effectiveness of weighted moving windows: experiment on linear regression based software effort estimation. J Software: Evol Process 27(7):488–507
Amasaki S, Lokan C (2014b) The effects of moving windows on software effort estimation: comparative study with CART. Proc 6th Int Workshop Empirical Software Eng Pract, Osaka, Japan
Amasaki S, Lokan C (2014c) The effects of gradual weighting on duration-based moving windows for software effort estimation. 15th Int Conf Product-Focused Software Eng Process Improve, Helsinki, Finland: 63–77
Amasaki S, Lokan C (2015) A replication of comparative study of moving windows on linear regression and estimation by analogy. Proc 11th Int Conf Predict Models Data Anal Software Eng, Beijing, China 1–6:6–10
Amasaki S, Lokan C (2016a) Evaluation of moving window policies with CART. Proc 7th Int Workshop Empirical Software Eng Pract, Osaka, Japan
Amasaki S, Lokan C (2016b) A replication study on the effects of weighted moving windows for software effort estimation. Proc 20th Int Conf Eval Assessment Software Eng, Limerick, Ireland
Amasaki S, Takahara Y, Yokogawa T (2011) Performance evaluation of windowing approach on effort estimation by analogy. IWSM/Mensura 2011, Nara, pp 188–195
Azhar D, Mendes E, Riddle P (2012) A systematic review of Web resource estimation. Proc 8th Int Conf Predict Models Software Eng, Lund, Sweden: 49–58
Azzeh M, Cowling PI, Neagu D (2010) Software stage-effort estimation based on association rule mining and fuzzy set theory. Proc 10th Int Conf Comput Inform Technol, Bradford, UK: 249–256
Bibi S, Stamelos I, Angelis L (2008) Combining probabilistic models for explanatory productivity estimation. Inf Softw Technol 50(7–8):656–669
Bibi S, Stamelos I, Gerolimos G, Kollias V (2010) BBN based approach for improving the software development process of an SME—a case study. J Softw Maint Evol Res Pract 22(2):121–140
Britto R, Freitas V, Mendes E, Usman M (2014) Effort estimation in global software development: a systematic literature review. Proc 9th Int Conf Global Software Eng, Shanghai, China: 135–144
Britto R, Mendes E, Börstler J (2015) An empirical investigation on effort estimation in agile global software development. Proc 10th Int Conf Global Software Eng, Ciudad Real, Spain: 38–45
Carver J (2010) Towards reporting guidelines for experimental replications: a proposal. Proc 1st Int Workshop Replic Empirical Software Eng Res. Cape Town, South Africa
Cohen J (1992) A power primer. Psychol Bull 112:155–159
Cohn M (2005) Agile estimating and planning. Prentice Hall
Conte SD, Dunsmore HE, Shen VY (1986) Software engineering metrics and models. Benjamin-Cummings
Cook RD (1977) Detection of influential observations in linear regression. Technometrics 19:15–18
Fernández-Diego M, Martínez-Gómez M, Torralba-Martínez J-M (2010) Sensitivity of results to different data quality meta-data criteria in the sample selection of projects from the ISBSG dataset. Proc 6th Int Conf Predict Models Software Eng, Timisoara, Romania: 13:1–13:9.
Forselius P (2006) Data quality criteria for Experience® data collection. STTF Oy
Foss T, Stensrud E, Kitchenham B, Myrtveit I (2003) A simulation study of the model evaluation criterion MMRE. IEEE Trans Softw Eng 29(11):985–995
Han J, Kamber M (2006) Data mining concepts and techniques. Morgan Kaufmann
Jørgensen M (2004) A review of studies on expert estimation of software development effort. J Syst Softw 70(1):37–60
Jørgensen M (2005) Practical guidelines for expert-judgment-based software effort estimation. IEEE Softw 22(3):57–63
Jørgensen M (2013) Relative estimation of software development effort: it matters with what and how you compare. IEEE Softw 30(2):74–79
Jørgensen M, Grimstad S (2008) Avoiding irrelevant and misleading information when estimating development effort. IEEE Software 25(3): 78–83
Jørgensen M, Shepperd M (2007) A systematic review of software development cost estimation studies. IEEE Trans Softw Eng 33(1):33–53
Kitchenham BA, Mendes E (2009) Why comparative effort prediction studies may be invalid. Proc 5th Int Conf Predict Models Software Eng, Vancouver, Canada: 4:1–4:5
Kitchenham BA, Pickard LM, MacDonell SG, Shepperd MJ (2001) What accuracy statistics really measure. IEE Proc - Software 148(3):81–85
Kitchenham B, Pfleeger SL, McColl B, Eagan S (2002) An empirical study of maintenance and development estimation accuracy. J Syst Softw 64(1):57–77
Kitchenham BA, Mendes E, Travassos G (2007) Cross versus within-company cost estimation studies: a systematic review. IEEE Trans Softw Eng 33(5):316–329
Kocaguneli E, Menzies T, Mendes E (2014) Transfer learning in effort estimation. Empir Softw Eng 19:1–31
Lefley M, Shepperd MJ (2003) Using genetic programming to improve software effort estimation based on general data sets. LNCS 2724. Springer, Verlag, pp 2477–2487
Li YF, Xie M, Goh TN (2009) A study of the non-linear adjustment for analogy based software cost estimation. Empir Softw Eng 14:603–643
Lokan C, Mendes E (2008) Investigating the use of chronological splitting to compare software cross-company and single-company effort predictions. Proc 12th Int Conf Eval Assess Software Eng, Bari, Italy: 151–160
Lokan C, Mendes E (2009a) Using chronological splitting to compare cross- and single-company effort models: further investigation. Proc 32nd Austral Conf Comput Sci, Wellington, NZ: 47–54
Lokan C, Mendes E (2009b) Applying moving windows to software effort estimation. Proc 3rd Int Symp Empirical Software Eng Measure, Lake Buena Vista, Florida, USA: 111–122
Lokan C, Mendes E (2012) Investigating the use of duration-based moving windows to improve software effort estimation. Proc 19th Asia-Pacific Software Eng Conf, Hong Kong
Lokan C, Mendes E (2014) Investigating the use of duration-based moving windows to improve software effort prediction: a replicated study. Inf Softw Technol 56(9):1063–1075
Lopez-Martin C, Isaza C, Chavoya A (2012) Software development effort prediction of industrial projects applying a general regression neural network. Empir Softw Eng 17(6):738–756
MacDonell SG, Shepperd MG (2003) Using prior-phase effort records for re-estimation during software projects. Proc 9th IEEE Int Symp Software Metrics, Sydney, Australia
MacDonell SG, Shepperd MJ (2010) Data accumulation and software effort prediction. Proceedings of 4th International Symposium on Empirical Software Engineering and Measurement. Bolzano-Bozen, Italy
Mäntylä MV, Lassenius C, Vanhanen J (2010) Rethinking replication in software engineering: can we see the forest for the trees?. Proc 1st Int Workshop Replic Empirical Software Eng Res, Cape Town, South Africa
Maxwell K (2002) Applied statistics for software managers. Software Quality Institute Series, Prentice Hall
Mendes E (2014) Practitioner’s Knowledge representation—a pathway to improve software effort estimation. Springer, ISBN 978-3-642-54156-8
Mendes, E. and C. Lokan. 2009. Investigating the use of chronological splitting to compare software cross-company and single-company effort predictions: a replicated study. Proc 13th Int Conf Eval Assess Software Eng, Durham, UK
Mendes E, Mosley N (2008) Bayesian network models for web effort prediction: a comparative study. IEEE Trans Softw Eng 34(6):723–737
Menzies T, Krishna R, Pryor D (2016) The promise repository of empirical software engineering data; http://openscience.us/repo. North Carolina State University, Department of Computer Science
Minku LL, Yao X (2012a) Can cross-company data improve performance in software effort estimation?. Proc 8th Int Conf Predict Models Software Eng, Lund, Sweden: 69–78
Minku LL, Yao X (2012b) Using unreliable data for creating more reliable online learners. International Joint Conference on Neural Networks, Brisbane, pp 1–8
Minku LL, Sarro F, Mendes E, Ferrucci F (2015) How to make best use of cross-company data for Web effort estimation?. Proc 9th Int Symp Empirical Software Eng Measure, Beijing, China: 1–10
Premraj R, Shepperd MJ, Kitchenham BA, Forselius P (2005) An empirical analysis of software productivity over time. Proc 11th Int Symp Software Metrics, Como, Italy
Schmietendorf A, Kunz M, Dumke R (2008) Effort estimation for agile software development projects. Proceedings 5th Software Measurement European Forum, Milan, pp 113–126
Shepperd MJ, MacDonell SG (2012) Evaluating prediction systems in software project estimation. Inf Softw Technol 54(8):820–827
Shull FJ, Carver JC, Vegas S, Juristo N (2008) The role of replications in empirical software engineering. Empir Softw Eng 13:211–218
Sigweni B, Shepperd MJ, Turchi T (2016) Realistic assessment of software effort estimation models. Proc 20th Int Conf Assess Eval Software Eng, Limerick, Ireland
Song L, Minku LL, Yao X (2013) The impact of parameter tuning on software effort estimation using learning machines. Proc 9th Int Conf Predict Models Software Eng, Baltimore, USA: 9:1–9:10
Tabachnick BG, Fidell LS (1996) Using multivariate statistics. Harper-Collins
Tsunoda M, Amasaki S, Lokan C (2013) How to treat timing information for software effort estimation?. Proc 2013 Int Conf Software Syst Process, San Francisco, USA:10–19
Turhan B (2012) On the dataset shift problem in software engineering prediction models. Empir Softw Eng 17:62–74
Usman M, Mendes E, Weidt F, Britto R (2014) Effort estimation in agile software development: a systematic literature review. Proc 10th Int Conf Predict Models Software Eng, Turin, Italy: 82–91
Acknowledgments
We thank Pekka Forselius for making the Finnish data set available to us for this research.
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by: Martin Shepperd
Appendix
Appendix
Tables 6, 7, and 8 present in full numerical detail the information that is plotted in Figs. 7a and b, 8a and b, and 9a and b. In each table, the first column shows the window size. The second column shows the number of projects for which the use of a window of that size could make a difference to the estimate, compared to using the growing portfolio. The third column shows the value of MAE across all of those projects, when a window is used. The fourth column shows the value of MAE for the same set of projects, when a window is not used and instead the training set always contains all projects completed so far. The fifth column shows the difference between columns 3 and 4; a positive number means that MAE is worse when a window is used compared to retaining all data, and a negative number indicates that MAE is better when a window is used compared to retaining all data. The sixth column presents the difference in MAE (the fifth column) as a percentage of the MAE without a window (the fourth column). The seventh column shows the p-value when the paired-samples two- sided Wilcoxon test was used to test the hypothesis that MAE with a window differed from MAE with the growing portfolio; values below 0.00055 indicate a statistically significant difference for that test (applying the Holm-Bonferroni correction to the overall significance level of 0.05). The final column shows the effect size r, calculated from Cohen’s d statistic Footnote 6 (Cohen 1992): r = d / sqrt(d 2 = 4). Effect size is considered small if it is below about 0.2, medium at about 0.5, and large above about 0.8 (Cohen 1992; Shepperd and MacDonell 2012).
Rights and permissions
About this article
Cite this article
Lokan, C., Mendes, E. Investigating the use of moving windows to improve software effort prediction: a replicated study. Empir Software Eng 22, 716–767 (2017). https://doi.org/10.1007/s10664-016-9446-4
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10664-016-9446-4