Abstract
In 2001 the ISBSG database was used by Jeffery et al. (Using public domain metrics to estimate software development effort. Proceedings Metrics’01, London, pp 16–27, 2001; S1) to compare the effort prediction accuracy between cross- and single-company effort models. Given that more than 2,000 projects were later volunteered to this database, in 2005 Mendes et al. (A replicated comparison of cross-company and within-company effort estimation models using the ISBSG Database, in Proceedings of Metrics’05, Como, 2005; S2) replicated S1 but obtained different results. The difference in results could have occurred due to legitimate differences in data set patterns; however, they could also have occurred due to differences in experimental procedure given that S2 was unable to employ exactly the same experimental procedure used in S1 because S1’s procedure was not fully documented. Recently, we applied S2’s experimental procedure to the ISBSG database version used in S1 (release 6) to assess if differences in experimental procedure would have contributed towards different results (Lokan and Mendes, Cross-company and single-company effort models using the ISBSG Database: a further replicated study, Proceedings of the ISESE’06, pp 75–84, 2006; S3). Our results corroborated those from S1, suggesting that differences in the results obtained by S2 were likely caused by legitimate differences in data set patterns. We have since been able to reconstruct the experimental procedure of S1 and therefore in this paper we present both S3 and also another study (S4), which applied the experimental procedure of S1 to the data set used in S2. By applying the experimental procedure of S2 to the data set used in S1 (study S3), and the experimental procedure of S1 to the data set used in S2 (study S4), we investigate the effect of all the variations between S1 and S2. Our results for S4 support those of S3, suggesting that differences in data preparation and analysis procedures did not affect the outcome of the analysis. Thus, the different results of S1 and S2 are very likely due to fundamental differences in the data sets.
Similar content being viewed by others
Notes
References
Briand LC, El-Emam K, Maxwell K, Surmann D, Wieczorek I (1999) An assessment and comparison of common cost estimation models. Proceedings of the 21st International Conference on Software Engineering, ICSE 99, pp 313–322
Briand LC, Langley T, Wieczorek I (2000) A replicated assessment of common software cost estimation techniques. Proceedings of the 22nd International Conference on Software Engineering, ICSE 20, pp 377–386
Conte SD, Dunsmore HE, Shen VY (1986) Software engineering metrics and models. Benjamin-Cummins
Cook RD (1977) Detection of influential observations in linear regression. Technometrics 19:15–18
Jeffery R, Ruhe M, Wieczorek I (2000) A comparative study of two software development cost modeling techniques using multi-organizational and company-specific data. Inf Softw Technol 42:1009–1016
Jeffery R, Ruhe M, Wieczorek I (2001) Using public domain metrics to estimate software development effort. Proceedings Metrics’01, London, pp 16–27
Kemerer CF (1987) An empirical validation of software cost estimation models. Communications ACM, 30(5)
Kitchenham BA, Mendes E (2004) A comparison of cross-company and single-company effort estimation models for Web applications, Proceedings EASE 2004, pp 47–55
Kitchenham BA, Mendes E, Travassos G (2006) A systematic review of cross- vs within-company cost estimation studies, Proceedings of EASE’06, BCS. (Available at http://ewic.bcs.org/conferences/2006/ease06/index.htm)
Kitchenham BA, Taylor NR (1984) Software cost models. ICL Tech J 73–102, May
Kirsopp C, Shepperd M (2002) Making inferences with small numbers of training sets. IEE Proc Softw 149:123–130
Lefley M, Shepperd MJ (2003) Using genetic programming to improve software effort estimation based on general data sets, Proceedings of GECCO 2003, LNCS 2724. Springer, New York, pp 2477–2487
Lokan CJ (2005) Function points. Advances in computers. In: M.V. Zelkowitz (ed), vol 65, pp 298–347, Elsevier
Lokan C, Mendes E (2006) Cross-company and single-company effort models using the ISBSG Database: a further replicated study, Proceedings of the ISESE’06, pp 75–84
Maxwell K (2002) Applied statistics for software managers. Software Quality Institute Series, Prentice-Hall, Englewood Cliffs, NJ
Maxwell K, Wassenhove LV, Dutta S (1999) Performance evaluation of general and company specific models in software development effort estimation. Manag Sci 45(6):787–803, June
Mendes E, Kitchenham BA (2004) Further comparison of cross-company and within company effort estimation models for Web applications. Proceedings Metrics’04, Chicago, Illinois September 11–17th 2004, IEEE Computer Society, pp 348–357
Mendes E, Lokan C, Harrison R, Triggs C (2005) A replicated comparison of cross-company and within-company effort estimation models using the ISBSG Database, in Proceedings of Metrics’05, Como
Tabachnick BG, Fidell LS (1996) Using multivariate statistics. Harper Collins, New York
Wieczorek I, Ruhe M (2002) How valuable is company-specific data compared to multi-company data for software cost estimation? Proceedings Metrics’02, Ottawa, pp 237–246
Acknowledgments
We would like to thank the ISBSG group for making releases 6 and 9 available for our research and all those companies that have volunteered data on their projects.
Author information
Authors and Affiliations
Corresponding author
Additional information
Editor:
José Carlo Maldonado
Rights and permissions
About this article
Cite this article
Mendes, E., Lokan, C. Replicating studies on cross- vs single-company effort models using the ISBSG Database. Empir Software Eng 13, 3–37 (2008). https://doi.org/10.1007/s10664-007-9045-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10664-007-9045-5