Abstract
Parametric software effort estimation models usually consists of only a single mathematical relationship. With the advent of software repositories containing data from heterogeneous projects, these types of models suffer from poor adjustment and predictive accuracy. One possible way to alleviate this problem is the use of a set of mathematical equations obtained through dividing of the historical project datasets according to different parameters into subdatasets called partitions. In turn, partitions are divided into clusters that serve as a tool for more accurate models. In this paper, we describe the process, tool and results of such approach through a case study using a publicly available repository, ISBSG. Results suggest the adequacy of the technique as an extension of existing single-expression models without making the estimation process much more complex that uses a single estimation model. A tool to support the process is also presented.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Boehm B, Abts C, Chulani S. Software development cost estimation approaches — A survey. USC Center for Software Engineering Technical Report USC-CSE-2000-505, 2000.
Parametric Estimating Initiative. Parametric Estimating Handbook, 2nd Edition, 1999.
Stensrud E, Foss T, Kitchenham B, Myrtveit I. An empirical validation of the relationship between the magnitude of relative error and project size. In Proc. the Eighth IEEE Symp. Software Metrics, Ottawa, Canada, 2002, pp.3–12.
Cuadrado-Gallego J J, Sicilia M A, Garre M et al. An empirical study of process-related attributes in segmented software cost-estimation relationships. Journal of Systems and Software, 2006, 79(3): 351–361.
Shepperd M, Schofield C, Kitchenham B. Effort estimation using analogy. In Proc. 8th Int. Conf. Software Engineering, IEEE Computer Society Press, Berlin, 1996, pp.170–178.
Xu Z, Khoshgoftaar T. Identification of fuzzy models of software cost estimation. Fuzzy Sets and Systems, 2004, 145(1): 141–163.
Pedrycz W, Succi G. Genetic granular classifiers in modeling software quality. The Journal of Systems and Software, 2002, 76(3): 277–285.
Dick S, Meeks A, Last M et al. Data mining in software metrics databases. Fuzzy Sets and Systems, 2004, 145(1): 81–110.
Lung C H, Zaman M, Nandi A. Applications of clustering techniques to software partitioning, recovery and restructuring. Journal of Systems and Software, 2004, 73(2): 227–244.
Dolado J. On the problem of the software cost function. Information and Software Technology, 2001, 43(1): 61–72.
Shepperd M, Schofield C. Estimating software project effort using analogies. IEEE Trans. Software Engineering, 1997, 23(11): 736–743.
Oligny S, Bourque P, Abran A, Fournier B. Exploring the relation between effort and duration in software engineering project. In Proc. World Computer Congress, Beijing, China, August 21–25, 2000, pp.175–178.
Marquardt W. An algorithm for least squares estimation of non-linear parameters. J. Soc. Indust. Appl. Math., 1963, 11: 431–441.
Conte S D, Dunsmore H E, Shen V Y. Software Engineering Metrics and Models. Menlo Park: Benjamin/Cummings, CA, 1986.
Kohavi R, John G. Automatic parameter selection by minimizing estimated error. In Proc. 12th Int. Conf. Machine Learning, San Francisco, 1995, pp.304–312.
Witten I H, Frank E. Data Mining, Practical Machine Learning Tools and Techniques with Java Implementations. San Francisco: Morgan Kaufmann Publishers, USA, 2005.
NESMA. NESMA FPA counting practices manual (CPM 2.0), 1996.
Dreger J B. Function Point Analysis. Englewood Cliffs, NJ: Prentice Hall, 1989.
Author information
Authors and Affiliations
Corresponding author
Additional information
This work is supported by the Spanish Ministry of Science and Technology under Grant No. CICYT TIN2004-06689-C03.
Electronic supplementary material
Rights and permissions
About this article
Cite this article
Gallego, J.J.C., Rodríguez, D., Sicilia, M.Á. et al. Software Project Effort Estimation Based on Multiple Parametric Models Generated Through Data Clustering. J Comput Sci Technol 22, 371–378 (2007). https://doi.org/10.1007/s11390-007-9043-5
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11390-007-9043-5