skip to main content
10.1145/1370788.1370796acmconferencesArticle/Chapter ViewAbstractPublication PagespromiseConference Proceedingsconference-collections
research-article

An empirical analysis of software effort estimation with outlier elimination

Published: 12 May 2008 Publication History

Abstract

Accurate software effort estimation has always been challenge for software engineering communities. To improve the estimation accuracy of software effort, many studies have focused on effort estimation methods without any consideration of data quality, although data quality is one of important factors to impact to the estimation accuracy. In this paper, we investigate the influence of outlier elimination upon the accuracy of software effort estimation through empirical studies applying two outlier elimination methods(Least trimmed square and K-means clustering) and three effort estimation methods( Least squares, Neural network and Bayesian network) associatively. The empirical studies are performed using two industry data sets(the ISBSG Release 9 and the Bank data set which consists of the project data performed in a bank in Korea) with or without outlier elimination.

References

[1]
International software benchmarking standards group. http://www.isbsg.org, 2005.
[2]
V. Barret and T. Lewis. Outliers in Statistical Data. Wiley Series in Probability and Statistics, 1994.
[3]
V. Chan and W. Wong. Outlier elimination in construction of software metric models. Proceedings of the 22nd ACM Symposium on Applied Computing, pages 1484--1488, 2007.
[4]
S. Chulani, B. Boehm, and B. Steece. Bayesian analysis of empirical software engineering cost models. IEEE Transactions on Software Engineering, 25(4):573--583, 1999.
[5]
S. Conte, H. Dunsmore, and V. Shen. Software Eng. Metrics and Models. Benjamin/Cummings Publishing Company, 1986.
[6]
I. de Barcelos Tronto, J. da Silva, and N. Sant'Anna. Comparison of artificial neural network and regression models in software effort estimation. International Joint Conference on Neural Networks, pages 771--776, 2007.
[7]
T. Foss, E. Stensrud, B. Kitchenham, and I. Myrtveit. A simulation study of the model evaluation criterion mmre. IEEE Transactions on Software Engineering, 29(11):985--995, 2003.
[8]
A. Gray and S. MacDonell. A comparison of techniques for developing predictive models of software metrics. Information and Software Technology, 39(6):425--437, 1997.
[9]
L. Hamilton. Regression with Graphics, A Second Course in Applied Statistics. Duxbury Press, 1992.
[10]
J. Han and M. Kamber. Data Mining: Concepts and Techniques. Morgan Kaufmann, 2006.
[11]
J. Heaton. Introduction to Neural Networks with Java. Heaton Research, Inc, 2005.
[12]
M. Jorgensen. Experience with the accuracy of software maintenance task effort prediction models. IEEE Transactions on Software Engineering, 21(8):674--681, 1995.
[13]
B. Kitchenham, S. MacDonell, L. Pickard, and M. Shepperd. Assessing prediction systems. The Information Science Discussion Paper Series, University of Otago, 1999.
[14]
S. Lamrous and M. Taileb. Divisive hierarchical k-means. International Conference on Intelligent Agents, Web Technologies and Internet Commerce, page 18, 2006.
[15]
E. Mendes, C. Lokan, R. Harrison, and C. Triggs. A replicated comparison of cross-company and within-company effort estimation models using the isbsg database. 11th IEEE International Software Metrics Symposium, page 36, 2005.
[16]
M. Mendes and A. Pala. Type i error rate and power of three normality tests. Pakistan Journal of Information and Technology, 2(2):135--139, 2003.
[17]
P. Pendharkar, G. Subramanian, and J. Rodger. A probabilistic model for predicting software development effort. IEEE Transactions on Software Engineering, 31(7):615--624, 2005.
[18]
P. Rousseeuw. Least median of squares regression. Journal of American Statistical Association, 79(388):871--880, 1984.
[19]
P. Rousseeuw and A. Leroy. Robust Regression and Outlier Detection. John Wiley and Sons, Inc, 1987.
[20]
P. Rousseeuw and K. van Driessen. Computing lts regression for large data sets. Data Mining and Knowledge Discovery, 12(1):29--45, 2006.
[21]
Q. Song and M. Shepperd. A new imputation method for small software project data sets. Journal of Systems and Software, 80(1):51--62, 2007.
[22]
T. H. Song, K. A. Yoon, and D. H. Bae. An approach to probabilistic effort estimation for military avionics software maintenance by considering structural characteristics. Asia-Pacific Software Engineering Conference, pages 406--413, 2007.

Cited By

View all
  • (2025)The role of surprisal in issue trackersEmpirical Software Engineering10.1007/s10664-024-10587-w30:1Online publication date: 1-Feb-2025
  • (2024)Fine-SE: Integrating Semantic Features and Expert Features for Software Effort EstimationProceedings of the IEEE/ACM 46th International Conference on Software Engineering10.1145/3597503.3623349(1-12)Online publication date: 20-May-2024
  • (2023)Improving Analogy-Based Software Cost Estimation Based on Nature-Inspired Algorithms for Feature Weighting: An Empirical Study2023 14th International Conference on Computing Communication and Networking Technologies (ICCCNT)10.1109/ICCCNT56998.2023.10307915(1-5)Online publication date: 6-Jul-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
PROMISE '08: Proceedings of the 4th international workshop on Predictor models in software engineering
May 2008
108 pages
ISBN:9781605580364
DOI:10.1145/1370788
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 May 2008

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. effort estimation
  2. outlier elimination
  3. software data quality

Qualifiers

  • Research-article

Conference

ICSE '08
Sponsor:

Acceptance Rates

PROMISE '08 Paper Acceptance Rate 13 of 16 submissions, 81%;
Overall Acceptance Rate 98 of 213 submissions, 46%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)9
  • Downloads (Last 6 weeks)1
Reflects downloads up to 14 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2025)The role of surprisal in issue trackersEmpirical Software Engineering10.1007/s10664-024-10587-w30:1Online publication date: 1-Feb-2025
  • (2024)Fine-SE: Integrating Semantic Features and Expert Features for Software Effort EstimationProceedings of the IEEE/ACM 46th International Conference on Software Engineering10.1145/3597503.3623349(1-12)Online publication date: 20-May-2024
  • (2023)Improving Analogy-Based Software Cost Estimation Based on Nature-Inspired Algorithms for Feature Weighting: An Empirical Study2023 14th International Conference on Computing Communication and Networking Technologies (ICCCNT)10.1109/ICCCNT56998.2023.10307915(1-5)Online publication date: 6-Jul-2023
  • (2023)Outlier Elimination Technique Using Deletion‐Imputation Iteration for Fault‐Prone Module DetectionIEEJ Transactions on Electrical and Electronic Engineering10.1002/tee.2388918:10(1653-1663)Online publication date: 9-Aug-2023
  • (2022)Adaptive Discretization Using Golden Section to Aid Outlier Detection for Software Development Effort EstimationIEEE Access10.1109/ACCESS.2022.320014910(90369-90387)Online publication date: 2022
  • (2022)Feature Selection Using Information Gain for Software Effort Prediction Using Neural Network ModelData, Engineering and Applications10.1007/978-981-19-4687-5_14(177-198)Online publication date: 12-Oct-2022
  • (2021)Influence of Outliers on Estimation Accuracy of Software Development EffortIEICE Transactions on Information and Systems10.1587/transinf.2020MPP0005E104.D:1(91-105)Online publication date: 1-Jan-2021
  • (2020)A Productivity Optimising Model for Improving Software Effort EstimationSoftware Engineering Perspectives in Intelligent Systems10.1007/978-3-030-63322-6_62(735-746)Online publication date: 16-Dec-2020
  • (2019)ExperienceJournal of Data and Information Quality10.1145/332874611:4(1-38)Online publication date: 19-Aug-2019
  • (2019)Software Effort Interval Prediction via Bayesian Inference and Synthetic Bootstrap ResamplingACM Transactions on Software Engineering and Methodology10.1145/329570028:1(1-46)Online publication date: 9-Jan-2019
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media