Abstract
Data quality is a prime concern for many tasks in learning and induction. We proposed in a previous paper a noise correction mechanism called polishing, which exploits the interdependence between the different components of a data set, to identify the noisy values and their appropriate replacements. The design of a sound and informative metric for evaluating the effectiveness of a noise correction scheme turned out to be non-trivial. We motivate here a number of classifier dependent measures and proximity measures, each focusing on a different aspect of the corrected data and the associated classifier. We report on some extended experimentation with polishing, as measured by the proposed metrics. The results suggested that polishing is able to repair a corrupted data set to some extent, and the metrics we devised appear to be reasonable.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Carla E. Brodley and Mark A. Friedl. Identifying and eliminating mislabeled training instances. In Proceedings of the Thirteenth National Conference on Artificial Intelligence, 1996.
P. Clark and T. Niblett. The CN2 induction algorithm. Machine Learning, 3(4):261–283, 1989.
George Drastal. Informed pruning in constructive induction. In Proceedings of the Eighth International Workshop on Machine Learning, pages 132–136, 1991.
[Gamberger et al., 1996]_Dragan Gamberger, Nada Lavrač, and Sašo Džeroski. Noise elimination in inductive concept learning: A case study in medical diagnosis. In Proceedings of the Seventh International Workshop on Algorithmic Learning Theory, pages 199–212, 1996.
George H. John. Robust decision trees: Removing outliers from databases. In Proceedings of the First International Conference on Knowledge Discovery and Data Mining, pages 174–179, 1995.
P. M. Murphy and D. W. Aha. UCI repository of machine learning databases. University of California, Irvine, Department of Information and Computer Science, 1998. http://www.ics.uci.edu/~mlearn/MLRepository.litml.
J. Ross Quinlan. Simplifying decision trees. International Journal of Man-Machine Studies, 27(3):221–234, 1987.
J. Ross Quinlan. C4.5: Programs for Machine Learning. Morgan Kaufmann, 1993.
Peter J. Rousseeuw and Annick M. Leroy. Robust Regression and Outlier Detection. John Wiley & Sons, 1987.
Alen D. Shapiro. Structured Induction in Expert Systems. Addison-Wesley, 1987.
Choh Man Teng. Correcting noisy data. In Proceedings of the Sixteenth International Conference on Machine Learning, pages 239–248, 1999.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2000 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Teng, C.M. (2000). Evaluating Noise Correction. In: Mizoguchi, R., Slaney, J. (eds) PRICAI 2000 Topics in Artificial Intelligence. PRICAI 2000. Lecture Notes in Computer Science(), vol 1886. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44533-1_22
Download citation
DOI: https://doi.org/10.1007/3-540-44533-1_22
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-67925-7
Online ISBN: 978-3-540-44533-3
eBook Packages: Springer Book Archive