Abstract
Data sets in Bioinformatics usually present a high level of noise. Various processes involved in biological data collection and preparation may be responsible for the introduction of this noise, such as the imprecision inherent to laboratory experiments generating these data. Using noisy data in the induction of classifiers through Machine Learning techniques may harm the classifiers prediction performance. Therefore, the predictions of these classifiers may be used for guiding noise detection and removal. This work compares three approaches for the elimination of noisy data from Bioinformatics data sets using Machine Learning classifiers: the first is based in the removal of the detected noisy examples, the second tries to reclassify these data and the third technique, named hybrid, unifies the previous approaches.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Zhu, X., Wu, X.: Class noise vs. Attribute noise: A quantitative study of their impacts. Artificial Intelligence Review 22(3), 177–210 (2004)
Van Hulse, J.D., Khoshgoftaar, T.M., Huang, H.: The pairwise attribute noise detection algorithm. Knowl. Inf. Syst. 11(2), 171–190 (2007)
Mitchell, T.: Machine Learning. McGraw-Hill, New York (1997)
Noble, W.S.: Kernel Methods in Computational Biology. In: Support vector machines applications in computational biology, ch. 3, pp. 71–92. MIT Press, Cambridge (2004)
Haykin, S.: Neural Network – A Compreensive foundation, 2nd edn. Prentice-Hall, New Jersey (1999)
Breiman, L., Friedman, F., Olshen, R.A., Stone, C.J.: Classification and regression trees. Wadsworth (1984)
Verbaeten, S., Assche, A.V.: Ensemble Methods for noise elimination in Classification problems. In: Windeatt, T., Roli, F. (eds.) MCS 2003. LNCS, vol. 2709, pp. 317–325. Springer, Heidelberg (2003)
Hodge, V., Austin, J.: A survey of outlier detection methodologies. Artificial Intelligence Review 22, 85–126 (2004)
Demsar, J.: Statistical Comparisons of Classifiers over Multiple Data Sets. The Journal of Machine Learning Research 7, 1–30 (2006)
Mack, D.H., Tom, E.Y., Mahadev, M., Dong, H., Mittman, M., Dee, S., Levine, A.J., Gingeras, T.R., Lockhart, D.J.: Biology of Tumors. In: Mihich, K., Croce, C. (eds.), pp. 123–131. Plenum, New York (1998)
Brown, M., Grundy, W., Lin, D., Christianini, N., Sugnet, C., Haussler, D.: Support Vector Machines Classication of Microarray Gene Expression Data, Technical Report UCSC-CRL 99-09, Department of Computer Science, University California Santa Cruz, Santa Cruz, CA (1999)
Dudoit, S., Fridlyand, J., Speed, T.P.: Comparison of discrimination methods for the classication of tumors using gene expression data. Technical Report 576, Department of Statistics, UC Berkeley (2000)
Yeoh, E.J., Ross, M.E., Shurtle, S.A., Williams, W.K., Patel, D., Mahfouz, R., Behm, F.G., Raimondi, S.C., Relling, M.V., Patel, A., Cheng, C., Campana, D., Wilkins, D., Zhou, X., Li, J., Liu, H., Pui, C.H., Evans, W.E., Naeve, C., Wong, L., Downing, J.R.: Classification, subtype discovery, and prediction of outcome in pediatric acute lymphoblastic leukemia by gene expression profiling. Cancer Cell 1(2), 133–143 (2002)
Monti, S., Tamayo, P., Mesirov, J., Golub, T.: Consensus clustering: A resampling based method for class discovery and visualization of gene expression microarray data. Machine Learning 52(1-2), 91–118 (2003)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Miranda, A.L.B., Garcia, L.P.F., Carvalho, A.C.P.L.F., Lorena, A.C. (2009). Use of Classification Algorithms in Noise Detection and Elimination. In: Corchado, E., Wu, X., Oja, E., Herrero, Á., Baruque, B. (eds) Hybrid Artificial Intelligence Systems. HAIS 2009. Lecture Notes in Computer Science(), vol 5572. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-02319-4_50
Download citation
DOI: https://doi.org/10.1007/978-3-642-02319-4_50
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-02318-7
Online ISBN: 978-3-642-02319-4
eBook Packages: Computer ScienceComputer Science (R0)