Ensembles of Pre-processing Techniques for Noise Detection in Gene Expression Data

Libralon, Giampaolo L.; Carvalho, André C. Ponce Leon Ferreira; Lorena, Ana C.

doi:10.1007/978-3-642-02490-0_60

Giampaolo L. Libralon¹⁹,
André C. Ponce Leon Ferreira Carvalho¹⁹ &
Ana C. Lorena²⁰

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 5506))

Included in the following conference series:

International Conference on Neural Information Processing

1610 Accesses
3 Citations

Abstract

Due to the imprecise nature of biological experiments, biological data are often characterized by the presence of redundant and noisy data, which are usually derived from errors associated with data collection, such as contaminations in laboratorial samples. Gene expression data represent an example of noisy biological data that suffer from this problem. Machine Learning algorithms have been successfully used in gene expression analysis. Although many Machine Learning algorithms can deal with noise, detecting and removing noisy instances from data can help the induction of the target hypothesis. This paper evaluates the use of distance-based pre-processing techniques in gene expression data, analyzing the effectiveness of these techniques and combinations of them in removing noisy data, measured by the accuracy obtained by different Machine Learning classifiers over the pre-processed data. The results obtained indicate that the pre-processing techniques employed were effective for noise detection.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Aggarwal, C.C., Hinneburg, A., Keim, D.A.: On the surprising behavior of distance metrics in high dimensional space. In: Van den Bussche, J., Vianu, V. (eds.) ICDT 2001. LNCS, vol. 1973, pp. 420–434. Springer, Heidelberg (2000)
Chapter Google Scholar
Cohen, W.W.: Fast effective rule induction. In: Proc. 12th Int. Conf. on Machine Learning, pp. 115–123 (1995)
Google Scholar
Collobert, R., Bengio, S.: SVMTorch: Support vector machines for large-scale regression problems. J. Machine Learning Res. 1, 143–160 (2001)
MathSciNet MATH Google Scholar
Demsar, J.: Statistical comparisons of classifiers over multiple datasets. J. Machine Learning Research 7, 1–30 (2006)
MathSciNet MATH Google Scholar
Frank, E., Witten, I.H.: Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann, San Francisco (2005)
MATH Google Scholar
Hodge, V., Austin, J.: A survey of outlier detection methodologies. Artificial Intelligence Review 22, 85–126 (2004)
Article MATH Google Scholar
Knorr, E.M., Ng, R.T., Tucakov, V.: Distance-based outliers: algorithms and applications. The VLDB Journal 8(3-4), 237–253 (2000)
Article Google Scholar
Mitchell, T.: Machine Learning. McGraw Hill, New York (1997)
MATH Google Scholar
Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann, San Francisco (1993)
Google Scholar
Tang, J., Chen, Z., Fu, A.W., Cheung, D.: A robust outlier detection scheme in large data sets. In: Proc. 6th Pacific-Asia Conf. on Knowledge Discovery and Data Mining (2002)
Google Scholar
Tomek, I.: Two modifications of CNN. IEEE Transactions on Systems, Man and Cybernetics 7(11), 769–772 (1976)
MathSciNet MATH Google Scholar
Vapnik, V.N.: The Nature of Statistical Learning Theory, 2nd edn. Springer, Heidelberg (1995)
Book MATH Google Scholar
Wilson, D.R., Martinez, T.R.: Reduction techniques for instance-based learning algorithms. Machine Learning 38(3), 257–286 (2000)
Article MATH Google Scholar
Wilson, D.R., Martinez, T.R.: Improved heterogeneous distance functions. J. Artificial Intelligence Research 6(1), 1–34 (1997)
MathSciNet MATH Google Scholar
Wilson, D.L.: Asymptotic properties of nearest neighbor rules using edited data. IEEE Transactions on Systems, Man and Cybernetics 2(3), 408–421 (1972)
Article MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

ICMC/USP - São Carlos, Caixa Postal 668, 13560-970, São Carlos, SP, Brazil
Giampaolo L. Libralon & André C. Ponce Leon Ferreira Carvalho
Center of Mathematics, Computation and Cognition (CMCC), ABC Fed. Univ. (UFABC), Santo André, SP, Brazil
Ana C. Lorena

Authors

Giampaolo L. Libralon
View author publications
You can also search for this author in PubMed Google Scholar
André C. Ponce Leon Ferreira Carvalho
View author publications
You can also search for this author in PubMed Google Scholar
Ana C. Lorena
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Kyushu Institute of Technology, Network Design and Research Center, 680-4 Fukuoka, 820-8502, Kawazu, Iizuka, Japan
Mario Köppen
Knowledge Engineering and Discovery Research Institute (KEDRI), School of Computing and Mathematical Sciences, Auckland University of Technology, 350 Queen Street, 10110, Auckland, New Zealand
Nikola Kasabov
Department of Electrical and Computer Engineering, Robotics Laboratory, Auckland University of Technology, 38 Princes Street, 1142, Auckland, New Zealand
George Coghill

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Libralon, G.L., Carvalho, A.C.P.L.F., Lorena, A.C. (2009). Ensembles of Pre-processing Techniques for Noise Detection in Gene Expression Data. In: Köppen, M., Kasabov, N., Coghill, G. (eds) Advances in Neuro-Information Processing. ICONIP 2008. Lecture Notes in Computer Science, vol 5506. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-02490-0_60

Download citation

DOI: https://doi.org/10.1007/978-3-642-02490-0_60
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-02489-4
Online ISBN: 978-3-642-02490-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics