Skip to main content

Ensembles of Pre-processing Techniques for Noise Detection in Gene Expression Data

  • Conference paper
Advances in Neuro-Information Processing (ICONIP 2008)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 5506))

Included in the following conference series:

Abstract

Due to the imprecise nature of biological experiments, biological data are often characterized by the presence of redundant and noisy data, which are usually derived from errors associated with data collection, such as contaminations in laboratorial samples. Gene expression data represent an example of noisy biological data that suffer from this problem. Machine Learning algorithms have been successfully used in gene expression analysis. Although many Machine Learning algorithms can deal with noise, detecting and removing noisy instances from data can help the induction of the target hypothesis. This paper evaluates the use of distance-based pre-processing techniques in gene expression data, analyzing the effectiveness of these techniques and combinations of them in removing noisy data, measured by the accuracy obtained by different Machine Learning classifiers over the pre-processed data. The results obtained indicate that the pre-processing techniques employed were effective for noise detection.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Aggarwal, C.C., Hinneburg, A., Keim, D.A.: On the surprising behavior of distance metrics in high dimensional space. In: Van den Bussche, J., Vianu, V. (eds.) ICDT 2001. LNCS, vol. 1973, pp. 420–434. Springer, Heidelberg (2000)

    Chapter  Google Scholar 

  2. Cohen, W.W.: Fast effective rule induction. In: Proc. 12th Int. Conf. on Machine Learning, pp. 115–123 (1995)

    Google Scholar 

  3. Collobert, R., Bengio, S.: SVMTorch: Support vector machines for large-scale regression problems. J. Machine Learning Res. 1, 143–160 (2001)

    MathSciNet  MATH  Google Scholar 

  4. Demsar, J.: Statistical comparisons of classifiers over multiple datasets. J. Machine Learning Research 7, 1–30 (2006)

    MathSciNet  MATH  Google Scholar 

  5. Frank, E., Witten, I.H.: Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann, San Francisco (2005)

    MATH  Google Scholar 

  6. Hodge, V., Austin, J.: A survey of outlier detection methodologies. Artificial Intelligence Review 22, 85–126 (2004)

    Article  MATH  Google Scholar 

  7. Knorr, E.M., Ng, R.T., Tucakov, V.: Distance-based outliers: algorithms and applications. The VLDB Journal 8(3-4), 237–253 (2000)

    Article  Google Scholar 

  8. Mitchell, T.: Machine Learning. McGraw Hill, New York (1997)

    MATH  Google Scholar 

  9. Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann, San Francisco (1993)

    Google Scholar 

  10. Tang, J., Chen, Z., Fu, A.W., Cheung, D.: A robust outlier detection scheme in large data sets. In: Proc. 6th Pacific-Asia Conf. on Knowledge Discovery and Data Mining (2002)

    Google Scholar 

  11. Tomek, I.: Two modifications of CNN. IEEE Transactions on Systems, Man and Cybernetics 7(11), 769–772 (1976)

    MathSciNet  MATH  Google Scholar 

  12. Vapnik, V.N.: The Nature of Statistical Learning Theory, 2nd edn. Springer, Heidelberg (1995)

    Book  MATH  Google Scholar 

  13. Wilson, D.R., Martinez, T.R.: Reduction techniques for instance-based learning algorithms. Machine Learning 38(3), 257–286 (2000)

    Article  MATH  Google Scholar 

  14. Wilson, D.R., Martinez, T.R.: Improved heterogeneous distance functions. J. Artificial Intelligence Research 6(1), 1–34 (1997)

    MathSciNet  MATH  Google Scholar 

  15. Wilson, D.L.: Asymptotic properties of nearest neighbor rules using edited data. IEEE Transactions on Systems, Man and Cybernetics 2(3), 408–421 (1972)

    Article  MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Libralon, G.L., Carvalho, A.C.P.L.F., Lorena, A.C. (2009). Ensembles of Pre-processing Techniques for Noise Detection in Gene Expression Data. In: Köppen, M., Kasabov, N., Coghill, G. (eds) Advances in Neuro-Information Processing. ICONIP 2008. Lecture Notes in Computer Science, vol 5506. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-02490-0_60

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-02490-0_60

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-02489-4

  • Online ISBN: 978-3-642-02490-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics