Skip to main content

A Machine Learning Approach to Mass Spectra Classification with Unsupervised Feature Selection

  • Conference paper
Computational Intelligence Methods for Bioinformatics and Biostatistics (CIBB 2008)

Abstract

Mass spectrometry spectra are recognized as a screening tool for detecting discriminatory protein patterns. Mass spectra, however, are high dimensional data and a large number of local maxima (a.k.a. peaks) have to be analyzed; to tackle this problem we have developed a three-step strategy. After data pre-processing we perform an unsupervised feature selection phase aimed at detecting salient parts of the spectra which could be useful for the subsequent classification phase. The main contribution of the paper is the development of this feature selection and extraction procedure grounded on the theory of multi-scale spaces. Then we use support vector machines for classification. Results obtained by the analysis of a data set of tumor/healthy samples allowed us to correctly classify more than 95% of samples. ROC analysis has been also performed.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Petricoin, E.F., et al.: Use of proteomic patterns in serum to identify ovarian cancer. The Lancet 359, 572–577 (2002)

    Article  CAS  Google Scholar 

  2. Liotta, L.A., et al.: High-resolution serum proteomic features for ovarian cancer detection. Endocrine-Related Cancer 11, 163–178 (2004)

    Article  PubMed  Google Scholar 

  3. Vapnik, V.: The Nature Of Statistical Learning Theory. Springer, New York (1995)

    Book  Google Scholar 

  4. Cristianini, N., Taylor, J.S.: Kernel Methods for Pattern Analysis. Cambridge University Press, Cambridge (2004)

    Google Scholar 

  5. Ulrich, H.G.K.: Advances in kernel methods: support vector learning. MIT Press Cambridge, Cambridge (1999)

    Google Scholar 

  6. Ressom, H.W., et al.: Particle swarm optimization for analysis of mass spectral serum profiles. In: GECCO 2005: Proceedings of the 2005 conference on Genetic and evolutionary computation (2005)

    Google Scholar 

  7. Ressom, H.W., et al.: Peak selection from MALDI-TOF mass spectra using ant colony optimization. Bioinformatics 23, 619–626 (2007)

    Article  CAS  PubMed  Google Scholar 

  8. Lilien, R., Farid, H., Donald, B.: Probabilistic Disease Classification of Expression-Dependent Proteomic Data from Mass Spectrometry of Humsn Serum. Journal of Computational Biology (January 2003 )

    Google Scholar 

  9. Wu, B., et al.: Comparison of statistical methods for classification of ovarian cancer using mass spectrometry data. Bioinformatics 19, 1636–1643 (2003)

    Article  CAS  PubMed  Google Scholar 

  10. Mantini, D., et al.: Independent component analysis for the extraction of reliable protein signal profiles from MALDI-TOF mass spectra. Bioinformatics 24, 63–70 (2008)

    Article  CAS  PubMed  Google Scholar 

  11. Baggerly, K., et al.: Reproducibility of SELDI-TOF protein patterns in serum: comparing datases from different experiments. Bioinformatics 20, 777–785 (2007)

    Article  Google Scholar 

  12. Sorace, J.M., Zhan, M.: A data review and reassessment of ovarian cancer serum proteomics profiling. BMC Bioinformatics 4, 24–32 (2003)

    Article  PubMed  PubMed Central  Google Scholar 

  13. Tibshirani, R., et al.: Sample classification from protein mass spectrometry, by peack probability contrasts. Bioinformatics 20, 3034–3044 (2004)

    Article  CAS  PubMed  Google Scholar 

  14. Karin Noy, K., Fasulo, D.: Improved model based, platform independent feature extraction for mass spectrometry. Bioinformatics 23, 2528–2535 (2007)

    Article  PubMed  Google Scholar 

  15. Witkin, A., Terzopoulos, D., Kass, M.: Signal matching through scale space. International Journal of Computer Vision, 133 (1987)

    Google Scholar 

  16. Lindeberg, T.: Scale-Space Theory in Computer Vision. Kluwer Academic Publisher, Dordrecht (1994)

    Book  Google Scholar 

  17. Boser, B., Guyon, I., Vapnik, V.: A training algorithm for optimal margin classifiers. In: Proceedings of the Fifth Annual workshop on Computational Learning Theory (1992)

    Google Scholar 

  18. Schoelkopf, B., et al.: Comparing Support Vector Machines with Gaussian Kernels to Radial Basis Function Classifiers. IEEE Transactions on Signal Processing 45, 2758–2765 (1997)

    Article  Google Scholar 

  19. Fawcett, T.: An introduction to ROC analysis. Pattern Recogn. Lett. 27, 861–874 (2006)

    Article  Google Scholar 

  20. Hsu, C.V., Lin, C.J.: A comparison of methods for multi-class support vector machines. IEEE Transactions on Neural Networks 13, 415–425 (2002)

    Article  PubMed  Google Scholar 

  21. Burges, C.J.C.: A Tutorial on Support Vector Machines for Pattern Recognition. Data Mining and Knowledge Discovery 121, 121–167 (1998)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Ceccarelli, M., d’Acierno, A., Facchiano, A. (2009). A Machine Learning Approach to Mass Spectra Classification with Unsupervised Feature Selection. In: Masulli, F., Tagliaferri, R., Verkhivker, G.M. (eds) Computational Intelligence Methods for Bioinformatics and Biostatistics. CIBB 2008. Lecture Notes in Computer Science(), vol 5488. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-02504-4_22

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-02504-4_22

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-02503-7

  • Online ISBN: 978-3-642-02504-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics