Abstract
Mass spectrometry spectra are recognized as a screening tool for detecting discriminatory protein patterns. Mass spectra, however, are high dimensional data and a large number of local maxima (a.k.a. peaks) have to be analyzed; to tackle this problem we have developed a three-step strategy. After data pre-processing we perform an unsupervised feature selection phase aimed at detecting salient parts of the spectra which could be useful for the subsequent classification phase. The main contribution of the paper is the development of this feature selection and extraction procedure grounded on the theory of multi-scale spaces. Then we use support vector machines for classification. Results obtained by the analysis of a data set of tumor/healthy samples allowed us to correctly classify more than 95% of samples. ROC analysis has been also performed.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Petricoin, E.F., et al.: Use of proteomic patterns in serum to identify ovarian cancer. The Lancet 359, 572–577 (2002)
Liotta, L.A., et al.: High-resolution serum proteomic features for ovarian cancer detection. Endocrine-Related Cancer 11, 163–178 (2004)
Vapnik, V.: The Nature Of Statistical Learning Theory. Springer, New York (1995)
Cristianini, N., Taylor, J.S.: Kernel Methods for Pattern Analysis. Cambridge University Press, Cambridge (2004)
Ulrich, H.G.K.: Advances in kernel methods: support vector learning. MIT Press Cambridge, Cambridge (1999)
Ressom, H.W., et al.: Particle swarm optimization for analysis of mass spectral serum profiles. In: GECCO 2005: Proceedings of the 2005 conference on Genetic and evolutionary computation (2005)
Ressom, H.W., et al.: Peak selection from MALDI-TOF mass spectra using ant colony optimization. Bioinformatics 23, 619–626 (2007)
Lilien, R., Farid, H., Donald, B.: Probabilistic Disease Classification of Expression-Dependent Proteomic Data from Mass Spectrometry of Humsn Serum. Journal of Computational Biology (January 2003 )
Wu, B., et al.: Comparison of statistical methods for classification of ovarian cancer using mass spectrometry data. Bioinformatics 19, 1636–1643 (2003)
Mantini, D., et al.: Independent component analysis for the extraction of reliable protein signal profiles from MALDI-TOF mass spectra. Bioinformatics 24, 63–70 (2008)
Baggerly, K., et al.: Reproducibility of SELDI-TOF protein patterns in serum: comparing datases from different experiments. Bioinformatics 20, 777–785 (2007)
Sorace, J.M., Zhan, M.: A data review and reassessment of ovarian cancer serum proteomics profiling. BMC Bioinformatics 4, 24–32 (2003)
Tibshirani, R., et al.: Sample classification from protein mass spectrometry, by peack probability contrasts. Bioinformatics 20, 3034–3044 (2004)
Karin Noy, K., Fasulo, D.: Improved model based, platform independent feature extraction for mass spectrometry. Bioinformatics 23, 2528–2535 (2007)
Witkin, A., Terzopoulos, D., Kass, M.: Signal matching through scale space. International Journal of Computer Vision, 133 (1987)
Lindeberg, T.: Scale-Space Theory in Computer Vision. Kluwer Academic Publisher, Dordrecht (1994)
Boser, B., Guyon, I., Vapnik, V.: A training algorithm for optimal margin classifiers. In: Proceedings of the Fifth Annual workshop on Computational Learning Theory (1992)
Schoelkopf, B., et al.: Comparing Support Vector Machines with Gaussian Kernels to Radial Basis Function Classifiers. IEEE Transactions on Signal Processing 45, 2758–2765 (1997)
Fawcett, T.: An introduction to ROC analysis. Pattern Recogn. Lett. 27, 861–874 (2006)
Hsu, C.V., Lin, C.J.: A comparison of methods for multi-class support vector machines. IEEE Transactions on Neural Networks 13, 415–425 (2002)
Burges, C.J.C.: A Tutorial on Support Vector Machines for Pattern Recognition. Data Mining and Knowledge Discovery 121, 121–167 (1998)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Ceccarelli, M., d’Acierno, A., Facchiano, A. (2009). A Machine Learning Approach to Mass Spectra Classification with Unsupervised Feature Selection. In: Masulli, F., Tagliaferri, R., Verkhivker, G.M. (eds) Computational Intelligence Methods for Bioinformatics and Biostatistics. CIBB 2008. Lecture Notes in Computer Science(), vol 5488. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-02504-4_22
Download citation
DOI: https://doi.org/10.1007/978-3-642-02504-4_22
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-02503-7
Online ISBN: 978-3-642-02504-4
eBook Packages: Computer ScienceComputer Science (R0)