Robust SVM-Based Biomarker Selection with Noisy Mass Spectrometric Proteomic Data

Marchiori, Elena; Jimenez, Connie R.; West-Nielsen, Mikkel; Heegaard, Niels H. H.

doi:10.1007/11732242_8

Elena Marchiori²⁹,
Connie R. Jimenez³⁰,
Mikkel West-Nielsen³¹ &
…
Niels H. H. Heegaard³¹

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 3907))

Included in the following conference series:

Workshops on Applications of Evolutionary Computation

1624 Accesses
8 Citations

Abstract

Computational analysis of mass spectrometric (MS) proteomic data from sera is of potential relevance for diagnosis, prognosis, choice of therapy, and study of disease activity. To this aim, feature selection techniques based on machine learning can be applied for detecting potential biomarkes and biomaker patterns. A key issue concerns the interpretability and robustness of the output results given by such techniques. In this paper we propose a robust method for feature selection with MS proteomic data. The method consists of the sequentail application of a filter feature selection algorithm, RELIEF, followed by multiple runs of a wrapper feature selection technique based on support vector machines (SVM), where each run is obtained by changing the class label of one support vector. Frequencies of features selected over the runs are used to identify features which are robust with respect to perturbations of the data. This method is tested on a dataset produced by a specific MS technique, called MALDI-TOF MS. Two classes have been artificially generated by spiking. Moreover, the samples have been collected at different storage durations. Leave-one-out cross validation (LOOCV) applied to the resulting dataset, indicates that the proposed feature selection method is capable of identifying highly discriminatory proteomic patterns.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

On Comprehensive Mass Spectrometry Data Analysis for Proteome Profiling of Human Blood Samples

Article 22 May 2018

A New Wavelet-Based Approach for Mass Spectrometry Data Classification

Sparse Proteomics Analysis – a compressed sensing-based approach for feature selection and classification of high-dimensional proteomics mass spectrometry data

Article Open access 09 March 2017

References

Bradley, A.P.: The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recognition 30(6), 1145–1159 (1997)
Article Google Scholar
Cristianini, N., Shawe-Taylor, J.: Support Vector machines. Cambridge Press, Cambridge (2000)
Google Scholar
Diamandis, E.P.: Analysis of serum proteomic patterns for early cancer diagnosis: Drawing attention to potential problems. Journal of the National Cancer Institute 96(5), 353–356 (2004)
Article Google Scholar
Issaq, H.J., et al.: SELDI-TOF MS for diagnostic proteomics. Anal. Chem 75(7), 148A–155A (2003)
Article Google Scholar
Petricoin, E.F., et al.: Serum proteomic patterns for detection of prostate cancer. Journal of the National Cancer Institute 94(20), 1576–1578 (2002)
Google Scholar
Petricoin, E.F., et al.: Use of proteomic patterns in serum to identify ovarian cancer. The Lancet 359(9306), 572–577 (2002)
Article Google Scholar
Qu, Y., et al.: Boosted decision tree analysis of surface-enhanced laser desorption/ ionization mass spectral serum profiles discriminates prostate cancer from noncancer patients. Clin. Chem 48(10), 1835–1843 (2002)
Google Scholar
Zhu, W., et al.: Detection of cancer-specific markers amid massive mass spectral data. PNAS 100(25), 14666–14671 (2003)
Article MATH Google Scholar
Evgeniou, T., Pontil, M., Elisseeff, A.: Leave one out error, stability, and generalization of voting combinations of classifiers. Mach. Learn. 55(1), 71–97 (2004)
Article MATH Google Scholar
Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. Machine Learning 3, 1157–1182 (2003); Special Issue on variable and feature selection
Article MATH Google Scholar
Guyon, I., Weston, J., Barnhill, S., Vapnik, V.: Gene selection for cancer classification using support vector machines. Mach. Learn. 46(1-3), 389–422 (2002)
Article MATH Google Scholar
John, G.H., Kohavi, R., Pfleger, K.: Irrelevant features and the subset selection problem. In: International Conference on Machine Learning, pp. 121–129 (1994)
Google Scholar
Jong, K., Marchiori, E., Sebag, M., van der Vaart, A.: Feature selection in proteomic pattern data with support vector machines. In: IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology (2004)
Google Scholar
Kira, K., Rendell, L.A.: The feature selection problem: Traditional methods and a new algorithm. In: Tenth National Conference on artificial intelligence, pp. 129–134 (1992)
Google Scholar
Li, J., Zhang, Z., Rosenzweig, J., Wang, Y.Y., Chan, D.W.: Proteomics and bioinformatics approaches for identification of serum biomarkers to detect breast cancer. Clinical Chemistry 48(8), 1296–1304 (2002)
Google Scholar
Lie, H., Motoda, H. (eds.): Feature Extraction, Construction and Selection: a Data Mining Perspective. International Series in Engineering and Computer Science. Kluwer, Dordrecht (1998)
Google Scholar
Liu, H., Li, J., Wong, L.: A comparative study on feature selection and classification methods using gene expression profiles and proteomic patterns. Genome Informatics 13, 51–60 (2002)
Google Scholar
Marchiori, E., Heegaard, N.H.H., West-Nielsen, M., Jimenez, C.R.: Feature selection for classification with proteomic data of mixed quality. In: Proceedings of the 2005 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology, pp. 385–391 (2005)
Google Scholar
Marshall, E.: Getting the noise out of gene arrays. Science 306(5696), 630–631 (2004)
Article Google Scholar
Oh, I.S., Lee, J.-S., Moon, B.-R.: Local search-embedded genetic algorithms for feature selection. In: 16 th International Conference on Pattern Recognition (ICPR 2002). IEEE Press, Los Alamitos (2002)
Google Scholar
Ransohoff, D.F.: Lessons from controversy: Ovarian cancer screening and serum proteomics. Journal of the National Cancer Institute 97, 315–319 (2005)
Article Google Scholar
Raymer, M.L., Punch, W.F., Goodman, E.D., Kuhn, L.A., Jain, A.K.: Dimensionality reduction using genetic algorithms. IEEE Transactions on Evolutionary Computation 4(2), 164–171 (2000)
Article Google Scholar
Rendell, L.A., Kira, K.: A practical approach to feature selection. In: International Conference on machine learning, pp. 249–256 (1992)
Google Scholar
Michiels, S., Koscielny, S., Hill, C.: Prediction of cancer outcome with microarrays: a multiple random validation strategy. The Lancet 365(9458), 488–492 (2005)
Article Google Scholar
Vapnik, V.N.: Statistical Learning Theory. John Wiley & Sons, Chichester (1998)
MATH Google Scholar
West-Nielsen, M., Hogdall, E.V., Marchiori, E., Hogdall, C.K., Schou, C., Heegaard, N.H.H.: Sample handling for mass spectrometric proteomic investigations of human sera. Analytical Chemistry 11(16), 5114–5123 (2005)
Article Google Scholar
Xing, E.P.: Feature selection in microarray analysis. In: A Practical Approach to Microarray Data Analysis, Kluwer Academic Publishers, Dordrecht (2003)
Google Scholar
Yu, L., Liu, H.: Feature selection for high-dimensional data: A fast correlationbased filter solution. In: ICML, pp. 856–863 (2003)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, Vrije Universiteit Amsterdam, The Netherlands
Elena Marchiori
Department of Molecular and Cellular Neurobiology, Vrije Universiteit Amsterdam, The Netherlands
Connie R. Jimenez
Department of Autoimmunology, Statens Serum Institut, Copenhagen, Denmark
Mikkel West-Nielsen & Niels H. H. Heegaard

Authors

Elena Marchiori
View author publications
You can also search for this author in PubMed Google Scholar
Connie R. Jimenez
View author publications
You can also search for this author in PubMed Google Scholar
Mikkel West-Nielsen
View author publications
You can also search for this author in PubMed Google Scholar
Niels H. H. Heegaard
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Johannes Gutenberg University, Mainz, Germany
Franz Rothlauf
Institute AIFB, University of Karlsruhe, 76128, Karlsruhe, Germany
Jürgen Branke
Dipartimento di Ingegneria dell’Informazione, Università di Parma,
Stefano Cagnoni
Centre of Informatics and Systems of the University of Coimbra,
Ernesto Costa
Dept. LCC, Universidad de Málaga, Spain
Carlos Cotta
Institute of Computer Science, University of Bremen, 28359, Bremen, Germany
Rolf Drechsler
INRIA Saclay - Ile-de-France, Parc Orsay Université, 4, rue Jacques Monod, 91893, ORSAY Cedex, France
Evelyne Lutton
CISUC, Department of Informatics Engineering, University of Coimbra, Polo II of the University of Coimbra, 3030, Coimbra, Portugal
Penousal Machado
Dartmouth College, Lebanon, NH, USA
Jason H. Moore
Universidade de A Coruña, CP 15071, A Coruña, Spain
Juan Romero
School of Computing Sciences, UEA Norwich, University of East Anglia, NR4 7TJ, Norwich, UK
George D. Smith
Dipartimento di Automatica e Informatica, Politecnico di Torino, Italy
Giovanni Squillero
Kyushu University, Japan
Hideyuki Takagi

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Marchiori, E., Jimenez, C.R., West-Nielsen, M., Heegaard, N.H.H. (2006). Robust SVM-Based Biomarker Selection with Noisy Mass Spectrometric Proteomic Data. In: Rothlauf, F., et al. Applications of Evolutionary Computing. EvoWorkshops 2006. Lecture Notes in Computer Science, vol 3907. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11732242_8

Download citation

DOI: https://doi.org/10.1007/11732242_8
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-33237-4
Online ISBN: 978-3-540-33238-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics