Abstract
Computational analysis of mass spectrometric (MS) proteomic data from sera is of potential relevance for diagnosis, prognosis, choice of therapy, and study of disease activity. To this aim, feature selection techniques based on machine learning can be applied for detecting potential biomarkes and biomaker patterns. A key issue concerns the interpretability and robustness of the output results given by such techniques. In this paper we propose a robust method for feature selection with MS proteomic data. The method consists of the sequentail application of a filter feature selection algorithm, RELIEF, followed by multiple runs of a wrapper feature selection technique based on support vector machines (SVM), where each run is obtained by changing the class label of one support vector. Frequencies of features selected over the runs are used to identify features which are robust with respect to perturbations of the data. This method is tested on a dataset produced by a specific MS technique, called MALDI-TOF MS. Two classes have been artificially generated by spiking. Moreover, the samples have been collected at different storage durations. Leave-one-out cross validation (LOOCV) applied to the resulting dataset, indicates that the proposed feature selection method is capable of identifying highly discriminatory proteomic patterns.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Bradley, A.P.: The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recognition 30(6), 1145–1159 (1997)
Cristianini, N., Shawe-Taylor, J.: Support Vector machines. Cambridge Press, Cambridge (2000)
Diamandis, E.P.: Analysis of serum proteomic patterns for early cancer diagnosis: Drawing attention to potential problems. Journal of the National Cancer Institute 96(5), 353–356 (2004)
Issaq, H.J., et al.: SELDI-TOF MS for diagnostic proteomics. Anal. Chem 75(7), 148A–155A (2003)
Petricoin, E.F., et al.: Serum proteomic patterns for detection of prostate cancer. Journal of the National Cancer Institute 94(20), 1576–1578 (2002)
Petricoin, E.F., et al.: Use of proteomic patterns in serum to identify ovarian cancer. The Lancet 359(9306), 572–577 (2002)
Qu, Y., et al.: Boosted decision tree analysis of surface-enhanced laser desorption/ ionization mass spectral serum profiles discriminates prostate cancer from noncancer patients. Clin. Chem 48(10), 1835–1843 (2002)
Zhu, W., et al.: Detection of cancer-specific markers amid massive mass spectral data. PNAS 100(25), 14666–14671 (2003)
Evgeniou, T., Pontil, M., Elisseeff, A.: Leave one out error, stability, and generalization of voting combinations of classifiers. Mach. Learn. 55(1), 71–97 (2004)
Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. Machine Learning 3, 1157–1182 (2003); Special Issue on variable and feature selection
Guyon, I., Weston, J., Barnhill, S., Vapnik, V.: Gene selection for cancer classification using support vector machines. Mach. Learn. 46(1-3), 389–422 (2002)
John, G.H., Kohavi, R., Pfleger, K.: Irrelevant features and the subset selection problem. In: International Conference on Machine Learning, pp. 121–129 (1994)
Jong, K., Marchiori, E., Sebag, M., van der Vaart, A.: Feature selection in proteomic pattern data with support vector machines. In: IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology (2004)
Kira, K., Rendell, L.A.: The feature selection problem: Traditional methods and a new algorithm. In: Tenth National Conference on artificial intelligence, pp. 129–134 (1992)
Li, J., Zhang, Z., Rosenzweig, J., Wang, Y.Y., Chan, D.W.: Proteomics and bioinformatics approaches for identification of serum biomarkers to detect breast cancer. Clinical Chemistry 48(8), 1296–1304 (2002)
Lie, H., Motoda, H. (eds.): Feature Extraction, Construction and Selection: a Data Mining Perspective. International Series in Engineering and Computer Science. Kluwer, Dordrecht (1998)
Liu, H., Li, J., Wong, L.: A comparative study on feature selection and classification methods using gene expression profiles and proteomic patterns. Genome Informatics 13, 51–60 (2002)
Marchiori, E., Heegaard, N.H.H., West-Nielsen, M., Jimenez, C.R.: Feature selection for classification with proteomic data of mixed quality. In: Proceedings of the 2005 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology, pp. 385–391 (2005)
Marshall, E.: Getting the noise out of gene arrays. Science 306(5696), 630–631 (2004)
Oh, I.S., Lee, J.-S., Moon, B.-R.: Local search-embedded genetic algorithms for feature selection. In: 16 th International Conference on Pattern Recognition (ICPR 2002). IEEE Press, Los Alamitos (2002)
Ransohoff, D.F.: Lessons from controversy: Ovarian cancer screening and serum proteomics. Journal of the National Cancer Institute 97, 315–319 (2005)
Raymer, M.L., Punch, W.F., Goodman, E.D., Kuhn, L.A., Jain, A.K.: Dimensionality reduction using genetic algorithms. IEEE Transactions on Evolutionary Computation 4(2), 164–171 (2000)
Rendell, L.A., Kira, K.: A practical approach to feature selection. In: International Conference on machine learning, pp. 249–256 (1992)
Michiels, S., Koscielny, S., Hill, C.: Prediction of cancer outcome with microarrays: a multiple random validation strategy. The Lancet 365(9458), 488–492 (2005)
Vapnik, V.N.: Statistical Learning Theory. John Wiley & Sons, Chichester (1998)
West-Nielsen, M., Hogdall, E.V., Marchiori, E., Hogdall, C.K., Schou, C., Heegaard, N.H.H.: Sample handling for mass spectrometric proteomic investigations of human sera. Analytical Chemistry 11(16), 5114–5123 (2005)
Xing, E.P.: Feature selection in microarray analysis. In: A Practical Approach to Microarray Data Analysis, Kluwer Academic Publishers, Dordrecht (2003)
Yu, L., Liu, H.: Feature selection for high-dimensional data: A fast correlationbased filter solution. In: ICML, pp. 856–863 (2003)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Marchiori, E., Jimenez, C.R., West-Nielsen, M., Heegaard, N.H.H. (2006). Robust SVM-Based Biomarker Selection with Noisy Mass Spectrometric Proteomic Data. In: Rothlauf, F., et al. Applications of Evolutionary Computing. EvoWorkshops 2006. Lecture Notes in Computer Science, vol 3907. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11732242_8
Download citation
DOI: https://doi.org/10.1007/11732242_8
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-33237-4
Online ISBN: 978-3-540-33238-1
eBook Packages: Computer ScienceComputer Science (R0)