Summary
Clinical proteomics based on mass spectrometry has gained tremendous visibility in the scientific and clinical community. Machine learning methods are keys for efficient processing of the complex data. One major class are prototype based algorithms. Prototype based vector quantizers or classifiers are intuitive approaches realizing the principle of characteristic representatives for data subsets or decision regions between them. Examples for such tools are Support Vector Machines (SVM) [1], Kohonens Learning Vector Quantization (LVQ) [2], Self-Organizing Map (SOMs) [2], Supervised Relevance Neural Gas (SRNG) [3] and respective variants. Depending on the task one can distinguish between unsupervised methods for data representation and supervised methods for classification. New developments include the utilization of non-standard metrics (functional norms, scaled Euclidean) and task-dependent automatic metric adaptation (feature selection), fuzzy classification, and similarity based visualization of data. These properties offer new possibilities for analysis of mass spectrometric data. In this contribution we concentrate on recent extensions of SOMs as universal tools in the light of clinical proteomics. We focus on non-standard metrics and biomarker patterns discovery. We consider extensions of the standard SOM and LVQ for handling of more general metrics. In particular, we demonstrate applications of the weighted Euclidean metric and the weighted functional norm (based on weighted L p-norm) or kernelized metrics taking the specific nature of mass-spectra into account. This allows an efficient feature selection, which may be used for biomarker identification. The adaptation of the algorithms to these specific requirements leads to effective tools for knowledge discovery keeping the robustness of the original simple approaches. Further we consider fuzzy classification and regression within the determination of clinical proteomics models. This topic deals with the widely ranged problem of uncertainty of data. Particularly in medicine, the classification of mass spectra may be subject of individual human assessment (based on some expert knowledge), multi-impairment diseases, and incomplete patient/proband information. This leads to the problem of uncertainty of training data in machine learning data bases. We developed a semi-supervised approach based on SOM to process such data. As a result the algorithm provides a fuzzy classification scheme based on prototypes for classification of spectra (Fuzzy Labeled SOM - FLSOM).
We demonstrate the usefulness of the above extensions of the basic prototype based data analysis by SOMs to the analysis of mass spectra in proteomics and related knowledge discovery. In particular, we give application examples for biomarker detection based on feature selection and fuzzy classification of spectra combined with similarity based class visualization.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Vapnik, V.: Statistical Learning Theory. Wiley, New York (1998)
Kohonen, T. (ed.): Self-Organizing Maps, Springer Series in Information Sciences, vol. 30. Springer, Berlin (1995) (2nd Ext. Ed. 1997)
Hammer, B., Strickert, M., Villmann, T.: Supervised neural gas with general similarity measure. Neural Proc. Letters 21(1), 21–44 (2005)
Pusch, W., Flocco, M., Leung, S., Thiele, H., Kostrzewa, M.: Mass spectrometry-based clinical proteomics. Pharmacogenomics 4, 463–476 (2003)
Petricoin, E., Ardekani, A., Hitt, B., et al.: Use of proteomic patterns in serum to identify ovarian cancer. Lancet 359, 572–577 (2002)
Wulfkuhle, J., Petricoin, E., Liotta, L.: Proteomic applications for the early detection of cancer. Nat. Rev. Cancer 3, 267–275 (2003)
Ransohoff, D.: Lessons from controversy: ovarian cancer screening and serum proteomics, J. Natl. Cancer Inst. 97, 315–319 (2005)
Morris, J.S., Coombes, K.R., Koomen, J., Baggerly, K.A., Kobayashi, R.: Feature extraction and quantification for mass spectrometry in biomedical applications using the mean spectrum. Bioinformatics 21(9), 1764–1775 (2005)
Vannucci, M., Sha, N., Brown, P.J.: Nir and mass spectra classification: Bayesian methods for wavelet-based feature selection. Chem. and Int. Lab Systems 77, 139–148 (2005)
Yu, J.S., Ongarello, S., Fiedler, R., et al.: Ovarian cancer identification based on dimensionality reduction for high-throughput mass spectrometry data. Bioinformatics 21(10), 2200–2209 (2005)
de Noo, M., Deelder, A., van der Werff, M., zalp, A., Martens, B.: MALDI-TOF serum protein profiling for detection of breast cancer. Onkologie 29, 501–506 (2006)
Fiedler, G., Baumann, S., Leichtle, A., Oltmann, A., Kase, J., Thiery, J., Ceglarek, U.: Standardized peptidome profiling of human urine by magnetic bead separation and matrix-assisted laser desorption/ionization time-of-flight mass spectrometry. Clinical Chemistry 53(3), 421–428 (2007)
Schäffeler, E., Zanger, U., Schwab, M., et al.: Magnetic bead based human plasma profiling discriminate acute lymphatic leukaemia from non-diseased samples. In: 52nd ASMS Conference. TPV 420 (2004)
Schipper, R., Loof, A., de Groot, J., Harthoorn, L., van Heerde, W., Dransfield, E.: Salivary protein/peptide profiling with seldi-tof-ms. Annals of the New York Academy of Science 1098, 498–503 (2007)
Guerreiro, N., Gomez-Mancilla, B., Charmont, S.: Optimization and evaluation of seldi-tof mass spectrometry for protein profiling of cerebrospinal fluid. Proteome science 4, 7 (2006)
Villmann, T., Der, R., Herrmann, M., Martinetz, T.: Topology Preservation in Self–Organizing Feature Maps: Exact Definition and Measurement. IEEE Transactions on Neural Networks 8(2), 256–266 (1997)
Schleif, F.M., Elssner, T., Kostrzewa, M., Villmann, T., Hammer, B.: Analysis and Visualization of Proteomic Data by Fuzzy labeled Self Organizing Maps. In: Proc. of CBMS 2006, pp. 919–924 (2006)
Wang, J., Bo, T.H., Jonassen, I., Myklebost, O., Hovig, E.: Tumor classification and marker gene prediction by feature selection and fuzzy c-means clustering using microarray data. BMC Bioinformatics 4, 60 (2003)
Arima, C., Hanai, T., Okamoto, M.: Gene expression analysis using fuzzy k-means clustering. Genome Informatics 14, 334–335 (2003)
Bishop, C.: Pattern Recognition and Machine Learning. Springer, Science+Business Media, LLC, New York (2006)
Pudil, P., Novovicova, J.: Floating search methods in feature selection. Pattern Recognition Letters 15, 1119–1125 (1994)
Somol, P., Pudil, P.: Adaptive floating search methods in feature selection. Pattern Recognition Letters 20, 1157–1163 (1999)
Guyon, I., Gunn, S., Nikravesh, M., Zahed, L.A.: Feature Extraction - Foundations and Applications. Springer, Heidelberg (2006)
Hecht-Nielsen, R.: Counterprogagation networks. Appl. Opt. 26(23), 4979–4984 (1987)
Vuorimaa, P.: Fuzzy self-organizing map. Fuzzy Sets and Systems 66(2), 223–231 (1994)
Erwin, E., Obermayer, K., Schulten, K.: Self-organizing maps: Ordering, convergence properties and energy functions. Biol. Cyb. 67(1), 47–55 (1992)
Heskes, T.: In: Oja, E., Kaski, S. (eds.) Kohonen Maps, pp. 303–316. Elsevier, Amsterdam (1999)
Hastie, T., Stuetzle, W.: Principal curves. J. Am. Stat. Assn. 84, 502–516 (1989)
Bauer, H.U., Pawelzik, K.R.: Quantifying the neighborhood preservation of Self-Organizing Feature Maps. IEEE Trans on Neural Networks 3(4), 570–579 (1992)
Schleif, F.M., Hammer, B., Villmann, T.: Supervised Neural Gas for Functional Data and its Application to the Analysis of Clinical Proteom Spectra. In: Sandoval, F., Prieto, A.G., Cabestany, J., Graña, M. (eds.) IWANN 2007. LNCS, vol. 4507, pp. 1036–1044. Springer, Heidelberg (2007)
Ketterlinus, R., Hsieh, S.Y., Teng, S.H., Lee, H., Pusch, W.: Fishing for biomarkers: analyzing mass spectrometry data with the new clinprotools software. Bio techniques 38(6), 37–40 (2005)
Schleif, F.M.: Prototype based Machine Learning for Clinical Proteomics. Ph.D. Thesis, Technical University Clausthal, Technical University Clausthal, Clausthal-Zellerfeld, Germany (2006)
Daubechies, I.: Ten lectures on wavelets. In: CBMS-NSF Regional Conference Series in Applied Mathematics, Philadelphia, PA. Society for Industrial and Applied Mathematics (SIAM), vol. 61 (1992)
Mallat, S.: A wavelet tour of signal processing. Academic Press, San Diego (1998)
Louis, A.K., Maaß, A.P.: Wavelets: Theory and Applications. Wiley, Chichester (1998)
Lio, P.: Wavelets in bioinformatics and computational biology: state of art and perspectives. Bioinformatics 19(1), 2–9 (2003)
Zhu, H., Yu, C.Y., Zhang, H.: Tree-based disease classification using protein data. Proteomics 3, 1673–1677 (2003)
Waagen, D., Cassabaum, M., Scott, C., Schmitt, H.: Exploring alternative wavelet base selection techniques with application to high resolution radar classification. In: Proc. of the 6th Int. Conf. on Inf. Fusion (ISIF 2003), pp. 1078–1085. IEEE Press, Los Alamitos (2003)
Leung, A., Chau, F., Gao, J.: A review on applications of wavelet transform techniques in chemical analysis: 1989-1997. Chem. and Int. Lab. Sys. 43(1), 165–184(20) (1998)
Cohen, A., Daubechies, I., Feauveau, J.C.: Biorthogonal bases of compactly supported wavelets. Comm. Pure Appl. Math. 45(5), 485–560 (1992)
Villmann, T., Strickert, M., Brüß, C., Schleif, F.M., Seiffert, U.: Visualization of fuzzy information in fuzzy-classification for image sagmentation using MDS. In: Proc. of ESANN 2007, pp. 103–108 (2007)
Hammer, B., Villmann, T.: Generalized relevance learning vector quantization. Neural Netw 15(8-9), 1059–1068 (2002)
Lee, J., Verleysen, M.: Generalizations of the Lp Norm for time series and its application to Self-Organizing Maps. In: Cottrell, M. (ed.) 5th Workshop on Self-Organizing Maps, vol. 1, pp. 733–740 (2005)
Hammer, B., Schleif, F.M., Villmann, T.: On the generalization ability of prototype-based classifiers with local relevance determination, Technical Reports University of Clausthal IfI-05-14, p. 18 (2005)
Schneider, P., Biehl, M., Hammer, B.: Relevance Matrices in LVQ. In: Proc. of ESANN 2007, pp. 37–42 (2007)
Baumann, S., Ceglarek, U., Fiedler, G., Lembcke, J., Leichtle, A., Thiery, J.: Standardized approach to proteomic profiling of human serum based magnetic bead separation and matrix-assisted laser esorption/ionization time-of flight mass spectrometry. Clinical Chemistry 51, 973–980 (2005)
Check, E.: Proteomics and cancer: Running before we can walk? Nature 429, 496–497 (2004)
Villmann, T., Schleif, F.M., Merenyi, E., Hammer, B.: Fuzzy Labeled Self Organizing Map for Classification of Spectra. In: Sandoval, F., Prieto, A.G., Cabestany, J., Graña, M. (eds.) IWANN 2007. LNCS, vol. 4507, pp. 556–563. Springer, Heidelberg (2007)
Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning. Springer, New York (2001)
Zhang, Z., Page, G., Zhang, H.: Fishing Expedition - A supervised approach to extract patterns from a compendium of expression profiles. In: Lin, S.M., Johnson, K.F. (eds.) Methods of Microarray Data Analysis II. Kluwer Academic Publishers, Dordrecht (papers from CAMDA 2001) (2002)
Lee, Y., Lee, C.K.: Classification of multiple cancer types by multicategory support vector machines using gene expression data. Bioinformatics 19(9), 1132–1139 (2003)
Villmann, T., Bauer, H.U., Villmann, T.: Proceedings of WSOM 1997, Workshop on Self-Organizing Maps, Helsinki University of Technology Neural Networks Research Centre, June 4-6, pp. 286–291 (1997)
Bauer, H.U., Villmann, T.: Growing a Hypercubical Output Space in a Self–Organizing Feature Map. IEEE Transactions on Neural Networks 8(2), 218–226 (1997)
Carpenter, G., Grossberg, S.: The Handbook of Brain Theory and Neural Networks, 2nd edn., pp. 87–90. MIT Press, Cambridge (2003)
Villmann, T., Hammer, B., Schleif, F.M., Geweniger, T.: Fuzzy classification by fuzzy labeled neural gas. Neural Networks 19(6-7), 772–779 (2006)
Der, R., Herrmann, M.: Instabilities in Self-Organized Feature Maps with Short Neighborhood Range. In: Verleysen, M. (ed.) Proc. ESANN 1994, European Symp. on Artificial Neural Networks, pp. 271–276. D facto conference services, Brussels, Belgium (1994)
Molinaro, A., Simon, R., Pfeiffer, R.: Prediction error estimation: A comparison of resampling methods. Bioinformatics 21(15), 3301–3307 (2005)
Kearns, M.J., Mansur, Y., Ng, A., Ron, D.: An experimental and theoretical comparison of model selection methods. Machine Learning 27, 7–50 (1997)
Bartlett, P.L., Boucheron, S., Lugosi, G.: Model selection and error estimation. Machine Learning 48, 85–113 (2002)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Schleif, FM., Villmann, T., Hammer, B., van der Werff, M., Deelder, A., Tollenaar, R. (2008). Analysis of Spectral Data in Clinical Proteomics by Use of Learning Vector Quantizers. In: Smolinski, T.G., Milanova, M.G., Hassanien, AE. (eds) Computational Intelligence in Biomedicine and Bioinformatics. Studies in Computational Intelligence, vol 151. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-70778-3_6
Download citation
DOI: https://doi.org/10.1007/978-3-540-70778-3_6
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-70776-9
Online ISBN: 978-3-540-70778-3
eBook Packages: EngineeringEngineering (R0)