Feature extraction and dimensionality reduction for mass spectrometry data

https://doi.org/10.1016/j.compbiomed.2009.06.012Get rights and content

Abstract

Mass spectrometry is being used to generate protein profiles from human serum, and proteomic data obtained from mass spectrometry have attracted great interest for the detection of early stage cancer. However, high dimensional mass spectrometry data cause considerable challenges. In this paper we propose a feature extraction algorithm based on wavelet analysis for high dimensional mass spectrometry data. A set of wavelet detail coefficients at different scale is used to detect the transient changes of mass spectrometry data. The experiments are performed on 2 datasets. A highly competitive accuracy, compared with the best performance of other kinds of classification models, is achieved. Experimental results show that the wavelet detail coefficients are efficient way to characterize features of high dimensional mass spectra and reduce the dimensionality of high dimensional mass spectra.

Section snippets

Background

Mass spectrometry is being used to generate protein profiles from human serum, and proteomic data obtained from mass spectrometry have attracted great interest for the detection of early stage cancer. Surface enhanced laser desorption/ionization time-of-flight mass spectrometry (SELDI-TOF-MS) in combination with advanced data mining algorithms, is used to detect protein patterns associated with diseases [1], [2], [3], [4], [5]. As a kind of MS-based protein chip technology, SELDI-TOF-MS has

Methods

In this research we develop a new application of wavelet feature extraction method for mass spectrometry data. Wavelet high frequency part (detail coefficients) is extracted to characterize the features of mass spectrometry data. The extracted features are used to build the SVM classifying model. Fig. 1 shows the general framework of the proposed method.

Experiments and Results

In this study we use classification accuracy, sensitivity, specificity, positive predictive value (PPV) and negative predictive value (NPV) to evaluate the performance of the proposed method. Let TP, TN, FP, and FN be the number of true positive (cancer), true negative (control), false positive and false negative samples. Sensitivity is defined as TP/(TP+FN); specificity is defined as TN/(TN+FP); positive predictive value is defined as TP/(TP+FP); negative predictive value is defined as TN/(TN+

Discussion and conclusions

In this paper we propose a feature extraction algorithm based on multilevel wavelet decomposition for high dimensional mass spectra. A set of wavelet detail coefficients at different levels is used to reduce the dimensionality of mass spectra and characterizes the transient changes of mass spectra, in order to detect the difference between cancer tissue and normal tissue.

Feature extraction method of wavelet detail coefficients is novel application on mass spectrometry data. A set of orthogonal

Conflict of interest statement

None declared.

Acknowledgements

This work was supported by SRF for ROCS, SEM, and Natural Science Foundation of Shandong Province (Y2008G30), China.

References (21)

  • E.F. Petricoin et al.

    Use of proteomic patterns in serum to identify ovarian cancer

    The Lancet

    (2002)
  • C.M. Michener et al.

    Genomics and proteomics: application of novel technology to early detection and prevention of cancer

    Cancer Detection and Prevention

    (2002)
  • J.M. Sorace et al.

    A data review and re-assessment of ovarian cancer serum proteomic profiling

    BMC Bioinformatics

    (2003)
  • E.F. Petricoin et al.

    Clinical proteomics: translating benchside promise into bedside reality

    Nature Reviews Drug Discovery

    (2002)
  • P.R. Srinivas et al.

    Proteomics for cancer biomarker discovery

    Clinical Chemistry

    (2002)
  • P.C. Herrmann et al.

    Cancer proteomics: the state of the art

    Disease Markers

    (2001)
  • G.W. Jr et al.

    Proteinchip surface enhanced laser desorption/ionization (SELDI) mass spectrometry: a novel protein biochip technology for detection of prostate cancer biomarkers in complex protein mixtures

    Prostate Cancer and Prostatic Disease

    (1999)
  • A. Vlahou et al.

    Development of a novel proteomic approach for the detection of transitional cell carcinoma of the bladder in urine

    American Journal of Pathology

    (2001)
  • R.H. Lilien et al.

    Probabilistic disease classification of expression-dependent proteomic data from mass spectrometry of human serum

    Journal of Computational Biology

    (2003)
  • H. Park et al.

    Lower dimensional representation of text data based on centroids and least squares

    BIT

    (2003)
There are more references available in the full text version of this article.

Cited by (0)

View full text