Elsevier

Applied Soft Computing

Volume 48, November 2016, Pages 59-79
Applied Soft Computing

Optimized spatial filters as a new method for mass spectrometry-based cancer diagnosis

https://doi.org/10.1016/j.asoc.2016.06.035Get rights and content

Highlights

  • This study proposes a new method namely, the optimized spatial filters (OSF), for the classification of mass spectrometry data for cancer diagnosis applications.

  • The OSF method is based on the theory of common spatial patterns (CSP) which is widely used for the classification of motor imagery EEG signals in brain-computer interface (BCI) applications.

  • The OSF method considers both the between-class and within-class distribution of the spatially filtered samples which is not the case for the CSP method.

  • The OSF method also highlights the importance of certain parts of the spectra which is highly valuable in the identification of the biomarkers that lie outside the pathological pathway of the disease.

  • The OSF method can also be utilized for the classification of other type of spectroscopic data such as NMR and Raman spectroscopy data.

Abstract

In the past two decades, mass spectrometry-based identification of serum proteomic patterns has emerged as a new diagnostic tool for the early detection of various types of cancers. However, due to its high dimensionality, the analysis of mass spectrometry data poses considerable challenges. Existing methods proposed for the analysis of mass spectrometry data usually consist of a number of steps. In this study, a comparatively simple but efficient method, namely, an optimal spatial filter (OSF) method, is proposed for the classification of mass spectrometry data. The newly proposed method is based on the theory of common spatial patterns (CSPs), which are widely used to classify motor imagery EEG signals in brain-computer interface (BCI) applications. The CSP method aims to find spatial filters to project the data into a new space in which optimal discrimination between classes is achieved. Although it has been shown that the CSP method performs quite well in classifying motor imagery EEG signals, it has a major drawback. In the CSP method, the between-class variance is maximized, but the minimization of within-class variance is ignored. As a result, the projected data may have large within-class variances. To overcome this problem, in this study, optimal filters are found by using the differential evolution (DE) algorithm. For the fitness function of the differential evolution algorithm, a divergence analysis is used. In the divergence analysis, both the between-class and within-class distributions of the projected data are considered. The experimental results obtained using publicly available mass spectrometry datasets showed that, when compared to existing methods, the proposed OSF method is quite simple and achieves the minimum classification error for each dataset. Furthermore, the proposed OSF method highlights the importance of certain parts of the spectra, which is highly valuable for the identification of biomarkers that lie outside the pathological pathway of the disease.

Introduction

Cancer remains one of the leading causes of death across the globe. To reduce the death rate, new methods for the early detection of cancer are needed. With the development of new methods, lethality can often be prevented by a relatively minor treatment administered during the early stages of the disease.

In the past two decades, mass spectrometry-based identification of serum proteomic patterns (or biomarkers) has emerged as a new diagnostic tool for the early detection of various types of cancers. In mass spectrometry-based methods, biological fluids, such as serum, plasma and urine, are analyzed by mass spectrometry to obtain a mass spectrum identifying m/z (mass to charge) ratios and peak intensities of peptides/proteins within that particular fluid. The obtained spectral data from pathological and normal control groups are then classified by pattern recognition methods.

Raw mass spectrometry data consist of tens of thousands of m/z ratios per specimen and an intensity level for each m/z ratio. Currently, a low resolution SELDI-TOF MS (Surface Enhanced Laser Desorption/Ionization Mass Spectrometer) can measure up to 15,500 data points that can be used to form datasets including 500–20,000 m/z ratios. With a high-resolution mass spectrometer (MS), the number of data points can be increased to 400,000 [1]. The analysis of such an immense amount of data poses considerable challenges. Therefore, to improve the performances of classification algorithms, after preprocessing stages (resampling, baseline correction, alignment and normalization), feature filtering or dimension reduction methods are widely utilized. Dimension reduction methods utilized in different studies are usually grouped into three categories: filtering, wrapper and embedded methods. Filtering methods [2], [3], [4], [5] use some statistical tests, such as t-tests, Wilcoxon tests, Mann-Whitley tests and Kolmogorov-Smirnov tests, to evaluate whether the data points are redundant or not. According to the obtained scores, statistically insignificant points are extracted from the data by setting a threshold value. In wrapper methods, the dimension reduction process is integrated into the classification stage. In these methods, a subset of features is first selected with an algorithm and then classified with a classification method. According to the obtained classification error, the feature selection algorithm updates its parameters until the optimum subset of features is found [6]. Because the dimensionality is high, usually a stochastic algorithm, such as a genetic algorithm, particle swarm optimization or ant colony optimization, is used for this purpose [1]. As in the wrapper methods, embedded methods also integrate the feature selection process with the classification stage. However, these methods are computationally more effective than the wrapper methods. In some other studies, discrete wavelet transform (DWT) is also utilized for both dimension reduction and signal enhancement [5], [6], [7], [8].

The classification of mass spectrometry data often requires multiple processing steps (including multiple dimension reduction steps or multiple feature extraction steps) because of the high dimensional nature of the data. In Ref. [1], two filtering algorithms, the between-group to the within-group sum of square (BWSS) algorithm and the χ2-test, are used for filtering. Then, a k-means algorithm was used to reduce the feature correlation and redundancy. After the k-means algorithm, the authors used a genetic ensemble-based feature selection step to further minimize the feature size by selecting highly discriminative m/z features in a combinatorial way. The proposed method utilizes a multi-objective genetic algorithm as the feature space exploring engine, while an ensemble of classifiers is used as the feature subset evaluator. The used ensemble classifier is the combination of five individual classifiers (decision tree, 1-nearest neighbor, 3-nearest neighbor, 7-nearest neighbor and naïve Bayes). In Ref. [5], the authors used a four-step dimension reduction strategy (binning, Kolmogorov-Smirnov test, restriction of coefficient of variation and wavelet analysis) for ovarian cancer identification. Even after the four step dimension reduction strategy, they still tackled the classification of 3382 dimensional vectors by using a SVM classifier. In Ref. [8], the authors first refined the MS data and removed some of the data points from the data that did not have some desired properties. After this process, they obtained 39,905 dimensional vectors, which they called dataset A. This dataset was then further analyzed by a filtering method (t-test) to further decrease the dimension. After this process, they reduced the dimension of the vectors from 39,905 to 24,545 to form dataset B. Then, they divided the MS data into several intervals (windows), and they selected variables that could represent the characteristics of each waveform segment. After several experiments, statistical moments (mean, variance, skewness and kurtosis) were selected for further analysis. After transformation based on statistical moments, sets A and B were transformed to sets C and D, which reduced the dimensionality down to 3992 and 1964, respectively. In the classification step, a kernel partial least squares (KPLS) algorithm was used.

The above mentioned multi-step dimension reduction and feature extraction methodology complicates the analysis of mass spectrometry data. Moreover, the choice of the right method or parameters (such as window length in window-based methods) at each step can highly affect the performance of the classification algorithms. In this study, to overcome the above mentioned drawbacks, a new method, namely, an optimized spatial filter (OSF) method, is proposed for the classification of mass spectrometry data. In the proposed method, after the preprocessing stage of the mass spectrometry data, only one dimension reduction method (discrete wavelet transform) is performed, and then the data are effectively classified after this single dimension reduction step without using any further feature extraction or dimension reduction steps. The proposed method is based on the theory of common spatial patterns [9], which is widely used to classify motor imagery EEG signals in brain-computer interface (BCI) applications. Motor imagery can be defined as a dynamic state during which an individual mentally simulates a predefined action. The EEG signal acquired from the brain during this mental simulation process is known as a “motor imagery EEG signal,” and the classification of the signals acquired from different mentally simulated actions is known as motor imagery signal classification. For motor imagery EEG signal classification, CSP methods aim to find spatial filters that provide optimal discrimination between two different classes (or mental actions). Computationally, CSPs are solved by simultaneously diagonalizing the two covariance matrices of two classes [10]. A computed CSP filter projects the multi-dimensional EEG time domain signal to a one-dimensional time domain signal in which the power (variance) of one class is maximized while the power of the other class is minimized [11]. Here, the same concept is used to find the optimum filters that project multi-dimensional mass spectrometry signal to a one-dimensional signal in which the variance between two classes (normal control group and cancerous samples) is maximized. Although, it has been shown that the CSP method performs quite well on EEG data, it also has some shortcomings. In the CSP method, the between-class variance is maximized, but the minimization of the within-class variances is ignored. As a result, the projected data may have large within-class variances. To overcome this problem, in this study, optimal filters are found by using the differential evolution (DE) algorithm [12]. For the fitness function of the differential evolution algorithm, a divergence analysis is used in which both the between-class and within-class distributions are considered. Experimental results performed on publicly available mass spectrometry datasets showed that, when compared to existing methods, the proposed method is quite simple and achieves the minimum classification error for each dataset.

The remaining part of this paper is organized as follows. In Section 2, the preprocessing steps of the mass spectrometry data are first briefly given. Then, details of the CSP-based method and the proposed OSF-based method for mass spectrometry data analysis are introduced. Section 3, covers the experimental results and discussion, and finally, Section 4 concludes the work.

Section snippets

Methodology

In this study, two different datasets obtained from Ref. [13] are used in the experiments. The first dataset was generated from a set of ovarian cancers and control specimens using a SELDI-TOF spectrometer. The dataset includes mass spectrometry of 95 control and 121 ovarian cancer samples. The initial dimension of high-resolution mass spectrometry data in the first set is 368,750. The second dataset used in this study was generated again using a SELDI-TOF spectrometer from 259 prostate cancer

Experimental results and discussion

In the classification phase of the study, the optimal spatial filter found by the CSP or OSF method is used to form feature spaces by using Eqs. (11) and (12). The formed feature space is then classified by linear discriminant analysis (LDA). Again, in the classification phase of the study, a 5-fold cross-validation process is used. However, in the experiments, the 5-fold cross-validation process is repeated 50 times, and the average results are considered. The experiments are performed for

Conclusion

This study proposes a new method, an optimized spatial filter (OSF) method, for the classification of mass spectrometry data for cancer diagnosis. The proposed method is based on the theory of common spatial patterns (CSPs), which is widely used in brain-computer interface (BCI) applications. In contrast to the CSP method, the proposed OSF method utilizes an optimization algorithm (DE in our case) to find the optimal spatial filter coefficients. The OSF method uses the divergence score as the

References (32)

  • Haixian Wang et al.

    L1-norm-based common spatial patterns

    IEEE Trans. Biomed. Eng.

    (2012)
  • Ayhan Yüksel et al.

    A neural network-based optimal spatial filter design method for motor imagery classification

    PLoS One

    (2015)
  • Rainer Storn et al.

    Differential evolution—a simple and efficient heuristic for global optimization over continuous spaces

    J. Global Optim.

    (1997)
  • FDA-NCI Clinical Proteomics Program Databank website. http://home.ccr.cancer.gov/ncifdaproteomics/ppatterns.asp...
  • Emanuel F. Petricoin

    Serum proteomic patterns for detection of prostate cancer

    J. Natl. Cancer Inst.

    (2002)
  • Dariya I. Malyarenko

    Enhancement of sensitivity and resolution of surface-enhanced laser desorption/ionization time-of-flight mass spectrometric records for serum peptides using time-series analysis techniques

    Clin. Chem.

    (2005)
  • Cited by (1)

    • Predicting fetal hypoxia using common spatial pattern and machine learning from cardiotocography signals

      2020, Applied Acoustics
      Citation Excerpt :

      The CSP method was introduced in a study focused on the multi-channel electroencephalography (EEG) hand movement classification problem proposed by H. Ramoser [59]. The CSP analysis has been commonly performed as a feature extraction approach in the brain-computer interfaces (BCIs) [60]. The steps listed below are followed by this method:

    View full text