Optimized spatial filters as a new method for mass spectrometry-based cancer diagnosis
Graphical abstract
Introduction
Cancer remains one of the leading causes of death across the globe. To reduce the death rate, new methods for the early detection of cancer are needed. With the development of new methods, lethality can often be prevented by a relatively minor treatment administered during the early stages of the disease.
In the past two decades, mass spectrometry-based identification of serum proteomic patterns (or biomarkers) has emerged as a new diagnostic tool for the early detection of various types of cancers. In mass spectrometry-based methods, biological fluids, such as serum, plasma and urine, are analyzed by mass spectrometry to obtain a mass spectrum identifying m/z (mass to charge) ratios and peak intensities of peptides/proteins within that particular fluid. The obtained spectral data from pathological and normal control groups are then classified by pattern recognition methods.
Raw mass spectrometry data consist of tens of thousands of m/z ratios per specimen and an intensity level for each m/z ratio. Currently, a low resolution SELDI-TOF MS (Surface Enhanced Laser Desorption/Ionization Mass Spectrometer) can measure up to 15,500 data points that can be used to form datasets including 500–20,000 m/z ratios. With a high-resolution mass spectrometer (MS), the number of data points can be increased to 400,000 [1]. The analysis of such an immense amount of data poses considerable challenges. Therefore, to improve the performances of classification algorithms, after preprocessing stages (resampling, baseline correction, alignment and normalization), feature filtering or dimension reduction methods are widely utilized. Dimension reduction methods utilized in different studies are usually grouped into three categories: filtering, wrapper and embedded methods. Filtering methods [2], [3], [4], [5] use some statistical tests, such as t-tests, Wilcoxon tests, Mann-Whitley tests and Kolmogorov-Smirnov tests, to evaluate whether the data points are redundant or not. According to the obtained scores, statistically insignificant points are extracted from the data by setting a threshold value. In wrapper methods, the dimension reduction process is integrated into the classification stage. In these methods, a subset of features is first selected with an algorithm and then classified with a classification method. According to the obtained classification error, the feature selection algorithm updates its parameters until the optimum subset of features is found [6]. Because the dimensionality is high, usually a stochastic algorithm, such as a genetic algorithm, particle swarm optimization or ant colony optimization, is used for this purpose [1]. As in the wrapper methods, embedded methods also integrate the feature selection process with the classification stage. However, these methods are computationally more effective than the wrapper methods. In some other studies, discrete wavelet transform (DWT) is also utilized for both dimension reduction and signal enhancement [5], [6], [7], [8].
The classification of mass spectrometry data often requires multiple processing steps (including multiple dimension reduction steps or multiple feature extraction steps) because of the high dimensional nature of the data. In Ref. [1], two filtering algorithms, the between-group to the within-group sum of square (BWSS) algorithm and the -test, are used for filtering. Then, a k-means algorithm was used to reduce the feature correlation and redundancy. After the k-means algorithm, the authors used a genetic ensemble-based feature selection step to further minimize the feature size by selecting highly discriminative m/z features in a combinatorial way. The proposed method utilizes a multi-objective genetic algorithm as the feature space exploring engine, while an ensemble of classifiers is used as the feature subset evaluator. The used ensemble classifier is the combination of five individual classifiers (decision tree, 1-nearest neighbor, 3-nearest neighbor, 7-nearest neighbor and naïve Bayes). In Ref. [5], the authors used a four-step dimension reduction strategy (binning, Kolmogorov-Smirnov test, restriction of coefficient of variation and wavelet analysis) for ovarian cancer identification. Even after the four step dimension reduction strategy, they still tackled the classification of 3382 dimensional vectors by using a SVM classifier. In Ref. [8], the authors first refined the MS data and removed some of the data points from the data that did not have some desired properties. After this process, they obtained 39,905 dimensional vectors, which they called dataset A. This dataset was then further analyzed by a filtering method (t-test) to further decrease the dimension. After this process, they reduced the dimension of the vectors from 39,905 to 24,545 to form dataset B. Then, they divided the MS data into several intervals (windows), and they selected variables that could represent the characteristics of each waveform segment. After several experiments, statistical moments (mean, variance, skewness and kurtosis) were selected for further analysis. After transformation based on statistical moments, sets A and B were transformed to sets C and D, which reduced the dimensionality down to 3992 and 1964, respectively. In the classification step, a kernel partial least squares (KPLS) algorithm was used.
The above mentioned multi-step dimension reduction and feature extraction methodology complicates the analysis of mass spectrometry data. Moreover, the choice of the right method or parameters (such as window length in window-based methods) at each step can highly affect the performance of the classification algorithms. In this study, to overcome the above mentioned drawbacks, a new method, namely, an optimized spatial filter (OSF) method, is proposed for the classification of mass spectrometry data. In the proposed method, after the preprocessing stage of the mass spectrometry data, only one dimension reduction method (discrete wavelet transform) is performed, and then the data are effectively classified after this single dimension reduction step without using any further feature extraction or dimension reduction steps. The proposed method is based on the theory of common spatial patterns [9], which is widely used to classify motor imagery EEG signals in brain-computer interface (BCI) applications. Motor imagery can be defined as a dynamic state during which an individual mentally simulates a predefined action. The EEG signal acquired from the brain during this mental simulation process is known as a “motor imagery EEG signal,” and the classification of the signals acquired from different mentally simulated actions is known as motor imagery signal classification. For motor imagery EEG signal classification, CSP methods aim to find spatial filters that provide optimal discrimination between two different classes (or mental actions). Computationally, CSPs are solved by simultaneously diagonalizing the two covariance matrices of two classes [10]. A computed CSP filter projects the multi-dimensional EEG time domain signal to a one-dimensional time domain signal in which the power (variance) of one class is maximized while the power of the other class is minimized [11]. Here, the same concept is used to find the optimum filters that project multi-dimensional mass spectrometry signal to a one-dimensional signal in which the variance between two classes (normal control group and cancerous samples) is maximized. Although, it has been shown that the CSP method performs quite well on EEG data, it also has some shortcomings. In the CSP method, the between-class variance is maximized, but the minimization of the within-class variances is ignored. As a result, the projected data may have large within-class variances. To overcome this problem, in this study, optimal filters are found by using the differential evolution (DE) algorithm [12]. For the fitness function of the differential evolution algorithm, a divergence analysis is used in which both the between-class and within-class distributions are considered. Experimental results performed on publicly available mass spectrometry datasets showed that, when compared to existing methods, the proposed method is quite simple and achieves the minimum classification error for each dataset.
The remaining part of this paper is organized as follows. In Section 2, the preprocessing steps of the mass spectrometry data are first briefly given. Then, details of the CSP-based method and the proposed OSF-based method for mass spectrometry data analysis are introduced. Section 3, covers the experimental results and discussion, and finally, Section 4 concludes the work.
Section snippets
Methodology
In this study, two different datasets obtained from Ref. [13] are used in the experiments. The first dataset was generated from a set of ovarian cancers and control specimens using a SELDI-TOF spectrometer. The dataset includes mass spectrometry of 95 control and 121 ovarian cancer samples. The initial dimension of high-resolution mass spectrometry data in the first set is 368,750. The second dataset used in this study was generated again using a SELDI-TOF spectrometer from 259 prostate cancer
Experimental results and discussion
In the classification phase of the study, the optimal spatial filter found by the CSP or OSF method is used to form feature spaces by using Eqs. (11) and (12). The formed feature space is then classified by linear discriminant analysis (LDA). Again, in the classification phase of the study, a 5-fold cross-validation process is used. However, in the experiments, the 5-fold cross-validation process is repeated 50 times, and the average results are considered. The experiments are performed for
Conclusion
This study proposes a new method, an optimized spatial filter (OSF) method, for the classification of mass spectrometry data for cancer diagnosis. The proposed method is based on the theory of common spatial patterns (CSPs), which is widely used in brain-computer interface (BCI) applications. In contrast to the CSP method, the proposed OSF method utilizes an optimization algorithm (DE in our case) to find the optimal spatial filter coefficients. The OSF method uses the divergence score as the
References (32)
A clustering based hybrid system for biomarker selection and sample classification of mass spectrometry data
Neurocomputing
(2010)Feature extraction and dimensionality reduction for mass spectrometry data
Comput. Biol. Med.
(2009)Dimension reduction by a novel unified scheme using divergence analysis and genetic search
Digit. Signal Process.
(2010)Detection of cancer-specific markers amid massive mass spectral data
Proc. Natl. Acad. Sci.
(2003)Comparison of statistical methods for classification of ovarian cancer using mass spectrometry data
Bioinformatics
(2003)- et al.
Ovarian cancer detection by partial least squares method using mass spectrometry data
National Conference on Biomedical Engineering (BIYOMUT)
(2012) Ovarian cancer identification based on dimensionality reduction for high-throughput mass spectrometry data
Bioinformatics
(2005)- et al.
Prostate cancer classification from mass spectrometry data by using wavelet analysis and Kernel partial least squares algorithm
Int. J. Biosci. Biochem. Bioinform.
(2013) Ovarian cancer classification based on dimensionality reduction for SELDI-TOF data
BMC Bioinform.
(2010)- et al.
Optimal spatial filtering of single trial EEG during imagined hand movement
IEEE Trans. Rehabil. Eng.
(2000)
L1-norm-based common spatial patterns
IEEE Trans. Biomed. Eng.
A neural network-based optimal spatial filter design method for motor imagery classification
PLoS One
Differential evolution—a simple and efficient heuristic for global optimization over continuous spaces
J. Global Optim.
Serum proteomic patterns for detection of prostate cancer
J. Natl. Cancer Inst.
Enhancement of sensitivity and resolution of surface-enhanced laser desorption/ionization time-of-flight mass spectrometric records for serum peptides using time-series analysis techniques
Clin. Chem.
Cited by (1)
Predicting fetal hypoxia using common spatial pattern and machine learning from cardiotocography signals
2020, Applied AcousticsCitation Excerpt :The CSP method was introduced in a study focused on the multi-channel electroencephalography (EEG) hand movement classification problem proposed by H. Ramoser [59]. The CSP analysis has been commonly performed as a feature extraction approach in the brain-computer interfaces (BCIs) [60]. The steps listed below are followed by this method: