Feature and model selection with discriminatory visualization for diagnostic classification of brain tumors

doi:10.1016/j.neucom.2009.07.018

Neurocomputing

Volume 73, Issues 4–6, January 2010, Pages 622-632

https://doi.org/10.1016/j.neucom.2009.07.018 Get rights and content

Abstract

Machine Learning (ML) and related methods have of late made significant contributions to solving multidisciplinary problems in the field of oncology diagnosis. Human brain tumor diagnosis, in particular, often relies on the use of non-invasive techniques such as Magnetic Resonance Imaging (MRI) and Spectroscopy (MRS). In this paper, MRS data of human brain tumors are analyzed in detail.

The high dimensionality of the MR spectra makes difficult both their classification and the interpretation of the obtained results, thus limiting their usability in practical medical settings. The use of dimensionality reduction techniques is therefore advisable. In this work, we apply feature selection methods and several off-the-shelf classifiers on various ¹H-MRS modalities: long and short echo times and an ad hoc combination of both. The introduction of bootstrap resampling techniques permits the obtention of mean performance estimates and their variability. Our experimental findings indicate that the feature selection process enhances the classification performance compared to using the full set of features. We also show that the use of combined information from the different echo times is a better strategy for small numbers of spectral frequencies; however, the use of ever greater numbers of short echo time frequencies permits the obtention of many models with similar performance. The final induced models offer very attractive solutions both in terms of prediction accuracy and number of involved spectral frequencies, which are also amenable to metabolic interpretation. A linear dimensionality-reduction technique that preserves class discrimination capabilities is used for visualizing the data corresponding to the selected frequencies.

Introduction

Over the last decade, ML has made significant inroads in the fields of bioinformatics and biomedicine. One particular application area that has attracted the attention of both medical practitioners and data analysts is that of human oncology [1]. In this work we are specifically interested in quantitative information in the form of patients’ biological signals. We analyze data corresponding to different types of human brain tumors, obtained by single-voxel proton magnetic resonance spectroscopy (¹H-MRS), with the purpose of developing reliable tools for the support of medical expert diagnostic decision making. Decisions in this area are extremely sensitive and are usually based on information obtained by non-invasive measurement techniques.

The analyzed data belong to a multi-center international database that contains cases of a number of brain tumor pathologies [2]. MRS provides a detailed metabolic fingerprint of the tumor-affected tissue that varies according to the echo time of the acquisition and can be used to characterize these pathologies. The echo time is a relevant parameter of ¹H-MRS measurement, given that, at short-echo times (SET), some of the metabolites are better resolved—although numerous overlapping resonances exist, making the spectra difficult to interpret [3]. The use of a long echo time (LET) yields less clearly resolved metabolites but also less baseline distortion, resulting in a more readable spectrum.

The available data have been acquired both at SET and LET and bundled into three groups or super-classes as described below; they are scarce and of high dimensionality, making their discrimination a non-trivial undertaking. Therefore, the need arises for the use of dimensionality reduction methods (feature selection and/or extraction) in order to reduce the overall complexity of the problem. We use an entropic filtering algorithm for feature selection as a fast method to generate relevant subsets of spectral frequencies. An in-depth feature selection study is performed, not only in LET and SET data, but in a combination of both echo times. Bootstrap resampling techniques are used to yield mean performance estimates and their variability, and thus a more reliable measure of predictive ability. The combination of feature selection and classification aims at obtaining simple models (in terms of low numbers of features) capable of good generalization.

We report experimental results that support the practical advantage of combining robust feature selection and classification in this application, as accurate classification is obtained with parsimonious and interpretable subsets of spectral frequencies. We also aim to progress in the comparison of performances for MRS data acquired at different echo times, as well as in the comparison of these with data that combine both echo times.

Of special importance in a practical medical setting is the interpretability of the solutions in terms of these spectral frequencies, something that limits the applicability of methods such as PCA or ICA (whose solutions involve weighted combinations of frequencies, instead of individual frequencies). Moreover, even if interpretable by mere inspection of the involved features, the final selection of spectral frequencies may still provide few clues about the structure of the classes (tumor types). In this medical context, data visualization in a low-dimensional representation space may become extremely important, as it would help radiologists to gain insights into this complex and highly sensitive domain. A linear dimensionality reduction technique that provides a data projection—while preserving the class discrimination achieved by a classifier—is also used in our study. The goal of combining feature selection and visualization is to increase the intuitive interpretability of the classifier results.

Section snippets

Literature review

Early attempts to study ¹H-MRS data in assessing human brain tumors in vivo can be traced back two decades [4]. This pioneering research showed that spectra corresponding to normal brain spectra and tumors differ significantly in terms of the presence/absence of different metabolites. Even though no ML analysis of spectra was done in establishing these differences, it was concluded that ¹H-MRS may help to differentiate tumors for diagnostic and therapeutic purposes, limiting the need for

The ¹H-MRS data

¹H-MRS is by no means a novel technique for the exploration of the brain, but its use for the routine diagnostic examination of brain abnormal tissue is far for standard in clinical practice. Among the reasons to explain this situation is that a simple visual interpretation of ¹H spectra does not easily lead to a clear diagnosis. Moreover, few radiologists (to whom the diagnostic decision pertains) are trained to use and make sense of this technique [17]. Instead, they often resort to magnetic

Methodological setup

Feature selection can often be considered part of model selection and becomes an important step, specially when the number of observations roughly matches the number of features. Performing model selection in the joint space of features and parameters in this situation is at best a delicate task that entails a very high risk of overfitting. In this work feature selection and classifier selection are carried out in an interleaved way. First, feature selection is done in a classifier-independent

Experimental results

The frequency distributions of spectral points selected by the EFA in the bootstrap samples (Section 4.1) are shown in Fig. 2, Fig. 3, Fig. 4.

The results on classifier construction using the previously selected sets of features (also developed in the bootstrap samples, see Section 4.2) are displayed in Table 1. These are the bootstrap estimates of prediction error as given by formula (7), translated to accuracy percentages for ease of reading. Additionally, 95% CIs for the mean are reported in

Conclusions and future work

MRS is yet to become a standard method for day-to-day clinical diagnosis of brain tumors. This is despite it being a non-invasive technique and one that provides rich information about the biochemistry of the tumor pathology. Instead, MRI is often the method of choice for diagnosis in practice, in spite of its limitations. To become mainstream, the diagnosis based on MRS must be sufficiently robust and, for that, reliable tools for spectral data analysis are required. In this study, algorithms

Acknowledgments

Authors gratefully acknowledge the former INTERPRET partners (INTERPRET, EU-IST-1999-10310) and, from 1st January 2003, Generalitat de Catalunya (CIRIT SGR2001-194, XT 2002-48 and XT 2004-51 grants); data providers: Dr. C. Majós (IDI), Dr. A. Moreno-Torres (CDP), Dr. F.A. Howe and Prof. J. Griffiths (SGUL), Prof. A. Heerschap (RU), Dr. W. Gajewicz (MUL) and Dr. J. Calvar (FLENI); data curators: Dr. A.P. Candiota, Ms. T. Delgado, Ms. J. Martín, Mr. I. Olier, Mr. A. Pérez and Prof. Carles Arús

Félix F. González-Navarro is an Associate Professor at the Engineering Institute at Baja California State University, Méxicali, México. Currently, he is a Ph.D. student in the Departament de Llenguatges I Sistemes Informátics at the Universitat Politècnica de Catalunya (UPC), where he investigates in the areas of Pattern Recognition, Feature Selection Algorithms and Information Theory.

References (39)

P.J.G. Lisboa et al.
Cluster-based visualisation with scatter matrices
Pattern Recognition Lett.
(2008)
N. Sibtain
The clinical value of proton magnetic resonance spectroscopy in adult brain tumours
Clin. Radiol.
(2007)
A. Vellido et al.
Handling outliers in brain tumour MRS data analysis through robust topographic mapping
Comput. Biol. Med.
(2006)
A. Vellido, E. Biganzoli, P.J.G. Lisboa, Machine learning in cancer research: implications for personalised medicine,...
M. Julià-Sapé et al.
The interpret consortium: a multi-centre, web-accessible and quality control-checked database of in vivo MR spectra of brain tumour patients
Magn. Reson. Mater. Phys.
(2006)
C. Majós et al.
Brain tumor classification by proton MR spectroscopy: comparison of diagnostic accuracy at short and long TE
Am. J. Neuroradiol.
(2004)
H. Bruhn et al.
Noninvasive differentiation of tumors with use of localized ¹H-MR spectroscopy in vivo: initial experience in patients with cerebral tumors
Radiology
(1989)
H. Kugel et al.
Human brain tumors: spectral patterns detected with localized ¹H-MR spectroscopy
Radiology
(1992)
D. Ott et al.
Human brain tumors: assessment with in vivo proton MR spectroscopy
Radiology
(1993)
Y. Kinoshita et al.
Proton magnetic resonance spectroscopy of brain tumors: an in vitro study
Neurosurgery
(1994)

H. Shimizu et al.

Noninvasive evaluation of malignancy of brain tumors with proton MR spectroscopy

Am. J. Neuroradiol.

(1996)

A. Tate

Development of a decision support system for diagnosis and grading of brain tumours using in vivo magnetic resonance single voxel spectra

NMR Biomed.

(2006)

C. Ladroue, Pattern recognition techniques for the study of magnetic resonance spectra of brain tumours, Ph.D. Thesis,...

A. Devos, Quantification and classification of MRS data and applications to brain tumour recognition, Ph.D. Thesis,...

J. García, S. Tortajada, C. Vidal, M. Julià-Sapé, J. Luts, S. Van Huffel, C. Arús, M. Robles, On the use of long TE and...

J.M. García-Gómez et al.

The influence of combining two echo times in automatic brain tumor classification by Magnetic Resonance Spectroscopy

NMR Biomed.

(2008)

K. Kira, L. Rendell, The feature selection problem: traditional methods and a new algorithm, in: Proceedings of the...

F. González, Ll. Belanche, Feature Selection in in vivo 1H-MRS single voxel spectra, in: Proceedings of the KES 2008...

INTERPRET: International Network for Pattern Recognition of Tumours Using Magnetic Resonance project...

Cited by (37)

A review on brain tumor diagnosis from MRI images: Practical implications, key achievements, and lessons learned
2019, Magnetic Resonance Imaging
The successful early diagnosis of brain tumors plays a major role in improving the treatment outcomes and thus improving patient survival. Manually evaluating the numerous magnetic resonance imaging (MRI) images produced routinely in the clinic is a difficult process. Thus, there is a crucial need for computer-aided methods with better accuracy for early tumor diagnosis. Computer-aided brain tumor diagnosis from MRI images consists of tumor detection, segmentation, and classification processes. Over the past few years, many studies have focused on traditional or classical machine learning techniques for brain tumor diagnosis. Recently, interest has developed in using deep learning techniques for diagnosing brain tumors with better accuracy and robustness. This study presents a comprehensive review of traditional machine learning techniques and evolving deep learning techniques for brain tumor diagnosis. This review paper identifies the key achievements reflected in the performance measurement metrics of the applied algorithms in the three diagnosis processes. In addition, this study discusses the key findings and draws attention to the lessons learned as a roadmap for future research.
A deep neural network based classifier for brain tumor diagnosis
2019, Applied Soft Computing Journal
Citation Excerpt :
Logistic model trees were used to detect the presence of epileptic seizures based on EEG signals [8]; however, the diagnosis rate was lower. A novel method was developed for finding significant brain tumor features for disease classification [9] but the time and space complexity during the brain tumors diagnosis was not solved. Gray Level Co-occurrence Matrix (GLCM) and Gray-Level Run-Length Matrix (GRLM) were also developed for classification of brain tumor detection [10]; however, feature selection performance was not effective.
Classification process plays a key role in diagnosing brain tumors. Earlier research works are intended for identifying brain tumors using different classification techniques. However, the False Alarm Rates (FARs) of existing classification techniques are high. To improve the early-stage brain tumor diagnosis via classification the Weighted Correlation Feature Selection Based Iterative Bayesian Multivariate Deep Neural Learning (WCFS-IBMDNL) technique is proposed in this work. The WCFS-IBMDNL algorithm considers medical dataset for classifying the brain tumor diagnosis at an early stage. At first, the WCFS-IBMDNL technique performs Weighted Correlation-Based Feature Selection (WC-FS) by selecting subsets of medical features that are relevant for classification of brain tumors. After completing the feature selection process, the WCFS-IBMDNL technique uses Iterative Bayesian Multivariate Deep Neural Network (IBMDNN) classifier for reducing the misclassification error rate of brain tumor identification. The WCFS-IBMDNL technique was evaluated in JAVA language using Disease Diagnosis Rate (DDR), Disease Diagnosis Time (DDT), and FAR parameter through the epileptic seizure recognition dataset.
Regions-of-interest based automated diagnosis of Parkinson's disease using T1-weighted MRI
2015, Expert Systems with Applications
Citation Excerpt :
Computer aided diagnosis based on structural MRI has drawn the attention of pattern recognition and machine learning communities during the past few decades. Numerous approaches have been proposed to diagnose diseases, such as Alzheimer’s disease (Chyzhyk et al., 2014; López et al., 2011), schizophrenia (Nieuwenhuis et al., 2012), tumor or lesion detection (González-Navarro et al., 2010; Zacharaki et al., 2009) and Huntington’s disease (Kassubek et al., 2004). Whereas, in case of PD, the main purpose of MRI acquisition has been to rule out alternative pathologies.
Parkinson’s disease (PD) is the second most common neurodegenerative disorder of the central nervous system. For its early management and accurate prognosis, there is a need to develop automated and non-invasive computer-aided diagnosis (CAD) technique(s). The present study proposes a novel region-of-interest (ROI) based CAD technique using T1-weighted magnetic resonance imaging (MRI) to discriminate PD patients from healthy subjects. A volumetric 3D T1-weighted (1 mm isovoxel) MRI of 30 PD patients and age & gender matched 30 healthy subjects is acquired on a 1.5 T MRI scanner. Five well-documented regions affected in PD, namely substantia nigra (SN), thalamus, hippocampus, frontal lobe (FL) and mid-brain are analyzed individually and in combinations of pairs and triplets. Features are constructed from gray matter (GM), white matter (WM) and cerebrospinal fluid (CSF) values of voxel from these regions. A small set of discriminating and non-redundant features is selected using mutual information based approach in conjunction with unpaired two-tailed two-sample t-test based ranking. A decision model is built with the help of support vector machine (SVM) as a classifier. The performance of the decision model is evaluated in terms of sensitivity, specificity and accuracy with leave-one-out cross-validation scheme. Experimental results demonstrate that the proposed method is able to differentiate PD from healthy subjects with a maximum accuracy of 86.67% with SN for GM and combination of SN & FL for WM; and outperforms the voxel-based morphometry method. Furthermore, loss in GM and WM volume and gain in CSF volume is observed in PD patients in comparison to healthy subjects. The excellent performance of the proposed method is beneficial for clinicians as it can be used as a decision support system which requires less time and efforts in diagnosing PD. In addition, it also encourages the application of CAD in medical domain.
Automated classification of brain tumours from short echo time in vivo MRS data using Gaussian Decomposition and Bayesian Neural Networks
2014, Expert Systems with Applications
Neuro-oncologists must ultimately rely on their acquired knowledge and accumulated experience to undertake the sensitive task of brain tumour diagnosis. This task strongly depends on indirect, non-invasive measurements, which are the source of valuable data in the form of signals and images. Expert radiologists should benefit from their use as part of an at least partially automated computer-based medical decision support system. This paper focuses on Magnetic Resonance Spectroscopy signal analysis and illustrates a method that combines Gaussian Decomposition, dimensionality reduction by Moving Window with Variance Analysis and classification using adaptively regularized Artificial Neural Networks. The method yields encouraging results in the task of binary classification of human brain tumours, even for tumour types that have seldom been analyzed from this viewpoint.
Surface recognition improvement in 3D medical laser scanner using Levenberg-Marquardt method
2013, Signal Processing
Citation Excerpt :
At present, neural networks are used as principal solutions for various problems like grouping and classification, pattern recognition, approximation, prediction, clusterization and memory simulation. Neural networks may initially seem complex and computer intensive, but actually may integrate well with a Medical environment in various distinct applications [7–9]. Properly trained backpropagation networks tend to give reasonable answers when presented with inputs that they have never seen.
The 3D measurements of the human body surface or anatomical areas have gained importance in many medical applications. Three dimensional laser scanning systems can provide these measurements; however usually these scanners have non-linear variations in their measurement, and typically these variations depend on the position of the scanner with respect to the person. In this paper, the Levenberg–Marquardt method is used as a digital rectifier to adjust this non-linear variation and increases the measurement accuracy of our 3D Rotational Body Scanner. A comparative analysis with other methods such as Polak–Ribire and quasi-Newton method, and the overall system functioning is presented. Finally, computational experiments are conducted to verify the performance of the proposed system and its method uncertainty.
Discriminant Convex Non-negative Matrix Factorization for the classification of human brain tumours
2013, Pattern Recognition Letters
Citation Excerpt :
In most instances, the target was brain tumour automatic diagnosis, a problem treated as one of supervised classification. In the case of MRS, the inclusion of feature selection and extraction methods in the classification process makes it possible to obtain a sparse and practical metabolic characterization of different types of tumours (González-Navarro et al., 2010; Vellido et al., 2012). The use of MRS creates a signal in the frequency domain that can be analyzed in an unsupervised manner to extract its constituent sources.
The medical analysis of human brain tumours commonly relies on indirect measurements. Among these, magnetic resonance imaging (MRI) and spectroscopy (MRS) predominate in clinical settings as tools for diagnostic assistance. Pattern recognition (PR) methods have successfully been used in this task, usually interpreting diagnosis as a supervised classification problem. In MRS, the acquired spectral signal can be analyzed in an unsupervised manner to extract its constituent sources. Recently, this has been successfully accomplished using Non-negative Matrix Factorization (NMF) methods. In this paper, we present a method to introduce the available class information into the unsupervised source extraction process of a convex variant of NMF. Novel techniques to generate diagnostic predictions for new, unseen spectra using the proposed Discriminant Convex-NMF are also described and experimentally assessed.

View all citing articles on Scopus

Lluís A. Belanche is an Associate Professor in the Software Department at the Universitat Politènica de Catalunya (UPC). He received his B.Sc. in Computer Science from the UPC in 1990 and an M.Sc. in Artificial Intelligence from the UPC in 1991. He joined the Computer Science Faculty shortly after, where he completed his doctoral dissertation in 2000. He has been doing research in neural networks and support vector machines for pattern recognition and function approximation, as well as in feature selection algorithms and their collective application to workable artificial learning systems.

Enrique Romero received his B.Sc. degree in Mathematics in 1989 from the Universitat Autònoma de Barcelona. In 1994, he received his B.Sc. in Computer Science from the Universitat Politècnica de Catalunya (UPC). In 1996, he joined the Software Department at the UPC, as an Assistant Professor. He received his M.Sc. degree in Artificial Intelligence and Ph.D. degree in Computer Science from the UPC in 2000 and 2004, respectively. His research interests include neural networks, support vector machines and feature selection.

Alfredo Vellido received his degree in Physics from the Department of Electronics and Automatic Control of the University of the Basque Country (Spain), in 1996. He completed his Ph.D. at Liverpool John Moores University (UK), in 2000. After a few years of experience in the private sector, he briefly joined Liverpool John Moores University again as research officer in a project in the field of computational neurosciences. He is now a Ramón y Cajal research fellow for the Technical University of Catalonia. Research interests include, but are not limited to, pattern recognition, machine learning and data mining, as well as their application in medicine, market analysis, ecology and e-learning, on which subjects he has published widely.

Margarida Julià-Sapé holds a B.Sc. Hon. in Biology from the Universitat de Barcelona (UB), Spain, 1994, as well as an M.Sc. in Biotechnology (1995) from the UB. She was awarded her Ph.D. in 2006 by the Universitat Autònoma de Barcelona (UAB), Spain. The author is currently a postdoctoral researcher with the Networking Research Center on Bioengineering, Biomaterials and Nanomedicine (CIBER-BBN), at UAB, Cerdanyola del Vallès, Spain.

Carles Arús was born in Barcelona (Spain), in 1954. B.Sc. in Biology from the Universitat Autònoma de Barcelona (UAB), Spain, in 1976. Ph.D. in Chemistry from UAB, in 1981 (Ph.D. advisor Prof. Claudi M. Cuchillo) on the subject of the sub-site structure of bovine pancreatic RNase A (enzyme kinetics, NMR spectroscopy). Best thesis award in the Faculty of Sciences of UAB in 1982. Postdoctoral work in the USA (1982-1985) on biomedical NMR with Prof. Michael Bárány (University of Illinois at Chicago, IL) and Prof. John L. Markley (Purdue University, IN). Since 1985, tenured assistant professor, and, since 2002, full Professor at the Department of Biochemistry and Molecular Biology of the UAB. His research group has carried out work on the application of NMR spectroscopy of tumours for diagnostic purposes and has also contributed to the investigation of human muscle bioenergetics by 31P MRS. His present interests in the field of tumour spectroscopy target the use of 1H MRS of human brain tumours, biopsies and cell models for diagnosis, prognosis and therapy planning. He has published 66 PubMed accessible articles since 1977.

View full text

Feature and model selection with discriminatory visualization for diagnostic classification of brain tumors

Abstract

Introduction

Section snippets

Literature review

The 1H-MRS data

Methodological setup

Experimental results

Conclusions and future work

Acknowledgments

Pattern Recognition Lett.

Clin. Radiol.

Comput. Biol. Med.

The interpret consortium: a multi-centre, web-accessible and quality control-checked database of in vivo MR spectra of brain tumour patients

Magn. Reson. Mater. Phys.

Brain tumor classification by proton MR spectroscopy: comparison of diagnostic accuracy at short and long TE

Am. J. Neuroradiol.

Noninvasive differentiation of tumors with use of localized 1H-MR spectroscopy in vivo: initial experience in patients with cerebral tumors

Radiology

Human brain tumors: spectral patterns detected with localized 1H-MR spectroscopy

Radiology

Human brain tumors: assessment with in vivo proton MR spectroscopy

Radiology

Proton magnetic resonance spectroscopy of brain tumors: an in vitro study

Neurosurgery

Noninvasive evaluation of malignancy of brain tumors with proton MR spectroscopy

Am. J. Neuroradiol.

Development of a decision support system for diagnosis and grading of brain tumours using in vivo magnetic resonance single voxel spectra

NMR Biomed.

The influence of combining two echo times in automatic brain tumor classification by Magnetic Resonance Spectroscopy

NMR Biomed.

The ¹H-MRS data

Noninvasive differentiation of tumors with use of localized ¹H-MR spectroscopy in vivo: initial experience in patients with cerebral tumors

Human brain tumors: spectral patterns detected with localized ¹H-MR spectroscopy