Elsevier

Neurocomputing

Volume 73, Issues 4–6, January 2010, Pages 622-632
Neurocomputing

Feature and model selection with discriminatory visualization for diagnostic classification of brain tumors

https://doi.org/10.1016/j.neucom.2009.07.018Get rights and content

Abstract

Machine Learning (ML) and related methods have of late made significant contributions to solving multidisciplinary problems in the field of oncology diagnosis. Human brain tumor diagnosis, in particular, often relies on the use of non-invasive techniques such as Magnetic Resonance Imaging (MRI) and Spectroscopy (MRS). In this paper, MRS data of human brain tumors are analyzed in detail.

The high dimensionality of the MR spectra makes difficult both their classification and the interpretation of the obtained results, thus limiting their usability in practical medical settings. The use of dimensionality reduction techniques is therefore advisable. In this work, we apply feature selection methods and several off-the-shelf classifiers on various 1H-MRS modalities: long and short echo times and an ad hoc combination of both. The introduction of bootstrap resampling techniques permits the obtention of mean performance estimates and their variability. Our experimental findings indicate that the feature selection process enhances the classification performance compared to using the full set of features. We also show that the use of combined information from the different echo times is a better strategy for small numbers of spectral frequencies; however, the use of ever greater numbers of short echo time frequencies permits the obtention of many models with similar performance. The final induced models offer very attractive solutions both in terms of prediction accuracy and number of involved spectral frequencies, which are also amenable to metabolic interpretation. A linear dimensionality-reduction technique that preserves class discrimination capabilities is used for visualizing the data corresponding to the selected frequencies.

Introduction

Over the last decade, ML has made significant inroads in the fields of bioinformatics and biomedicine. One particular application area that has attracted the attention of both medical practitioners and data analysts is that of human oncology [1]. In this work we are specifically interested in quantitative information in the form of patients’ biological signals. We analyze data corresponding to different types of human brain tumors, obtained by single-voxel proton magnetic resonance spectroscopy (1H-MRS), with the purpose of developing reliable tools for the support of medical expert diagnostic decision making. Decisions in this area are extremely sensitive and are usually based on information obtained by non-invasive measurement techniques.

The analyzed data belong to a multi-center international database that contains cases of a number of brain tumor pathologies [2]. MRS provides a detailed metabolic fingerprint of the tumor-affected tissue that varies according to the echo time of the acquisition and can be used to characterize these pathologies. The echo time is a relevant parameter of 1H-MRS measurement, given that, at short-echo times (SET), some of the metabolites are better resolved—although numerous overlapping resonances exist, making the spectra difficult to interpret [3]. The use of a long echo time (LET) yields less clearly resolved metabolites but also less baseline distortion, resulting in a more readable spectrum.

The available data have been acquired both at SET and LET and bundled into three groups or super-classes as described below; they are scarce and of high dimensionality, making their discrimination a non-trivial undertaking. Therefore, the need arises for the use of dimensionality reduction methods (feature selection and/or extraction) in order to reduce the overall complexity of the problem. We use an entropic filtering algorithm for feature selection as a fast method to generate relevant subsets of spectral frequencies. An in-depth feature selection study is performed, not only in LET and SET data, but in a combination of both echo times. Bootstrap resampling techniques are used to yield mean performance estimates and their variability, and thus a more reliable measure of predictive ability. The combination of feature selection and classification aims at obtaining simple models (in terms of low numbers of features) capable of good generalization.

We report experimental results that support the practical advantage of combining robust feature selection and classification in this application, as accurate classification is obtained with parsimonious and interpretable subsets of spectral frequencies. We also aim to progress in the comparison of performances for MRS data acquired at different echo times, as well as in the comparison of these with data that combine both echo times.

Of special importance in a practical medical setting is the interpretability of the solutions in terms of these spectral frequencies, something that limits the applicability of methods such as PCA or ICA (whose solutions involve weighted combinations of frequencies, instead of individual frequencies). Moreover, even if interpretable by mere inspection of the involved features, the final selection of spectral frequencies may still provide few clues about the structure of the classes (tumor types). In this medical context, data visualization in a low-dimensional representation space may become extremely important, as it would help radiologists to gain insights into this complex and highly sensitive domain. A linear dimensionality reduction technique that provides a data projection—while preserving the class discrimination achieved by a classifier—is also used in our study. The goal of combining feature selection and visualization is to increase the intuitive interpretability of the classifier results.

Section snippets

Literature review

Early attempts to study 1H-MRS data in assessing human brain tumors in vivo can be traced back two decades [4]. This pioneering research showed that spectra corresponding to normal brain spectra and tumors differ significantly in terms of the presence/absence of different metabolites. Even though no ML analysis of spectra was done in establishing these differences, it was concluded that 1H-MRS may help to differentiate tumors for diagnostic and therapeutic purposes, limiting the need for

The 1H-MRS data

1H-MRS is by no means a novel technique for the exploration of the brain, but its use for the routine diagnostic examination of brain abnormal tissue is far for standard in clinical practice. Among the reasons to explain this situation is that a simple visual interpretation of 1H spectra does not easily lead to a clear diagnosis. Moreover, few radiologists (to whom the diagnostic decision pertains) are trained to use and make sense of this technique [17]. Instead, they often resort to magnetic

Methodological setup

Feature selection can often be considered part of model selection and becomes an important step, specially when the number of observations roughly matches the number of features. Performing model selection in the joint space of features and parameters in this situation is at best a delicate task that entails a very high risk of overfitting. In this work feature selection and classifier selection are carried out in an interleaved way. First, feature selection is done in a classifier-independent

Experimental results

The frequency distributions of spectral points selected by the EFA in the bootstrap samples (Section 4.1) are shown in Fig. 2, Fig. 3, Fig. 4.

The results on classifier construction using the previously selected sets of features (also developed in the bootstrap samples, see Section 4.2) are displayed in Table 1. These are the bootstrap estimates of prediction error as given by formula (7), translated to accuracy percentages for ease of reading. Additionally, 95% CIs for the mean are reported in

Conclusions and future work

MRS is yet to become a standard method for day-to-day clinical diagnosis of brain tumors. This is despite it being a non-invasive technique and one that provides rich information about the biochemistry of the tumor pathology. Instead, MRI is often the method of choice for diagnosis in practice, in spite of its limitations. To become mainstream, the diagnosis based on MRS must be sufficiently robust and, for that, reliable tools for spectral data analysis are required. In this study, algorithms

Acknowledgments

Authors gratefully acknowledge the former INTERPRET partners (INTERPRET, EU-IST-1999-10310) and, from 1st January 2003, Generalitat de Catalunya (CIRIT SGR2001-194, XT 2002-48 and XT 2004-51 grants); data providers: Dr. C. Majós (IDI), Dr. A. Moreno-Torres (CDP), Dr. F.A. Howe and Prof. J. Griffiths (SGUL), Prof. A. Heerschap (RU), Dr. W. Gajewicz (MUL) and Dr. J. Calvar (FLENI); data curators: Dr. A.P. Candiota, Ms. T. Delgado, Ms. J. Martín, Mr. I. Olier, Mr. A. Pérez and Prof. Carles Arús

Félix F. González-Navarro is an Associate Professor at the Engineering Institute at Baja California State University, Méxicali, México. Currently, he is a Ph.D. student in the Departament de Llenguatges I Sistemes Informátics at the Universitat Politècnica de Catalunya (UPC), where he investigates in the areas of Pattern Recognition, Feature Selection Algorithms and Information Theory.

References (39)

  • P.J.G. Lisboa et al.

    Cluster-based visualisation with scatter matrices

    Pattern Recognition Lett.

    (2008)
  • N. Sibtain

    The clinical value of proton magnetic resonance spectroscopy in adult brain tumours

    Clin. Radiol.

    (2007)
  • A. Vellido et al.

    Handling outliers in brain tumour MRS data analysis through robust topographic mapping

    Comput. Biol. Med.

    (2006)
  • A. Vellido, E. Biganzoli, P.J.G. Lisboa, Machine learning in cancer research: implications for personalised medicine,...
  • M. Julià-Sapé et al.

    The interpret consortium: a multi-centre, web-accessible and quality control-checked database of in vivo MR spectra of brain tumour patients

    Magn. Reson. Mater. Phys.

    (2006)
  • C. Majós et al.

    Brain tumor classification by proton MR spectroscopy: comparison of diagnostic accuracy at short and long TE

    Am. J. Neuroradiol.

    (2004)
  • H. Bruhn et al.

    Noninvasive differentiation of tumors with use of localized 1H-MR spectroscopy in vivo: initial experience in patients with cerebral tumors

    Radiology

    (1989)
  • H. Kugel et al.

    Human brain tumors: spectral patterns detected with localized 1H-MR spectroscopy

    Radiology

    (1992)
  • D. Ott et al.

    Human brain tumors: assessment with in vivo proton MR spectroscopy

    Radiology

    (1993)
  • Y. Kinoshita et al.

    Proton magnetic resonance spectroscopy of brain tumors: an in vitro study

    Neurosurgery

    (1994)
  • H. Shimizu et al.

    Noninvasive evaluation of malignancy of brain tumors with proton MR spectroscopy

    Am. J. Neuroradiol.

    (1996)
  • A. Tate

    Development of a decision support system for diagnosis and grading of brain tumours using in vivo magnetic resonance single voxel spectra

    NMR Biomed.

    (2006)
  • C. Ladroue, Pattern recognition techniques for the study of magnetic resonance spectra of brain tumours, Ph.D. Thesis,...
  • A. Devos, Quantification and classification of MRS data and applications to brain tumour recognition, Ph.D. Thesis,...
  • J. García, S. Tortajada, C. Vidal, M. Julià-Sapé, J. Luts, S. Van Huffel, C. Arús, M. Robles, On the use of long TE and...
  • J.M. García-Gómez et al.

    The influence of combining two echo times in automatic brain tumor classification by Magnetic Resonance Spectroscopy

    NMR Biomed.

    (2008)
  • K. Kira, L. Rendell, The feature selection problem: traditional methods and a new algorithm, in: Proceedings of the...
  • F. González, Ll. Belanche, Feature Selection in in vivo 1H-MRS single voxel spectra, in: Proceedings of the KES 2008...
  • INTERPRET: International Network for Pattern Recognition of Tumours Using Magnetic Resonance project...
  • Cited by (37)

    • A deep neural network based classifier for brain tumor diagnosis

      2019, Applied Soft Computing Journal
      Citation Excerpt :

      Logistic model trees were used to detect the presence of epileptic seizures based on EEG signals [8]; however, the diagnosis rate was lower. A novel method was developed for finding significant brain tumor features for disease classification [9] but the time and space complexity during the brain tumors diagnosis was not solved. Gray Level Co-occurrence Matrix (GLCM) and Gray-Level Run-Length Matrix (GRLM) were also developed for classification of brain tumor detection [10]; however, feature selection performance was not effective.

    • Regions-of-interest based automated diagnosis of Parkinson's disease using T1-weighted MRI

      2015, Expert Systems with Applications
      Citation Excerpt :

      Computer aided diagnosis based on structural MRI has drawn the attention of pattern recognition and machine learning communities during the past few decades. Numerous approaches have been proposed to diagnose diseases, such as Alzheimer’s disease (Chyzhyk et al., 2014; López et al., 2011), schizophrenia (Nieuwenhuis et al., 2012), tumor or lesion detection (González-Navarro et al., 2010; Zacharaki et al., 2009) and Huntington’s disease (Kassubek et al., 2004). Whereas, in case of PD, the main purpose of MRI acquisition has been to rule out alternative pathologies.

    • Surface recognition improvement in 3D medical laser scanner using Levenberg-Marquardt method

      2013, Signal Processing
      Citation Excerpt :

      At present, neural networks are used as principal solutions for various problems like grouping and classification, pattern recognition, approximation, prediction, clusterization and memory simulation. Neural networks may initially seem complex and computer intensive, but actually may integrate well with a Medical environment in various distinct applications [7–9]. Properly trained backpropagation networks tend to give reasonable answers when presented with inputs that they have never seen.

    • Discriminant Convex Non-negative Matrix Factorization for the classification of human brain tumours

      2013, Pattern Recognition Letters
      Citation Excerpt :

      In most instances, the target was brain tumour automatic diagnosis, a problem treated as one of supervised classification. In the case of MRS, the inclusion of feature selection and extraction methods in the classification process makes it possible to obtain a sparse and practical metabolic characterization of different types of tumours (González-Navarro et al., 2010; Vellido et al., 2012). The use of MRS creates a signal in the frequency domain that can be analyzed in an unsupervised manner to extract its constituent sources.

    View all citing articles on Scopus

    Félix F. González-Navarro is an Associate Professor at the Engineering Institute at Baja California State University, Méxicali, México. Currently, he is a Ph.D. student in the Departament de Llenguatges I Sistemes Informátics at the Universitat Politècnica de Catalunya (UPC), where he investigates in the areas of Pattern Recognition, Feature Selection Algorithms and Information Theory.

    Lluís A. Belanche is an Associate Professor in the Software Department at the Universitat Politènica de Catalunya (UPC). He received his B.Sc. in Computer Science from the UPC in 1990 and an M.Sc. in Artificial Intelligence from the UPC in 1991. He joined the Computer Science Faculty shortly after, where he completed his doctoral dissertation in 2000. He has been doing research in neural networks and support vector machines for pattern recognition and function approximation, as well as in feature selection algorithms and their collective application to workable artificial learning systems.

    Enrique Romero received his B.Sc. degree in Mathematics in 1989 from the Universitat Autònoma de Barcelona. In 1994, he received his B.Sc. in Computer Science from the Universitat Politècnica de Catalunya (UPC). In 1996, he joined the Software Department at the UPC, as an Assistant Professor. He received his M.Sc. degree in Artificial Intelligence and Ph.D. degree in Computer Science from the UPC in 2000 and 2004, respectively. His research interests include neural networks, support vector machines and feature selection.

    Alfredo Vellido received his degree in Physics from the Department of Electronics and Automatic Control of the University of the Basque Country (Spain), in 1996. He completed his Ph.D. at Liverpool John Moores University (UK), in 2000. After a few years of experience in the private sector, he briefly joined Liverpool John Moores University again as research officer in a project in the field of computational neurosciences. He is now a Ramón y Cajal research fellow for the Technical University of Catalonia. Research interests include, but are not limited to, pattern recognition, machine learning and data mining, as well as their application in medicine, market analysis, ecology and e-learning, on which subjects he has published widely.

    Margarida Julià-Sapé holds a B.Sc. Hon. in Biology from the Universitat de Barcelona (UB), Spain, 1994, as well as an M.Sc. in Biotechnology (1995) from the UB. She was awarded her Ph.D. in 2006 by the Universitat Autònoma de Barcelona (UAB), Spain. The author is currently a postdoctoral researcher with the Networking Research Center on Bioengineering, Biomaterials and Nanomedicine (CIBER-BBN), at UAB, Cerdanyola del Vallès, Spain.

    Carles Arús was born in Barcelona (Spain), in 1954. B.Sc. in Biology from the Universitat Autònoma de Barcelona (UAB), Spain, in 1976. Ph.D. in Chemistry from UAB, in 1981 (Ph.D. advisor Prof. Claudi M. Cuchillo) on the subject of the sub-site structure of bovine pancreatic RNase A (enzyme kinetics, NMR spectroscopy). Best thesis award in the Faculty of Sciences of UAB in 1982. Postdoctoral work in the USA (1982-1985) on biomedical NMR with Prof. Michael Bárány (University of Illinois at Chicago, IL) and Prof. John L. Markley (Purdue University, IN). Since 1985, tenured assistant professor, and, since 2002, full Professor at the Department of Biochemistry and Molecular Biology of the UAB. His research group has carried out work on the application of NMR spectroscopy of tumours for diagnostic purposes and has also contributed to the investigation of human muscle bioenergetics by 31P MRS. His present interests in the field of tumour spectroscopy target the use of 1H MRS of human brain tumours, biopsies and cell models for diagnosis, prognosis and therapy planning. He has published 66 PubMed accessible articles since 1977.

    View full text