A Review on Metabolomics Data Analysis for Cancer Applications

Cardoso, Sara; Baptista, Delora; Santos, Rebeca; Rocha, Miguel

doi:10.1007/978-3-319-98702-6_19

A Review on Metabolomics Data Analysis for Cancer Applications

Sara Cardoso¹⁹,
Delora Baptista¹⁹,
Rebeca Santos¹⁹ &
…
Miguel Rocha¹⁹

Conference paper
First Online: 17 August 2018

508 Accesses
1 Citations

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 803))

Abstract

Cancer cells undergo metabolic changes that contribute to tumorigenesis, which can be determined using metabolomics data produced by techniques such as nuclear magnetic resonance and mass spectroscopy, and analyzed through statistical and machine learning methods. Since these data represent well the metabolic phenotype of these cells, they are very relevant in cancer research, to better understand tumour cells metabolism and help in efforts of biomarker and drug target discovery. This mini-review focuses on data analysis methods that are commonly used to extract knowledge from cancer metabolomics data, such as univariate analysis and supervised and unsupervised multivariate data analysis, including clustering and machine learning.

Download conference paper PDF

1 Introduction

Cancer, a label applied to a variety of diseases featuring excessive cell proliferation, is driven by changes at the genomic level, which define a distinct metabolic profile that supports the tumorigenic process. A common alteration, usually referred to as the Warburg effect [1], is the observation that cancer cells resort to glycolysis with subsequent lactate fermentation to produce energy, even under aerobic conditions. Many other metabolic changes have since been documented, and a recent review has identified six cancer metabolism hallmarks [2].

These changes in intracellular, extracellular, and circulating metabolites can be assessed by applying one of two approaches. Targeted studies focus on a selected subset of known metabolites, while untargeted studies attempt to profile the metabolome in a non-predefined manner. The metabolomics data can be obtained using techniques such as Nuclear Magnetic Resonance (NMR) spectroscopy, and Mass Spectroscopy, normally coupled to Gas or Liquid Chromatography (GC/LC-MS).

Table 1. Data analysis methods used in a selection of recent cancer NMR and MS studies.

Full size table

NMR has been extensively used for several purposes in cancer studies, such as the distinction between tumor and normal samples [3], prediction of patient survival [4] and tumor recurrence [5], and monitoring tumor drug response [6]. On the other hand, applications of MS in cancer research include the characterization of metabolite signatures in lung cancer patients undergoing treatment [7], and several cases of metabolic profiling to find diagnostic/prognostic biomarkers of tumors like lung, colorectal, ovarian and hepatocellular tumors [8,9,10,11,12].

Univariate and multivariate statistical methods can be applied to analyze NMR and MS peaks data or even on the metabolites identified from the data of these techniques and respective concentrations. Table 1 shows a selection of the most relevant studies in cancer metabolomics using NMR and MS. The data analysis strategies will be presented in the following sections.

2 Univariate Analysis

Univariate analysis studies a data variable at a time, crossing its values with those of metadata variables, being easy to perform and interpret, using methods such as t-tests (TT), one-way and multifactor analysis of variance (ANOVA), MannWhitney (MW), Kruskal-Wallis (KW) and Kolmogorov-Smirnov (KS) tests, fold change (FC), regression and correlation analysis (CA). These can provide sets of (ranked) variables, candidates for a better discrimination of a clinical variable. Thus, these techniques are quite useful for biomarker prediction, as well as a first step in classification or regression with machine learning.

Specifically in metabolomic cancer studies, univariate analysis has been performed in many studies as is clear from the previous table. One example is the use of one-way ANOVA and Tukey’s Honest Significant Difference (HSD) test in studying NMR data from breast cancer cells [15]. Also, in a chemotherapy breast cancer study [33], the authors performed paired/unpaired t-tests over MS data, as well as two-way ANOVA to study the interaction of two variables.

3 Unsupervised Multivariate Analysis

This type of analysis summarizes data and thus detects patterns that can be related to biological or experimental variables.

Principal Component Analysis (PCA) is the most frequently used unsupervised learning method for data analysis, normally used in metabolomics to discover patterns in the data which may reveal how samples group based on their metabolic profiles. It is a dimensionality reduction technique, which produces new variables through linear combinations of the original variables [35], to explain as much of the variance in the original data set as possible.

In recent cancer studies using NMR, PCA has been applied, for instance, to discriminate between four groups of MCF7 breast cancer cell lines with or without tamoxifen resistance and/or CK-\(\upalpha \) downregulation [18], and to separate gastric cancer samples from control samples [3]. Regarding MS approaches, there are also some studies using PCA, for the detection of biomarkers related to prostate cancer, by combining it with supervised methods [36] or to access the different metabolic profiles of ovarian cancer stem cells and cancer cells [11].

On the other hand, Hierarchical clustering (HC) separates observations into groups and establishes a hierarchical ordering of the data points by taking into consideration a measure of dissimilarity between observations. In [15], HC was performed on metabolite concentration data derived from NMR experiments of different breast cancer cell lines to assess the effect of radiation therapy or poly ADP-ribose polymerase inhibition. In another study, the authors [13] used HC to evaluate the separation between advanced colorectal cancer samples and controls, based on data from NMR of fecal extracts. They did, however, conclude that PCA performed better at this task than HC. In [11], following a MS approach, HC demonstrated a clear separation between cell types, based on the intracellular profile of ovarian cancer stem cells and ovarian cancer cells, while in another MS cancer study, HC allowed the estimation of clinical metabolic biomarkers from plasma for diagnosis of esophageal squamous-cell carcinoma [37].

K-means is another clustering approach. It partitions observations into a pre-defined number k of groups. The algorithm is initialized considering k observations to be the initial clusters, and samples are assigned to the cluster with the nearest mean, recalculating the clusters after every assignment [38]. As an example of its application over MS data, in [39] the authors used it to identify metabolite signatures of malignant glioma from human cerebrospinal fluid.

4 Supervised Multivariate Analysis

On the other hand, supervised multivariate analysis creates models capable of predicting an output from a certain data input, based on data with known output.

Partial least squares (PLS) regression, partial least squares discriminant analysis (PLS-DA), and orthogonal partial least squares discriminant analysis (OPLS-DA) are the most popular supervised learning methods used in metabolomics studies. PLS [40] models the relationship between a matrix of predictor variables and one or more output variables by finding a set of new variables that maximize the explained covariance. PLS-DA is an adaptation of the partial least squares algorithm for classification, and is used to analyze group separation [41].

In recent metabolomics NMR and MS studies, PLS-DA has been used, for instance, to identify a urinary metabolite signature for renal cell carcinoma [19]. In [33], PLS-DA, using MS data, revealed a trend to separate premenopausal and postmenopausal samples, suggesting that altered serum levels of oleic acid in breast cancer patients are associated with their response to chemotherapy. OPLS-DA is a variant of PLS-DA in which non-correlated variation is removed to facilitate model interpretability [42]. It has been applied, for example, to discriminate between pancreatic adenocarcinoma and healthy tissue [4], and to differentiate between basal cell carcinoma and normal skin samples [21].

To build predictors in cancer metabolomics studies, random forests (RF) represent another model that can be used for classification or regression. RFs are ensembles of decision trees, which are made up of decision rules that are inferred from input data [43]. In an experiment, RF was used to determine if NMR data could distinguish between groups of cancer patients (with cachexia, pre-cachexia or weight stable) and healthy controls [26]. This RF was used as a feature selection step, evaluating the importance of each metabolite and subsequently selecting the fifteen most predictive metabolites. In another study using both NMR spectroscopy and MS [44], experimental data was used to train a RF that could distinguish between hepatocellular carcinoma, liver cirrhosis and control serum samples. The RF was valuable in selecting the most important metabolites that could accurately discriminate the groups and could be considered potential biomarkers. In another study, [9], RF models used MS data to train a set of lung cancer and control cases. The model revealed that three of the most highly well-known nicotine metabolites (cotinine, nicotine-N-oxide, and trans-3-hydroxycotinine) were the most important ones for the model to distinguish between both cases.

A Support Vector Machine (SVM) [45] is a machine learning method that maps input features to a new, linear feature space using a kernel function. Regarding NMR studies, [46] used a SVM with a radial basis function kernel to classify cell extracts from normal and hepatocellular carcinoma cell lines as well as the respective culture media. In [14], two supervised methods were combined - PLS was applied as a dimensionality reduction method and the resulting scores were used to train a SVM model to distinguish between patients with metastatic colorectal cancer and healthy individuals. In the same study, a PLS-SVM approach was also used to predict overall survival for the patients with metastatic colorectal cancer. In [47], SVM models were applied on MS data collected for sixteen diagnostic metabolites from lipid and fatty acid metabolism, allowing the identification of early-stage ovarian cancer patients.

5 Case Studies

Specmine [48] is an R package, developed in our group, for metabolomics data analysis that allows users to perform the analyses described in the previous sections, and many others. To demonstrate its usefulness in cancer metabolomics studies based on NMR and MS techniques, two studies were reproduced using the specmine package. The fully detailed reports can be accessed in the URL http://darwin.di.uminho.pt/PACBB2018/metabolomics.

The first study [49] analyzed the possible association of metabolism with the altered expression of the inositol 1,4,5 trisphosphate (IP3R) receptor in breast cancer, as this receptor is known to regulate metabolism and cellular bioenergetics and is upregulated in a number of cancers, by using the 1H CPMG NMR technique. Data for this analysis was obtained from the Metabolights website [50], under the study MTBLS152. The analysis performed included PCA and PLS-DA. Although there were some differences in terms of results, possibly due to the use of a dataset that is slightly different to the original file used by the authors, the specmine results confirm that PCA and PLS-DA were able to discriminate between samples with high and/or low expression of the gene that encodes inositol 1,4,5-trisphosphate receptor type 3 and healthy control samples.

The second study [11] analyzed the differences between ovarian cancer cells (OCCs) and cancer stem cells (OCSCs) as regards the intracellular and extracellular metabolomic profiles, by using the GC-MS technique. Data for this analysis was also obtained from Metabolights, under the study MTBLS152. The analysis performed included PCA and t-tests. Overall, the obtained results were very similar to the ones present in the article. Some of the differences may be due to the study authors not fully explaining how the analysis was conducted, especially regarding how they handled the fact that, in some cases, the same metabolite had different concentration levels for each sample.

6 Conclusions

Although the typical procedure in metabolomics data analysis usually involves PCA and PLS-DA/OPLS-DA analyses, most studies use a variety of data analysis methods that confirm and complement one another. Some recent cancer metabolomics studies have explored other machine learning techniques to build predictors based on NMR and/or MS data. These alternative predictors may be useful to build more robust classifiers and to extract biologically meaningful information from metabolomics data, such as identifying potential metabolic biomarkers. In the future, it would be interesting to see how these and other alternatives perform when compared to established methods.

Furthermore, with the reproduction of two studies using the specmine package, it is noticeable that this R package can be very useful in metabolomics data analysis, not only in univariate analysis, but also in multivariate analysis, such as machine learning and PCA.

References

Warburg, O.: On the origin of cancer cells. Science 123(3191), 309–314 (1956)
Article Google Scholar
Pavlova, N.N., Thompson, C.B.: The emerging hallmarks of cancer metabolism. Cell Metab. 23(1), 27–47 (2016)
Article Google Scholar
Wang, H., et al.: Tissue metabolic profiling of human gastric cancer assessed by 1H NMR. BMC Cancer 16(1), 371 (2016)
Article MathSciNet Google Scholar
Battini, S., et al.: Metabolomics approaches in pancreatic adenocarcinoma: tumor metabolism profiling predicts clinical outcome of patients. BMC Med. 15(1), 56 (2017)
Google Scholar
Hart, C.D., et al.: Serum metabolomic profiles identify ER-positive early breast cancer patients at increased risk of disease recurrence in a multicenter population. Clin. Cancer Res. 23(6), 1422–1431 (2017)
Article Google Scholar
Belkaid, A., et al.: Metabolic effect of estrogen receptor agonists on breast cancer cells in the presence or absence of carbonic anhydrase inhibitors. Metabolites 6(2), 16 (2016)
Google Scholar
Hao, D., et al.: Temporal characterization of serum metabolite signatures in lung cancer patients undergoing treatment. Metabolomics 12(3), 58 (2016)
Google Scholar
Fahrmann, J.F., et al.: Serum phosphatidylethanolamine levels distinguish benign from malignant solitary pulmonary nodules and represent a potential diagnostic biomarker for lung cancer. Cancer Biomarkers 16(4), 609–617 (2016)
Google Scholar
Mathé, E.A., et al.: Noninvasive urinary metabolomic profiling identifies diagnostic and prognostic markers in lung cancer. Cancer Res. 74(12), 3259–3270 (2014)
Article Google Scholar
Zhu, J., et al.: Colorectal cancer detection using targeted serum metabolic profiling. J. Proteome Res. 13(9), 4120–4130 (2014)
Google Scholar
Vermeersch, K., et al.: OVCAR-3 spheroid-derived cells display distinct metabolic profiles. PLoS One 10(2), e0118262 (2015)
Article Google Scholar
Ranjbar, M., et al.: GC-MS based plasma metabolomics for identification of candidate biomarkers for hepatocellular carcinoma in Egyptian cohort. PLoS One 10(6), e0127299 (2015)
Article Google Scholar
Amiot, A., et al.: 1 H NMR spectroscopy of fecal extracts enables detection of advanced Colorectal Neoplasia. J. Proteome Res. 14(9), 3871–3881 (2015)
Article Google Scholar
Bertini, I., et al.: Metabolomic NMR fingerprinting to identify and predict survival of patients with metastatic colorectal cancer. Cancer Res. 72(1), 356–364 (2012)
Article Google Scholar
Bhute, V.J., et al.: The poly (ADP-Ribose) polymerase inhibitor veliparib and radiation cause significant cell line dependent metabolic changes in breast cancer cells. Sci. Rep. 6(1), 36061 (2016)
Article Google Scholar
Chan, A.W., et al.: 1H-NMR urinary metabolomic profiling for diagnosis of gastric cancer. Br. J. Cancer 114(1), 59–62 (2016)
Article Google Scholar
Fages, A., et al.: Metabolomic profiles of hepatocellular carcinoma in a European prospective cohort. BMC Med. 13(1), 242 (2015)
Article Google Scholar
Kim, H.S., et al.: Investigation of discriminant metabolites in tamoxifen-resistant and choline kinase-alpha-downregulated breast cancer cells using 1H-nuclear magnetic resonance spectroscopy. PLoS One 12(6), e0179773 (2017)
Article Google Scholar
Monteiro, M.S., et al.: Nuclear magnetic resonance metabolomics reveals an excretory metabolic signature of renal cell carcinoma. Sci. Rep. 6(1), 37275 (2016)
Article Google Scholar
Morin, P.J., et al.: NMR metabolomics analysis of the effects of 5-lipoxygenase inhibitors on metabolism in glioblastomas. J. Proteome Res. 12(5), 2165–2176 (2013)
Article Google Scholar
Mun, J., et al.: Discrimination of basal cell carcinoma from normal skin tissue using high-resolution magic angle spinning 1H NMR spectroscopy. PLoS One 11(3), e0150328 (2016)
Article Google Scholar
Roberts, M.J., et al.: Seminal plasma enables selection and monitoring of active surveillance candidates using nuclear magnetic resonance-based metabolomics: a preliminary investigation. Prostate Int. 5(4), 149–157 (2017)
Article Google Scholar
Shao, W., et al.: Malignancy-associated metabolic profiling of human glioma cell lines using 1H NMR spectroscopy. Mol. Cancer 13(1), 197 (2014)
Article Google Scholar
Tsai, I., et al.: Metabolomic dynamic analysis of hypoxia in MDA-MB-231 and the comparison with inferred metabolites from transcriptomics data. Cancers 5(2), 491–510 (2013)
Article Google Scholar
Uifăalean, A., et al.: The impact of Soy Iso avones on MCF-7 and MDA-MB-231 breast cancer cells using a global metabolomic approach. Int. J. Mol. Sci. 17(9), 1443 (2016)
Article Google Scholar
Yang, Q.J., et al.: Serum and urine metabolomics study reveals a distinct diagnostic model for cancer cachexia. J. Cachexia Sarcopenia Muscle 9(1), 1–15 (2017)
Google Scholar
Miyamoto, S., et al.: Systemic metabolomic changes in blood samples of lung cancer patients identified by gas chromatography time-of-flight mass spectrometry. Metabolites 5(2), 192–210 (2015)
Article Google Scholar
Batova, A., et al.: Englerin A induces an acute inflammatory response and reveals lipid metabolism and ER stress as targetable vulnerabilities in renal cell carcinoma. PLoS One 12(3), e0172632 (2017)
Article Google Scholar
Xiao, J.F., et al.: LC-MS based serum metabolomics for identification of hepatocellular carcinoma biomarkers in Egyptian cohort. J. Proteome Res. 11(12), 5914–5923 (2012)
Article Google Scholar
Dhakshinamoorthy, S., et al.: Metabolomics identifies the intersection of phosphoethanolamine with menaquinone-triggered apoptosis in an in vitro model of leukemia. Mol. BioSyst. 11(9), 2406–2416 (2015)
Google Scholar
Mackay, E.: Fatty acid synthesis in colorectal cancer: characterization of lipid metabolism in serum, tumour, and normal host tissues. Ph.D. thesis, University of Calgary (2015)
Google Scholar
Vermeersch, K.A., et al.: Distinct metabolic responses of an ovarian cancer stem cell line. BMC Syst. Biol. 8(1), 134 (2014)
Google Scholar
Hilvo, M., et al.: Monounsaturated fatty acids in serum triacylglycerols are associated with response to neoadjuvant chemotherapy in breast cancer patients. Int. J. Cancer 134(7), 1725–1733 (2014)
Article Google Scholar
Ressom, H.W., et al.: Utilization of metabolomics to identify serum biomarkers for hepatocellular carcinoma in patients with liver cirrhosis. Analytica Chimica Acta 743, 90–100 (2012)
Article Google Scholar
Abdi, H., Williams, L.J.: Principal component analysis. Wiley Interdisc. Rev. Comput. Stat. 2(4), 433–459 (2010)
Article Google Scholar
Zhang, T., et al.: Application of holistic liquid chromatography-high resolution mass spectrometry based urinary metabolomics for prostate cancer detection and biomarker discovery. PLoS One 8(6), e65880 (2013)
Google Scholar
Liu, R., et al.: Identification of plasma metabolomic profiling for diagnosis of esophageal squamous-cell carcinoma using an UPLC/TOF/MS platform. Int. J. Mol. Sci. 14(5), 8899–8911 (2013)
Article Google Scholar
Macqueen, J.: Some methods for classification and analysis of multivariate observations. In: Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, vol. 1(233), pp. 281–297 (1967)
Google Scholar
Locasale, J.W., et al.: Metabolomics of human cerebrospinal fluid identifies signatures of malignant glioma. Mol. Cellular Proteomics 11(6), M111.014688 (2012)
Google Scholar
Wold, H.: Estimation of principal components and related models by iterative least squares. In: Multivariate Analysis, pp. 1391–1420 (1966)
Google Scholar
Barker, M., Rayens, W.: Partial least squares for discrimination. J. Chemom. 17(3), 166–173 (2003)
Article Google Scholar
Trygg, J., Wold, S.: Orthogonal projections to latent structures (OPLS). J. Chemom. 16(3), 119–128 (2002)
Google Scholar
Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)
Article Google Scholar
Liu, Y., et al.: NMR and LC/MS-based global metabolomics to identify serum biomarkers differentiating hepatocellular carcinoma from liver cirrhosis. Int. J. Cancer 135(3), 658–668 (2014)
Google Scholar
Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20(3), 273–297 (1995)
MATH Google Scholar
Chen, Y., et al.: Metabolic profiling of normal hepatocyte and hepatocellular carcinoma cells via 1 H nuclear magnetic resonance spectroscopy. Cell Biol. Int. 9999, 1–10 (2017)
Google Scholar
Gaul, D.A., et al.: Highly-accurate metabolomic detection of early-stage ovarian cancer. Sci. Rep. 5, 16351 (2015)
Article Google Scholar
Costa, C., et al.: An R package for the integrated analysis of metabolomics and spectral data. Comput. Meth. Prog. Biomed. 129, 117–124 (2016)
Article Google Scholar
Singh, A., et al.: 1H NMR metabolomics reveals association of high expression of inositol 1,4,5 trisphosphate receptor and metabolites in breast cancer patients. PLoS One 12(1), 1–20 (2017)
Google Scholar
Haug, K., et al.: MetaboLights - an open-access general-purpose repository for metabolomics studies and associated meta-data. Nucleic Acids Res. 41(D1), D781–D786 (2013)
Article Google Scholar

Download references

Acknowledgments

This work is co-funded by the North Portugal Regional Operational Programme, under the “Portugal 2020”, through the European Regional Development Fund (ERDF), within project SISBI- Ref\(^a\)NORTE-01-0247-FEDER-003381.

This study was also supported by the Portuguese Foundation for Science and Technology (FCT) under the scope of the strategic funding of UID/BIO/04469/2013 unit and COMPETE 2020 (POCI-01-0145-FEDER-006684) and BioTecNorte operation (NORTE-01-0145-FEDER-000004) funded by European Regional Development Fund under the scope of Norte2020 - Programa Operacional Regional do Norte.

Author information

Authors and Affiliations

CEB - Centre Biological Engineering, University of Minho, Campus of Gualtar, Braga, Portugal
Sara Cardoso, Delora Baptista, Rebeca Santos & Miguel Rocha

Authors

Sara Cardoso
View author publications
You can also search for this author in PubMed Google Scholar
Delora Baptista
View author publications
You can also search for this author in PubMed Google Scholar
Rebeca Santos
View author publications
You can also search for this author in PubMed Google Scholar
Miguel Rocha
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sara Cardoso .

Editor information

Editors and Affiliations

Escuela Superior de Ingeniería Informática, Universidad de Vigo, Ourense, Spain
Florentino Fdez-Riverola
Faculty of Computing, Department of Software Engineering, Universiti Teknologi Malaysia, Johor, Malaysia
Mohd Saberi Mohamad
Department de Informática, Universidade do Minho, Braga, Portugal
Miguel Rocha
Departamento de Informática y Automática, Facultad de Ciencias, Universidad de Salamanca, Salamanca, Spain
Juan F. De Paz
Departamento de Sistemas Informáticos, Universidad de Castilla-La Mancha, Albacete, Albacete, Spain
Pascual González

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Cardoso, S., Baptista, D., Santos, R., Rocha, M. (2019). A Review on Metabolomics Data Analysis for Cancer Applications. In: Fdez-Riverola, F., Mohamad, M., Rocha, M., De Paz, J., González, P. (eds) Practical Applications of Computational Biology and Bioinformatics, 12th International Conference. PACBB2018 2018. Advances in Intelligent Systems and Computing, vol 803. Springer, Cham. https://doi.org/10.1007/978-3-319-98702-6_19

Download citation

DOI: https://doi.org/10.1007/978-3-319-98702-6_19
Published: 17 August 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-98701-9
Online ISBN: 978-3-319-98702-6
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics