Abstract
Cancer cells undergo metabolic changes that contribute to tumorigenesis, which can be determined using metabolomics data produced by techniques such as nuclear magnetic resonance and mass spectroscopy, and analyzed through statistical and machine learning methods. Since these data represent well the metabolic phenotype of these cells, they are very relevant in cancer research, to better understand tumour cells metabolism and help in efforts of biomarker and drug target discovery. This mini-review focuses on data analysis methods that are commonly used to extract knowledge from cancer metabolomics data, such as univariate analysis and supervised and unsupervised multivariate data analysis, including clustering and machine learning.
1 Introduction
Cancer, a label applied to a variety of diseases featuring excessive cell proliferation, is driven by changes at the genomic level, which define a distinct metabolic profile that supports the tumorigenic process. A common alteration, usually referred to as the Warburg effect [1], is the observation that cancer cells resort to glycolysis with subsequent lactate fermentation to produce energy, even under aerobic conditions. Many other metabolic changes have since been documented, and a recent review has identified six cancer metabolism hallmarks [2].
These changes in intracellular, extracellular, and circulating metabolites can be assessed by applying one of two approaches. Targeted studies focus on a selected subset of known metabolites, while untargeted studies attempt to profile the metabolome in a non-predefined manner. The metabolomics data can be obtained using techniques such as Nuclear Magnetic Resonance (NMR) spectroscopy, and Mass Spectroscopy, normally coupled to Gas or Liquid Chromatography (GC/LC-MS).
NMR has been extensively used for several purposes in cancer studies, such as the distinction between tumor and normal samples [3], prediction of patient survival [4] and tumor recurrence [5], and monitoring tumor drug response [6]. On the other hand, applications of MS in cancer research include the characterization of metabolite signatures in lung cancer patients undergoing treatment [7], and several cases of metabolic profiling to find diagnostic/prognostic biomarkers of tumors like lung, colorectal, ovarian and hepatocellular tumors [8,9,10,11,12].
Univariate and multivariate statistical methods can be applied to analyze NMR and MS peaks data or even on the metabolites identified from the data of these techniques and respective concentrations. Table 1 shows a selection of the most relevant studies in cancer metabolomics using NMR and MS. The data analysis strategies will be presented in the following sections.
2 Univariate Analysis
Univariate analysis studies a data variable at a time, crossing its values with those of metadata variables, being easy to perform and interpret, using methods such as t-tests (TT), one-way and multifactor analysis of variance (ANOVA), MannWhitney (MW), Kruskal-Wallis (KW) and Kolmogorov-Smirnov (KS) tests, fold change (FC), regression and correlation analysis (CA). These can provide sets of (ranked) variables, candidates for a better discrimination of a clinical variable. Thus, these techniques are quite useful for biomarker prediction, as well as a first step in classification or regression with machine learning.
Specifically in metabolomic cancer studies, univariate analysis has been performed in many studies as is clear from the previous table. One example is the use of one-way ANOVA and Tukey’s Honest Significant Difference (HSD) test in studying NMR data from breast cancer cells [15]. Also, in a chemotherapy breast cancer study [33], the authors performed paired/unpaired t-tests over MS data, as well as two-way ANOVA to study the interaction of two variables.
3 Unsupervised Multivariate Analysis
This type of analysis summarizes data and thus detects patterns that can be related to biological or experimental variables.
Principal Component Analysis (PCA) is the most frequently used unsupervised learning method for data analysis, normally used in metabolomics to discover patterns in the data which may reveal how samples group based on their metabolic profiles. It is a dimensionality reduction technique, which produces new variables through linear combinations of the original variables [35], to explain as much of the variance in the original data set as possible.
In recent cancer studies using NMR, PCA has been applied, for instance, to discriminate between four groups of MCF7 breast cancer cell lines with or without tamoxifen resistance and/or CK-\(\upalpha \) downregulation [18], and to separate gastric cancer samples from control samples [3]. Regarding MS approaches, there are also some studies using PCA, for the detection of biomarkers related to prostate cancer, by combining it with supervised methods [36] or to access the different metabolic profiles of ovarian cancer stem cells and cancer cells [11].
On the other hand, Hierarchical clustering (HC) separates observations into groups and establishes a hierarchical ordering of the data points by taking into consideration a measure of dissimilarity between observations. In [15], HC was performed on metabolite concentration data derived from NMR experiments of different breast cancer cell lines to assess the effect of radiation therapy or poly ADP-ribose polymerase inhibition. In another study, the authors [13] used HC to evaluate the separation between advanced colorectal cancer samples and controls, based on data from NMR of fecal extracts. They did, however, conclude that PCA performed better at this task than HC. In [11], following a MS approach, HC demonstrated a clear separation between cell types, based on the intracellular profile of ovarian cancer stem cells and ovarian cancer cells, while in another MS cancer study, HC allowed the estimation of clinical metabolic biomarkers from plasma for diagnosis of esophageal squamous-cell carcinoma [37].
K-means is another clustering approach. It partitions observations into a pre-defined number k of groups. The algorithm is initialized considering k observations to be the initial clusters, and samples are assigned to the cluster with the nearest mean, recalculating the clusters after every assignment [38]. As an example of its application over MS data, in [39] the authors used it to identify metabolite signatures of malignant glioma from human cerebrospinal fluid.
4 Supervised Multivariate Analysis
On the other hand, supervised multivariate analysis creates models capable of predicting an output from a certain data input, based on data with known output.
Partial least squares (PLS) regression, partial least squares discriminant analysis (PLS-DA), and orthogonal partial least squares discriminant analysis (OPLS-DA) are the most popular supervised learning methods used in metabolomics studies. PLS [40] models the relationship between a matrix of predictor variables and one or more output variables by finding a set of new variables that maximize the explained covariance. PLS-DA is an adaptation of the partial least squares algorithm for classification, and is used to analyze group separation [41].
In recent metabolomics NMR and MS studies, PLS-DA has been used, for instance, to identify a urinary metabolite signature for renal cell carcinoma [19]. In [33], PLS-DA, using MS data, revealed a trend to separate premenopausal and postmenopausal samples, suggesting that altered serum levels of oleic acid in breast cancer patients are associated with their response to chemotherapy. OPLS-DA is a variant of PLS-DA in which non-correlated variation is removed to facilitate model interpretability [42]. It has been applied, for example, to discriminate between pancreatic adenocarcinoma and healthy tissue [4], and to differentiate between basal cell carcinoma and normal skin samples [21].
To build predictors in cancer metabolomics studies, random forests (RF) represent another model that can be used for classification or regression. RFs are ensembles of decision trees, which are made up of decision rules that are inferred from input data [43]. In an experiment, RF was used to determine if NMR data could distinguish between groups of cancer patients (with cachexia, pre-cachexia or weight stable) and healthy controls [26]. This RF was used as a feature selection step, evaluating the importance of each metabolite and subsequently selecting the fifteen most predictive metabolites. In another study using both NMR spectroscopy and MS [44], experimental data was used to train a RF that could distinguish between hepatocellular carcinoma, liver cirrhosis and control serum samples. The RF was valuable in selecting the most important metabolites that could accurately discriminate the groups and could be considered potential biomarkers. In another study, [9], RF models used MS data to train a set of lung cancer and control cases. The model revealed that three of the most highly well-known nicotine metabolites (cotinine, nicotine-N-oxide, and trans-3-hydroxycotinine) were the most important ones for the model to distinguish between both cases.
A Support Vector Machine (SVM) [45] is a machine learning method that maps input features to a new, linear feature space using a kernel function. Regarding NMR studies, [46] used a SVM with a radial basis function kernel to classify cell extracts from normal and hepatocellular carcinoma cell lines as well as the respective culture media. In [14], two supervised methods were combined - PLS was applied as a dimensionality reduction method and the resulting scores were used to train a SVM model to distinguish between patients with metastatic colorectal cancer and healthy individuals. In the same study, a PLS-SVM approach was also used to predict overall survival for the patients with metastatic colorectal cancer. In [47], SVM models were applied on MS data collected for sixteen diagnostic metabolites from lipid and fatty acid metabolism, allowing the identification of early-stage ovarian cancer patients.
5 Case Studies
Specmine [48] is an R package, developed in our group, for metabolomics data analysis that allows users to perform the analyses described in the previous sections, and many others. To demonstrate its usefulness in cancer metabolomics studies based on NMR and MS techniques, two studies were reproduced using the specmine package. The fully detailed reports can be accessed in the URL http://darwin.di.uminho.pt/PACBB2018/metabolomics.
The first study [49] analyzed the possible association of metabolism with the altered expression of the inositol 1,4,5 trisphosphate (IP3R) receptor in breast cancer, as this receptor is known to regulate metabolism and cellular bioenergetics and is upregulated in a number of cancers, by using the 1H CPMG NMR technique. Data for this analysis was obtained from the Metabolights website [50], under the study MTBLS152. The analysis performed included PCA and PLS-DA. Although there were some differences in terms of results, possibly due to the use of a dataset that is slightly different to the original file used by the authors, the specmine results confirm that PCA and PLS-DA were able to discriminate between samples with high and/or low expression of the gene that encodes inositol 1,4,5-trisphosphate receptor type 3 and healthy control samples.
The second study [11] analyzed the differences between ovarian cancer cells (OCCs) and cancer stem cells (OCSCs) as regards the intracellular and extracellular metabolomic profiles, by using the GC-MS technique. Data for this analysis was also obtained from Metabolights, under the study MTBLS152. The analysis performed included PCA and t-tests. Overall, the obtained results were very similar to the ones present in the article. Some of the differences may be due to the study authors not fully explaining how the analysis was conducted, especially regarding how they handled the fact that, in some cases, the same metabolite had different concentration levels for each sample.
6 Conclusions
Although the typical procedure in metabolomics data analysis usually involves PCA and PLS-DA/OPLS-DA analyses, most studies use a variety of data analysis methods that confirm and complement one another. Some recent cancer metabolomics studies have explored other machine learning techniques to build predictors based on NMR and/or MS data. These alternative predictors may be useful to build more robust classifiers and to extract biologically meaningful information from metabolomics data, such as identifying potential metabolic biomarkers. In the future, it would be interesting to see how these and other alternatives perform when compared to established methods.
Furthermore, with the reproduction of two studies using the specmine package, it is noticeable that this R package can be very useful in metabolomics data analysis, not only in univariate analysis, but also in multivariate analysis, such as machine learning and PCA.
References
Warburg, O.: On the origin of cancer cells. Science 123(3191), 309–314 (1956)
Pavlova, N.N., Thompson, C.B.: The emerging hallmarks of cancer metabolism. Cell Metab. 23(1), 27–47 (2016)
Wang, H., et al.: Tissue metabolic profiling of human gastric cancer assessed by 1H NMR. BMC Cancer 16(1), 371 (2016)
Battini, S., et al.: Metabolomics approaches in pancreatic adenocarcinoma: tumor metabolism profiling predicts clinical outcome of patients. BMC Med. 15(1), 56 (2017)
Hart, C.D., et al.: Serum metabolomic profiles identify ER-positive early breast cancer patients at increased risk of disease recurrence in a multicenter population. Clin. Cancer Res. 23(6), 1422–1431 (2017)
Belkaid, A., et al.: Metabolic effect of estrogen receptor agonists on breast cancer cells in the presence or absence of carbonic anhydrase inhibitors. Metabolites 6(2), 16 (2016)
Hao, D., et al.: Temporal characterization of serum metabolite signatures in lung cancer patients undergoing treatment. Metabolomics 12(3), 58 (2016)
Fahrmann, J.F., et al.: Serum phosphatidylethanolamine levels distinguish benign from malignant solitary pulmonary nodules and represent a potential diagnostic biomarker for lung cancer. Cancer Biomarkers 16(4), 609–617 (2016)
Mathé, E.A., et al.: Noninvasive urinary metabolomic profiling identifies diagnostic and prognostic markers in lung cancer. Cancer Res. 74(12), 3259–3270 (2014)
Zhu, J., et al.: Colorectal cancer detection using targeted serum metabolic profiling. J. Proteome Res. 13(9), 4120–4130 (2014)
Vermeersch, K., et al.: OVCAR-3 spheroid-derived cells display distinct metabolic profiles. PLoS One 10(2), e0118262 (2015)
Ranjbar, M., et al.: GC-MS based plasma metabolomics for identification of candidate biomarkers for hepatocellular carcinoma in Egyptian cohort. PLoS One 10(6), e0127299 (2015)
Amiot, A., et al.: 1 H NMR spectroscopy of fecal extracts enables detection of advanced Colorectal Neoplasia. J. Proteome Res. 14(9), 3871–3881 (2015)
Bertini, I., et al.: Metabolomic NMR fingerprinting to identify and predict survival of patients with metastatic colorectal cancer. Cancer Res. 72(1), 356–364 (2012)
Bhute, V.J., et al.: The poly (ADP-Ribose) polymerase inhibitor veliparib and radiation cause significant cell line dependent metabolic changes in breast cancer cells. Sci. Rep. 6(1), 36061 (2016)
Chan, A.W., et al.: 1H-NMR urinary metabolomic profiling for diagnosis of gastric cancer. Br. J. Cancer 114(1), 59–62 (2016)
Fages, A., et al.: Metabolomic profiles of hepatocellular carcinoma in a European prospective cohort. BMC Med. 13(1), 242 (2015)
Kim, H.S., et al.: Investigation of discriminant metabolites in tamoxifen-resistant and choline kinase-alpha-downregulated breast cancer cells using 1H-nuclear magnetic resonance spectroscopy. PLoS One 12(6), e0179773 (2017)
Monteiro, M.S., et al.: Nuclear magnetic resonance metabolomics reveals an excretory metabolic signature of renal cell carcinoma. Sci. Rep. 6(1), 37275 (2016)
Morin, P.J., et al.: NMR metabolomics analysis of the effects of 5-lipoxygenase inhibitors on metabolism in glioblastomas. J. Proteome Res. 12(5), 2165–2176 (2013)
Mun, J., et al.: Discrimination of basal cell carcinoma from normal skin tissue using high-resolution magic angle spinning 1H NMR spectroscopy. PLoS One 11(3), e0150328 (2016)
Roberts, M.J., et al.: Seminal plasma enables selection and monitoring of active surveillance candidates using nuclear magnetic resonance-based metabolomics: a preliminary investigation. Prostate Int. 5(4), 149–157 (2017)
Shao, W., et al.: Malignancy-associated metabolic profiling of human glioma cell lines using 1H NMR spectroscopy. Mol. Cancer 13(1), 197 (2014)
Tsai, I., et al.: Metabolomic dynamic analysis of hypoxia in MDA-MB-231 and the comparison with inferred metabolites from transcriptomics data. Cancers 5(2), 491–510 (2013)
Uifăalean, A., et al.: The impact of Soy Iso avones on MCF-7 and MDA-MB-231 breast cancer cells using a global metabolomic approach. Int. J. Mol. Sci. 17(9), 1443 (2016)
Yang, Q.J., et al.: Serum and urine metabolomics study reveals a distinct diagnostic model for cancer cachexia. J. Cachexia Sarcopenia Muscle 9(1), 1–15 (2017)
Miyamoto, S., et al.: Systemic metabolomic changes in blood samples of lung cancer patients identified by gas chromatography time-of-flight mass spectrometry. Metabolites 5(2), 192–210 (2015)
Batova, A., et al.: Englerin A induces an acute inflammatory response and reveals lipid metabolism and ER stress as targetable vulnerabilities in renal cell carcinoma. PLoS One 12(3), e0172632 (2017)
Xiao, J.F., et al.: LC-MS based serum metabolomics for identification of hepatocellular carcinoma biomarkers in Egyptian cohort. J. Proteome Res. 11(12), 5914–5923 (2012)
Dhakshinamoorthy, S., et al.: Metabolomics identifies the intersection of phosphoethanolamine with menaquinone-triggered apoptosis in an in vitro model of leukemia. Mol. BioSyst. 11(9), 2406–2416 (2015)
Mackay, E.: Fatty acid synthesis in colorectal cancer: characterization of lipid metabolism in serum, tumour, and normal host tissues. Ph.D. thesis, University of Calgary (2015)
Vermeersch, K.A., et al.: Distinct metabolic responses of an ovarian cancer stem cell line. BMC Syst. Biol. 8(1), 134 (2014)
Hilvo, M., et al.: Monounsaturated fatty acids in serum triacylglycerols are associated with response to neoadjuvant chemotherapy in breast cancer patients. Int. J. Cancer 134(7), 1725–1733 (2014)
Ressom, H.W., et al.: Utilization of metabolomics to identify serum biomarkers for hepatocellular carcinoma in patients with liver cirrhosis. Analytica Chimica Acta 743, 90–100 (2012)
Abdi, H., Williams, L.J.: Principal component analysis. Wiley Interdisc. Rev. Comput. Stat. 2(4), 433–459 (2010)
Zhang, T., et al.: Application of holistic liquid chromatography-high resolution mass spectrometry based urinary metabolomics for prostate cancer detection and biomarker discovery. PLoS One 8(6), e65880 (2013)
Liu, R., et al.: Identification of plasma metabolomic profiling for diagnosis of esophageal squamous-cell carcinoma using an UPLC/TOF/MS platform. Int. J. Mol. Sci. 14(5), 8899–8911 (2013)
Macqueen, J.: Some methods for classification and analysis of multivariate observations. In: Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, vol. 1(233), pp. 281–297 (1967)
Locasale, J.W., et al.: Metabolomics of human cerebrospinal fluid identifies signatures of malignant glioma. Mol. Cellular Proteomics 11(6), M111.014688 (2012)
Wold, H.: Estimation of principal components and related models by iterative least squares. In: Multivariate Analysis, pp. 1391–1420 (1966)
Barker, M., Rayens, W.: Partial least squares for discrimination. J. Chemom. 17(3), 166–173 (2003)
Trygg, J., Wold, S.: Orthogonal projections to latent structures (OPLS). J. Chemom. 16(3), 119–128 (2002)
Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)
Liu, Y., et al.: NMR and LC/MS-based global metabolomics to identify serum biomarkers differentiating hepatocellular carcinoma from liver cirrhosis. Int. J. Cancer 135(3), 658–668 (2014)
Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20(3), 273–297 (1995)
Chen, Y., et al.: Metabolic profiling of normal hepatocyte and hepatocellular carcinoma cells via 1 H nuclear magnetic resonance spectroscopy. Cell Biol. Int. 9999, 1–10 (2017)
Gaul, D.A., et al.: Highly-accurate metabolomic detection of early-stage ovarian cancer. Sci. Rep. 5, 16351 (2015)
Costa, C., et al.: An R package for the integrated analysis of metabolomics and spectral data. Comput. Meth. Prog. Biomed. 129, 117–124 (2016)
Singh, A., et al.: 1H NMR metabolomics reveals association of high expression of inositol 1,4,5 trisphosphate receptor and metabolites in breast cancer patients. PLoS One 12(1), 1–20 (2017)
Haug, K., et al.: MetaboLights - an open-access general-purpose repository for metabolomics studies and associated meta-data. Nucleic Acids Res. 41(D1), D781–D786 (2013)
Acknowledgments
This work is co-funded by the North Portugal Regional Operational Programme, under the “Portugal 2020”, through the European Regional Development Fund (ERDF), within project SISBI- Ref\(^a\)NORTE-01-0247-FEDER-003381.
This study was also supported by the Portuguese Foundation for Science and Technology (FCT) under the scope of the strategic funding of UID/BIO/04469/2013 unit and COMPETE 2020 (POCI-01-0145-FEDER-006684) and BioTecNorte operation (NORTE-01-0145-FEDER-000004) funded by European Regional Development Fund under the scope of Norte2020 - Programa Operacional Regional do Norte.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Cardoso, S., Baptista, D., Santos, R., Rocha, M. (2019). A Review on Metabolomics Data Analysis for Cancer Applications. In: Fdez-Riverola, F., Mohamad, M., Rocha, M., De Paz, J., González, P. (eds) Practical Applications of Computational Biology and Bioinformatics, 12th International Conference. PACBB2018 2018. Advances in Intelligent Systems and Computing, vol 803. Springer, Cham. https://doi.org/10.1007/978-3-319-98702-6_19
Download citation
DOI: https://doi.org/10.1007/978-3-319-98702-6_19
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-98701-9
Online ISBN: 978-3-319-98702-6
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)