Comparative evaluation of multiomics integration tools for the study of prediabetes: insights into the earliest stages of type 2 diabetes mellitus

Emam, Mohamed; Tarek, Ahmed; Soudy, Mohamed; Antunes, Agostinho; Hadidi, Mohamed El; Hamed, Mohamed

doi:10.1007/s13721-024-00442-9

Comparative evaluation of multiomics integration tools for the study of prediabetes: insights into the earliest stages of type 2 diabetes mellitus

Original Article
Published: 14 March 2024

Volume 13, article number 8, (2024)
Cite this article

Network Modeling Analysis in Health Informatics and Bioinformatics Aims and scope Submit manuscript

Mohamed Emam^1,2,
Ahmed Tarek^3,4,
Mohamed Soudy⁵,
Agostinho Antunes ORCID: orcid.org/0000-0002-1328-1732^1,2,
Mohamed El Hadidi⁴ &
…
Mohamed Hamed^6,7

557 Accesses
Explore all metrics

Abstract

Type 2 diabetes mellitus (T2D) remains a critical health concern, particularly in its early disease stages such as prediabetes. Understanding these early stages is paramount for improving patient outcomes. Multiomics data integration tools offer promise in unraveling the underlying mechanisms of T2D. The advent of high-throughput technology and the increasing availability of multiomics data has led to the development of several statistical and network-based integration methods. However, the performance of such methods varies, requiring their output evaluation in an unbiased manner. Here, we conducted a comparative analysis of three represented unsupervised multiomics integration tools, MOFA + , GFA, and ICluster alongside an in-house supervised model EMFR, using two complementary benchmarks. First, we assessed how well the features selected by each tool could discriminate between patient and control samples using both linear and nonlinear classification models. Second, we quantified how much each type of omics data-selected features contributed to the total variance. Through such detailed comparisons between the unsupervised, we observed that the features selected by MOFA + and GFA gave the best F1 score (0.7) in the nonlinear classification model, clearly discriminating between patient and control classes. Hence, we recommend these two unsupervised integration tools for feature selection purposes. Our comparative analyses were conducted on a real biological dataset to further study prediabetes patients. Such multiomics data enabled the detection of prediabetes subtypes and provided several clinical insights that will open a new gate toward the era of personalized medicine for diabetic disease.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Uncovering the gene regulatory network of type 2 diabetes through multi-omic data integration

Article Open access 16 December 2022

Discovering a trans-omics biomarker signature that predisposes high risk diabetic patients to diabetic kidney disease

Article Open access 02 November 2022

Novel personalized pathway-based metabolomics models reveal key metabolic pathways for breast cancer diagnosis

Article Open access 31 March 2016

Data availability

The dataset supporting the findings of this study is available in the GitHub repository at (https://github.com/ahmedtariq/MultiOmic-Ensembled-Feature-Reduction) and was retrieved from the integrative human microbiome project 'iHMP' (https://portal.hmpdacc.org; T2D iHMP Google Cloud platform).

References

Allesøe RL, Lundgaard AT, Hernández Medina R, Aguayo-Orozco A, Johansen J, Nissen JN, Brorsson C, Mazzoni G, Niu L, Biel JH, Brasas V, Webel H, Benros ME, Pedersen AG, Chmura PJ, Jacobsen UP, Mari A, Koivula R, Mahajan A, Abdalla M (2023) Discovery of drug–omics associations in type 2 diabetes with generative deep-learning models. Nat Biotechnol 41(3):399–408. https://doi.org/10.1038/s41587-022-01520-x
Article Google Scholar
Argelaguet R, Velten B, Arnol D, Dietrich S, Zenz T, Marioni JC, Buettner F, Huber W, Stegle O (2018) Multi-omics factor analysis—a framework for unsupervised integration of multi-omics data sets. Mol Syst Biol. https://doi.org/10.15252/msb.20178124
Article Google Scholar
Argelaguet R, Arnol D, Bredikhin D, Deloro Y, Velten B, Marioni JC, Stegle O (2020) MOFA+: a statistical framework for comprehensive integration of multi-modal single-cell data. Genome Biol. https://doi.org/10.1186/s13059-020-02015-1
Article Google Scholar
Bersanelli M, Mosca E, Remondini D, Giampieri E, Sala C, Castellani G, Milanesi L (2016) Methods for the integration of multi-omics data: mathematical aspects. BMC Bioinform. https://doi.org/10.1186/s12859-015-0857-9
Article Google Scholar
Cantini L, Zakeri P, Hernandez C, Naldi A, Thieffry D, Remy E, Baudot A (2021) Benchmarking joint multi-omics dimensionality reduction approaches for the study of cancer. Nat Commun. https://doi.org/10.1038/s41467-020-20430-7
Article Google Scholar
Cao Y, Ghazanfar S, Yang P, Yang J (2023) Benchmarking of analytical combinations for COVID-19 outcome prediction using single-cell RNA sequencing data. Brief Bioinform. https://doi.org/10.1093/bib/bbad159
Article Google Scholar
Geurts P, Ernst D, Wehenkel L (2006) Extremely randomized trees. Mach Learn 63(1):3–42. https://doi.org/10.1007/s10994-006-6226-1
Article Google Scholar
Huang E, Kim S, Ahn T (2021) Deep learning for integrated analysis of insulin resistance with multi-omics data. J Person Med 11(2):1–14. https://doi.org/10.3390/jpm11020128
Article Google Scholar
Love MI, Huber W, Anders S (2014) Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. https://doi.org/10.1186/s13059-014-0550-8
Article Google Scholar
Mo Q, Shen R, Guo C, Vannucci M, Chan KS, Hilsenbeck SG (2018) A fully Bayesian latent variable model for integrative clustering analysis of multi-type omics data. Biostatistics 19(1):71–86. https://doi.org/10.1093/biostatistics/kxx017
Article MathSciNet Google Scholar
Nicora G, Vitali F, Dagliati A, Geifman N, Bellazzi R (2020) Integrated multi-omics analyses in oncology: a review of machine learning methods and tools. Front Oncol. https://doi.org/10.3389/fonc.2020.01030
Article Google Scholar
Pierre-Jean M, Deleuze JF, Le Floch E, Mauger F (2020) Clustering and variable selection evaluation of 13 unsupervised methods for multi-omics data integration. Brief Bioinform 21(6):2011–2030. https://doi.org/10.1093/bib/bbz138
Article Google Scholar
Pollard KS, Dudoit S, van der Laan MJ (2005) Multiple testing procedures: the multtest package and applications to genomics. In: Gentleman R, Carey VJ, Huber W, Irizarry RA, Dudoit S (eds) Bioinformatics and computational biology solutions using R and bioconductor. Springer, New York, pp 249–271. https://doi.org/10.1007/0-387-29362-0_15
Chapter Google Scholar
Rappoport N, Shamir R (2018) Multi-omic and multi-view clustering algorithms: review and cancer benchmark. Nucleic Acids Res 46(20):10546–10562. https://doi.org/10.1093/nar/gky889
Article Google Scholar
Subramanian I, Verma S, Kumar S, Jere A, Anamika K (2020) Multi-omics data integration, interpretation, and its application. Bioinform Biol Insights. https://doi.org/10.1177/1177932219899051
Article Google Scholar
Tini G, Marchetti L, Priami C, Scott-Boyer MP (2018) Multi-omics integration–a comparison of unsupervised clustering methodologies. Brief Bioinform 20(4):1269–1279. https://doi.org/10.1093/bib/bbx167
Article Google Scholar
Wang B, Mezlini AM, Demir F, Fiume M, Tu Z, Brudno M, Haibe-Kains B, Goldenberg A (2014) Similarity network fusion for aggregating data types on a genomic scale. Nat Methods 11(3):333–337. https://doi.org/10.1038/nmeth.2810
Article Google Scholar
Wang J, Li Y, Han X, Hu H, Wang F, Li X, Yang K, Yuan J, Yao P, Miao X, Wei S, Wang Y, Cheng W, Liang Y, Zhang X, Guo H, Yang H, Yuan J, Koh WP, He M (2017) Serum bilirubin levels and risk of type 2 diabetes: Results from two independent cohorts in middle-aged and elderly Chinese. Sci Rep. https://doi.org/10.1038/srep41338
Article Google Scholar
Zhang Y, Zhou C, Li J, Zhang Y, Xie D, Liang M, Wang B, Song Y, Wang X, Huo Y, Hou FF, Xu X, Qin X (2020) Serum alkaline phosphatase levels and the risk of new-onset diabetes in hypertensive adults. Cardiovasc Diabetol. https://doi.org/10.1186/s12933-020-01161-x
Article Google Scholar
Zhou W, Sailani MR, Contrepois K, Zhou Y, Ahadi S, Leopold SR, Zhang MJ, Rao V, Avina M, Mishra T, Johnson J, Lee-McMullen B, Chen S, Metwally AA, Tran TDB, Nguyen H, Zhou X, Albright B, Hong BY, Snyder M (2019) Longitudinal multi-omics of host–microbe dynamics in prediabetes. Nature 569(7758):663–671. https://doi.org/10.1038/s41586-019-1236-x
Article Google Scholar
Chalise, P., Koestler, D. C., Bimali, M., Yu, Q., & Fridley, B. L. (2014). Integrative clustering methods for high-dimensional molecular data. In: Translational Cancer Research (Vol. 3, Issue 3, pp. 202–216). AME Publishing Company. https://doi.org/10.3978/j.issn.2218-676X.2014.06.03
Chauvel C, Novoloaca A, Veyre P, Reynier F, Becker J (2020) Evaluation of integrative clustering methods for the analysis of multi-omics data. In: Briefings in Bioinformatics (Vol. 21, Issue 2, pp. 541–552). Oxford University Press, Oxford. https://doi.org/10.1093/bib/bbz015
Friedman, J. H. (2001). 999 Reitz lecture greedy function approximation: a gradient boosting machine 1. Ann. Stat. 29(5)
Huang S, Chaudhary K, Garmire LX (2017) More is better: recent progress in multi-omics data integration methods. In: Frontiers in Genetics (Vol. 8, Issue JUN). Frontiers Media S.A. https://doi.org/10.3389/fgene.2017.00084
Huang S, Nianguang CAI, Penzuti Pacheco P, Narandes S, Wang Y, Wayne XU (2018) Applications of support vector machine (SVM) learning in cancer genomics. In: Cancer Genomics and Proteomics (Vol. 15, Issue 1, pp. 41–51). International Institute of Anticancer Research. https://doi.org/10.21873/cgp.20063
Jones E, Oliphant T, Peterson P (2001) SciPy: Open Source Scientific Tools for Python. http://www.scipy.org
Leppäaho E, Kaski S, Khan ME (2017) GFA: exploratory analysis of multiple data sources with group factor analysis Muhammad Ammad-ud-din. J Mach Learn Res 18. http://jmlr.org/papers/v18/16-509.html.
Pedregosa F, Michel V, Grisel O, Blondel M, Prettenhofer P, Weiss R, Vanderplas J, Cournapeau D, Pedregosa F, Varoquaux G, Gramfort A, Thirion B, Grisel O, Dubourg V, Passos A, Brucher MP, Édouardand M, Duchesnay É, Duchesnay EF (2011). Scikit-learn: machine learning in Python Gaël varoquaux bertrand thirion vincent dubourg alexandre passos pedregosa, varoquaux, Gramfort et al. Matthieu Perrot. J Mach Learn Res. http://scikit-learn.sourceforge.net.

Download references

Acknowledgements

We would like to thank reviewers for taking the effort and time to review the manuscript. We appreciate all your valuable comments and suggestions, which helped us in improving the quality of the manuscript.

Funding

The main author ME acknowledges funding by the “la Caixa” Foundation (ID 100010434), within the Doctoral INPhINIT Program LCF/BQ/D122/11940015. AA was partially supported by the Strategic Funding U-IDB/04423/2020 and UIDP/04423/2020 through national funds provided by the Fundação para a Ciência e a Tecnologia (FCT) and the European Regional Development Fund (ERDF) in the framework of the program PT2020, by the European Structural and Investment Funds (ESIF) through the Competitiveness and Internationalization Operational Program—COMPETE 2020 and by National Funds through the FCT under the projects PTDC/CTA-AMB/31774/2017 (POCI-01–0145-FEDER/031774/2017). ME, AT, MH and MS were funded via two DAAD grants (1) GED-PerMED Z57546888: German-Egyptian Dialog on Tackling Precision Medicine using Artificial Intelligence, and (2) Eg-CompBio Z57587968: Empowering computational biology and bioinformatics research in Egypt, both funded by the DAAD (German Academic Exchange Service) in Germany.

Author information

Authors and Affiliations

CIIMAR/CIMAR, Interdisciplinary Centre of Marine and Environmental Research, University of Porto, Terminal de Cruzeiros do Porto de Leixões, Av. General Norton de Matos, S/N, 4450-208, Porto, Portugal
Mohamed Emam & Agostinho Antunes
Department of Biology, Faculty of Sciences, University of Porto, Rua Do Campo Alegre, 4169-007, Porto, Portugal
Mohamed Emam & Agostinho Antunes
Center for Bioinformatics, Saarland University, Saarbrücken, Germany
Ahmed Tarek
Bioinformatics Group, Center for Informatics Sciences (CIS), Nile University, Giza, Egypt
Ahmed Tarek & Mohamed El Hadidi
Information Systems, Faculty of Computer and Information Sciences, Ain Shams University, Cairo, Egypt
Mohamed Soudy
Faculty of Media Engineering and Technology, German University in Cairo, Cairo, Egypt
Mohamed Hamed
Institute for Biostatistics and Informatics in Medicine and Aging Research (IBIMA), Rostock University Medical Center, Rostock, Germany
Mohamed Hamed

Authors

Mohamed Emam
View author publications
You can also search for this author inPubMed Google Scholar
Ahmed Tarek
View author publications
You can also search for this author inPubMed Google Scholar
Mohamed Soudy
View author publications
You can also search for this author inPubMed Google Scholar
Agostinho Antunes
View author publications
You can also search for this author inPubMed Google Scholar
Mohamed El Hadidi
View author publications
You can also search for this author inPubMed Google Scholar
Mohamed Hamed
View author publications
You can also search for this author inPubMed Google Scholar

Contributions

Mohamed Emam: conceptualization, methodology, formal analysis, investigation, writing—original draft, data curation. Ahmed Tarek: formal analysis, writing—review and editing, methodology. Mohamed Soudy: formal analysis, writing—review and editing, methodology. Agostinho Antunes: conceptualization, supervision, funding acquisition, software, visualization, writing—review and editing. Mohamed El Hadid: project administration, funding acquisition, supervision, methodology, writing—review and editing. Mohamed Hamed: conceptualization, project administration, funding acquisition, supervision, methodology, review and editing.

Corresponding author

Correspondence to Agostinho Antunes.

Ethics declarations

Conflict of interest

The authors declare no conflicts of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

Ethical statement

The study is based on multiomics integration analysis, and the data were retrieved from publicly available databases.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file1 (DOCX 838 kb)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Emam, M., Tarek, A., Soudy, M. et al. Comparative evaluation of multiomics integration tools for the study of prediabetes: insights into the earliest stages of type 2 diabetes mellitus. Netw Model Anal Health Inform Bioinforma 13, 8 (2024). https://doi.org/10.1007/s13721-024-00442-9

Download citation

Received: 01 September 2023
Revised: 24 January 2024
Accepted: 26 January 2024
Published: 14 March 2024
DOI: https://doi.org/10.1007/s13721-024-00442-9

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Comparative evaluation of multiomics integration tools for the study of prediabetes: insights into the earliest stages of type 2 diabetes mellitus

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Uncovering the gene regulatory network of type 2 diabetes through multi-omic data integration

Discovering a trans-omics biomarker signature that predisposes high risk diabetic patients to diabetic kidney disease

Novel personalized pathway-based metabolomics models reveal key metabolic pathways for breast cancer diagnosis

Data availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Ethical statement

Additional information

Publisher's Note

Supplementary Information

Supplementary file1 (DOCX 838 kb)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now