skip to main content
research-article

Harnessing the Power of Statistics and Machine Learning in the Era of Biobank-Scale Whole-Genome Sequencing and Multi-Omics Studies

Published:01 February 2024Publication History
Skip Abstract Section

Abstract

Researchers are developing new statistical and machine learning methods to effectively integrate biobank-scale whole-genome sequencing multi-omics and electronic health records data to better understand the molecular basis of complex human diseases.

References

  1. Taliun, D. et al. Sequencing of 53,831 diverse genomes from the NHLBI TOPMed Program. Nature 590, 7845 (2021), 290--299.Google ScholarGoogle Scholar
  2. The All of Us Research Program Investigators, The "All of Us" research program. The New England Journal of Medicine 381, 7 (2019), 668--676.Google ScholarGoogle Scholar
  3. Halldorsson, B. V. et al. The sequences of 150,119 genomes in the UK Biobank. Nature 607, 7920 (2022), 732--740.Google ScholarGoogle ScholarCross RefCross Ref
  4. Tam, V. et al. Benefits and limitations of genome-wide association studies. Nature Reviews Genetics 20, 8 (2019), 467--484.Google ScholarGoogle ScholarCross RefCross Ref
  5. The ENCODE Project Consortium. An integrated encyclopedia of DNA elements in the human genome. Nature 489, 7414 (2012), 57--74.Google ScholarGoogle ScholarCross RefCross Ref
  6. Moore, J. E. et al. Expanded encyclopaedias of DNA elements in the human and mouse genomes. Nature 583, 7818 (2020), 699--710.Google ScholarGoogle Scholar
  7. Kundaje, A. et al. Integrative analysis of 111 reference human epigenomes. Nature 518, 7539 (2015), 317--330.Google ScholarGoogle ScholarCross RefCross Ref
  8. Li, X. et al. Dynamic incorporation of multiple in silico functional annotations empowers rare variant association analysis of large whole-genome sequencing studies at scale. Nature Genetics 52, 9 (2020), 969--983.Google ScholarGoogle ScholarCross RefCross Ref
  9. Zhou, H. et al. FAVOR: functional annotation of variants online resource and annotator for variation across the human genome. Nucleic Acids Research 51, D1 (2023), D1300--D1311.Google ScholarGoogle ScholarCross RefCross Ref
  10. Li, Z. et al. A framework for detecting noncoding rare-variant associations of large-scale whole-genome sequencing studies. Nature Methods 19, 12 (2022), 1599--1611.Google ScholarGoogle ScholarCross RefCross Ref
  11. Li, Z. et al. Dynamic scan procedure for detecting rare-variant association regions in whole-genome sequencing studies. The American Journal of Human Genetics 104, 5 (2019), 802--814.Google ScholarGoogle ScholarCross RefCross Ref
  12. Gaynor, S. M. et al. STAAR workflow: A cloud-based workflow for scalable and reproducible rare variant analysis. Bioinformatics 38, 11 (2022), 3116--3117.Google ScholarGoogle ScholarCross RefCross Ref
  13. Selvaraj, M. S. et al. Whole genome sequence analysis of blood lipid levels in >66,000 individuals. Nature Communications 13, 5995 (2022).Google ScholarGoogle ScholarCross RefCross Ref
  14. Wang, Y. et al. Rare variants in long non-coding RNAs are associated with blood lipid levels in the TOPMed whole-genome sequencing study. The American Journal of Human Genetics 110, 10 (2023), 1704--1717.Google ScholarGoogle ScholarCross RefCross Ref
  15. Hawkes, G. et al. Whole genome association testing in 333,100 individuals across three biobanks identifies rare non-coding single variant and genomic aggregate associations with height. bioRxiv, 2023.11.19.566520 (2023).Google ScholarGoogle Scholar
  16. Jiang, M. -Z. et al. Whole genome sequencing based analysis of inflammation biomarkers in the Trans-Omics for Precision Medicine (TOPMed) consortium. bioRxiv, 2023.09.10.555215 (2023).Google ScholarGoogle Scholar
  17. Feofanova, E. V. et al. Whole-genome sequencing analysis of human metabolome in multi-ethnic populations. Nature Communications 14, 3111 (2023).Google ScholarGoogle ScholarCross RefCross Ref
  18. Li, X. et al. Powerful, scalable and resource-efficient meta-analysis of rare variant associations in large whole genome sequencing studies. Nature Genetics 55, 1 (2023), 154--164.Google ScholarGoogle ScholarCross RefCross Ref
  19. Li, X. et al. A statistical framework for powerful multi-trait rare variant analysis in large-scale whole-genome sequencing studies. bioRxiv, 2023.10.30.564764 (2023).Google ScholarGoogle Scholar
  20. Vösa, U. et al. Large-scale cis- and trans-eQTL analyses identify thousands of genetic loci and polygenic scores that regulate blood gene expression. Nature Genetics 53, 9 (2021), 1300--1310.Google ScholarGoogle ScholarCross RefCross Ref
  21. Oliva, M. et al. DNA methylation QTL mapping across diverse human tissues provides molecular links between genetic variation and complex traits. Nature Genetics 55, 1 (2023), 112--122.Google ScholarGoogle ScholarCross RefCross Ref
  22. Eldjarn, G. H. et al. Large-scale plasma proteomics comparisons through genetics and disease associations. Nature 622, 7982 (2023), 348--358.Google ScholarGoogle ScholarCross RefCross Ref
  23. Sun, B. B. et al. Plasma proteomic associations with genetics and health in the UK Biobank. Nature 622, 7982 (2023), 329--338.Google ScholarGoogle ScholarCross RefCross Ref
  24. Dhindsa, R. S. et al. Rare variant associations with plasma protein levels in the UK Biobank. Nature 622, 7982 (2023), 339--347.Google ScholarGoogle ScholarCross RefCross Ref
  25. Quick, C. et al. A versatile toolkit for molecular QTL mapping and meta-analysis at scale. bioRxiv, 2020.12.18.423490 (2020).Google ScholarGoogle Scholar
  26. Aguet, F. et al. Molecular quantitative trait loci. Nature Reviews Methods Primers 3, 4 (2023).Google ScholarGoogle ScholarCross RefCross Ref
  27. Kircher, M. et al. A general framework for estimating the relative pathogenicity of human genetic variants. Nature Genetics 46, 3 (2014), 310--315.Google ScholarGoogle ScholarCross RefCross Ref
  28. Dong, C. et al. Comparison and integration of deleteriousness prediction methods for nonsynonymous SNVs in whole exome sequencing studies. Human Molecular Genetics 24, 8 (2014), 2125--2137.Google ScholarGoogle Scholar
  29. Cheng, J. et al. Accurate proteome-wide missense variant effect prediction with AlphaMissense. Science 381, 6664 (2023), eadg7492.Google ScholarGoogle Scholar
  30. Lee, P. H. et al. Principles and methods of in-silico prioritization of non-coding regulatory variants. Human Genetics 137, 1 (2018), 15--30.Google ScholarGoogle ScholarCross RefCross Ref
  31. Ionita-Laza, I., McCallum, K., Xu, B. and Buxbaum, J. D. A spectral approach integrating functional genomic annotations for coding and noncoding variants. Nature genetics 48, 2 (2016), 214--220.Google ScholarGoogle Scholar
  32. Li, X. et al. A multi-dimensional integrative scoring framework for predicting functional variants in the human genome. The American Journal of Human Genetics 109, 3 (2022), 446--456.Google ScholarGoogle ScholarCross RefCross Ref
  33. Sun, R. et al. Integration of multiomic annotation data to prioritize and characterize inflammation and immune-related risk variants in squamous cell lung cancer. Genetic Epidemiology 45, 1 (2021), 99--114.Google ScholarGoogle ScholarCross RefCross Ref
  34. Byun, J. et al. Cross-ancestry genome-wide meta-analysis of 61,047 cases and 947,237 controls identifies new susceptibility loci contributing to lung cancer. Nature Genetics 54, 8 (2022), 1167--1177.Google ScholarGoogle ScholarCross RefCross Ref
  35. IGVF Consortium. The Impact of Genomic Variation on Function (IGVF) Consortium. ArXiv (2023). Google ScholarGoogle ScholarCross RefCross Ref
  36. Wilkinson, M. D. et al. The FAIR Guiding Principles for scientific data management and stewardship. Scientific Data 3, 160018 (2016).Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Harnessing the Power of Statistics and Machine Learning in the Era of Biobank-Scale Whole-Genome Sequencing and Multi-Omics Studies

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image XRDS: Crossroads, The ACM Magazine for Students
          XRDS: Crossroads, The ACM Magazine for Students  Volume 30, Issue 2
          Winter 2023
          42 pages
          ISSN:1528-4972
          EISSN:1528-4980
          DOI:10.1145/3644034
          Issue’s Table of Contents

          Copyright © 2024 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 1 February 2024

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article
        • Article Metrics

          • Downloads (Last 12 months)40
          • Downloads (Last 6 weeks)15

          Other Metrics

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        HTML Format

        View this article in HTML Format .

        View HTML Format