Abstract
Researchers are developing new statistical and machine learning methods to effectively integrate biobank-scale whole-genome sequencing multi-omics and electronic health records data to better understand the molecular basis of complex human diseases.
- Taliun, D. et al. Sequencing of 53,831 diverse genomes from the NHLBI TOPMed Program. Nature 590, 7845 (2021), 290--299.Google Scholar
- The All of Us Research Program Investigators, The "All of Us" research program. The New England Journal of Medicine 381, 7 (2019), 668--676.Google Scholar
- Halldorsson, B. V. et al. The sequences of 150,119 genomes in the UK Biobank. Nature 607, 7920 (2022), 732--740.Google ScholarCross Ref
- Tam, V. et al. Benefits and limitations of genome-wide association studies. Nature Reviews Genetics 20, 8 (2019), 467--484.Google ScholarCross Ref
- The ENCODE Project Consortium. An integrated encyclopedia of DNA elements in the human genome. Nature 489, 7414 (2012), 57--74.Google ScholarCross Ref
- Moore, J. E. et al. Expanded encyclopaedias of DNA elements in the human and mouse genomes. Nature 583, 7818 (2020), 699--710.Google Scholar
- Kundaje, A. et al. Integrative analysis of 111 reference human epigenomes. Nature 518, 7539 (2015), 317--330.Google ScholarCross Ref
- Li, X. et al. Dynamic incorporation of multiple in silico functional annotations empowers rare variant association analysis of large whole-genome sequencing studies at scale. Nature Genetics 52, 9 (2020), 969--983.Google ScholarCross Ref
- Zhou, H. et al. FAVOR: functional annotation of variants online resource and annotator for variation across the human genome. Nucleic Acids Research 51, D1 (2023), D1300--D1311.Google ScholarCross Ref
- Li, Z. et al. A framework for detecting noncoding rare-variant associations of large-scale whole-genome sequencing studies. Nature Methods 19, 12 (2022), 1599--1611.Google ScholarCross Ref
- Li, Z. et al. Dynamic scan procedure for detecting rare-variant association regions in whole-genome sequencing studies. The American Journal of Human Genetics 104, 5 (2019), 802--814.Google ScholarCross Ref
- Gaynor, S. M. et al. STAAR workflow: A cloud-based workflow for scalable and reproducible rare variant analysis. Bioinformatics 38, 11 (2022), 3116--3117.Google ScholarCross Ref
- Selvaraj, M. S. et al. Whole genome sequence analysis of blood lipid levels in >66,000 individuals. Nature Communications 13, 5995 (2022).Google ScholarCross Ref
- Wang, Y. et al. Rare variants in long non-coding RNAs are associated with blood lipid levels in the TOPMed whole-genome sequencing study. The American Journal of Human Genetics 110, 10 (2023), 1704--1717.Google ScholarCross Ref
- Hawkes, G. et al. Whole genome association testing in 333,100 individuals across three biobanks identifies rare non-coding single variant and genomic aggregate associations with height. bioRxiv, 2023.11.19.566520 (2023).Google Scholar
- Jiang, M. -Z. et al. Whole genome sequencing based analysis of inflammation biomarkers in the Trans-Omics for Precision Medicine (TOPMed) consortium. bioRxiv, 2023.09.10.555215 (2023).Google Scholar
- Feofanova, E. V. et al. Whole-genome sequencing analysis of human metabolome in multi-ethnic populations. Nature Communications 14, 3111 (2023).Google ScholarCross Ref
- Li, X. et al. Powerful, scalable and resource-efficient meta-analysis of rare variant associations in large whole genome sequencing studies. Nature Genetics 55, 1 (2023), 154--164.Google ScholarCross Ref
- Li, X. et al. A statistical framework for powerful multi-trait rare variant analysis in large-scale whole-genome sequencing studies. bioRxiv, 2023.10.30.564764 (2023).Google Scholar
- Vösa, U. et al. Large-scale cis- and trans-eQTL analyses identify thousands of genetic loci and polygenic scores that regulate blood gene expression. Nature Genetics 53, 9 (2021), 1300--1310.Google ScholarCross Ref
- Oliva, M. et al. DNA methylation QTL mapping across diverse human tissues provides molecular links between genetic variation and complex traits. Nature Genetics 55, 1 (2023), 112--122.Google ScholarCross Ref
- Eldjarn, G. H. et al. Large-scale plasma proteomics comparisons through genetics and disease associations. Nature 622, 7982 (2023), 348--358.Google ScholarCross Ref
- Sun, B. B. et al. Plasma proteomic associations with genetics and health in the UK Biobank. Nature 622, 7982 (2023), 329--338.Google ScholarCross Ref
- Dhindsa, R. S. et al. Rare variant associations with plasma protein levels in the UK Biobank. Nature 622, 7982 (2023), 339--347.Google ScholarCross Ref
- Quick, C. et al. A versatile toolkit for molecular QTL mapping and meta-analysis at scale. bioRxiv, 2020.12.18.423490 (2020).Google Scholar
- Aguet, F. et al. Molecular quantitative trait loci. Nature Reviews Methods Primers 3, 4 (2023).Google ScholarCross Ref
- Kircher, M. et al. A general framework for estimating the relative pathogenicity of human genetic variants. Nature Genetics 46, 3 (2014), 310--315.Google ScholarCross Ref
- Dong, C. et al. Comparison and integration of deleteriousness prediction methods for nonsynonymous SNVs in whole exome sequencing studies. Human Molecular Genetics 24, 8 (2014), 2125--2137.Google Scholar
- Cheng, J. et al. Accurate proteome-wide missense variant effect prediction with AlphaMissense. Science 381, 6664 (2023), eadg7492.Google Scholar
- Lee, P. H. et al. Principles and methods of in-silico prioritization of non-coding regulatory variants. Human Genetics 137, 1 (2018), 15--30.Google ScholarCross Ref
- Ionita-Laza, I., McCallum, K., Xu, B. and Buxbaum, J. D. A spectral approach integrating functional genomic annotations for coding and noncoding variants. Nature genetics 48, 2 (2016), 214--220.Google Scholar
- Li, X. et al. A multi-dimensional integrative scoring framework for predicting functional variants in the human genome. The American Journal of Human Genetics 109, 3 (2022), 446--456.Google ScholarCross Ref
- Sun, R. et al. Integration of multiomic annotation data to prioritize and characterize inflammation and immune-related risk variants in squamous cell lung cancer. Genetic Epidemiology 45, 1 (2021), 99--114.Google ScholarCross Ref
- Byun, J. et al. Cross-ancestry genome-wide meta-analysis of 61,047 cases and 947,237 controls identifies new susceptibility loci contributing to lung cancer. Nature Genetics 54, 8 (2022), 1167--1177.Google ScholarCross Ref
- IGVF Consortium. The Impact of Genomic Variation on Function (IGVF) Consortium. ArXiv (2023). Google ScholarCross Ref
- Wilkinson, M. D. et al. The FAIR Guiding Principles for scientific data management and stewardship. Scientific Data 3, 160018 (2016).Google ScholarCross Ref
Index Terms
- Harnessing the Power of Statistics and Machine Learning in the Era of Biobank-Scale Whole-Genome Sequencing and Multi-Omics Studies
Recommendations
Assembling the 20 Gb white spruce (Picea glauca) genome from whole-genome shotgun sequencing data
White spruce ( Picea glauca ) is a dominant conifer of the boreal forests of North America, and providing genomics resources for this commercially valuable tree will help improve forest management and conservation efforts. Sequencing and assembling the ...
Whole-Genome DNA Sequencing
Recent advances in DNA sequencing technology portend the determination of the complete DNA sequences of humans, mouse, and several other species of biological importance in the next five years. Sequencing DNA involves computation in an integral and ...
A critical review of machine-learning for “multi-omics” marine metabolite datasets
AbstractDuring the last decade, genomic, transcriptomic, proteomic, metabolomic, and other omics datasets have been generated for a wide range of marine organisms, and even more are still on the way. Marine organisms possess unique and diverse ...
Highlights- Recent progress in the use and integration of “multi-omics” techniques to identify novel marine metabolites.
- The multi-omics data integration tools developed for analyzing “multi-omics” data.
- The requirement of ML for analyzing “...
Comments