skip to main content
10.1145/2975167.2975195acmconferencesArticle/Chapter ViewAbstractPublication PagesbcbConference Proceedingsconference-collections
research-article

Scalable Algorithms at Genomic Resolution to fit LD Distributions

Published: 02 October 2016 Publication History

Abstract

While the problem of reconstructing a population that matches a given LD (linkage disequilibrium) distribution is not straightforward, it is further compounded if the population must additionally match MAF (minimum allele frequency) distribution as well. Here we address the task of co-fitting the multiple distributions at genomic resolutions.
The solution is based on incrementally scaling a fast, i.e., linear time, non-generative algorithm (SimBA). Non-generative implies that the algorithm does not generate the population through evolution-simulation. Instead it directly builds the genomes in terms of polymorphic alleles that mimic the the structure of the desired population. We present an incremental framework to scale up the algorithm that continues to be both accurate and efficient. We demonstrate the efficacy of the algorithm on a variety of data sets, both human as well as plant data.
Such simulation of populations that match summary distributions play a critical role in in-silico hypothesis-testing and optimization. For instance in-silico breeding optimization in plants can model years or decades of experimentation to predict breeding outcomes in an incredibly short time of days, if not hours or minutes.

References

[1]
Yuan X., Zhang, J. and Wang, Y. Simulating Linkage Disequilibrium Structures in a Human Population for SNP Association Studies. Biochemical Genetics. 2011, 49(5):395--409.
[2]
Shang, J., Zhang, J., Lei, X., Zhao, W., and Dong, Y. EpiSIM: simulation of multiple epistasis, linkage disequilibrium patterns and haplotype blocks for genome-wide interaction analysis. Genes Genomes. 2013; 35:3.
[3]
Peng, B., Chen, H-S., Mechanic, L. E., Racine, B., Clarke, J., Gillanders, E., and Feuer, E. J. Genetic Data Simulators and their Applications: An Overview. Genetic Epidemiology. 2015;39(1):2--10.
[4]
Parida, L. and Haiminen, N. SimBA: simulation algorithm to fit extant-population distributions. BMC Bioinformatics. 2015;16:82.
[5]
Haiminen, N., Lebreton, C. and Parida, L. Best-Fit in Linear Time for Non-generative Population Simulation. Lecture Notes in Bioinformatics. 2014;8701:247--62.
[6]
Montana G: HapSim: a simulation tool for generating haplotype data with pre-specified allele frequencies and LD coefficients. Bioinformatics 2005;21:23.
[7]
Shang J, Zhang J, Lei X, Zhao W, Dong Y: EpiSIM: simulation of multiple epistasis, linkage disequilibrium patterns and haplotype blocks for genome-wide interaction analysis. Genes & Genomes 2013;35:3.
[8]
Yuan X, Zhang J, Wang Y: Simulating Linkage Disequilibrium Structures in a Human Population for SNP Association Studies. Biochemical Genetics 2011;49:5--6.
[9]
The International HapMap Consortium. The International HapMap Project. Nature 2003;426:789--96.
[10]
Van Inghelandt D., Reif J.C., Dhillon, B.S., Flament, P. and Melchinger, A.E. Extent and genome-wide distribution of linkage disequilibrium in commercial maize germplasm. Theor. Appl. Genet. 2011;123:1.
[11]
Kearsey, M.J. and Pooni, H.S. The genetical analysis of quantitative traits. Chapman & Hall, 1996.
[12]
Lynch, M. and Walsh, B. Genetics and Analysis of Quantitative Traits. Sinauer Associates, 1998.

Cited By

View all
  • (2019)Linear Time Algorithms to Construct Populations Fitting Multiple Constraint Distributions at Genomic ScalesIEEE/ACM Transactions on Computational Biology and Bioinformatics10.1109/TCBB.2017.276087916:4(1132-1142)Online publication date: 1-Jul-2019

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
BCB '16: Proceedings of the 7th ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics
October 2016
675 pages
ISBN:9781450342254
DOI:10.1145/2975167
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 02 October 2016

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Linkage disequilibrium
  2. Population genetics
  3. Simulation

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

BCB '16
Sponsor:

Acceptance Rates

Overall Acceptance Rate 254 of 885 submissions, 29%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 20 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2019)Linear Time Algorithms to Construct Populations Fitting Multiple Constraint Distributions at Genomic ScalesIEEE/ACM Transactions on Computational Biology and Bioinformatics10.1109/TCBB.2017.276087916:4(1132-1142)Online publication date: 1-Jul-2019

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media