ABSTRACT
It is widely hoped that variation in the human genome will provide a means of predicting risk of a variety of complex, chronic diseases. A major stumbling block to the successful identification of association between human DNA polymorphisms (SNPs) and variability in risk of complex diseases is the enormous number of SNPs in the human genome (4,9). The large number of SNPs results in unacceptably high costs for exhaustive genotyping, and so there is a broad effort to determine ways to select SNPs so as to maximize the informativeness of a subset.In this paper we contrast two methods for reducing the complexity of SNP variation: haplotype tagging, i.e. typing a subset of SNPs to identify segments of the genome that appear to be nearly unrecombined (haplotype blocks), and a new block-free model that we develop in this report. We present a statistic for comparing haplotype blocks and show that while the concept of haplotype blocks is reasonably robust there is substantial variability among block partitions. We develop a measure for selecting an informative subset of SNPs in a block free model. We show that the general version of this problem is NP-hard and give efficient algorithms for two important special cases of this problem.
- Goncalo R. Abecasis, Stacey S. Cherny, William O. Cookson, and Lon R. Cardon. Merlin - rapid analysis of dense genetic maps using sparse gene flow trees. Nature Genetics, 30:97--101, 2002.]]Google ScholarCross Ref
- Hadar I. Avi-Itzhak, Xiaoping Su, and Francisco M. De La Vega. Selection of minimum subsets of single nucleotide polymorphism to capture haplotype block diversity. In Proceedings of Pacific Symposium on Biocomputing, pages 466--477, 2003.]]Google Scholar
- V. Bafna, D. Gusfield, G. Lancia, and S. Yooseph. Haplotyping as a perfect phylogeny. a direct approach. Journal of Computational Biology, 2003. To appear.]]Google ScholarCross Ref
- K.M.J. De Bontridder, B.V. Halldorsson, M.M. Halldorsson, C.A.J. Hurkens, J.K. Lenstra, R. Ravi, and L. Stougie. Approximation algorithms for the minimum test cover problem. Mathematical Programming-B, 2003. To Appear.]]Google Scholar
- K.M.J. De Bontridder, B.J. Lageweg, J.K. Lenstra, J.B. Orlin, and L. Stougie. Branch and bound algorithms for the test cover problem. In Proceedings of the 10th Annual European Symposium on Algorithms (ESA), pages 223--233, 2002.]] Google ScholarDigital Library
- D. Clayton. Choosing a set of haplotype tagging SNPs from a larger set of diallelic loci. www.nature.com/ng/journal/v29/n2/extref/ng1001-233-S10.pdf, 2001.]]Google Scholar
- Gusfield D. Algorithms on Strings, Trees, and Sequences. Cambridge University Press, 1997.]] Google ScholarDigital Library
- M.J. Daly, J.D. Rioux, S.F. Schaffner, T.J. Hudson, and E. S. Lander. High-resolution haplotype structure in the human genome. Nature Genetics, 29:229--232, 2001.]]Google ScholarCross Ref
- B. Devlin and N. Risch. A comparison of linkage disequilibrium measures for fine-scale mapping. Genomics, 29:311--322, 1995.]]Google ScholarCross Ref
- D. E. Reich et al. Linkage disequiblirium in the human genome. Nature, 2001.]]Google Scholar
- S.B. Gabriel, S.F. Schaffner, H. Nguyen, J.M. Moore, J. Roy, B. Blumenstiel, J. Higgins, M. DeFelice, A. Lochner, M. Faggart, S.N. Liu-Cordero, C. Rotimi, A. Adeyemo, R. Cooper, R. Ward, E.S. Lander, M.J. Daly, and D. Altschuler. The structure of haplotype blocks in the human genome. Science, 296:2225--2229, 2002.]]Google ScholarCross Ref
- B.V. Halldorsson, M.M. Halldorsson, and R. Ravi. On the approximability of the test collection problem. In Proceedings of the 9th Annual European Symposium on Algorithms (ESA), pages 158--169, 2001.]] Google ScholarDigital Library
- D. S. Hirschberg. A linear space algorithm for computing maximal common subsequence. Communications of the ACM, 18:341--343, 1975.]] Google ScholarDigital Library
- R.R. Hudson and N.L. Kaplan. Statistical properties of the number of recombination events in the history of a sample of DNA sequences. Genetics, 111:147--164, 1985.]]Google ScholarCross Ref
- A.J. Jeffreys, L. Kauppi, and R. Neumann. Intensely punctute meiotic recombination in the class II region of the major histocompatibility complex. Nature Genetics, 29:217--222, 2001.]]Google ScholarCross Ref
- R. Judson, B. Salisbury, J. Schneider, A. Windemuth, and J. C. Stephens. How many SNPs does a genome-wide haplotype map require? Pharmacogenomics, 3:379--391, 2002.]]Google ScholarCross Ref
- L. Kruglyak. Prospects for whole-genome linkage mapping of common disease genes. Nature Genetics, 22:139--144, 1999.]]Google ScholarCross Ref
- G. Lancia, V. Bafna, S. Istrail, R. Lippert, and R. Schwartz. SNPs problems, complexity and algorithms. In Proceedings of the 9th Annual European Symposium on Algorithms (ESA), pages 182--193, 2001.]] Google ScholarDigital Library
- R. Lippert, R. Schwartz, G. Lancia, and S. Istrail. Algorithmic strategies for the single nucleotide polymorphism haplotype assembly problem. Briefings in Bioinformatics, 3(1):23--31, 2002.]]Google ScholarCross Ref
- D. A. Nickerson, S. L. Taylor, S. M. Fullerton, K. M. Weiss, A. G. Clark, J. H. Stengaard, V. Salomaa, E. Boerwinkle, and C. F. Sing. Sequence diversity and large-scale typing of SNPs in the human apolipoprotein E gene. Genome Research, 10:1532--1545, 2000.]]Google ScholarCross Ref
- N. Patil et al. Blocks of limited haplotype diversity revealed by high resolution scanning of human chromosome 21. Science, 294:1719--1722, 2001.]]Google ScholarCross Ref
- R. Rizzi, V. Bafna, S. Istrail, and G. Lancia. Practical algorithms for the single individual SNP haplotyping problem. In Workshop on Algorithms in Bioinformatics, pages 29--43, 2002.]] Google ScholarDigital Library
- F. M. De La Vega, X. Su, H. Avi-Itzhak, B. V. Halldorsson, D. Gordon, A. Collins, R. A. Lippert, R. Schwartz, C. Scafe, Y. Wang, M. Laig-Webster, R. T. Koehler, J. Ziegle, L. Wogan, J.F. Stevens, K.M. Leinen, S.J. Olson, K.J. Guegler, X. You, L. Xu., H.G. Hemken, F. Kalush, A. G. Clark, S. Istrail, M. W. Hunkapiller, E. G. Spier, and D. A. Gilbert. The profile of linkage disequilibrium across human chromosomes 6, 21, and 22 in African-American and Caucasian populations. In preparation, 2003.]]Google Scholar
- K. Weiss and A. Clark. Linkage diseuilibrium and the mapping of comples human traits. Trends in Genetics, 18(1):19--24, 2002.]]Google ScholarCross Ref
- K. Zhang, M. Deng, T. Chen, M.S. Waterman, and F. Sun. A dynamic programming algorithm for haplotype block partitioning. Proceedings of the National Academy of Sciences, 99(11):7335--7339, 2002.]]Google ScholarCross Ref
Index Terms
- Haplotypes and informative SNP selection algorithms: don't block out information
Recommendations
Inferring combined CNV/SNP haplotypes from genotype data
Motivation: Copy number variations (CNVs) are increasingly recognized as an substantial source of individual genetic variation, and hence there is a growing interest in investigating the evolutionary history of CNVs as well as their impact on complex ...
Characterization of expressed sequence tags from a Gallus gallus pineal gland cDNA library: Research Articles
The pineal gland is the circadian oscillator in the chicken, regulating diverse functions ranging from egg laying to feeding. Here, we describe the isolation and characterization of expressed sequence tags (ESTs) isolated from a chicken pineal gland ...
A Compatibility Approach to Identify Recombination Breakpoints in Bacterial and Viral Genomes
ACM-BCB '17: Proceedings of the 8th ACM International Conference on Bioinformatics, Computational Biology,and Health InformaticsRecombination is an evolutionary force that results in mosaic genomes for microorganisms. The evolutionary history of microorganisms cannot be properly inferred if recombination has occurred among a set of taxa. That is, polymorphic sites of a multiple ...
Comments