Efficient Genome Wide Tagging by Reduction to SAT

Choi, Arthur; Zaitlen, Noah; Han, Buhm; Pipatsrisawat, Knot; Darwiche, Adnan; Eskin, Eleazar

doi:10.1007/978-3-540-87361-7_12

Efficient Genome Wide Tagging by Reduction to SAT

Arthur Choi¹,
Noah Zaitlen²,
Buhm Han³,
Knot Pipatsrisawat¹,
Adnan Darwiche¹ &
…
Eleazar Eskin^1,4

Conference paper

999 Accesses
2 Citations

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 5251))

Abstract

Whole genome association has recently demonstrated some remarkable successes in identifying loci involved in disease. Designing these studies involves selecting a subset of known single nucleotide polymorphisms (SNPs) or tag SNPs to be genotyped. The problem of choosing tag SNPs is an active area of research and is usually formulated such that the goal is to select the fewest number of tag SNPs which “cover” the remaining SNPs where “cover” is defined by some statistical criterion. Since the standard formulation of the tag SNP selection problem is NP-hard, most algorithms for selecting tag SNPs are either heuristics which do not guarantee selection of the minimal set of tag SNPs or are exhaustive algorithms which are computationally impractical. In this paper, we present a set of methods which guarantee discovering the minimal set of tag SNPs, yet in practice are much faster than traditional exhaustive algorithms. We demonstrate that our methods can be applied to discover minimal tag sets for the entire human genome. Our method converts the instance of the tag SNP selection problem to an instance of the satisfiability problem, encoding the instance into conjunctive normal form (CNF). We take advantage of the local structure inherent in human variation, as well as progress in knowledge compilation, and convert our CNF encoding into a representation known as DNNF, from which solutions to our original problem can be easily enumerated. We demonstrate our methods by constructing the optimal tag set for the whole genome and show that we significantly outperform previous exhaustive search-based methods. We also present optimal solutions for the problem of selecting multi-marker tags in which some SNPs are “covered” by a pair of tag SNPs. Multi-marker tags can significantly decrease the number of tags we need to select, however discovering the minimal number of multi-marker tags is much more difficult. We evaluate our methods and perform benchmark comparisons to other methods by choosing tag sets using the HapMap data.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Bafna, V., Halldorsson, B.V., Schwartz, R., Clark, A., Istrail, S.: Haplotypes and informative snp selection: Don’t block out information. In: RECOMB, pp. 19–27 (2003)
Google Scholar
Barrett, A.: From hybrid systems to universal plans via domain compilation. In: Proceedings of the 14th International Conference on Planning and Scheduling (ICAPS), pp. 44–51 (2004)
Google Scholar
Barrett, A.: Model compilation for real-time planning and diagnosis with feedback. In: Proceedings of the Nineteenth International Joint Conference on Artificial Intelligence (IJCAI), pp. 1195–1200 (2005)
Google Scholar
Bonet, B., Geffner, H.: Heuristics for planning with penalties and rewards using compiled knowledge. In: Proceedings of the International Conference on Principles of Knowledge Representation and Reasoning (KR), pp. 452–462 (2006)
Google Scholar
Halldorsson, B.V., Bafna, V., Lippert, R., Schwartz, R., De La Vega, F.M., Clark, A.G., Istrail,: Optimal haplotype block-free selection of tagging snps for genome-wide assoaciation studies. Genome Research 14, 1633–1640 (2004)
Article Google Scholar
The c2d compiler, http://reasoning.cs.ucla.edu/c2d/
Carlson, C.S., Eberle, M.A., Rieder, M.J., Yi, Q., Kruglyak, L., Nickerson, D.A.: Selecting a maximally informative set of single-nucleotide polymorphisms for association analyses using linkage disequilibrium. Am. J. Hum. Genet. 74(1), 106–120 (2004)
Article Google Scholar
Chavira, M., Darwiche, A.: Compiling Bayesian networks with local structure. In: Proceedings of the 19th International Joint Conference on Artificial Intelligence (IJCAI), pp. 1306–1312 (2005)
Google Scholar
Chavira, M., Darwiche, A., Jaeger, M.: Compiling relational Bayesian networks for exact inference. International Journal of Approximate Reasoning 42(1–2), 4–20 (2006)
Article MATH MathSciNet Google Scholar
The International HapMap Consortium. A haplotype map of the human genome 437(7063), 1299–1320 (2005)
Google Scholar
The Wellcome Trust Case Control Consortium. Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature 447, 661–678 (2007)
Google Scholar
Darwiche, A.: Decomposable negation normal form. Journal of the ACM 48(4), 608–647 (2001)
Article MATH MathSciNet Google Scholar
Darwiche, A.: On the tractability of counting theory models and its application to belief revision and truth maintenance. Journal of Applied Non-Classical Logics 11(1-2), 11–34 (2001)
Article MATH MathSciNet Google Scholar
Darwiche, A.: A compiler for deterministic, decomposable negation normal form. In: Proceedings of the Eighteenth National Conference on Artificial Intelligence (AAAI), pp. 627–634. AAAI Press, Menlo Park (2002)
Google Scholar
Darwiche, A.: New advances in compiling CNF to decomposable negational normal form. In: Proceedings of European Conference on Artificial Intelligence, pp. 328–332 (2004)
Google Scholar
Darwiche, A., Marquis, P.: A knowledge compilation map. Journal of Artificial Intelligence Research 17, 229–264 (2002)
MATH MathSciNet Google Scholar
Darwiche, A., Marquis, P.: Compiling propositional weighted bases. Artificial Intelligence 157(1-2), 81–113 (2004)
Article MATH MathSciNet Google Scholar
de Bakker, P.I.W., Yelensky, R., Pe’er, I., Gabriel, S.B., Daly, M.J., Altshuler, D.: Efficiency and power in genetic association studies. Nat. Genet. 37(11), 1217–1223 (2005)
Article Google Scholar
Elliott, P., Williams, B.: Dnnf-based belief state estimation. In: Proceedings of the 21st National Conference on Artificial Intelligence (AAAI 2006) (2006)
Google Scholar
Darwiche, A., Palacios, H., Bonet, B., Geffner, H.: Pruning conformant plans by counting models on compiled d-dnnf representations. In: Proceedings of the 15th International Conference on Planning and Scheduling (ICAPS), pp. 141–150. AAAI Press, Menlo Park (2005)
Google Scholar
Huang, J.: Complan: A conformant probabilistic planner. In: Proceedings of the 16th International Conference on Planning and Scheduling (ICAPS) (2006)
Google Scholar
Huang, J., Darwiche, A.: On compiling system models for faster and more scalable diagnosis. In: Proceedings of the 20th National Conference on Artificial Intelligence (AAAI), pp. 300–306 (2005)
Google Scholar
Pe’er, I., de Bakker, P.I.W., Maller, J., Yelensky, R., Altshuler, D., Daly, M.: Evaluating and improving power in whole genome association studies using fixed marker sets. Nature Genetics 38, 663–667 (2006)
Article Google Scholar
Qin, Z.S., Gopalakrishnan, S., Abecasis, G.R.: An efficient comprehensive search algorithm for tagSNP selection using linkage disequilibrium criteria. Bioinformatics 22(2), 220–225 (2006)
Article Google Scholar
Sang, T., Beame, P., Kautz, H.: Solving Bayesian networks by weighted model counting. In: Proceedings of the Twentieth National Conference on Artificial Intelligence (AAAI 2005), vol. 1, pp. 475–482. AAAI Press, Menlo Park (2005)
Google Scholar
Siddiqi, S., Huang, J.: Hierarchical diagnosis of multiple faults. In: Proceedings of the Twentieth International Joint Conference on Artificial Intelligence (IJCAI) (2007)
Google Scholar
Halldorsson, B.V., Istraila, S., De La Vegab, F.M.: Optimal selection of snp markers for disease association studies. Human Heredity 58, 190–202 (2004)
Article Google Scholar
Wachter, M., Haenni, R.: Logical compilation of bayesian networks. Technical Report iam-06-006, University of Bern, Switzerland (2006)
Google Scholar
Yolifè Arvelo, M.-E.V., Bonet, B.: Compilation of query–rewriting problems into tractable fragments of propositional logic. In: Proceedings of AAAI National Conference (2006)
Google Scholar
Zaitlen, N., Kang, H.M., Eskin, E., Halperin, E.: Leveraging the HapMap correlation structure in association studies. Am. J. Hum. Genet. 80(4), 683–691 (2007)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, University of California Los Angeles, Los Angeles, CA, 90095,
Arthur Choi, Knot Pipatsrisawat, Adnan Darwiche & Eleazar Eskin
Bioinformatics Program, University of California, San Diego, La Jolla, CA, 92093,
Noah Zaitlen
Department of Computer Science and Engineering, University of California, San Diego, La Jolla, CA, 92093,
Buhm Han
Department of Human Genetics, University of California Los Angeles, Los Angeles, CA, 90095,
Eleazar Eskin

Authors

Arthur Choi
View author publications
You can also search for this author in PubMed Google Scholar
Noah Zaitlen
View author publications
You can also search for this author in PubMed Google Scholar
Buhm Han
View author publications
You can also search for this author in PubMed Google Scholar
Knot Pipatsrisawat
View author publications
You can also search for this author in PubMed Google Scholar
Adnan Darwiche
View author publications
You can also search for this author in PubMed Google Scholar
Eleazar Eskin
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Keith A. Crandall Jens Lagergren

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Choi, A., Zaitlen, N., Han, B., Pipatsrisawat, K., Darwiche, A., Eskin, E. (2008). Efficient Genome Wide Tagging by Reduction to SAT. In: Crandall, K.A., Lagergren, J. (eds) Algorithms in Bioinformatics. WABI 2008. Lecture Notes in Computer Science(), vol 5251. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-87361-7_12

Download citation

DOI: https://doi.org/10.1007/978-3-540-87361-7_12
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-87360-0
Online ISBN: 978-3-540-87361-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics