Skip to main content

Efficient Genome Wide Tagging by Reduction to SAT

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 5251))

Abstract

Whole genome association has recently demonstrated some remarkable successes in identifying loci involved in disease. Designing these studies involves selecting a subset of known single nucleotide polymorphisms (SNPs) or tag SNPs to be genotyped. The problem of choosing tag SNPs is an active area of research and is usually formulated such that the goal is to select the fewest number of tag SNPs which “cover” the remaining SNPs where “cover” is defined by some statistical criterion. Since the standard formulation of the tag SNP selection problem is NP-hard, most algorithms for selecting tag SNPs are either heuristics which do not guarantee selection of the minimal set of tag SNPs or are exhaustive algorithms which are computationally impractical. In this paper, we present a set of methods which guarantee discovering the minimal set of tag SNPs, yet in practice are much faster than traditional exhaustive algorithms. We demonstrate that our methods can be applied to discover minimal tag sets for the entire human genome. Our method converts the instance of the tag SNP selection problem to an instance of the satisfiability problem, encoding the instance into conjunctive normal form (CNF). We take advantage of the local structure inherent in human variation, as well as progress in knowledge compilation, and convert our CNF encoding into a representation known as DNNF, from which solutions to our original problem can be easily enumerated. We demonstrate our methods by constructing the optimal tag set for the whole genome and show that we significantly outperform previous exhaustive search-based methods. We also present optimal solutions for the problem of selecting multi-marker tags in which some SNPs are “covered” by a pair of tag SNPs. Multi-marker tags can significantly decrease the number of tags we need to select, however discovering the minimal number of multi-marker tags is much more difficult. We evaluate our methods and perform benchmark comparisons to other methods by choosing tag sets using the HapMap data.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   79.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bafna, V., Halldorsson, B.V., Schwartz, R., Clark, A., Istrail, S.: Haplotypes and informative snp selection: Don’t block out information. In: RECOMB, pp. 19–27 (2003)

    Google Scholar 

  2. Barrett, A.: From hybrid systems to universal plans via domain compilation. In: Proceedings of the 14th International Conference on Planning and Scheduling (ICAPS), pp. 44–51 (2004)

    Google Scholar 

  3. Barrett, A.: Model compilation for real-time planning and diagnosis with feedback. In: Proceedings of the Nineteenth International Joint Conference on Artificial Intelligence (IJCAI), pp. 1195–1200 (2005)

    Google Scholar 

  4. Bonet, B., Geffner, H.: Heuristics for planning with penalties and rewards using compiled knowledge. In: Proceedings of the International Conference on Principles of Knowledge Representation and Reasoning (KR), pp. 452–462 (2006)

    Google Scholar 

  5. Halldorsson, B.V., Bafna, V., Lippert, R., Schwartz, R., De La Vega, F.M., Clark, A.G., Istrail,: Optimal haplotype block-free selection of tagging snps for genome-wide assoaciation studies. Genome Research 14, 1633–1640 (2004)

    Article  Google Scholar 

  6. The c2d compiler, http://reasoning.cs.ucla.edu/c2d/

  7. Carlson, C.S., Eberle, M.A., Rieder, M.J., Yi, Q., Kruglyak, L., Nickerson, D.A.: Selecting a maximally informative set of single-nucleotide polymorphisms for association analyses using linkage disequilibrium. Am. J. Hum. Genet. 74(1), 106–120 (2004)

    Article  Google Scholar 

  8. Chavira, M., Darwiche, A.: Compiling Bayesian networks with local structure. In: Proceedings of the 19th International Joint Conference on Artificial Intelligence (IJCAI), pp. 1306–1312 (2005)

    Google Scholar 

  9. Chavira, M., Darwiche, A., Jaeger, M.: Compiling relational Bayesian networks for exact inference. International Journal of Approximate Reasoning 42(1–2), 4–20 (2006)

    Article  MATH  MathSciNet  Google Scholar 

  10. The International HapMap Consortium. A haplotype map of the human genome 437(7063), 1299–1320 (2005)

    Google Scholar 

  11. The Wellcome Trust Case Control Consortium. Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature 447, 661–678 (2007)

    Google Scholar 

  12. Darwiche, A.: Decomposable negation normal form. Journal of the ACM 48(4), 608–647 (2001)

    Article  MATH  MathSciNet  Google Scholar 

  13. Darwiche, A.: On the tractability of counting theory models and its application to belief revision and truth maintenance. Journal of Applied Non-Classical Logics 11(1-2), 11–34 (2001)

    Article  MATH  MathSciNet  Google Scholar 

  14. Darwiche, A.: A compiler for deterministic, decomposable negation normal form. In: Proceedings of the Eighteenth National Conference on Artificial Intelligence (AAAI), pp. 627–634. AAAI Press, Menlo Park (2002)

    Google Scholar 

  15. Darwiche, A.: New advances in compiling CNF to decomposable negational normal form. In: Proceedings of European Conference on Artificial Intelligence, pp. 328–332 (2004)

    Google Scholar 

  16. Darwiche, A., Marquis, P.: A knowledge compilation map. Journal of Artificial Intelligence Research 17, 229–264 (2002)

    MATH  MathSciNet  Google Scholar 

  17. Darwiche, A., Marquis, P.: Compiling propositional weighted bases. Artificial Intelligence 157(1-2), 81–113 (2004)

    Article  MATH  MathSciNet  Google Scholar 

  18. de Bakker, P.I.W., Yelensky, R., Pe’er, I., Gabriel, S.B., Daly, M.J., Altshuler, D.: Efficiency and power in genetic association studies. Nat. Genet. 37(11), 1217–1223 (2005)

    Article  Google Scholar 

  19. Elliott, P., Williams, B.: Dnnf-based belief state estimation. In: Proceedings of the 21st National Conference on Artificial Intelligence (AAAI 2006) (2006)

    Google Scholar 

  20. Darwiche, A., Palacios, H., Bonet, B., Geffner, H.: Pruning conformant plans by counting models on compiled d-dnnf representations. In: Proceedings of the 15th International Conference on Planning and Scheduling (ICAPS), pp. 141–150. AAAI Press, Menlo Park (2005)

    Google Scholar 

  21. Huang, J.: Complan: A conformant probabilistic planner. In: Proceedings of the 16th International Conference on Planning and Scheduling (ICAPS) (2006)

    Google Scholar 

  22. Huang, J., Darwiche, A.: On compiling system models for faster and more scalable diagnosis. In: Proceedings of the 20th National Conference on Artificial Intelligence (AAAI), pp. 300–306 (2005)

    Google Scholar 

  23. Pe’er, I., de Bakker, P.I.W., Maller, J., Yelensky, R., Altshuler, D., Daly, M.: Evaluating and improving power in whole genome association studies using fixed marker sets. Nature Genetics 38, 663–667 (2006)

    Article  Google Scholar 

  24. Qin, Z.S., Gopalakrishnan, S., Abecasis, G.R.: An efficient comprehensive search algorithm for tagSNP selection using linkage disequilibrium criteria. Bioinformatics 22(2), 220–225 (2006)

    Article  Google Scholar 

  25. Sang, T., Beame, P., Kautz, H.: Solving Bayesian networks by weighted model counting. In: Proceedings of the Twentieth National Conference on Artificial Intelligence (AAAI 2005), vol. 1, pp. 475–482. AAAI Press, Menlo Park (2005)

    Google Scholar 

  26. Siddiqi, S., Huang, J.: Hierarchical diagnosis of multiple faults. In: Proceedings of the Twentieth International Joint Conference on Artificial Intelligence (IJCAI) (2007)

    Google Scholar 

  27. Halldorsson, B.V., Istraila, S., De La Vegab, F.M.: Optimal selection of snp markers for disease association studies. Human Heredity 58, 190–202 (2004)

    Article  Google Scholar 

  28. Wachter, M., Haenni, R.: Logical compilation of bayesian networks. Technical Report iam-06-006, University of Bern, Switzerland (2006)

    Google Scholar 

  29. Yolifè Arvelo, M.-E.V., Bonet, B.: Compilation of query–rewriting problems into tractable fragments of propositional logic. In: Proceedings of AAAI National Conference (2006)

    Google Scholar 

  30. Zaitlen, N., Kang, H.M., Eskin, E., Halperin, E.: Leveraging the HapMap correlation structure in association studies. Am. J. Hum. Genet. 80(4), 683–691 (2007)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Keith A. Crandall Jens Lagergren

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Choi, A., Zaitlen, N., Han, B., Pipatsrisawat, K., Darwiche, A., Eskin, E. (2008). Efficient Genome Wide Tagging by Reduction to SAT. In: Crandall, K.A., Lagergren, J. (eds) Algorithms in Bioinformatics. WABI 2008. Lecture Notes in Computer Science(), vol 5251. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-87361-7_12

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-87361-7_12

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-87360-0

  • Online ISBN: 978-3-540-87361-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics