Skip to main content

A Tree Based Method for the Rapid Screening of Chemical Fingerprints

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 5724))

Abstract

The fingerprint of a molecule is a bitstring based on its structure, constructed such that structurally similar molecules will have similar fingerprints. Molecular fingerprints can be used in an initial phase for identifying novel drug candidates by screening large databases for molecules with fingerprints similar to a query fingerprint. In this paper, we present a method which efficiently finds all fingerprints in a database with Tanimoto coefficient to the query fingerprint above a user defined threshold. The method is based on two novel data structures for rapid screening of large databases: the kD grid and the Multibit tree. The kD grid is based on splitting the fingerprints into k shorter bitstrings and utilising these to compute bounds on the similarity of the complete bitstrings. The Multibit tree uses hierarchical clustering and similarity within each cluster to compute similar bounds. We have implemented our method and tested it on a large data set from the industry. Our experiments show that our method yields a three-fold speed-up over previous methods.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Irwin, J.J., Shoichet, B.K.: Zinc: A free database of commercially available compounds for virtual screening. Journal of Chemical Information and Modeling 45(1), 177–182 (2005)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  2. Gillet, V.J., Willett, P., Bradshaw, J.: Similarity searching using reduced graphs. Journal of Chemical Information and Computer Sciences 43(2), 338–345 (2003)

    Article  CAS  PubMed  Google Scholar 

  3. Leach, A.R., Gillet, V.J.: An Introduction to Chemoinformatics. Rev. edn. Kluwer Academic Publishers, Dordrecht (2007)

    Book  Google Scholar 

  4. Willett, P.: Similarity-based approaches to virtual screening. Biochem. Soc. Trans. 31(Pt 3), 603–606 (2003)

    Article  CAS  PubMed  Google Scholar 

  5. Willett, P., Barnard, J.M., Downs, G.M.: Chemical similarity searching. Journal of Chemical Information and Computer Sciences 38(6), 983–996 (1998)

    Article  CAS  Google Scholar 

  6. Swamidass, S.J., Baldi, P.: Bounds and algorithms for fast exact searches of chemical fingerprints in linear and sublinear time. Journal of Chemical Information and Modeling 47(2), 302–317 (2007)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  7. Smellie, A.: Compressed binary bit trees: A new data structure for accelerating database searching. Journal of Chemical Information and Modeling 49(2), 257–262 (2009)

    Article  CAS  PubMed  Google Scholar 

  8. Baldi, P., Hirschberg, D.S., Nasr, R.J.: Speeding up chemical database searches using a proximity filter based on the logical exclusive or. Journal of Chemical Information and Modeling 48(7), 1367–1378 (2008)

    Article  CAS  PubMed  Google Scholar 

  9. Steinbeck, C., Han, Y., Kuhn, S., Horlacher, O., Luttmann, E., Willighagen, E.: The chemistry development kit (cdk): An open-source java library for chemo- and bioinformatics. Journal of Chemical Information and Computer Sciences 43(2), 493–500 (2003)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Kristensen, T.G., Nielsen, J., Pedersen, C.N.S. (2009). A Tree Based Method for the Rapid Screening of Chemical Fingerprints. In: Salzberg, S.L., Warnow, T. (eds) Algorithms in Bioinformatics. WABI 2009. Lecture Notes in Computer Science(), vol 5724. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04241-6_17

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-04241-6_17

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-04240-9

  • Online ISBN: 978-3-642-04241-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics