Abstract
The fingerprint of a molecule is a bitstring based on its structure, constructed such that structurally similar molecules will have similar fingerprints. Molecular fingerprints can be used in an initial phase for identifying novel drug candidates by screening large databases for molecules with fingerprints similar to a query fingerprint. In this paper, we present a method which efficiently finds all fingerprints in a database with Tanimoto coefficient to the query fingerprint above a user defined threshold. The method is based on two novel data structures for rapid screening of large databases: the kD grid and the Multibit tree. The kD grid is based on splitting the fingerprints into k shorter bitstrings and utilising these to compute bounds on the similarity of the complete bitstrings. The Multibit tree uses hierarchical clustering and similarity within each cluster to compute similar bounds. We have implemented our method and tested it on a large data set from the industry. Our experiments show that our method yields a three-fold speed-up over previous methods.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Irwin, J.J., Shoichet, B.K.: Zinc: A free database of commercially available compounds for virtual screening. Journal of Chemical Information and Modeling 45(1), 177–182 (2005)
Gillet, V.J., Willett, P., Bradshaw, J.: Similarity searching using reduced graphs. Journal of Chemical Information and Computer Sciences 43(2), 338–345 (2003)
Leach, A.R., Gillet, V.J.: An Introduction to Chemoinformatics. Rev. edn. Kluwer Academic Publishers, Dordrecht (2007)
Willett, P.: Similarity-based approaches to virtual screening. Biochem. Soc. Trans. 31(Pt 3), 603–606 (2003)
Willett, P., Barnard, J.M., Downs, G.M.: Chemical similarity searching. Journal of Chemical Information and Computer Sciences 38(6), 983–996 (1998)
Swamidass, S.J., Baldi, P.: Bounds and algorithms for fast exact searches of chemical fingerprints in linear and sublinear time. Journal of Chemical Information and Modeling 47(2), 302–317 (2007)
Smellie, A.: Compressed binary bit trees: A new data structure for accelerating database searching. Journal of Chemical Information and Modeling 49(2), 257–262 (2009)
Baldi, P., Hirschberg, D.S., Nasr, R.J.: Speeding up chemical database searches using a proximity filter based on the logical exclusive or. Journal of Chemical Information and Modeling 48(7), 1367–1378 (2008)
Steinbeck, C., Han, Y., Kuhn, S., Horlacher, O., Luttmann, E., Willighagen, E.: The chemistry development kit (cdk): An open-source java library for chemo- and bioinformatics. Journal of Chemical Information and Computer Sciences 43(2), 493–500 (2003)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Kristensen, T.G., Nielsen, J., Pedersen, C.N.S. (2009). A Tree Based Method for the Rapid Screening of Chemical Fingerprints. In: Salzberg, S.L., Warnow, T. (eds) Algorithms in Bioinformatics. WABI 2009. Lecture Notes in Computer Science(), vol 5724. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04241-6_17
Download citation
DOI: https://doi.org/10.1007/978-3-642-04241-6_17
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-04240-9
Online ISBN: 978-3-642-04241-6
eBook Packages: Computer ScienceComputer Science (R0)