Skip to main content

Statistical Distribution of Chemical Fingerprints

  • Conference paper
Fuzzy Logic and Applications (WILF 2005)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3849))

Included in the following conference series:

Abstract

Binary fingerprints are binary vectors used to represent chemical molecules by recording the presence or absence of particular substructures, such as labeled paths in the 2D graph of bonds. Complete fingerprints are often reduced to a compressed format–of typical dimension n = 512 or n = 1024–by using a simple congruence operation. The statistical properties of complete or compressed fingerprints representations are important since fingerprints are used to rapidly search large databases and to develop statistical machine learning methods in chemoinformatics. Here we present an empirical and mathematical analysis of the distribution of complete and compressed fingerprints. In particular, we derive formulas that provide good approximation for the expected number of bits set to one in a compressed fingerprint, given its uncompressed version, and vice versa.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Altschul, S., Madden, T., Shaffer, A., Zhang, J., Zhang, Z., Miller, W., Lipman, D.: Gapped Blast and PSI-Blast: a new generation of protein database search programs. Nucl. Acids Res. 25, 3389–3402 (1997)

    Article  Google Scholar 

  2. Bollobas, B.: Random Graphs. Academic Press, London (1985)

    MATH  Google Scholar 

  3. Chen, J., Swamidass, S.J., Dou, Y., Bruand, J., Baldi, P.: ChemDB: a public database of small molecules and related chemoinformatics resources (2005) (Submitted)

    Google Scholar 

  4. Fligner, M.A., Verducci, J.S., Blower, P.E.: A Modification of the Jaccard/Tanimoto Similarity Index for Diverse Selection of Chemical Compounds Using Binary Strings. Technometrics 44(2), 110–119 (2002)

    Article  MathSciNet  Google Scholar 

  5. Flower, D.R.: On the properties of bit string-based measures of chemical similarity. J. of Chemical Information and Computer Science 38, 378–386 (1998)

    Google Scholar 

  6. Irwin, J.J., Shoichet, B.K.: ZINC–a free database of commercially available compounds for virtual screening. Journal of Chemical Information and Computer Sciences 45, 177–182 (2005)

    Google Scholar 

  7. Ralaivola, L., Swamidass, S.J., Saigo, H., Baldi, P.: Graph kernels for chemical informatics. Neural Networks (2005); Special issue on Neural Networks and Kernel Methods for Structured Domains (In press)

    Google Scholar 

  8. Rouvray, D.: Definition and role of similarity concepts in the chemical and physical sciences. Journal of Chemical Information and Computer Sciences 32(6), 580–586 (1992)

    Google Scholar 

  9. Swamidass, S.J., Chen, J., Bruand, J., Phung, P., Ralaivola, L., Baldi, P.: Kernels for small molecules and the prediction of mutagenicity, toxicity, and anti-cancer activity. Bioinformatics 21(suppl. 1), i359–368 (2005); Proceedings of the 2005 ISMB Conference

    Article  Google Scholar 

  10. Tversky, A.: Features of similarity. Psychological Review 84(4), 327–352 (1977)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Swamidass, S.J., Baldi, P. (2006). Statistical Distribution of Chemical Fingerprints. In: Bloch, I., Petrosino, A., Tettamanzi, A.G.B. (eds) Fuzzy Logic and Applications. WILF 2005. Lecture Notes in Computer Science(), vol 3849. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11676935_2

Download citation

  • DOI: https://doi.org/10.1007/11676935_2

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-32529-1

  • Online ISBN: 978-3-540-32530-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics