Skip to main content

Using Non-Zero Dimensions and Lengths of Vectors for the Tanimoto Similarity Search among Real Valued Vectors

  • Conference paper
Intelligent Information and Database Systems (ACIIDS 2014)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8397))

Included in the following conference series:

Abstract

The Tanimoto similarity measure finds numerous applications e.g. in chemical informatics, bioinformatics, information retrieval, text and web mining. Recently, two efficient methods for reducing the number of candidates for Tanimoto similar real valued vectors have been offered: the one using lengths of vectors and the other using their non-zero dimensions. In this paper, we offer new theoretical results on combined usage of lengths of real valued vectors and their non-zero dimensions for more efficient reduction of candidates for Tanimoto similar vectors. In particular, we derive more restrictive bounds on lengths of such candidate vectors.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Arasu, A., Ganti, V., Kaushik, R.: Efficient exact set-similarity joins. In: Proc. of VLDB 2006. ACM (2006)

    Google Scholar 

  2. Chaudhuri, S., Ganti, V., Kaushik, R.L.: A primitive operator for similarity joins in data cleaning. In: Proceedings of ICDE 2006. IEEE Computer Society (2006)

    Google Scholar 

  3. Kryszkiewicz, M.: Bounds on Lengths of Real Valued Vectors Similar with Regard to the Tanimoto Similarity. In: Selamat, A., Nguyen, N.T., Haron, H. (eds.) ACIIDS 2013, Part I. LNCS, vol. 7802, pp. 445–454. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  4. Kryszkiewicz, M.: On Cosine and Tanimoto Near Duplicates Search among Vectors with Domains Consisting of Zero, a Positive Number and a Negative Number. In: Larsen, H.L., Martin-Bautista, M.J., Vila, M.A., Andreasen, T., Christiansen, H. (eds.) FQAS 2013. LNCS (LNAI), vol. 8132, pp. 531–542. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  5. Kryszkiewicz, M.: Using Non-Zero Dimensions for the Cosine and Tanimoto Similarity Search among Real Valued Vectors. Fundamenta Informaticae 127, 307–323 (2013)

    MATH  Google Scholar 

  6. Rajaraman, A., Ullman, J.D.: Mining of Massive Datasets. Cambridge Univ. Press (2011)

    Google Scholar 

  7. Willett, P., Barnard, J.M., Downs, G.M.: Chemical similarity searching. J. Chem. Inf. Comput. Sci. 38(6), 983–996 (1998)

    Article  Google Scholar 

  8. Witten, I.H., Moffat, A., Bell, T.C.: Managing Gigabytes: Compressing and Indexing Documents and Images. Morgan Kaufmann (1999)

    Google Scholar 

  9. Xiao, C., Wang, W., Lin, X., Yu, J.X.: Efficient similarity joins for near duplicate detection. In: Proc. of WWW Conference, pp. 131–140 (2008)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Kryszkiewicz, M. (2014). Using Non-Zero Dimensions and Lengths of Vectors for the Tanimoto Similarity Search among Real Valued Vectors. In: Nguyen, N.T., Attachoo, B., Trawiński, B., Somboonviwat, K. (eds) Intelligent Information and Database Systems. ACIIDS 2014. Lecture Notes in Computer Science(), vol 8397. Springer, Cham. https://doi.org/10.1007/978-3-319-05476-6_18

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-05476-6_18

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-05475-9

  • Online ISBN: 978-3-319-05476-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics