Abstract
The Tanimoto similarity measure finds numerous applications e.g. in chemical informatics, bioinformatics, information retrieval, text and web mining. Recently, two efficient methods for reducing the number of candidates for Tanimoto similar real valued vectors have been offered: the one using lengths of vectors and the other using their non-zero dimensions. In this paper, we offer new theoretical results on combined usage of lengths of real valued vectors and their non-zero dimensions for more efficient reduction of candidates for Tanimoto similar vectors. In particular, we derive more restrictive bounds on lengths of such candidate vectors.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Arasu, A., Ganti, V., Kaushik, R.: Efficient exact set-similarity joins. In: Proc. of VLDB 2006. ACM (2006)
Chaudhuri, S., Ganti, V., Kaushik, R.L.: A primitive operator for similarity joins in data cleaning. In: Proceedings of ICDE 2006. IEEE Computer Society (2006)
Kryszkiewicz, M.: Bounds on Lengths of Real Valued Vectors Similar with Regard to the Tanimoto Similarity. In: Selamat, A., Nguyen, N.T., Haron, H. (eds.) ACIIDS 2013, Part I. LNCS, vol. 7802, pp. 445–454. Springer, Heidelberg (2013)
Kryszkiewicz, M.: On Cosine and Tanimoto Near Duplicates Search among Vectors with Domains Consisting of Zero, a Positive Number and a Negative Number. In: Larsen, H.L., Martin-Bautista, M.J., Vila, M.A., Andreasen, T., Christiansen, H. (eds.) FQAS 2013. LNCS (LNAI), vol. 8132, pp. 531–542. Springer, Heidelberg (2013)
Kryszkiewicz, M.: Using Non-Zero Dimensions for the Cosine and Tanimoto Similarity Search among Real Valued Vectors. Fundamenta Informaticae 127, 307–323 (2013)
Rajaraman, A., Ullman, J.D.: Mining of Massive Datasets. Cambridge Univ. Press (2011)
Willett, P., Barnard, J.M., Downs, G.M.: Chemical similarity searching. J. Chem. Inf. Comput. Sci. 38(6), 983–996 (1998)
Witten, I.H., Moffat, A., Bell, T.C.: Managing Gigabytes: Compressing and Indexing Documents and Images. Morgan Kaufmann (1999)
Xiao, C., Wang, W., Lin, X., Yu, J.X.: Efficient similarity joins for near duplicate detection. In: Proc. of WWW Conference, pp. 131–140 (2008)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Kryszkiewicz, M. (2014). Using Non-Zero Dimensions and Lengths of Vectors for the Tanimoto Similarity Search among Real Valued Vectors. In: Nguyen, N.T., Attachoo, B., Trawiński, B., Somboonviwat, K. (eds) Intelligent Information and Database Systems. ACIIDS 2014. Lecture Notes in Computer Science(), vol 8397. Springer, Cham. https://doi.org/10.1007/978-3-319-05476-6_18
Download citation
DOI: https://doi.org/10.1007/978-3-319-05476-6_18
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-05475-9
Online ISBN: 978-3-319-05476-6
eBook Packages: Computer ScienceComputer Science (R0)