Skip to main content

Selecting Sketches for Similarity Search

  • Conference paper
  • First Online:
Advances in Databases and Information Systems (ADBIS 2018)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11019))

Included in the following conference series:

Abstract

Techniques of the Hamming embedding, producing bit string sketches, have been recently successfully applied to speed up similarity search. Sketches are usually compared by the Hamming distance, and applied to filter out non-relevant objects during the query evaluation. As several sketching techniques exist and each can produce sketches with different lengths, it is hard to select a proper configuration for a particular dataset. We assume that the (dis)similarity of objects is expressed by an arbitrary metric function, and we propose a way to efficiently estimate the quality of sketches using just a small sample set of data. Our approach is based on a probabilistic analysis of sketches which describes how separated are objects after projection to the Hamming space.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Reasoning is provided at https://www.fi.muni.cz/~xmic/sketches/Symmetry.pdf.

  2. 2.

    Please see, that the sign of \(\textit{sep}_\textit{sk}(x_1, x_2)\) is given by the sign of function \(f_\textit{sk}(x_1, x_2)\), and this is negative iff \(p_i(x_2, 1) < p_i(x_1, 1)\). We have assumed \(x_1 \le x_2\), and these two inequalities are equivalent to swapping distances \(x_1, x_2\).

  3. 3.

    http://disa.fi.muni.cz/profiset/.

  4. 4.

    http://corpus-texmex.irisa.fr/.

  5. 5.

    https://www.fi.muni.cz/~xmic/sketches/AlgSelectLowCorBits.pdf.

References

  1. Charikar, M.: Similarity estimation techniques from rounding algorithms. In: Proceedings on 34th Annual ACM Symposium on Theory of Computing, Montréal, Québec, Canada, 19–21 May 2002, pp. 380–388 (2002)

    Google Scholar 

  2. Chávez, E., Navarro, G., Baeza-Yates, R., Marroquín, J.L.: Searching in metric spaces. ACM Comput. Surv. 33(3), 273–321 (2001)

    Article  Google Scholar 

  3. Daugman, J.: The importance of being random: statistical principles of iris recognition. Pattern Recogn. 36(2), 279–291 (2003)

    Article  Google Scholar 

  4. Donahue, J., et al.: DeCaf: a deep convolutional activation feature for generic visual recognition. In: ICML, vol. 32, pp. 647–655 (2014)

    Google Scholar 

  5. Dong, W., Charikar, M., Li, K.: Asymmetric distance estimation with sketches for similarity search in high-dimensional spaces. In: Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM (2008)

    Google Scholar 

  6. Gordo, A., Perronnin, F., Gong, Y., Lazebnik, S.: Asymmetric distances for binary embeddings. IEEE Trans. Pattern Anal. Mach. Intell. 36(1), 33–47 (2014)

    Article  Google Scholar 

  7. Jegou, H., Douze, M., Schmid, C.: Hamming embedding and weak geometric consistency for large scale image search. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008 Part I. LNCS, vol. 5302, pp. 304–317. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-88682-2_24

    Chapter  Google Scholar 

  8. Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks, vol. 60, pp. 84–90 (2017)

    Google Scholar 

  9. Lowe, D.G.: Object recognition from local scale-invariant features. In: ICCV, pp. 1150–1157 (1999)

    Google Scholar 

  10. Lv, Q., Charikar, M., Li, K.: Image similarity search with compact data structures. In: Proceedings of the 2004 ACM CIKM International Conference on Information and Knowledge Management, Washington, DC, USA, 8–13 November 2004, pp. 208–217 (2004)

    Google Scholar 

  11. Lv, Q., Josephson, W., Wang, Z., Charikar, M., Li, K.: Ferret: a toolkit for content-based similarity search of feature-rich data. In: ACM SIGOPS Operating Systems Review (2006)

    Google Scholar 

  12. Mic, V., Novak, D., Zezula, P.: Designing sketches for similarity filtering. In: IEEE International Conference on Data Mining Workshops, ICDMW 2016, Barcelona, Spain, 12–15 December 2016, pp. 655–662 (2016)

    Google Scholar 

  13. Mic, V., Novak, D., Zezula, P.: Sketches with unbalanced bits for similarity search. In: Beecks, C., Borutta, F., Kröger, P., Seidl, T. (eds.) Similarity Search and Applications, vol. 10609, pp. 53–63. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-68474-1_4

    Chapter  Google Scholar 

  14. Muller-Molina, A.J., Shinohara, T.: Efficient similarity search by reducing i/o with compressed sketches. In: Proceedings of the 2nd International Workshop on Similarity Search and Applications, pp. 30–38 (2009)

    Google Scholar 

  15. Oquab, M., Bottou, L., Laptev, I., Sivic, J.: Learning and transferring mid-level image representations using convolutional neural networks. In: Proceedings of the IEEE CVPR Conference (2014)

    Google Scholar 

  16. Samet, H.: Foundations of Multidimensional and Metric Data Structures. Morgan Kaufmann, Burlington (2006)

    MATH  Google Scholar 

  17. Wang, Z., Dong, W., Josephson, W., Lv, Q., Charikar, M., Li, K.: Sizing sketches: a rank-based analysis for similarity search. In: Proceedings of the 2007 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems, SIGMETRICS 2007, San Diego, California, USA, 12–16 June 2007, pp. 157–168 (2007). https://doi.org/10.1145/1254882.1254900

  18. Zezula, P., Amato, G., Dohnal, V., Batko, M.: Similarity Search: The Metric Space Approach, vol. 32. Springer, Heidelberg (2006). https://doi.org/10.1007/0-387-29151-2

    Book  MATH  Google Scholar 

Download references

Acknowledgements

Paper was supported by the Czech Science Foundation project GBP103/12/G084.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to David Novak .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Mic, V., Novak, D., Vadicamo, L., Zezula, P. (2018). Selecting Sketches for Similarity Search. In: Benczúr, A., Thalheim, B., Horváth, T. (eds) Advances in Databases and Information Systems. ADBIS 2018. Lecture Notes in Computer Science(), vol 11019. Springer, Cham. https://doi.org/10.1007/978-3-319-98398-1_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-98398-1_9

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-98397-4

  • Online ISBN: 978-3-319-98398-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics