Abstract
Techniques of the Hamming embedding, producing bit string sketches, have been recently successfully applied to speed up similarity search. Sketches are usually compared by the Hamming distance, and applied to filter out non-relevant objects during the query evaluation. As several sketching techniques exist and each can produce sketches with different lengths, it is hard to select a proper configuration for a particular dataset. We assume that the (dis)similarity of objects is expressed by an arbitrary metric function, and we propose a way to efficiently estimate the quality of sketches using just a small sample set of data. Our approach is based on a probabilistic analysis of sketches which describes how separated are objects after projection to the Hamming space.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Reasoning is provided at https://www.fi.muni.cz/~xmic/sketches/Symmetry.pdf.
- 2.
Please see, that the sign of \(\textit{sep}_\textit{sk}(x_1, x_2)\) is given by the sign of function \(f_\textit{sk}(x_1, x_2)\), and this is negative iff \(p_i(x_2, 1) < p_i(x_1, 1)\). We have assumed \(x_1 \le x_2\), and these two inequalities are equivalent to swapping distances \(x_1, x_2\).
- 3.
- 4.
- 5.
References
Charikar, M.: Similarity estimation techniques from rounding algorithms. In: Proceedings on 34th Annual ACM Symposium on Theory of Computing, Montréal, Québec, Canada, 19–21 May 2002, pp. 380–388 (2002)
Chávez, E., Navarro, G., Baeza-Yates, R., MarroquÃn, J.L.: Searching in metric spaces. ACM Comput. Surv. 33(3), 273–321 (2001)
Daugman, J.: The importance of being random: statistical principles of iris recognition. Pattern Recogn. 36(2), 279–291 (2003)
Donahue, J., et al.: DeCaf: a deep convolutional activation feature for generic visual recognition. In: ICML, vol. 32, pp. 647–655 (2014)
Dong, W., Charikar, M., Li, K.: Asymmetric distance estimation with sketches for similarity search in high-dimensional spaces. In: Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM (2008)
Gordo, A., Perronnin, F., Gong, Y., Lazebnik, S.: Asymmetric distances for binary embeddings. IEEE Trans. Pattern Anal. Mach. Intell. 36(1), 33–47 (2014)
Jegou, H., Douze, M., Schmid, C.: Hamming embedding and weak geometric consistency for large scale image search. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008 Part I. LNCS, vol. 5302, pp. 304–317. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-88682-2_24
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks, vol. 60, pp. 84–90 (2017)
Lowe, D.G.: Object recognition from local scale-invariant features. In: ICCV, pp. 1150–1157 (1999)
Lv, Q., Charikar, M., Li, K.: Image similarity search with compact data structures. In: Proceedings of the 2004 ACM CIKM International Conference on Information and Knowledge Management, Washington, DC, USA, 8–13 November 2004, pp. 208–217 (2004)
Lv, Q., Josephson, W., Wang, Z., Charikar, M., Li, K.: Ferret: a toolkit for content-based similarity search of feature-rich data. In: ACM SIGOPS Operating Systems Review (2006)
Mic, V., Novak, D., Zezula, P.: Designing sketches for similarity filtering. In: IEEE International Conference on Data Mining Workshops, ICDMW 2016, Barcelona, Spain, 12–15 December 2016, pp. 655–662 (2016)
Mic, V., Novak, D., Zezula, P.: Sketches with unbalanced bits for similarity search. In: Beecks, C., Borutta, F., Kröger, P., Seidl, T. (eds.) Similarity Search and Applications, vol. 10609, pp. 53–63. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-68474-1_4
Muller-Molina, A.J., Shinohara, T.: Efficient similarity search by reducing i/o with compressed sketches. In: Proceedings of the 2nd International Workshop on Similarity Search and Applications, pp. 30–38 (2009)
Oquab, M., Bottou, L., Laptev, I., Sivic, J.: Learning and transferring mid-level image representations using convolutional neural networks. In: Proceedings of the IEEE CVPR Conference (2014)
Samet, H.: Foundations of Multidimensional and Metric Data Structures. Morgan Kaufmann, Burlington (2006)
Wang, Z., Dong, W., Josephson, W., Lv, Q., Charikar, M., Li, K.: Sizing sketches: a rank-based analysis for similarity search. In: Proceedings of the 2007 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems, SIGMETRICS 2007, San Diego, California, USA, 12–16 June 2007, pp. 157–168 (2007). https://doi.org/10.1145/1254882.1254900
Zezula, P., Amato, G., Dohnal, V., Batko, M.: Similarity Search: The Metric Space Approach, vol. 32. Springer, Heidelberg (2006). https://doi.org/10.1007/0-387-29151-2
Acknowledgements
Paper was supported by the Czech Science Foundation project GBP103/12/G084.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Mic, V., Novak, D., Vadicamo, L., Zezula, P. (2018). Selecting Sketches for Similarity Search. In: Benczúr, A., Thalheim, B., Horváth, T. (eds) Advances in Databases and Information Systems. ADBIS 2018. Lecture Notes in Computer Science(), vol 11019. Springer, Cham. https://doi.org/10.1007/978-3-319-98398-1_9
Download citation
DOI: https://doi.org/10.1007/978-3-319-98398-1_9
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-98397-4
Online ISBN: 978-3-319-98398-1
eBook Packages: Computer ScienceComputer Science (R0)