Towards Faster Similarity Search by Dynamic Reordering of Streamed Queries

Nalepa, Filip; Batko, Michal; Zezula, Pavel

doi:10.1007/978-3-662-58384-5_3

Filip Nalepa¹⁷,
Michal Batko¹⁷ &
Pavel Zezula¹⁷

Part of the book series: Lecture Notes in Computer Science ((TLDKS,volume 11250))

Abstract

Current era of digital data explosion calls for employment of content-based similarity search techniques, since traditional searchable metadata like annotations are not always available. In our work, we focus on a scenario where the similarity search is used in the context of stream processing, which is one of the suitable approaches to deal with huge amounts of data. Our goal is to maximize the throughput of processed queries while a slight delay is acceptable. We propose a technique that dynamically reorders the queries coming from the stream in order to use our caching mechanism in huge data spaces more effectively. We were able to achieve significantly higher throughput compared to the baseline when no reordering and no caching were used. Moreover, our proposal does not incur any additional precision loss of the similarity search, as opposed to some other caching techniques. In addition to the throughput maximization, we also study the potential of trading off the throughput for low delays (waiting times). The proposed technique allows to be parameterized by the amount of the throughput that can be sacrificed.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Amato, G., Esuli, A., Falchi, F.: A comparison of pivot selection techniques for permutation-based indexing. Inf. Syst. 52, 176–188 (2015)
Article Google Scholar
Barrios, J.M., Bustos, B., Skopal, T.: Analyzing and dynamically indexing the query set. Inf. Syst. 45, 37–47 (2014)
Article Google Scholar
Batko, M., Novak, D., Zezula, P.: MESSIF: metric similarity search implementation framework. In: Thanos, C., Borri, F., Candela, L. (eds.) DELOS 2007. LNCS, vol. 4877, pp. 1–10. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-77088-6_1
Chapter Google Scholar
Bellmore, M., Nemhauser, G.L.: The traveling salesman problem: a survey. Oper. Res. 16(3), 538–558 (1968)
Article MathSciNet Google Scholar
Brisaboa, N.R., Cerdeira-Pena, A., Gil-Costa, V., Marin, M., Pedreira, O.: Efficient similarity search by combining indexing and caching strategies. In: Italiano, G.F., Margaria-Steffen, T., Pokorný, J., Quisquater, J.-J., Wattenhofer, R. (eds.) SOFSEM 2015. LNCS, vol. 8939, pp. 486–497. Springer, Heidelberg (2015). https://doi.org/10.1007/978-3-662-46078-8_40
Chapter MATH Google Scholar
Budikova, P., Batko, M., Zezula, P.: Evaluation platform for content-based image retrieval systems. In: Gradmann, S., Borri, F., Meghini, C., Schuldt, H. (eds.) TPDL 2011. LNCS, vol. 6966, pp. 130–142. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-24469-8_15
Chapter Google Scholar
Chávez, E., Figueroa, K., Navarro, G.: Effective proximity retrieval by ordering permutations. IEEE Trans. Patt. Anal. Mach. Intell. 30(9), 1647–1658 (2008)
Article Google Scholar
Chung, Y., Su, I., Lee, C., Liu, P.: Multiple k nearest neighbor search. World Wide Web 20(2), 371–398 (2017)
Article Google Scholar
Fagni, T., Perego, R., Silvestri, F., Orlando, S.: Boosting the performance of web search engines: caching and prefetching query results by exploiting historical usage data. ACM Trans. Inf. Syst. 24(1), 51–78 (2006)
Article Google Scholar
Falchi, F., Lucchese, C., Orlando, S., Perego, R., Rabitti, F.: Similarity caching in large-scale image retrieval. Inf. Process. Manage. 48(5), 803–818 (2012)
Article Google Scholar
Jia, Y., et al.: Caffe: convolutional architecture for fast feature embedding. In: Proceedings of the ACM International Conference on Multimedia, MM 2014, Orlando, FL, USA, 03–07 November 2014, pp. 675–678. ACM (2014)
Google Scholar
Karedla, R., Love, J.S., Wherry, B.G.: Caching strategies to improve disk system performance. IEEE Comput. 27(3), 38–46 (1994)
Article Google Scholar
Laporte, G.: The traveling salesman problem: an overview of exact and approximate algorithms. Eur. J. Oper. Res. 59(2), 231–247 (1992)
Article MathSciNet Google Scholar
Nalepa, F., Batko, M., Zezula, P.: Enhancing similarity search throughput by dynamic query reordering. In: Hartmann, S., Ma, H. (eds.) DEXA 2016. LNCS, vol. 9828, pp. 185–200. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-44406-2_14
Chapter Google Scholar
Novak, D., Batko, M., Zezula, P.: Metric index: an efficient and scalable solution for precise and approximate similarity search. Inf. Syst. 36(4), 721–733 (2011)
Article Google Scholar
Pandey, S., Broder, A.Z., Chierichetti, F., Josifovski, V., Kumar, R., Vassilvitskii, S.: Nearest-neighbor caching for content-match applications. In: Proceedings of the 18th International Conference on World Wide Web, WWW 2009, Madrid, Spain, 20–24 April 2009, pp. 441–450. ACM (2009)
Google Scholar
Shao, J., Huang, Z., Shen, H.T., Zhou, X., Lim, E., Li, Y.: Batch nearest neighbor search for video retrieval. IEEE Trans. Multimedia 10(3), 409–420 (2008)
Article Google Scholar
Skopal, T., Lokoc, J., Bustos, B.: D-cache: universal distance cache for metric access methods. IEEE Trans. Knowl. Data Eng. 24(5), 868–881 (2012)
Article Google Scholar
Solar, R., Gil-Costa, V., Marín, M.: Evaluation of static/dynamic cache for similarity search engines. In: Freivalds, R.M., Engels, G., Catania, B. (eds.) SOFSEM 2016. LNCS, vol. 9587, pp. 615–627. Springer, Heidelberg (2016). https://doi.org/10.1007/978-3-662-49192-8_50
Chapter MATH Google Scholar
Zezula, P., Amato, G., Dohnal, V., Batko, M.: Similarity search - the metric space approach. In: Advances in Database Systems, vol. 32. Kluwer (2006)
Google Scholar

Download references

Acknowledgement

This work was supported by the Czech national research project GA16-18889S.

Author information

Authors and Affiliations

Faculty of Informatics, Masaryk University, Brno, Czech Republic
Filip Nalepa, Michal Batko & Pavel Zezula

Authors

Filip Nalepa
View author publications
You can also search for this author in PubMed Google Scholar
Michal Batko
View author publications
You can also search for this author in PubMed Google Scholar
Pavel Zezula
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Filip Nalepa .

Editor information

Editors and Affiliations

IRIT, Paul Sabatier University, Toulouse, France
Abdelkader Hameurlain
FAW, University of Linz, Linz, Austria
Roland Wagner
Clausthal University of Technology, Clausthal-Zellerfeld, Germany
Sven Hartmann
Victoria University of Wellington, Wellington, New Zealand
Hui Ma

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Nalepa, F., Batko, M., Zezula, P. (2018). Towards Faster Similarity Search by Dynamic Reordering of Streamed Queries. In: Hameurlain, A., Wagner, R., Hartmann, S., Ma, H. (eds) Transactions on Large-Scale Data- and Knowledge-Centered Systems XXXVIII. Lecture Notes in Computer Science(), vol 11250. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-58384-5_3

Download citation

DOI: https://doi.org/10.1007/978-3-662-58384-5_3
Published: 22 November 2018
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-58383-8
Online ISBN: 978-3-662-58384-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics