Abstract
This article describes the staged vector stream similarity search methods, or briefly SVS, designed to index and search vector streams by similarity over a time interval. SVS continuously adapts to the vector stream as the vectors are received and do not depend on costly updates on an index structure. The article presents experiments to investigate the performance of two implementations of SVS, one based on product quantization and another based on Hierarchical Navigable Small World graphs. Finally, the article describes a proof-of-concept implementation of a classified ad retrieval tool that uses staged HNSW on real data collected from an online classified ads company.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
- 2.
- 3.
- 4.
- 5.
- 6.
- 7.
- 8.
- 9.
- 10.
- 11.
- 12.
- 13.
- 14.
- 15.
References
Bengio, Y., Courville, A., Vincent, P.: Representation learning: a review and new perspectives. IEEE Trans. Pattern Anal. Mach. Intell. 35(8), 1798–1828 (2013). https://doi.org/10.1109/TPAMI.2013.50
Beyer, K., Goldstein, J., Ramakrishnan, R., Shaft, U.: When is “nearest neighbor’’ meaningful? In: Beeri, C., Buneman, P. (eds.) ICDT 1999. LNCS, vol. 1540, pp. 217–235. Springer, Heidelberg (1999). https://doi.org/10.1007/3-540-49257-7_15
Costa Pereira, J., et al.: On the role of correlation and abstraction in cross-modal multimedia retrieval. Trans. Pattern Anal. Mach. Intell. 36(3), 521–535 (2014). https://doi.org/10.1109/TPAMI.2013.142
Datar, M., Immorlica, N., Indyk, P., Mirrokni, V.S.: Locality-sensitive hashing scheme based on p-stable distributions. In: Proceedings of the Twentieth Annual Symposium on Computational Geometry, pp. 253–262 (2004). https://doi.org/10.1145/997817.997857
Fast Api. https://fastapi.tiangolo.com
Fu, C., Xiang, C., Wang, C., Cai, D.: Fast approximate nearest neighbor search with the navigating spreading-out graph. Proc. VLDB Endow. 12(5), 461–474 (2019). https://doi.org/10.14778/3303753.3303754
Gionis, A., Indyk, P., Motwani, R., et al.: Similarity search in high dimensions via hashing. In: Proc. 25th International Conference on Very Large Data Bases, pp. 518–529. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA (1999)
Hameed, I.M., Abdulhussain, S.H., Mahmmod, B.M.: Content-based image retrieval: a review of recent trends. Cogent Eng. 8(1), 1927469 (2021). https://doi.org/10.1080/23311916.2021.1927469
Jegou, H., Douze, M., Schmid, C.: Hamming embedding and weak geometric consistency for large scale image search. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008. LNCS, vol. 5302, pp. 304–317. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-88682-2_24
Johnson, J., Douze, M., Jegou, H.: Billion-scale similarity search with GPUs. IEEE Trans. Big Data 7(03), 535–547 (2021). https://doi.org/10.1109/TBDATA.2019.2921572
Jégou, H., Douze, M., Schmid, C.: Product quantization for nearest neighbor search. IEEE Trans. Pattern Anal. Mach. Intell. 33(1), 117–128 (2011). https://doi.org/10.1109/TPAMI.2010.57
Li, X., Yang, J., Ma, J.: Recent developments of content-based image retrieval (CBIR). Neurocomputing 452, 675–689 (2021). https://doi.org/10.1016/j.neucom.2020.07.139
Liu, C., Lian, D., Nie, M., Hu, X.: Online optimized product quantization. In: 2020 IEEE International Conference on Data Mining (ICDM), pp. 362–371 (2020). https://doi.org/10.1109/ICDM50108.2020.00045
Malkov, Y.A., Yashunin, D.A.: Efficient and robust approximate nearest neighbor search using hierarchical navigable small world graphs. IEEE Trans. Pattern Anal. Mach. Intell. 42(4), 824–836 (2020). https://doi.org/10.1109/TPAMI.2018.2889473
Muja, M., Lowe, D.G.: Fast approximate nearest neighbors with automatic algorithm configuration. VISAPP (1) 2(331-340), 2 (2009). https://doi.org/10.5220/0001787803310340
Apache Parquet. https://parquet.apache.org/docs/
Pinheiro, J., Borges, L., Silva, B., Leme, L., Casanova, M.: Indexing high-dimensional vector streams. In: Proceedings of the 25th International Conference on Enterprise Information Systems, vol. 1 (2023). https://doi.org/10.5220/0011758900003467
Vaswani, A., et al.: Attention is all you need. In: Guyon, I., Luxburg, U.V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 30. Curran Associates, Inc. (2017)
Xu, D., Tsang, I.W., Zhang, Y.: Online product quantization. IEEE Trans. Knowl. Data Eng. 30(11), 2185–2198 (2018). https://doi.org/10.1109/TKDE.2018.2817526
Yang, W., Li, T., Fang, G., Wei, H.: Pase: Postgresql ultra-high-dimensional approximate nearest neighbor search extension. In: Proceedings 2020 ACM SIGMOD International Conference on Management of Data, p. 2241–2253 (2020). https://doi.org/10.1145/3318464.3386131
Yukawa, K., Amagasa, T.: Online optimized product quantization for dynamic database using SVD-updating. In: Database and Expert Systems Applications, pp. 273–284 (2021). https://doi.org/10.1007/978-3-030-86472-9_25
Zeng, D., Yu, Y., Oyama, K.: Deep triplet neural networks with cluster-cca for audio-visual cross-modal retrieval. ACM Trans. Multimedia Comput. Commun. Appl. 16(3), 1–23 (2020). https://doi.org/10.1145/3387164
Acknowledgements
This work was partly funded by FAPERJ under grants E-26/200.834/2021, by CAPES under grant 88881.134081/2016-01 and 88882.164913/2010-01, and by CNPq under grant 305.587/2021-8.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Pinheiro, J.P.V., Borges, L.R., da Silva, B.F.M., Leme, L.A.P.P., Casanova, M.A. (2024). Staged Vector Stream Similarity Search Methods. In: Filipe, J., Śmiałek, M., Brodsky, A., Hammoudi, S. (eds) Enterprise Information Systems. ICEIS 2023. Lecture Notes in Business Information Processing, vol 518. Springer, Cham. https://doi.org/10.1007/978-3-031-64748-2_2
Download citation
DOI: https://doi.org/10.1007/978-3-031-64748-2_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-64747-5
Online ISBN: 978-3-031-64748-2
eBook Packages: Computer ScienceComputer Science (R0)