Skip to main content

Staged Vector Stream Similarity Search Methods

  • Conference paper
  • First Online:
Enterprise Information Systems (ICEIS 2023)

Abstract

This article describes the staged vector stream similarity search methods, or briefly SVS, designed to index and search vector streams by similarity over a time interval. SVS continuously adapts to the vector stream as the vectors are received and do not depend on costly updates on an index structure. The article presents experiments to investigate the performance of two implementations of SVS, one based on product quantization and another based on Hierarchical Navigable Small World graphs. Finally, the article describes a proof-of-concept implementation of a classified ad retrieval tool that uses staged HNSW on real data collected from an online classified ads company.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    https://redis.io.

  2. 2.

    https://github.com/facebookresearch/faiss/wiki/.

  3. 3.

    https://github.com/google-research/google-research/tree/master/scann.

  4. 4.

    https://morioh.com/p/8c38367453ae.

  5. 5.

    https://milvus.io/docs/index.md.

  6. 6.

    https://weaviate.io.

  7. 7.

    https://qdrant.tech.

  8. 8.

    https://www.elastic.co/what-is/vector-search.

  9. 9.

    http://ann-benchmarks.com/index.html.

  10. 10.

    https://redis.io.

  11. 11.

    https://lear.inrialpes.fr/~jegou/data.php#holidays.

  12. 12.

    See also http://lear.inrialpes.fr/people/jegou/data.php.

  13. 13.

    https://huggingface.co/sentence-transformers/paraphrase-multilingual-mpnet-base-v2.

  14. 14.

    https://pytorch.org/hub/pytorch_vision_mobilenet_v2/.

  15. 15.

    Available at https://github.com/BrunoFMSilva/projeto-final-multimodal-clustering.

References

  1. Bengio, Y., Courville, A., Vincent, P.: Representation learning: a review and new perspectives. IEEE Trans. Pattern Anal. Mach. Intell. 35(8), 1798–1828 (2013). https://doi.org/10.1109/TPAMI.2013.50

    Article  Google Scholar 

  2. Beyer, K., Goldstein, J., Ramakrishnan, R., Shaft, U.: When is “nearest neighbor’’ meaningful? In: Beeri, C., Buneman, P. (eds.) ICDT 1999. LNCS, vol. 1540, pp. 217–235. Springer, Heidelberg (1999). https://doi.org/10.1007/3-540-49257-7_15

    Chapter  Google Scholar 

  3. Costa Pereira, J., et al.: On the role of correlation and abstraction in cross-modal multimedia retrieval. Trans. Pattern Anal. Mach. Intell. 36(3), 521–535 (2014). https://doi.org/10.1109/TPAMI.2013.142

    Article  Google Scholar 

  4. Datar, M., Immorlica, N., Indyk, P., Mirrokni, V.S.: Locality-sensitive hashing scheme based on p-stable distributions. In: Proceedings of the Twentieth Annual Symposium on Computational Geometry, pp. 253–262 (2004). https://doi.org/10.1145/997817.997857

  5. Fast Api. https://fastapi.tiangolo.com

  6. Fu, C., Xiang, C., Wang, C., Cai, D.: Fast approximate nearest neighbor search with the navigating spreading-out graph. Proc. VLDB Endow. 12(5), 461–474 (2019). https://doi.org/10.14778/3303753.3303754

    Article  Google Scholar 

  7. Gionis, A., Indyk, P., Motwani, R., et al.: Similarity search in high dimensions via hashing. In: Proc. 25th International Conference on Very Large Data Bases, pp. 518–529. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA (1999)

    Google Scholar 

  8. Hameed, I.M., Abdulhussain, S.H., Mahmmod, B.M.: Content-based image retrieval: a review of recent trends. Cogent Eng. 8(1), 1927469 (2021). https://doi.org/10.1080/23311916.2021.1927469

  9. Jegou, H., Douze, M., Schmid, C.: Hamming embedding and weak geometric consistency for large scale image search. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008. LNCS, vol. 5302, pp. 304–317. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-88682-2_24

    Chapter  Google Scholar 

  10. Johnson, J., Douze, M., Jegou, H.: Billion-scale similarity search with GPUs. IEEE Trans. Big Data 7(03), 535–547 (2021). https://doi.org/10.1109/TBDATA.2019.2921572

    Article  Google Scholar 

  11. Jégou, H., Douze, M., Schmid, C.: Product quantization for nearest neighbor search. IEEE Trans. Pattern Anal. Mach. Intell. 33(1), 117–128 (2011). https://doi.org/10.1109/TPAMI.2010.57

    Article  Google Scholar 

  12. Li, X., Yang, J., Ma, J.: Recent developments of content-based image retrieval (CBIR). Neurocomputing 452, 675–689 (2021). https://doi.org/10.1016/j.neucom.2020.07.139

    Article  Google Scholar 

  13. Liu, C., Lian, D., Nie, M., Hu, X.: Online optimized product quantization. In: 2020 IEEE International Conference on Data Mining (ICDM), pp. 362–371 (2020). https://doi.org/10.1109/ICDM50108.2020.00045

  14. Malkov, Y.A., Yashunin, D.A.: Efficient and robust approximate nearest neighbor search using hierarchical navigable small world graphs. IEEE Trans. Pattern Anal. Mach. Intell. 42(4), 824–836 (2020). https://doi.org/10.1109/TPAMI.2018.2889473

    Article  Google Scholar 

  15. Muja, M., Lowe, D.G.: Fast approximate nearest neighbors with automatic algorithm configuration. VISAPP (1) 2(331-340), 2 (2009). https://doi.org/10.5220/0001787803310340

  16. Apache Parquet. https://parquet.apache.org/docs/

  17. Pinheiro, J., Borges, L., Silva, B., Leme, L., Casanova, M.: Indexing high-dimensional vector streams. In: Proceedings of the 25th International Conference on Enterprise Information Systems, vol. 1 (2023). https://doi.org/10.5220/0011758900003467

  18. Vaswani, A., et al.: Attention is all you need. In: Guyon, I., Luxburg, U.V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 30. Curran Associates, Inc. (2017)

    Google Scholar 

  19. Xu, D., Tsang, I.W., Zhang, Y.: Online product quantization. IEEE Trans. Knowl. Data Eng. 30(11), 2185–2198 (2018). https://doi.org/10.1109/TKDE.2018.2817526

    Article  Google Scholar 

  20. Yang, W., Li, T., Fang, G., Wei, H.: Pase: Postgresql ultra-high-dimensional approximate nearest neighbor search extension. In: Proceedings 2020 ACM SIGMOD International Conference on Management of Data, p. 2241–2253 (2020). https://doi.org/10.1145/3318464.3386131

  21. Yukawa, K., Amagasa, T.: Online optimized product quantization for dynamic database using SVD-updating. In: Database and Expert Systems Applications, pp. 273–284 (2021). https://doi.org/10.1007/978-3-030-86472-9_25

  22. Zeng, D., Yu, Y., Oyama, K.: Deep triplet neural networks with cluster-cca for audio-visual cross-modal retrieval. ACM Trans. Multimedia Comput. Commun. Appl. 16(3), 1–23 (2020). https://doi.org/10.1145/3387164

    Article  Google Scholar 

Download references

Acknowledgements

This work was partly funded by FAPERJ under grants E-26/200.834/2021, by CAPES under grant 88881.134081/2016-01 and 88882.164913/2010-01, and by CNPq under grant 305.587/2021-8.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Marco A. Casanova .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Pinheiro, J.P.V., Borges, L.R., da Silva, B.F.M., Leme, L.A.P.P., Casanova, M.A. (2024). Staged Vector Stream Similarity Search Methods. In: Filipe, J., Śmiałek, M., Brodsky, A., Hammoudi, S. (eds) Enterprise Information Systems. ICEIS 2023. Lecture Notes in Business Information Processing, vol 518. Springer, Cham. https://doi.org/10.1007/978-3-031-64748-2_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-64748-2_2

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-64747-5

  • Online ISBN: 978-3-031-64748-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics