Skip to main content

Multi-modal Similarity Retrieval with a Shared Distributed Data Store

  • Conference paper
  • First Online:
Book cover Scalable Information Systems (INFOSCALE 2014)

Abstract

We propose a generic system architecture for large-scale similarity search in various types of digital data. The architecture combines contemporary highly-scalable distributed data stores with recent efficient similarity indexes and also with other types of search indexes. The system is designed to provide several types of queries – distance-based similarity queries, term-based queries, attribute queries, and advanced queries combining several search aspects (modalities). The first part of this work is devoted to the generic architecture and to description of a similarity index PPP-Codes that is suitable for our system. In the second part, we describe a specific instance of this architecture that manages a 106 million image collection providing content-based visual search, keyword search, attribute-based access, and their combinations.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 34.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 44.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://basho.com/riak/.

  2. 2.

    http://www.project-voldemort.com.

  3. 3.

    http://www.jboss.org/infinispan/.

  4. 4.

    http://www.mongodb.org.

  5. 5.

    http://www.flickr.com.

  6. 6.

    http://www.jboss.org/infinispan/.

  7. 7.

    http://lucene.apache.org/.

References

  1. Amato, G., Gennaro, C., Savino, P.: MI-File: using inverted files for scalable approximate similarity search. In: Multimedia Tools and Applications, pp. 1–30 (2012)

    Google Scholar 

  2. Atrey, P.K., Hossain, M.A., El Saddik, A., Kankanhalli, M.S.: Multimodal fusion for multimedia analysis: a survey. Multimedia Syst. 16, 345–379 (2010)

    Article  Google Scholar 

  3. Batko, M., Falchi, F., Lucchese, C., Novak, D., Perego, R., Rabitti, F., Sedmidubsky, J., Zezula, P.: Building a web-scale image similarity search system. Multimed. Tools Appl. 47(3), 599–629 (2010)

    Article  Google Scholar 

  4. Batko, M., Kohoutkova, P., Novak, D.: CoPhIR image collection under the microscope. In: Proceedings of SISAP 2009, pp. 47–54. IEEE (2009)

    Google Scholar 

  5. Batko, M., Novak, D., Falchi, F., Zezula, P.: On scalability of the similarity search in the world of peers. In: Proceedings of InfoScale 2006, pp. 1–12. ACM Press, New York, USA (2006)

    Google Scholar 

  6. Batko, M., Novak, D., Zezula, P.: MESSIF: metric similarity search implementation framework. In: Thanos, C., Borri, F., Candela, L. (eds.) Digital Libraries: Research and Development. LNCS, vol. 4877, pp. 1–10. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  7. Bolettieri, P., Esuli, A., Falchi, F., Lucchese, C., Perego, R., Piccioli, T., Rabitti, F.: Cophir: a test collection for content-based image retrieval. CoRR 0905.4 (2009)

    Google Scholar 

  8. Chávez, E., Figueroa, K., Navarro, G.: Effective proximity retrieval by ordering permutations. IEEE Trans. Pattern Anal. Mach. Intell. 30(9), 1647–1658 (2008)

    Article  Google Scholar 

  9. Chávez, E., Navarro, G., Baeza-Yates, R., Marroquín, J.L.: Searching in metric spaces. ACM Comput. Surv. 33(3), 273–321 (2001)

    Article  Google Scholar 

  10. Ciaccia, P., Patella, M., Zezula, P.: M-Tree: an efficient access method for similarity search in metric spaces. In: Proceedings of VLDB 1997, vol. 25, pp. 426–435 (1997)

    Google Scholar 

  11. DeCandia, G., Hastorun, D., Jampani, M., Kakulapati, G., Lakshman, A., Pilchin, A., Sivasubramanian, S., Vosshall, P., Vogels, W.: Dynamo: amazons highly available key-value store. ACM SIGOPS Operating Syst. Rev. 41(6), 205–220 (2007)

    Article  Google Scholar 

  12. Esuli, A.: Use of permutation prefixes for efficient and scalable approximate similarity search. Inf. Process. Manage. 48(5), 889–902 (2012)

    Article  Google Scholar 

  13. Gil-Costa, V., Marin, M.: Approximate distributed metric-space search. In: Proceedings of LSDS-IR 2011, pp. 15–20. ACM Press, New York, USA (2011)

    Google Scholar 

  14. Malkov, Y., Ponomarenko, A., Logvinov, A., Krylov, V.: Scalable distributed algorithm for approximate nearest neighbor search problem in high dimensional general metric spaces. In: Navarro, G., Pestov, V. (eds.) SISAP 2012. LNCS, vol. 7404, pp. 132–147. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  15. MPEG-7: Multimedia content description interfaces. Part 3: Visual. ISO/IEC 15938–3:2002 (2002)

    Google Scholar 

  16. Novak, D., Batko, M., Zezula, P.: Metric index: an efficient and scalable solution for precise and approximate similarity search. Inf. Syst. 36(4), 721–733 (2011)

    Article  Google Scholar 

  17. Novak, D., Batko, M., Zezula, P.: Large-scale similarity data management with distributed Metric Index. Inf. Process. Manage. 48(5), 855–872 (2012)

    Article  Google Scholar 

  18. Novak, D., Zezula, P.: M-Chord: a scalable distributed similarity search structure. In: Proceedings of InfoScale 2006, pp. 1–10. ACM Press, NY, USA (2006)

    Google Scholar 

  19. Novak, D., Zezula, P.: Rank aggregation of candidate sets for efficient similarity search. In: Decker, H., Lhotská, L., Link, S., Spies, M., Wagner, R.R. (eds.) DEXA 2014, Part II. LNCS, vol. 8645, pp. 42–58. Springer, Heidelberg (2014)

    Chapter  Google Scholar 

  20. Patella, M., Ciaccia, P.: Approximate similarity search: a multi-faceted problem. J. Discret. Algorithms 7(1), 36–48 (2009)

    Article  MATH  MathSciNet  Google Scholar 

  21. Zezula, P., Amato, G., Dohnal, V., Batko, M.: Similarity Search the Metric Space Approach. Advances in Database Systems, vol. 32. Springer, Heidelberg (2006)

    MATH  Google Scholar 

Download references

Acknowledgments

This work was supported by Czech Research Foundation project P103/12/G084.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to David Novak .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Institute for Computer Sciences, Social Informatics and Telecommunications Engineering

About this paper

Cite this paper

Novak, D. (2015). Multi-modal Similarity Retrieval with a Shared Distributed Data Store. In: Jung, J., Badica, C., Kiss, A. (eds) Scalable Information Systems. INFOSCALE 2014. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 139. Springer, Cham. https://doi.org/10.1007/978-3-319-16868-5_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-16868-5_3

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-16867-8

  • Online ISBN: 978-3-319-16868-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics