Skip to main content

Quantized Ranking for Permutation-Based Indexing

  • Conference paper
Similarity Search and Applications (SISAP 2013)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8199))

Included in the following conference series:

Abstract

Similarity search, translating into the nearest neighbor search problem, finds many applications for information retrieval and visualization, machine learning and data mining. The large volume of data that typical applications should handle imposes to find approximate solutions for the similarity search problem. Permutation-based indexing is one of the most recent techniques for approximate similarity search. Objects are represented by lists ordering their distances to a set of selected reference objects, following the idea that two neighboring objects have the same surrounding. In this paper, we propose a quantized representation of the permutation lists with its related data structure for effective retrieval. Our novel permutation-based indexing strategy is built to be fast, memory efficient and scalable without excessively sacrificing on search precision. This is experimentally demonstrated in comparison to existing proposals using several large-scale dataset of millions of documents and different dimensions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Jagadish, H.V., Mendelzon, A.O., Milo, T.: Similarity-based queries. In: Proceedings of the Fourteenth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, PODS 1995, pp. 36–45. ACM, New York (1995)

    Chapter  Google Scholar 

  2. Samet, H.: Foundations of multidimensional and metric data structures. The Morgan Kaufmann series in computer graphics and geometric modeling. Elsevier/Morgan Kaufmann (2006)

    Google Scholar 

  3. Chávez, E., Navarro, G., Baeza-Yates, R.A., Marroquín, J.L.: Searching in metric spaces. ACM Comput. Surv. 33(3), 273–321 (2001)

    Article  Google Scholar 

  4. Lee, D.T., Wong, C.K.: Worst-case analysis for region and partial region searches in multidimensional binary search trees and balanced quad trees. Acta Inf. 9, 23–29 (1977)

    Article  MathSciNet  MATH  Google Scholar 

  5. Zezula, P., Amato, G., Dohnal, V., Batko, M.: Similarity Search: The Metric Space Approach. Advances in Database Systems, vol. 32. Springer (2006)

    Google Scholar 

  6. Indyk, P., Motwani, R.: Approximate nearest neighbors: towards removing the curse of dimensionality. In: Proceedings of the Thirtieth Annual ACM Symposium on Theory of Computing, STOC 1998, pp. 604–613. ACM, New York (1998)

    Chapter  Google Scholar 

  7. Patella, M., Ciaccia, P.: Approximate similarity search: A multi-faceted problem. J. of Discrete Algorithms 7(1), 36–48 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  8. Gonzalez, E., Figueroa, K., Navarro, G.: Effective proximity retrieval by ordering permutations. IEEE Transactions on Pattern Analysis and Machine Intelligence 30(9), 1647–1658 (2008)

    Article  Google Scholar 

  9. Amato, G., Savino, P.: Approximate similarity search in metric spaces using inverted files. In: Proceedings of the 3rd International Conference on Scalable Information Systems, InfoScale 2008, pp. 28:1–28:10. ICST, Brussels (2008)

    Google Scholar 

  10. Esuli, A.: Mipai: Using the pp-index to build an efficient and scalable similarity search system. In: Proceedings of the 2009 Second International Workshop on Similarity Search and Applications, SISAP 2009, pp. 146–148. IEEE Computer Society, Washington, DC (2009)

    Chapter  Google Scholar 

  11. Mohamed, H., Marchand-Maillet, S.: Metric Suffix Array For Large-Scale Similarity Search. In: ACM WSDM 2013 Workshop on Large Scale and Distributed Systems for Information Retrieval, Rome, IT (February 2013)

    Google Scholar 

  12. Tellez, E.S., Chávez, E., Navarro, G.: Succinct nearest neighbor search. Inf. Syst. 38(7), 1019–1030 (2013)

    Article  Google Scholar 

  13. Amato, G., Gennaro, C., Savino, P.: Mi-file: using inverted files for scalable approximate similarity search. In: Multimedia Tools and Applications (2012)

    Google Scholar 

  14. Mohamed, H., Marchand-Maillet, S.: Parallel Approaches to Permutation-Based Indexing using Inverted Files. In: 5th International Conference on Similarity Search and Applications (SISAP), Toronto, CA (August 2012)

    Google Scholar 

  15. Esuli, A.: Pp-index: Using permutation prefixes for efficient and scalable approximate similarity search. In: Proceedings of LSDSIR 2009, pp. 1–48 (2009)

    Google Scholar 

  16. Faloutsos, C., Lin, K.I.: Fastmap: a fast algorithm for indexing, data-mining and visualization of traditional and multimedia datasets. SIGMOD Rec. 24(2), 163–174 (1995)

    Article  Google Scholar 

  17. Bustos, B., Pedreira, O., Brisaboa, N.: A dynamic pivot selection technique for similarity search. In: Proceedings of the First International Workshop on Similarity Search and Applications, SISAP 2008, pp. 105–112. IEEE Computer Society, Washington, DC (2008)

    Chapter  Google Scholar 

  18. Ares, L.G., Brisaboa, N.R., Esteller, M.F., Pedreira, O., Places, A.S.: Optimal pivots to minimize the index size for metric access methods. In: Proceedings of the 2009 Second International Workshop on Similarity Search and Applications, SISAP 2009, pp. 74–80. IEEE Computer Society, Washington, DC (2009), doi:10.1109/SISAP.2009.21

    Chapter  Google Scholar 

  19. Bustos, B., Navarro, G., Chávez, E.: Pivot selection techniques for proximity searching in metric spaces. Pattern Recognition Letters 24(14), 2357–2366 (2003)

    Article  MATH  Google Scholar 

  20. Smith, S.W.: The scientist and engineer’s guide to digital signal processing. California Technical Publishing, San Diego (1997)

    Google Scholar 

  21. Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: A Large-Scale Hierarchical Image Database. In: CVPR 2009 (2009)

    Google Scholar 

  22. Téllez, E.S., Chávez, E., Camarena-Ibarrola, A.: A brief index for proximity searching. In: Bayro-Corrochano, E., Eklundh, J.-O. (eds.) CIARP 2009. LNCS, vol. 5856, pp. 529–536. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  23. Figueroa, K., Navarro, G., Chávez, E.: Metric spaces library (2007), http://www.sisap.org/Metric_Space_Library.html

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Mohamed, H., Marchand-Maillet, S. (2013). Quantized Ranking for Permutation-Based Indexing. In: Brisaboa, N., Pedreira, O., Zezula, P. (eds) Similarity Search and Applications. SISAP 2013. Lecture Notes in Computer Science, vol 8199. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-41062-8_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-41062-8_11

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-41061-1

  • Online ISBN: 978-3-642-41062-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics