skip to main content
10.1145/3183713.3183750acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections

A General and Efficient Querying Method for Learning to Hash

Authors Info & Claims
Published:27 May 2018Publication History

ABSTRACT

As an effective solution to the approximate nearest neighbors (ANN) search problem, learning to hash (L2H) is able to learn similarity-preserving hash functions tailored for a given dataset. However, existing L2H research mainly focuses on improving query performance by learning good hash functions, while Hamming ranking (HR) is used as the default querying method. We show by analysis and experiments that Hamming distance, the similarity indicator used in HR, is too coarse-grained and thus limits the performance of query processing. We propose a new fine-grained similarity indicator, quantization distance (QD), which provides more information about the similarity between a query and the items in a bucket. We then develop two efficient querying methods based on QD, which achieve significantly better query performance than HR. Our methods are general and can work with various L2H algorithms. Our experiments demonstrate that a simple and elegant querying method can produce performance gain equivalent to advanced and complicated learning algorithms.

References

  1. Artem Babenko and Victor S. Lempitsky . 2012. The Inverted Multi-Index. In CVPR. 3069--3076. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Jon Louis Bentley . 1975. Multidimensional Binary Search Trees Used for Associative Searching CACM, Vol. Vol. 18. 509--517. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Konstantin Berlin, Sergey Koren, Chen-Shan Chin, James P Drake, Jane M Landolin, and Adam M Phillippy . 2015. Assembling Large Genomes With Single-Molecule Sequencing and Locality-Sensitive Hashing. In Nature biotechnology, Vol. Vol. 33. 623--630.Google ScholarGoogle Scholar
  4. Deng Cai . 2016. A Revisit of Hashing Algorithms for Approximate Nearest Neighbor Search CoRR, Vol. Vol. abs/1612.07545.Google ScholarGoogle Scholar
  5. Abhinandan Das, Mayur Datar, Ashutosh Garg, and Shyamsundar Rajaram . 2007. Google News Personalization: Scalable Online Collaborative Filtering WWW. 271--280. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Junhao Gan, Jianlin Feng, Qiong Fang, and Wilfred Ng . 2012. Locality-Sensitive Hashing Scheme Based on Dynamic Collision Counting SIGMOD. 541--552. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Tiezheng Ge, Kaiming He, Qifa Ke, and Jian Sun . 2013. Optimized Product Quantization for Approximate Nearest Neighbor Search CVPR. 2946--2953. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Yunchao Gong and Svetlana Lazebnik . 2011. Iterative Quantization: A Procrustean Approach to Learning Binary Codes CVPR. 817--824. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Yunchao Gong, Svetlana Lazebnik, Albert Gordo, and Florent Perronnin . 2013. Iterative Quantization: A Procrustean Approach to Learning Binary Codes for Large-Scale Image Retrieval. In TPAMI, Vol. Vol. 35. 2916--2929. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Antonin Guttman . 1984. R-Trees: A Dynamic Index Structure for Spatial Searching SIGMOD. 47--57. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Kaiming He, Fang Wen, and Jian Sun . 2013. K-Means Hashing: An Affinity-Preserving Quantization Method for Learning Binary Compact Codes. In CVPR. 2938--2945. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Pinjia He, Jieming Zhu, Shilin He, Jian Li, and Michael R. Lyu . 2017. Towards Automated Log Parsing for Large-Scale Log Data Analysis TDSC.Google ScholarGoogle Scholar
  13. Johannes Hoffart, Stephan Seufert, Dat Ba Nguyen, Martin Theobald, and Gerhard Weikum . 2012. KORE: Keyphrase Overlap Relatedness for Entity Disambiguation CIKM. 545--554. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Qiang Huang, Jianlin Feng, Yikai Zhang, Qiong Fang, and Wilfred Ng . 2015. Query-Aware Locality-Sensitive Hashing for Approximate Nearest Neighbor Search PVLDB, Vol. Vol. 9. 1--12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Yuzhen Huang, Tatiana Jin, Yidi Wu, Zhenkun Cai, Xiao Yan, Fan Yang, Jinfeng Li, Yuying Guo, and James Cheng . 2018. FlexPS: Flexible Parallelism Control in Parameter Server Architecture PVLDB. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Piotr Indyk and Rajeev Motwani . 1998. Approximate Nearest Neighbors: Towards Removing the Curse of Dimensionality STOC. 604--613. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Wang-Cheng Kang, Wu-Jun Li, and Zhi-Hua Zhou . 2016. Column Sampling Based Discrete Supervised Hashing. In AAAI. 1230--1236. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Brian Kulis and Kristen Grauman . 2009. Kernelized Locality-Sensitive Hashing for Scalable Image Search ICCV. 2130--2137.Google ScholarGoogle Scholar
  19. Learning to Hash . 2017. http://cs.nju.edu.cn/lwj/L2H.html.Google ScholarGoogle Scholar
  20. Cong Leng, Jiaxiang Wu, Jian Cheng, Xi Zhang, and Hanqing Lu . 2015. Hashing for Distributed Data. In ICML. 1642--1650. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Jinfeng Li, James Cheng, Fan Yang, Yuzhen Huang, Yunjian Zhao, Xiao Yan, and Ruihao Zhao . 2017 a. LoSHa: A General Framework for Scalable Locality Sensitive Hashing SIGIR. 635--644. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Jinfeng Li, James Cheng, Yunjian Zhao, Fan Yang, Yuzhen Huang, Haipeng Chen, and Ruihao Zhao . 2016 a. A Comparison of General-Purpose Distributed Systems for Data Processing IEEE BigData. 378--383.Google ScholarGoogle Scholar
  23. Wu-Jun Li, Sheng Wang, and Wang-Cheng Kang . 2016 b. Feature Learning Based Deep Supervised Hashing with Pairwise Labels IJCAI. 1711--1717. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Wen Li, Ying Zhang, Yifang Sun, Wei Wang, Wenjie Zhang, and Xuemin Lin . 2016 c. Approximate Nearest Neighbor Search on High Dimensional Data - Experiments, Analyses, and Improvement. In CoRR, Vol. Vol. abs/1610.02455.Google ScholarGoogle Scholar
  25. Xuelong Li, Di Hu, and Feiping Nie . 2017 b. Large Graph Hashing with Spectral Rotation. In AAAI. 2203--2209.Google ScholarGoogle Scholar
  26. Wei Liu, Cun Mu, Sanjiv Kumar, and Shih-Fu Chang . 2014 b. Discrete Graph Hashing. In NIPS. 3419--3427. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Yingfan Liu, Jiangtao Cui, Zi Huang, Hui Li, and Heng Tao Shen . 2014 a. SK-LSH: An Efficient Index Structure for Approximate Nearest Neighbor Search PVLDB, Vol. Vol. 7. 745--756. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Qin Lv, William Josephson, Zhe Wang, Moses Charikar, and Kai Li . 2007. Multi-Probe LSH: Efficient Indexing for High-Dimensional Similarity Search VLDB. 950--961. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Marius Muja and David G. Lowe . 2009. Fast Approximate Nearest Neighbors with Automatic Algorithm Configuration VISAPP. 331--340.Google ScholarGoogle Scholar
  30. Marius Muja and David G. Lowe . 2014. Scalable Nearest Neighbor Algorithms for High Dimensional Data TPAMI, Vol. Vol. 36. 2227--2240.Google ScholarGoogle Scholar
  31. Ankur Narang and Souvik Bhattacherjee . 2011. Real-time Approximate Range Motif Discovery & Data Redundancy Removal Algorithm EDBT. 485--496. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. NNS Benchmark . 2017. https://github.com/DBWangGroupUNSW/nns_benchmark.Google ScholarGoogle Scholar
  33. Mohammad Norouzi, Ali Punjani, and David J. Fleet . 2012. Fast Search in Hamming Space with Multi-Index Hashing CVPR. 3108--3115. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Mohammad Norouzi, Ali Punjani, and David J. Fleet . 2014. Fast Exact Search in Hamming Space With Multi-Index Hashing TPAMI, Vol. Vol. 36. 1107--1119. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. OpenCV . 2017. http://opencv.org/.Google ScholarGoogle Scholar
  36. Rina Panigrahy . 2006. Entropy Based Nearest Neighbor Search in High Dimensions SODA. 1186--1195. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Lo"ıc Paulevé, Hervé Jégou, and Laurent Amsaleg . 2010. Locality Sensitive Hashing: A Comparison of Hash Function Types and Querying Mechanisms. In PRL, Vol. Vol. 31. 1348--1358. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Yuxin Su, Irwin King, and Michael R. Lyu . 2017. Learning to Rank Using Localized Geometric Mean Metrics SIGIR. 45--54. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Yufei Tao, Ke Yi, Cheng Sheng, and Panos Kalnis . 2009. Quality and Efficiency in High Dimensional Nearest Neighbor Search SIGMOD. 563--576. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Jun Wang, Ondrej Kumar, and Shih-Fu Chang . 2010. Semi-Supervised Hashing for Scalable Image Retrieval CVPR. 3424--3431.Google ScholarGoogle Scholar
  41. Jingdong Wang, Heng Tao Shen, Jingkuan Song, and Jianqiu Ji . 2014. Hashing for Similarity Search: A Survey. In CoRR, Vol. Vol. abs/1408.2927.Google ScholarGoogle Scholar
  42. Jingdong Wang, Ting Zhang, Jingkuan Song, Nicu Sebe, and Heng Tao Shen . 2017. A Survey on Learning to Hash. In TPAMI.Google ScholarGoogle Scholar
  43. Xin-Jing Wang, Lei Zhang, Feng Jing, and Wei-Ying Ma . 2006. AnnoSearch: Image Auto-Annotation by Search. In CVPR. 1483--1490. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Yair Weiss, Antonio Torralba, and Robert Fergus . 2008. Spectral Hashing. In NIPS. 1753--1760. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Fan Yang, Yuzhen Huang, Yunjian Zhao, Jinfeng Li, Guanxian Jiang, and James Cheng . 2017 a. The Best of Both Worlds: Big Data Programming with Both Productivity and Performance SIGMOD. 1619--1622. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Fan Yang, Jinfeng Li, and James Cheng . 2016. Husky: Towards a More Efficient and Expressive Distributed Computing Framework PVLDB, Vol. Vol. 9. 420--431. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Fan Yang, Fanhua Shang, Yuzhen Huang, James Cheng, Jinfeng Li, Yunjian Zhao, and Ruihao Zhao . 2017 b. LFTF: A Framework for Efficient Tensor Analytics at Scale PVLDB, Vol. Vol. 10. 745--756. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Cui Yu . 2002. High-Dimensional Indexing: Transformational Approaches to High-Dimensional Range and Similarity Searches (Lecture Notes in Computer Science), Vol. Vol. 2341. Springer. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Fuzhen Zhang . 2011. Matrix Theory: Basic Results and Techniques. Springer Science & Business Media.Google ScholarGoogle ScholarCross RefCross Ref
  50. Ting Zhang, Chao Du, and Jingdong Wang . 2014. Composite Quantization for Approximate Nearest Neighbor Search ICML. 838--846. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Yuxin Zheng, Qi Guo, Anthony K. H. Tung, and Sai Wu . 2016. LazyLSH: Approximate Nearest Neighbor Search for Multiple Distance Functions with a Single Index. In SIGMOD. 2023--2037. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. A General and Efficient Querying Method for Learning to Hash

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          SIGMOD '18: Proceedings of the 2018 International Conference on Management of Data
          May 2018
          1874 pages
          ISBN:9781450347037
          DOI:10.1145/3183713

          Copyright © 2018 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 27 May 2018

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          SIGMOD '18 Paper Acceptance Rate90of461submissions,20%Overall Acceptance Rate785of4,003submissions,20%

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader