ABSTRACT
As an effective solution to the approximate nearest neighbors (ANN) search problem, learning to hash (L2H) is able to learn similarity-preserving hash functions tailored for a given dataset. However, existing L2H research mainly focuses on improving query performance by learning good hash functions, while Hamming ranking (HR) is used as the default querying method. We show by analysis and experiments that Hamming distance, the similarity indicator used in HR, is too coarse-grained and thus limits the performance of query processing. We propose a new fine-grained similarity indicator, quantization distance (QD), which provides more information about the similarity between a query and the items in a bucket. We then develop two efficient querying methods based on QD, which achieve significantly better query performance than HR. Our methods are general and can work with various L2H algorithms. Our experiments demonstrate that a simple and elegant querying method can produce performance gain equivalent to advanced and complicated learning algorithms.
- Artem Babenko and Victor S. Lempitsky . 2012. The Inverted Multi-Index. In CVPR. 3069--3076. Google ScholarDigital Library
- Jon Louis Bentley . 1975. Multidimensional Binary Search Trees Used for Associative Searching CACM, Vol. Vol. 18. 509--517. Google ScholarDigital Library
- Konstantin Berlin, Sergey Koren, Chen-Shan Chin, James P Drake, Jane M Landolin, and Adam M Phillippy . 2015. Assembling Large Genomes With Single-Molecule Sequencing and Locality-Sensitive Hashing. In Nature biotechnology, Vol. Vol. 33. 623--630.Google Scholar
- Deng Cai . 2016. A Revisit of Hashing Algorithms for Approximate Nearest Neighbor Search CoRR, Vol. Vol. abs/1612.07545.Google Scholar
- Abhinandan Das, Mayur Datar, Ashutosh Garg, and Shyamsundar Rajaram . 2007. Google News Personalization: Scalable Online Collaborative Filtering WWW. 271--280. Google ScholarDigital Library
- Junhao Gan, Jianlin Feng, Qiong Fang, and Wilfred Ng . 2012. Locality-Sensitive Hashing Scheme Based on Dynamic Collision Counting SIGMOD. 541--552. Google ScholarDigital Library
- Tiezheng Ge, Kaiming He, Qifa Ke, and Jian Sun . 2013. Optimized Product Quantization for Approximate Nearest Neighbor Search CVPR. 2946--2953. Google ScholarDigital Library
- Yunchao Gong and Svetlana Lazebnik . 2011. Iterative Quantization: A Procrustean Approach to Learning Binary Codes CVPR. 817--824. Google ScholarDigital Library
- Yunchao Gong, Svetlana Lazebnik, Albert Gordo, and Florent Perronnin . 2013. Iterative Quantization: A Procrustean Approach to Learning Binary Codes for Large-Scale Image Retrieval. In TPAMI, Vol. Vol. 35. 2916--2929. Google ScholarDigital Library
- Antonin Guttman . 1984. R-Trees: A Dynamic Index Structure for Spatial Searching SIGMOD. 47--57. Google ScholarDigital Library
- Kaiming He, Fang Wen, and Jian Sun . 2013. K-Means Hashing: An Affinity-Preserving Quantization Method for Learning Binary Compact Codes. In CVPR. 2938--2945. Google ScholarDigital Library
- Pinjia He, Jieming Zhu, Shilin He, Jian Li, and Michael R. Lyu . 2017. Towards Automated Log Parsing for Large-Scale Log Data Analysis TDSC.Google Scholar
- Johannes Hoffart, Stephan Seufert, Dat Ba Nguyen, Martin Theobald, and Gerhard Weikum . 2012. KORE: Keyphrase Overlap Relatedness for Entity Disambiguation CIKM. 545--554. Google ScholarDigital Library
- Qiang Huang, Jianlin Feng, Yikai Zhang, Qiong Fang, and Wilfred Ng . 2015. Query-Aware Locality-Sensitive Hashing for Approximate Nearest Neighbor Search PVLDB, Vol. Vol. 9. 1--12. Google ScholarDigital Library
- Yuzhen Huang, Tatiana Jin, Yidi Wu, Zhenkun Cai, Xiao Yan, Fan Yang, Jinfeng Li, Yuying Guo, and James Cheng . 2018. FlexPS: Flexible Parallelism Control in Parameter Server Architecture PVLDB. Google ScholarDigital Library
- Piotr Indyk and Rajeev Motwani . 1998. Approximate Nearest Neighbors: Towards Removing the Curse of Dimensionality STOC. 604--613. Google ScholarDigital Library
- Wang-Cheng Kang, Wu-Jun Li, and Zhi-Hua Zhou . 2016. Column Sampling Based Discrete Supervised Hashing. In AAAI. 1230--1236. Google ScholarDigital Library
- Brian Kulis and Kristen Grauman . 2009. Kernelized Locality-Sensitive Hashing for Scalable Image Search ICCV. 2130--2137.Google Scholar
- Learning to Hash . 2017. http://cs.nju.edu.cn/lwj/L2H.html.Google Scholar
- Cong Leng, Jiaxiang Wu, Jian Cheng, Xi Zhang, and Hanqing Lu . 2015. Hashing for Distributed Data. In ICML. 1642--1650. Google ScholarDigital Library
- Jinfeng Li, James Cheng, Fan Yang, Yuzhen Huang, Yunjian Zhao, Xiao Yan, and Ruihao Zhao . 2017 a. LoSHa: A General Framework for Scalable Locality Sensitive Hashing SIGIR. 635--644. Google ScholarDigital Library
- Jinfeng Li, James Cheng, Yunjian Zhao, Fan Yang, Yuzhen Huang, Haipeng Chen, and Ruihao Zhao . 2016 a. A Comparison of General-Purpose Distributed Systems for Data Processing IEEE BigData. 378--383.Google Scholar
- Wu-Jun Li, Sheng Wang, and Wang-Cheng Kang . 2016 b. Feature Learning Based Deep Supervised Hashing with Pairwise Labels IJCAI. 1711--1717. Google ScholarDigital Library
- Wen Li, Ying Zhang, Yifang Sun, Wei Wang, Wenjie Zhang, and Xuemin Lin . 2016 c. Approximate Nearest Neighbor Search on High Dimensional Data - Experiments, Analyses, and Improvement. In CoRR, Vol. Vol. abs/1610.02455.Google Scholar
- Xuelong Li, Di Hu, and Feiping Nie . 2017 b. Large Graph Hashing with Spectral Rotation. In AAAI. 2203--2209.Google Scholar
- Wei Liu, Cun Mu, Sanjiv Kumar, and Shih-Fu Chang . 2014 b. Discrete Graph Hashing. In NIPS. 3419--3427. Google ScholarDigital Library
- Yingfan Liu, Jiangtao Cui, Zi Huang, Hui Li, and Heng Tao Shen . 2014 a. SK-LSH: An Efficient Index Structure for Approximate Nearest Neighbor Search PVLDB, Vol. Vol. 7. 745--756. Google ScholarDigital Library
- Qin Lv, William Josephson, Zhe Wang, Moses Charikar, and Kai Li . 2007. Multi-Probe LSH: Efficient Indexing for High-Dimensional Similarity Search VLDB. 950--961. Google ScholarDigital Library
- Marius Muja and David G. Lowe . 2009. Fast Approximate Nearest Neighbors with Automatic Algorithm Configuration VISAPP. 331--340.Google Scholar
- Marius Muja and David G. Lowe . 2014. Scalable Nearest Neighbor Algorithms for High Dimensional Data TPAMI, Vol. Vol. 36. 2227--2240.Google Scholar
- Ankur Narang and Souvik Bhattacherjee . 2011. Real-time Approximate Range Motif Discovery & Data Redundancy Removal Algorithm EDBT. 485--496. Google ScholarDigital Library
- NNS Benchmark . 2017. https://github.com/DBWangGroupUNSW/nns_benchmark.Google Scholar
- Mohammad Norouzi, Ali Punjani, and David J. Fleet . 2012. Fast Search in Hamming Space with Multi-Index Hashing CVPR. 3108--3115. Google ScholarDigital Library
- Mohammad Norouzi, Ali Punjani, and David J. Fleet . 2014. Fast Exact Search in Hamming Space With Multi-Index Hashing TPAMI, Vol. Vol. 36. 1107--1119. Google ScholarDigital Library
- OpenCV . 2017. http://opencv.org/.Google Scholar
- Rina Panigrahy . 2006. Entropy Based Nearest Neighbor Search in High Dimensions SODA. 1186--1195. Google ScholarDigital Library
- Lo"ıc Paulevé, Hervé Jégou, and Laurent Amsaleg . 2010. Locality Sensitive Hashing: A Comparison of Hash Function Types and Querying Mechanisms. In PRL, Vol. Vol. 31. 1348--1358. Google ScholarDigital Library
- Yuxin Su, Irwin King, and Michael R. Lyu . 2017. Learning to Rank Using Localized Geometric Mean Metrics SIGIR. 45--54. Google ScholarDigital Library
- Yufei Tao, Ke Yi, Cheng Sheng, and Panos Kalnis . 2009. Quality and Efficiency in High Dimensional Nearest Neighbor Search SIGMOD. 563--576. Google ScholarDigital Library
- Jun Wang, Ondrej Kumar, and Shih-Fu Chang . 2010. Semi-Supervised Hashing for Scalable Image Retrieval CVPR. 3424--3431.Google Scholar
- Jingdong Wang, Heng Tao Shen, Jingkuan Song, and Jianqiu Ji . 2014. Hashing for Similarity Search: A Survey. In CoRR, Vol. Vol. abs/1408.2927.Google Scholar
- Jingdong Wang, Ting Zhang, Jingkuan Song, Nicu Sebe, and Heng Tao Shen . 2017. A Survey on Learning to Hash. In TPAMI.Google Scholar
- Xin-Jing Wang, Lei Zhang, Feng Jing, and Wei-Ying Ma . 2006. AnnoSearch: Image Auto-Annotation by Search. In CVPR. 1483--1490. Google ScholarDigital Library
- Yair Weiss, Antonio Torralba, and Robert Fergus . 2008. Spectral Hashing. In NIPS. 1753--1760. Google ScholarDigital Library
- Fan Yang, Yuzhen Huang, Yunjian Zhao, Jinfeng Li, Guanxian Jiang, and James Cheng . 2017 a. The Best of Both Worlds: Big Data Programming with Both Productivity and Performance SIGMOD. 1619--1622. Google ScholarDigital Library
- Fan Yang, Jinfeng Li, and James Cheng . 2016. Husky: Towards a More Efficient and Expressive Distributed Computing Framework PVLDB, Vol. Vol. 9. 420--431. Google ScholarDigital Library
- Fan Yang, Fanhua Shang, Yuzhen Huang, James Cheng, Jinfeng Li, Yunjian Zhao, and Ruihao Zhao . 2017 b. LFTF: A Framework for Efficient Tensor Analytics at Scale PVLDB, Vol. Vol. 10. 745--756. Google ScholarDigital Library
- Cui Yu . 2002. High-Dimensional Indexing: Transformational Approaches to High-Dimensional Range and Similarity Searches (Lecture Notes in Computer Science), Vol. Vol. 2341. Springer. Google ScholarDigital Library
- Fuzhen Zhang . 2011. Matrix Theory: Basic Results and Techniques. Springer Science & Business Media.Google ScholarCross Ref
- Ting Zhang, Chao Du, and Jingdong Wang . 2014. Composite Quantization for Approximate Nearest Neighbor Search ICML. 838--846. Google ScholarDigital Library
- Yuxin Zheng, Qi Guo, Anthony K. H. Tung, and Sai Wu . 2016. LazyLSH: Approximate Nearest Neighbor Search for Multiple Distance Functions with a Single Index. In SIGMOD. 2023--2037. Google ScholarDigital Library
Index Terms
- A General and Efficient Querying Method for Learning to Hash
Recommendations
When is ontology-mediated querying efficient?
LICS '19: Proceedings of the 34th Annual ACM/IEEE Symposium on Logic in Computer ScienceIn ontology-mediated querying, description logic (DL) ontologies are used to enrich incomplete data with domain knowledge which results in more complete answers to queries. However, the evaluation of ontology-mediated queries (OMQs) over relational ...
Querying data provenance
SIGMOD '10: Proceedings of the 2010 ACM SIGMOD International Conference on Management of dataMany advanced data management operations (e.g., incremental maintenance, trust assessment, debugging schema mappings, keyword search over databases, or query answering in probabilistic databases), involve computations that look at how a tuple was ...
Learning Label Preserving Binary Codes for Multimedia Retrieval: A General Approach
Learning-based hashing has been researched extensively in the past few years due to its great potential in fast and accurate similarity search among huge volumes of multimedia data. In this article, we present a novel multimedia hashing framework, ...
Comments