Skip to main content
Log in

Fast approximate matching of binary codes with distinctive bits

  • Research Article
  • Published:
Frontiers of Computer Science Aims and scope Submit manuscript

Abstract

Although the distance between binary codes can be computed fast in Hamming space, linear search is not practical for large scale datasets. Therefore attention has been paid to the efficiency of performing approximate nearest neighbor search, in which hierarchical clustering trees (HCT) are widely used. However, HCT select cluster centers randomly and build indexes with the entire binary code, this degrades search performance. In this paper, we first propose a new clustering algorithm, which chooses cluster centers on the basis of relative distances and uses a more homogeneous partition of the dataset than HCT has to build the hierarchical clustering trees. Then, we present an algorithm to compress binary codes by extracting distinctive bits according to the standard deviation of each bit. Consequently, a new index is proposed using compressed binary codes based on hierarchical decomposition of binary spaces. Experiments conducted on reference datasets and a dataset of one billion binary codes demonstrate the effectiveness and efficiency of our method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Zhang W, Gao K, Zhang Y, Li J. Efficient approximate nearest neighbor search with integrated binary codes. In: Proceedings of ACM International Conference on Multimedia. 2011, 1189–1192

    Google Scholar 

  2. Chu W, Li C, Tseng S. Travelmedia: an intelligent management system for media captured in travel. Journal of Visual Communication and Image Representation, 2011, 22(1): 93–104

    Article  Google Scholar 

  3. Wang M, Li H, Tao D, Lu K, Wu X. Multimodal graph-based reranking for Web image search. IEEE Transactions on Image Processing, 2012, 21(11): 4649–4661

    Article  MathSciNet  Google Scholar 

  4. Wang M, Li G, Lu Z, Gao Y, Chua T. When amazon meets google: product visualization by exploring multiple Web sources. ACM Transactions on Internet Technology, 2013, 12(4): 12

    Article  Google Scholar 

  5. Zhang Y, Yan C, Dai F, Ma Y. Efficient parallel framework for H.264/AVC deblocking filter on many-core platform. IEEE Transactions on Multimedia, 2012, 14(3): 510–524

    Article  Google Scholar 

  6. Yan C, Zhang Y, Xu J, Dai F, Li L, Dai Q, Wu F. A highly parallel framework for HEVC coding unit partitioning tree decision on manycore processors. IEEE Signal Processing letters, 2014, 21(5): 573–576

    Article  Google Scholar 

  7. Yan C, Zhang Y, Xu J, Dai F, Zhang J, Dai Q, Wu F. Efficient parallel framework for HEVC motion estimation on many-core processors. IEEE Transactions on Circuits and Systems for Video Technology, 2014, 24(12): 2077–2089

    Article  Google Scholar 

  8. Torralba A, Fergus R, Weiss Y. Small codes and large image databases for recognition. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2008, 1–8

    Google Scholar 

  9. Zhang L, Zhang Y, Tang J, Gu X, Li J, Tian Q. Topology preserving hashing for similarity search. In: Proceedings of ACM International Conference on Multimedia. 2013, 123–132

    Chapter  Google Scholar 

  10. Xie H, Zhang Y, Tan J, Guo L, Li J. Contextual query expansion for image retrieval. IEEE Transactions on Multimedia, 2014, 16(4): 1104–1114

    Article  Google Scholar 

  11. Salakhutdinov R, Hinton G. Semantic hashing. International Journal of Approximate Reasoning, 2009, 50(7): 969–978

    Article  Google Scholar 

  12. Strecha C, Bronstein A, Bronstein M, Fua P. LDAHash: improved matching with smaller descriptors. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2012, 34(1): 66–78

    Article  Google Scholar 

  13. Rublee E, Rabaud V, Konolige K, Bradski G. ORB: an efficient alternative to SIFT or SURF. In: Proceedings of IEEE International Conference on Computer Vision. 2011, 2564–2571

    Google Scholar 

  14. Norouzi M, Punjani A, Fleet D J. Fast search in hamming space with multi-index hashing. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2012: 3108–3115

    Google Scholar 

  15. Muja M, Lowe D G. Fast matching of binary features. In: Proceedings of Computer and Robot Vision. 2012: 404–410

    Google Scholar 

  16. Muja M, Lowe D G. Flann, fast library for approximate nearest neighbors. http://people.cs.ubc.ca/~mariusm/index.php/FLANN/FLANN

  17. Zitnick C L. Binary coherent edge descriptors. Computer Vision-ECCV 2010. Springer Berlin Heidelberg, 2010, 170–182

    Google Scholar 

  18. Weiss Y, Torralba A, Fergus R. Spectral hashing. In: Proceedings of Advances in Neural Information Processing Systems. 2008, 1753–1760

    Google Scholar 

  19. Yeung R W. Information Theory and Network Coding. Springer, 2008

    MATH  Google Scholar 

  20. Ou M, Cui P, Wang F, Wang J, Zhu W, Yang S. Comparing apples to oranges: a scalable solution with heterogeneous hashing. In: Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2013, 230–238

    Chapter  Google Scholar 

  21. Wei Y, Song Y, Zhen Y, Liu B, Yang Q. Scalable heterogeneous translated hashing. In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2014, 791–800

    Google Scholar 

  22. Liu S, Cui P, Zhu W, Yang S, Tian Q. Social embedding image distance learning. In: Proceedings of the ACM International Conference on Multimedia. 2014, 617–626

    Google Scholar 

  23. Zhang L, Zhang Y, Tang J, Tang J, Lu K, Tian Q. Binary code ranking with weighted hamming distance. In: Proceedings of 2013 IEEE Conference on Computer Vision and Pattern Recognition. 2013, 1586–1593

    Chapter  Google Scholar 

  24. Jegou H, Douze M, Schmid C. Product quantization for nearest neighbor search. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2011, 33(1): 117–128

    Article  Google Scholar 

  25. Gong Y, Lazebnik S. Iterative quantization: A procrustean approach to learning binary codes. In: Proceedings of 2011 IEEE Conference on Computer Vision and Pattern Recognition. 2011, 817–824

  26. Heo J P, Lee Y, He J, Chang S, Yoon S. Spherical hashing. In: Proceedings of 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2012, 2957–2964

    Google Scholar 

  27. Li P, Wang M, Cheng J, Xu C, Lu H. Spectral hashing with semantically consistent graph for image indexing. IEEE Transactions on Multimedia, 2013, 15(1): 141–152

    Article  Google Scholar 

  28. Esmaeili M M, Ward R K, Fatourechi M. A fast approximate nearest neighbor search algorithm in the hamming space. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2012, 34(12): 2481–2488

    Article  Google Scholar 

  29. Zhang X, Qin J, Wang W, et al. Hmsearch: An efficient hamming distance query processing algorithm. In: Proceedings of the 25th International Conference on Scientific and Statistical Database Management. 2013, 19

    Google Scholar 

  30. Aly M, Munich M, Perona P. Distributed kd-trees for retrieval from very large image collections. In: Proceedings of British Machine Vision Conference. 2011

    Google Scholar 

  31. Babenko A, Lempitsky V. The inverted multi-index. In: Proceedings of 2012 IEEE Conference on Computer Vision and Pattern Recognition. 2012, 3069–3076

    Chapter  Google Scholar 

  32. Silpa-Anan C, Hartley R. Optimised KD-trees for fast image descriptor matching. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2008, 1–8

    Google Scholar 

  33. Gionis A, Indyk P, Motwani R. Similarity search in high dimensions via hashing. In: Proceedings of the International Conference on Very Large Data Bases. 1999, 99: 518–529

    Google Scholar 

  34. Broder, A.Z. On the resemblance and containment of documents. In: Proceedings of IEEE Compression and Complexity of Sequences. 1997, 21–29

    Google Scholar 

  35. Park H S, Jun C H. A simple and fast algorithm for K-medoids clustering. Expert Systems with Applications, 2009, 36(2): 3336–3341

    Article  Google Scholar 

  36. Bland J M, Altman D G. Statistics notes: measurement error. BMJ, 1996, 312(7047): 1654

    Article  Google Scholar 

  37. Jégou H, Douze M, Schmid C. Improving bag-of-features for large scale image search. International Journal of Computer Vision, 2010, 87(3): 316–336

    Article  Google Scholar 

  38. Yan C, Zhang Y, Dai F, Wang X, Li L, Dai Q. Parallel deblocking filter for HEVC on many-core processor. Electronics Letters, 2014, 50(5): 367–368

    Article  Google Scholar 

  39. Yan C, Zhang Y, Dai F, Li L. Highly parallel framework for HEVC motion estimation on many-core platform. In: Proceedings of Data Compression Conference. 2013, 63–72

    Google Scholar 

  40. Yan C, Dai F, Zhang Y, Ma Y. Parallel deblocking filter for H.264/AVC implemented on Tile64 Platform. In: Proceedings of International Conference on Multimedia and Expo. 2011, 1–6

    Google Scholar 

  41. Yan C, Zhang Y, Dai F, Zhang J, Li L, Dai Q. Efficient parallel HEVC intra prediction on many-core processor. Electronics Letters, 2014, 50(11): 805–806

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hongtao Xie.

Additional information

Chenggang Clarence Yan received his BS in computer science from Shandong University, China in 2008 and his PhD also in computer science from the Institute of Computing Technology, Chinese Academy of Sciences, China in 2013. He is a post-doctoral research fellow with the Department of Automation, Tsinghua University, China. His research interests include image and video compression, multimedia analysis, parallel computing, and computational photography.

Hongtao Xie received his PhD in Computer Application Technology from the Institute of Computing Technology, Chinese Academy of Sciences, China in 2012. He is an associate professor in the Institute of Information Engineering, Chinese Academy of Sciences, China. His research interests include multimedia content analysis and retrieval, similarity search and parallel computing.

Bing Zhang is a PhD candidate in theory of computation in the School of Physics, Beijing Institute of Technology, China. His research interests include multimedia content analysis, and quantum information and quantum computing.

Yanping Ma is a PhD candidate in computer application technology in the College of Information Science and Engineering, Ocean University of China. She is a lecturer in the College of Information and Electrical Engineering at Ludong University, China. Her research interests include multimedia content analysis and retrieval, wavelet image procession, pattern recognition, and machine learning.

Qiong Dai is an associate professor in the Institute of Information Engineering, Chinese Academy of Sciences, China. Her research interests include parallel algorithm research, data flow analysis and processing.

Yizhi Liu received his PhD in computer application technology from the Institute of Computing Technology, Chinese Academy of Sciences, China in 2011. He is an associate professor in the School of Computer Science and Engineering, Hunan University of Science and Technology, China. His research interests include multimedia content analysis and retrieval, spatio-temporal data mining.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yan, C.C., Xie, H., Zhang, B. et al. Fast approximate matching of binary codes with distinctive bits. Front. Comput. Sci. 9, 741–750 (2015). https://doi.org/10.1007/s11704-015-4192-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11704-015-4192-0

Keywords

Navigation