Skip to main content
Log in

Product Quantization Network for Fast Visual Search

  • Published:
International Journal of Computer Vision Aims and scope Submit manuscript

Abstract

Product quantization has been widely used in fast image retrieval due to its effectiveness of coding high-dimensional visual features. By constructing the approximation function, we extend the hard-assignment quantization to soft-assignment quantization. Thanks to the differentiable property of the soft-assignment quantization, the product quantization operation can be integrated as a layer in a convolutional neural network, constructing the proposed product quantization network (PQN). Meanwhile, by extending the triplet loss to the asymmetric triplet loss, we directly optimize the retrieval accuracy of the learned representation based on asymmetric similarity measurement. Utilizing PQN, we can learn a discriminative and compact image representation in an end-to-end manner, which further enables a fast and accurate image retrieval. By revisiting residual quantization, we further extend the proposed PQN to residual product quantization network (RPQN). Benefited from the residual learning triggered by residual quantization, RPQN achieves a higher accuracy than PQN using the same computation cost. Moreover, we extend PQN to temporal product quantization network (TPQN) by exploiting temporal consistency in videos to speed up the video retrieval. It integrates frame-wise feature learning, frame-wise features aggregation and video-level feature quantization in a single neural network. Comprehensive experiments conducted on multiple public benchmark datasets demonstrate the state-of-the-art performance of the proposed PQN, RPQN and TPQN in fast image and video retrieval.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  • Babenko, A., & Lempitsky, V. (2014). Additive quantization for extreme vector compression. In CVPR (pp. 931–938).

  • Babenko, A., & Lempitsky, V. (2015). Aggregating local deep features for image retrieval. In ICCV (pp. 1269–1277).

  • Babenko, A., Slesarev, A., Chigorin, A., & Lempitsky, V. (2014). Neural codes for image retrieval. In ECCV (pp. 584–599). Berlin: Springer.

  • Bai, S., Bai, X., Tian, Q., & Latecki, L. J. (2018). Regularized diffusion process on bidirectional context for object retrieval. TPAMI.

  • Bai, S., Zhou, Z., Wang, J., Bai, X., Latecki, L. J., & Tian, Q. (2017). Ensemble diffusion for retrieval.

  • Cakir, F., He, K., Bargal, S. A., & Sclaroff, S. (2017). Mihash: Online hashing with mutual information. In ICCV.

  • Cao, L., Li, Z., Mu, Y., & Chang, S. F. (2012). Submodular video hashing: a unified framework towards video pooling and indexing. In Proceedings of the 20th ACM international conference on Multimedia (pp. 299–308). ACM.

  • Cao, Y., Long, M., Wang, J., Zhu, H., & Wen, Q. (2016). Deep quantization network for efficient image retrieval. In AAAI.

  • Cao, Z., Long, M., Wang, J., & Yu, P. S. (2017). Hashnet: Deep learning to hash by continuation. In ICCV.

  • Charikar, M. S. (2002). Similarity estimation techniques from rounding algorithms. In Proceedings of the 34th annual ACM symposium on theory of computing (pp. 380–388).

  • Chen, Y., Guan, T., & Wang, C. (2010). Approximate nearest neighbor search by residual vector quantization. Sensors, 10(12), 11259–11273.

    Article  Google Scholar 

  • Chua, T. S., Tang, J., Hong, R., Li, H., Luo, Z., & Zheng, Y. (2009). Nus-wide: a real-world web image database from national university of singapore. In Proceedings of the ACM international conference on image and video retrieval (p  48).

  • Datar, M., Immorlica, N., Indyk, P., Mirrokni, V. S. (2004). Locality-sensitive hashing scheme based on p-stable distributions. In Proceedings of the twentieth annual symposium on Computational geometry (pp. 253–262).

  • Ge, T., He, K., Ke, Q., & Sun, J. (2013). Optimized product quantization for approximate nearest neighbor search. In CVPR (pp. 2946–2953). IEEE.

  • Gong, Y., Lazebnik, S., Gordo, A., & Perronnin, F. (2013). Iterative quantization: A procrustean approach to learning binary codes for large-scale image retrieval. IEEE T-PAMI, 35(12), 2916–2929.

    Article  Google Scholar 

  • Gordo, A., Almazán, J., Revaud, J., & Larlus, D. (2016). Deep image retrieval: Learning global representations for image search. In ECCV (pp. 241–257). Springer.

  • He, K., Cakir, F., Bargal, S. A., & Sclaroff, S. (2018). Hashing as tie-aware learning to rank. In CVPR.

  • He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 770–778).

  • Hong, W., Meng, J., & Yuan, J. (2018). Distributed composite quantization. In AAAI.

  • Hong, W., Meng, J., & Yuan, J. (2018). Tensorized projection for high-dimensional binary embedding. In AAAI.

  • Hong, W., & Yuan, J. (2018). Fried binary embedding: From high-dimensional visual features to high-dimensional binary codes. IEEE Transactions on Image Processing, 27(10), 1.

    Article  MathSciNet  Google Scholar 

  • Hong, W., Yuan, J., & Bhattacharjee, S. D. (2017). Fried binary embedding for high-dimensional visual features. CVPR, 11, 18.

    Google Scholar 

  • Ioffe, S., & Szegedy, C. (2015). Batch normalization: Accelerating deep network training by reducing internal covariate shift. In International conference on machine learning (pp. 448–456).

  • Jain, H., Zepeda, J., Perez, P., & Gribonval, R. (2017). Subic: A supervised, structured binary code for image search. In ICCV (pp. 833–842).

  • Jegou, H., Douze, M., & Schmid, C. (2011). Product quantization for nearest neighbor search. IEEE T-PAMI, 33(1), 117–128.

    Article  Google Scholar 

  • Jégou, H., Douze, M., Schmid, C., & Pérez, P. (2010). Aggregating local descriptors into a compact image representation. In: CVPR (pp. 3304–3311).

  • Jiang, Q. Y., & Li, W. J. (2018). Asymmetric deep supervised hashing. AAAI.

  • Klein, B., & Wolf, L. (2017). In defense of product quantization. arXiv preprint arXiv:1711.08589.

  • Krizhevsky, A. (2009). Learning multiple layers of features from tiny images.

  • Kuehne, H., Jhuang, H., Garrote, E., Poggio, T., & Serre, T. (2011). Hmdb: a large video database for human motion recognition. In IEEE International Conference on Computer Vision (ICCV), 2011 (pp. 2556–2563). IEEE.

  • Lai, H., Pan, Y., Liu, Y., & Yan, S. (2015). Simultaneous feature learning and hash coding with deep neural networks. arXiv preprint arXiv:1504.03410.

  • Li, Q., Sun, Z., He, R., & Tan, T. (2017). Deep supervised discrete hashing. In NIPS (pp. 2479–2488).

  • Li, W. J., Wang, S., & Kang, W. C. (2015). Feature learning based deep supervised hashing with pairwise labels. arXiv preprint arXiv:1511.03855

  • Liong, V. E., Lu, J., Tan, Y. P., & Zhou, J. (2017). Deep video hashing. IEEE Transactions on Multimedia, 19(6), 1209–1219.

    Article  Google Scholar 

  • Liu, H., Wang, R., Shan, S., & Chen, X. (2016). Deep supervised hashing for fast image retrieval. In CVPR (pp. 2064–2072).

  • Liu, W., Wang, J., Ji, R., Jiang, Y. G., & Chang, S. F. (2012). Supervised hashing with kernels. In 2012 IEEE Conference on Computer Vision and Pattern Recognition (pp. 2074–2081). IEEE.

  • Liu, X., Zhao, L., Ding, D., & Dong, Y. (2017). Deep hashing with category mask for fast video retrieval. CoRR arXiv:1712.08315.

  • Martinez, J., Clement, J., Hoos, H. H., & Little, J. J. (2016). Revisiting additive quantization. In European Conference on Computer Vision (pp. 137–153). Springer.

  • Ng, J.Y.H., Yang, F., Davis, L. S. (2015). Exploiting local features from deep networks for image retrieval. arXiv preprint arXiv:1504.05133.

  • Norouzi, M., & Fleet, D. J. (2013). Cartesian k-means. In CVPR (pp. 3017–3024).

  • Norouzi, M., Fleet, D. J., & Salakhutdinov, R. R. (2012). Hamming distance metric learning. In Advances in neural information processing systems (pp. 1061–1069).

  • Perronnin, F., Liu, Y., Sánchez, J., & Poirier, H. (2010). Large-scale image retrieval with compressed fisher vectors. In CVPR (pp. 3384–3391).

  • Philbin, J., Chum, O., Isard, M., Sivic, J., & Zisserman, A. (2007). Object retrieval with large vocabularies and fast spatial matching. In CVPR (pp. 1–8).

  • Sablayrolles, A., Douze, M., Jégou, H., & Usunier, N. (2017). How should we evaluate supervised hashing? In ICASSP.

  • Salakhutdinov, R., & Hinton, G. (2007). Semantic hashing. RBM, 500(3), 500.

    Google Scholar 

  • Shen, F., Shen, C., Liu, W., & Shen, H. T. (2013). Supervised discrete hashing. IEEE T-PAMI, 35(12), 2916–2929.

    Article  Google Scholar 

  • Shen, F., Shen, C., Liu, W., & Shen, H. T. (2015). Supervised discrete hashing. In: CVPR (Vol. 2, p. 5).

  • Soomro, K., Zamir, A. R., & Shah, M. (2012). Ucf101: A dataset of 101 human actions classes from videos in the wild. arXiv preprint arXiv:1212.0402.

  • Tu, Z., Li, H., Zhang, D., Dauwels, J., Li, B., & Yuan, J. (2019). Action-stage emphasized spatio-temporal VLAD for video action recognition. IEEE Transactions on Image Processing.

  • Tu, Z., Xie, W., Qin, Q., Veltkamp, R. C., Li, B., & Yuan, J. Multi-stream cnn: Learning representations based on human-related regions for action recognition. Pattern Recognition.

  • Wang, L., Xiong, Y., Wang, Z., Qiao, Y., Lin, D., Tang, X., & Van Gool, L. (2016a). Temporal segment networks: Towards good practices for deep action recognition. In European Conference on Computer Vision (pp. 20–36). Springer.

  • Wang, X., Shi, Y., & Kitani, K. M. (2016b). Deep supervised hashing with triplet labels. In ACCV (pp. 70–84). Springer.

  • Wang, X., Zhang, T., Qi, G.J., Tang, J., & Wang, J. (2016c). Supervised quantization for similarity search. In CVPR (pp. 2018–2026).

  • Weiss, Y., Torralba, A., & Fergus, R. (2009). Spectral hashing. In NIPS (pp. 1753–1760).

  • Wu, C.Y., Manmatha, R., Smola, A. J., & Krähenbühl, P. (2017a). Sampling matters in deep embedding learning. In ICCV.

  • Wu, G., Liu, L., Guo, Y., Ding, G., Han, J., Shen, J., & Shao, L. (2017b). Unsupervised deep video hashing with balanced rotation. In IJCAI.

  • Xia, R., Pan, Y., Lai, H., Liu, C., & Yan, S. (2014). Supervised hashing for image retrieval via image representation learning. In AAAI (pp. 2156–2162). AAAI Press.

  • Xia, Y., He, K., Kohli, P., & Sun, J. (2015). Sparse projections for high-dimensional binary codes. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 3332–3339).

  • Ye, G., Liu, D., Wang, J., & Chang, S. F. (2013). Large-scale video hashing via structure learning. In Proceedings of the IEEE International Conference on Computer Vision (pp. 2272–2279).

  • Yu, T., Meng, J., & Yuan, J. (2017a). Is my object in this video? reconstruction-based object search in videos. In Proceedings of the 26th International Joint Conference on Artificial Intelligence (pp. 4551–4557). AAAI Press.

  • Yu, T., Wang, Z., & Yuan, J. (2017b). Compressive quantization for fast object instance search in videos. In ICCV (pp. 833–842).

  • Yu, T., Wu, Y., Bhattacharjee, S. D., & Yuan, J. (2017c). Efficient object instance search using fuzzy objects matching. In AAAI.

  • Yu, T., Wu, Y., & Yuan, J. (2017d). Hope: Hierarchical object prototype encoding for efficient object instance search in videos. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2424–2433).

  • Yu, T., Yuan, J., Fang, C., Jin, H. (2018). Product quantization network for fast image retrieval. In Proceedings of the European Conference on Computer Vision (ECCV) (pp. 186–201).

  • Zhang, R., Lin, L., Zhang, R., Zuo, W., & Zhang, L. (2015). Bit-scalable deep hashing with regularized similarity learning for image retrieval and person re-identification. IEEE TIP, 24(12), 4766–4779.

    MathSciNet  MATH  Google Scholar 

  • Zhang, T., Du, C., & Wang, J. (2014). Composite quantization for approximate nearest neighbor search. In ICML, 2 (pp. 838–846).

  • Zhang, Z., Chen, Y., & Saligrama, V. (2016). Efficient training of very deep neural networks for supervised hashing. In CVPR (pp. 1487–1495).

  • Zhao, F., Huang, Y., Wang, L., & Tan, T. (2015). Deep semantic ranking based hashing for multi-label image retrieval. In CVPR (pp. 1556–1564).

  • Zhu, H., Long, M., Wang, J., & Cao, Y. (2016). Deep hashing network for efficient similarity retrieval. In AAAI.

Download references

Acknowledgements

This work is supported in part by a gift grant from Adobe and startup funds from University at Buffalo.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tan Yu .

Additional information

Communicated by Li Liu, Matti Pietikäinen, Jie Qin, Jie Chen, Wanli Ouyang, Luc Van Gool.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yu , T., Meng, J., Fang, C. et al. Product Quantization Network for Fast Visual Search. Int J Comput Vis 128, 2325–2343 (2020). https://doi.org/10.1007/s11263-020-01326-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11263-020-01326-x

Keywords

Navigation