Product Quantization Network for Fast Visual Search

Yu , Tan; Meng, Jingjing; Fang, Chen; Jin, Hailin; Yuan, Junsong

doi:10.1007/s11263-020-01326-x

Product Quantization Network for Fast Visual Search

Published: 23 April 2020

Volume 128, pages 2325–2343, (2020)
Cite this article

International Journal of Computer Vision Aims and scope Submit manuscript

Tan Yu ¹,
Jingjing Meng²,
Chen Fang³,
Hailin Jin³ &
…
Junsong Yuan²

943 Accesses
12 Citations
Explore all metrics

Abstract

Product quantization has been widely used in fast image retrieval due to its effectiveness of coding high-dimensional visual features. By constructing the approximation function, we extend the hard-assignment quantization to soft-assignment quantization. Thanks to the differentiable property of the soft-assignment quantization, the product quantization operation can be integrated as a layer in a convolutional neural network, constructing the proposed product quantization network (PQN). Meanwhile, by extending the triplet loss to the asymmetric triplet loss, we directly optimize the retrieval accuracy of the learned representation based on asymmetric similarity measurement. Utilizing PQN, we can learn a discriminative and compact image representation in an end-to-end manner, which further enables a fast and accurate image retrieval. By revisiting residual quantization, we further extend the proposed PQN to residual product quantization network (RPQN). Benefited from the residual learning triggered by residual quantization, RPQN achieves a higher accuracy than PQN using the same computation cost. Moreover, we extend PQN to temporal product quantization network (TPQN) by exploiting temporal consistency in videos to speed up the video retrieval. It integrates frame-wise feature learning, frame-wise features aggregation and video-level feature quantization in a single neural network. Comprehensive experiments conducted on multiple public benchmark datasets demonstrate the state-of-the-art performance of the proposed PQN, RPQN and TPQN in fast image and video retrieval.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Product Quantization Network for Fast Image Retrieval

A Triplet-loss Dilated Residual Network for High-Resolution Representation Learning in Image Retrieval

Article 25 April 2023

Saeideh Yousefzadeh, Hamidreza Pourreza & Hamidreza Mahyar

SOLAR: Second-Order Loss and Attention for Image Retrieval

References

Babenko, A., & Lempitsky, V. (2014). Additive quantization for extreme vector compression. In CVPR (pp. 931–938).
Babenko, A., & Lempitsky, V. (2015). Aggregating local deep features for image retrieval. In ICCV (pp. 1269–1277).
Babenko, A., Slesarev, A., Chigorin, A., & Lempitsky, V. (2014). Neural codes for image retrieval. In ECCV (pp. 584–599). Berlin: Springer.
Bai, S., Bai, X., Tian, Q., & Latecki, L. J. (2018). Regularized diffusion process on bidirectional context for object retrieval. TPAMI.
Bai, S., Zhou, Z., Wang, J., Bai, X., Latecki, L. J., & Tian, Q. (2017). Ensemble diffusion for retrieval.
Cakir, F., He, K., Bargal, S. A., & Sclaroff, S. (2017). Mihash: Online hashing with mutual information. In ICCV.
Cao, L., Li, Z., Mu, Y., & Chang, S. F. (2012). Submodular video hashing: a unified framework towards video pooling and indexing. In Proceedings of the 20th ACM international conference on Multimedia (pp. 299–308). ACM.
Cao, Y., Long, M., Wang, J., Zhu, H., & Wen, Q. (2016). Deep quantization network for efficient image retrieval. In AAAI.
Cao, Z., Long, M., Wang, J., & Yu, P. S. (2017). Hashnet: Deep learning to hash by continuation. In ICCV.
Charikar, M. S. (2002). Similarity estimation techniques from rounding algorithms. In Proceedings of the 34th annual ACM symposium on theory of computing (pp. 380–388).
Chen, Y., Guan, T., & Wang, C. (2010). Approximate nearest neighbor search by residual vector quantization. Sensors, 10(12), 11259–11273.
Article Google Scholar
Chua, T. S., Tang, J., Hong, R., Li, H., Luo, Z., & Zheng, Y. (2009). Nus-wide: a real-world web image database from national university of singapore. In Proceedings of the ACM international conference on image and video retrieval (p 48).
Datar, M., Immorlica, N., Indyk, P., Mirrokni, V. S. (2004). Locality-sensitive hashing scheme based on p-stable distributions. In Proceedings of the twentieth annual symposium on Computational geometry (pp. 253–262).
Ge, T., He, K., Ke, Q., & Sun, J. (2013). Optimized product quantization for approximate nearest neighbor search. In CVPR (pp. 2946–2953). IEEE.
Gong, Y., Lazebnik, S., Gordo, A., & Perronnin, F. (2013). Iterative quantization: A procrustean approach to learning binary codes for large-scale image retrieval. IEEE T-PAMI, 35(12), 2916–2929.
Article Google Scholar
Gordo, A., Almazán, J., Revaud, J., & Larlus, D. (2016). Deep image retrieval: Learning global representations for image search. In ECCV (pp. 241–257). Springer.
He, K., Cakir, F., Bargal, S. A., & Sclaroff, S. (2018). Hashing as tie-aware learning to rank. In CVPR.
He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 770–778).
Hong, W., Meng, J., & Yuan, J. (2018). Distributed composite quantization. In AAAI.
Hong, W., Meng, J., & Yuan, J. (2018). Tensorized projection for high-dimensional binary embedding. In AAAI.
Hong, W., & Yuan, J. (2018). Fried binary embedding: From high-dimensional visual features to high-dimensional binary codes. IEEE Transactions on Image Processing, 27(10), 1.
Article MathSciNet Google Scholar
Hong, W., Yuan, J., & Bhattacharjee, S. D. (2017). Fried binary embedding for high-dimensional visual features. CVPR, 11, 18.
Google Scholar
Ioffe, S., & Szegedy, C. (2015). Batch normalization: Accelerating deep network training by reducing internal covariate shift. In International conference on machine learning (pp. 448–456).
Jain, H., Zepeda, J., Perez, P., & Gribonval, R. (2017). Subic: A supervised, structured binary code for image search. In ICCV (pp. 833–842).
Jegou, H., Douze, M., & Schmid, C. (2011). Product quantization for nearest neighbor search. IEEE T-PAMI, 33(1), 117–128.
Article Google Scholar
Jégou, H., Douze, M., Schmid, C., & Pérez, P. (2010). Aggregating local descriptors into a compact image representation. In: CVPR (pp. 3304–3311).
Jiang, Q. Y., & Li, W. J. (2018). Asymmetric deep supervised hashing. AAAI.
Klein, B., & Wolf, L. (2017). In defense of product quantization. arXiv preprint arXiv:1711.08589.
Krizhevsky, A. (2009). Learning multiple layers of features from tiny images.
Kuehne, H., Jhuang, H., Garrote, E., Poggio, T., & Serre, T. (2011). Hmdb: a large video database for human motion recognition. In IEEE International Conference on Computer Vision (ICCV), 2011 (pp. 2556–2563). IEEE.
Lai, H., Pan, Y., Liu, Y., & Yan, S. (2015). Simultaneous feature learning and hash coding with deep neural networks. arXiv preprint arXiv:1504.03410.
Li, Q., Sun, Z., He, R., & Tan, T. (2017). Deep supervised discrete hashing. In NIPS (pp. 2479–2488).
Li, W. J., Wang, S., & Kang, W. C. (2015). Feature learning based deep supervised hashing with pairwise labels. arXiv preprint arXiv:1511.03855
Liong, V. E., Lu, J., Tan, Y. P., & Zhou, J. (2017). Deep video hashing. IEEE Transactions on Multimedia, 19(6), 1209–1219.
Article Google Scholar
Liu, H., Wang, R., Shan, S., & Chen, X. (2016). Deep supervised hashing for fast image retrieval. In CVPR (pp. 2064–2072).
Liu, W., Wang, J., Ji, R., Jiang, Y. G., & Chang, S. F. (2012). Supervised hashing with kernels. In 2012 IEEE Conference on Computer Vision and Pattern Recognition (pp. 2074–2081). IEEE.
Liu, X., Zhao, L., Ding, D., & Dong, Y. (2017). Deep hashing with category mask for fast video retrieval. CoRR arXiv:1712.08315.
Martinez, J., Clement, J., Hoos, H. H., & Little, J. J. (2016). Revisiting additive quantization. In European Conference on Computer Vision (pp. 137–153). Springer.
Ng, J.Y.H., Yang, F., Davis, L. S. (2015). Exploiting local features from deep networks for image retrieval. arXiv preprint arXiv:1504.05133.
Norouzi, M., & Fleet, D. J. (2013). Cartesian k-means. In CVPR (pp. 3017–3024).
Norouzi, M., Fleet, D. J., & Salakhutdinov, R. R. (2012). Hamming distance metric learning. In Advances in neural information processing systems (pp. 1061–1069).
Perronnin, F., Liu, Y., Sánchez, J., & Poirier, H. (2010). Large-scale image retrieval with compressed fisher vectors. In CVPR (pp. 3384–3391).
Philbin, J., Chum, O., Isard, M., Sivic, J., & Zisserman, A. (2007). Object retrieval with large vocabularies and fast spatial matching. In CVPR (pp. 1–8).
Sablayrolles, A., Douze, M., Jégou, H., & Usunier, N. (2017). How should we evaluate supervised hashing? In ICASSP.
Salakhutdinov, R., & Hinton, G. (2007). Semantic hashing. RBM, 500(3), 500.
Google Scholar
Shen, F., Shen, C., Liu, W., & Shen, H. T. (2013). Supervised discrete hashing. IEEE T-PAMI, 35(12), 2916–2929.
Article Google Scholar
Shen, F., Shen, C., Liu, W., & Shen, H. T. (2015). Supervised discrete hashing. In: CVPR (Vol. 2, p. 5).
Soomro, K., Zamir, A. R., & Shah, M. (2012). Ucf101: A dataset of 101 human actions classes from videos in the wild. arXiv preprint arXiv:1212.0402.
Tu, Z., Li, H., Zhang, D., Dauwels, J., Li, B., & Yuan, J. (2019). Action-stage emphasized spatio-temporal VLAD for video action recognition. IEEE Transactions on Image Processing.
Tu, Z., Xie, W., Qin, Q., Veltkamp, R. C., Li, B., & Yuan, J. Multi-stream cnn: Learning representations based on human-related regions for action recognition. Pattern Recognition.
Wang, L., Xiong, Y., Wang, Z., Qiao, Y., Lin, D., Tang, X., & Van Gool, L. (2016a). Temporal segment networks: Towards good practices for deep action recognition. In European Conference on Computer Vision (pp. 20–36). Springer.
Wang, X., Shi, Y., & Kitani, K. M. (2016b). Deep supervised hashing with triplet labels. In ACCV (pp. 70–84). Springer.
Wang, X., Zhang, T., Qi, G.J., Tang, J., & Wang, J. (2016c). Supervised quantization for similarity search. In CVPR (pp. 2018–2026).
Weiss, Y., Torralba, A., & Fergus, R. (2009). Spectral hashing. In NIPS (pp. 1753–1760).
Wu, C.Y., Manmatha, R., Smola, A. J., & Krähenbühl, P. (2017a). Sampling matters in deep embedding learning. In ICCV.
Wu, G., Liu, L., Guo, Y., Ding, G., Han, J., Shen, J., & Shao, L. (2017b). Unsupervised deep video hashing with balanced rotation. In IJCAI.
Xia, R., Pan, Y., Lai, H., Liu, C., & Yan, S. (2014). Supervised hashing for image retrieval via image representation learning. In AAAI (pp. 2156–2162). AAAI Press.
Xia, Y., He, K., Kohli, P., & Sun, J. (2015). Sparse projections for high-dimensional binary codes. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 3332–3339).
Ye, G., Liu, D., Wang, J., & Chang, S. F. (2013). Large-scale video hashing via structure learning. In Proceedings of the IEEE International Conference on Computer Vision (pp. 2272–2279).
Yu, T., Meng, J., & Yuan, J. (2017a). Is my object in this video? reconstruction-based object search in videos. In Proceedings of the 26th International Joint Conference on Artificial Intelligence (pp. 4551–4557). AAAI Press.
Yu, T., Wang, Z., & Yuan, J. (2017b). Compressive quantization for fast object instance search in videos. In ICCV (pp. 833–842).
Yu, T., Wu, Y., Bhattacharjee, S. D., & Yuan, J. (2017c). Efficient object instance search using fuzzy objects matching. In AAAI.
Yu, T., Wu, Y., & Yuan, J. (2017d). Hope: Hierarchical object prototype encoding for efficient object instance search in videos. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2424–2433).
Yu, T., Yuan, J., Fang, C., Jin, H. (2018). Product quantization network for fast image retrieval. In Proceedings of the European Conference on Computer Vision (ECCV) (pp. 186–201).
Zhang, R., Lin, L., Zhang, R., Zuo, W., & Zhang, L. (2015). Bit-scalable deep hashing with regularized similarity learning for image retrieval and person re-identification. IEEE TIP, 24(12), 4766–4779.
MathSciNet MATH Google Scholar
Zhang, T., Du, C., & Wang, J. (2014). Composite quantization for approximate nearest neighbor search. In ICML, 2 (pp. 838–846).
Zhang, Z., Chen, Y., & Saligrama, V. (2016). Efficient training of very deep neural networks for supervised hashing. In CVPR (pp. 1487–1495).
Zhao, F., Huang, Y., Wang, L., & Tan, T. (2015). Deep semantic ranking based hashing for multi-label image retrieval. In CVPR (pp. 1556–1564).
Zhu, H., Long, M., Wang, J., & Cao, Y. (2016). Deep hashing network for efficient similarity retrieval. In AAAI.

Download references

Acknowledgements

This work is supported in part by a gift grant from Adobe and startup funds from University at Buffalo.

Author information

Authors and Affiliations

Cognitive Computing Lab, Baidu Research, Seattle, USA
Tan Yu
Computer Science and Engineering Department, University at Buffalo, State University of New York, New York, USA
Jingjing Meng & Junsong Yuan
Adobe Research, San Jose, USA
Chen Fang & Hailin Jin

Authors

Tan Yu
View author publications
You can also search for this author in PubMed Google Scholar
Jingjing Meng
View author publications
You can also search for this author in PubMed Google Scholar
Chen Fang
View author publications
You can also search for this author in PubMed Google Scholar
Hailin Jin
View author publications
You can also search for this author in PubMed Google Scholar
Junsong Yuan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tan Yu .

Additional information

Communicated by Li Liu, Matti Pietikäinen, Jie Qin, Jie Chen, Wanli Ouyang, Luc Van Gool.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yu , T., Meng, J., Fang, C. et al. Product Quantization Network for Fast Visual Search. Int J Comput Vis 128, 2325–2343 (2020). https://doi.org/10.1007/s11263-020-01326-x

Download citation

Received: 08 March 2019
Accepted: 30 March 2020
Published: 23 April 2020
Issue Date: September 2020
DOI: https://doi.org/10.1007/s11263-020-01326-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Product Quantization Network for Fast Visual Search

Abstract

Access this article

Similar content being viewed by others

Product Quantization Network for Fast Image Retrieval

A Triplet-loss Dilated Residual Network for High-Resolution Representation Learning in Image Retrieval

SOLAR: Second-Order Loss and Attention for Image Retrieval

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Product Quantization Network for Fast Visual Search

Abstract

Access this article

Similar content being viewed by others

Product Quantization Network for Fast Image Retrieval

A Triplet-loss Dilated Residual Network for High-Resolution Representation Learning in Image Retrieval

SOLAR: Second-Order Loss and Attention for Image Retrieval

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation