Skip to main content
Log in

Unsupervised semantic-based convolutional features aggregation for image retrieval

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Deep features extracted from the convolutional layers of pre-trained CNNs have been widely used in the image retrieval task. These features, however, are in a large number and probably cannot be directly used for similarity evaluation due to lack of efficiency. Thus, it is of great importance to study how to aggregate deep features into a global yet distinctive image vector. This paper first introduces a simple but effective method to select informative features based on semantic content of feature maps. Then, we propose an effective channel weighting method (CW) for selected features by analyzing relations between the discriminative activation and distribution parameters of feature maps, including standard variance, non-zero responses and sum value. Furthermore, we provide a solution to pick semantic detectors that are independent on gallery images. Based on the aforementioned three strategies, we derive a global image vector generation method, and demonstrate its state-of-the-art performance on benchmark datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

Notes

  1. https://github.com/ShawnWXS/filckr_building

References

  1. Azizpour H, Razavian AS, Sullivan J, Maki A, Carlsson S (2015) From generic to specific deep representations for visual recognition. In: Computer vision and pattern recognition workshops. pp 36–45

  2. Babenko A, Lempitsky V (2015) Aggregating deep convolutional features for image retrieval. Computer Science

  3. Babenko A, Slesarev A, Chigorin A, Lempitsky V (2014) Neural Codes for Image Retrieval 8689:584–599

    Google Scholar 

  4. Cao X, Wang P, Meng C, Bai X, Gong G, Liu M, Qi J (2018) Region based CNN for foreign object debris detection on airfield pavement. Sensors 18(3):737

    Article  Google Scholar 

  5. Chen Z, Kuang Z, Wong KYK, Zhang W (2017) Aggregated deep feature from activation clusters for particular object retrieval. In: Thematic workshops of ACM multimedia. pp 44–51

  6. Chu WT, Wu YL (2018) Image style classification based on learnt deep correlation features. IEEE Trans Multimed (99):1–1

  7. Chum O, Philbin J, Sivic J, Isard M, Zisserman A (2007) Total Recall: Automatic Query Expansion with a Generative Feature Model for Object Retrieval. 1–8

  8. Do TT, Hoang T, Tan DKL, Cheung NM (2018) From Selective Deep Convolutional Features to Compact Binary Representations for Image Retrieval

  9. Gao L, Guo Z, Zhang H, Xu X, Shen HT (2017) Video captioning with attention-based LSTM and semantic consistency. IEEE Trans Multimed 19(9):2045–2055

    Article  Google Scholar 

  10. Gong Y, Wang L, Guo R, Lazebnik S (2014) Multi-scale Orderless Pooling of Deep Convolutional Activation Features. 8695:392–407

  11. Gordo A, Almazán J, Revaud J, Larlus D (2016) Deep image retrieval: learning global representations for image search. In: European conference on computer vision. pp 241–257

  12. Gordo A, Almazán J, Revaud J, Larlus D (2016) End-to-end learning of deep visual representations for image retrieval. Int J Comput Vis:1–18

  13. He L, Xu X, Lu H, Yang Y, Shen F, Shen HT (2017) Unsupervised cross-modal retrieval through adversarial learning. IEEE Int Conf Multimed Expo: 1153–1158

  14. Jégou H, Chum O (2012) Negative evidences and co-occurences in image retrieval: the benefit of PCA and whitening. Eur Conf Comput Vision: 774–787

  15. Jégou H, Zisserman A (2014) Triangulation Embedding and Democratic aggregation for image search. In: Computer vision and pattern recognition. pp 3310–3317

  16. Jegou H, Douze M, Schmid C (2009) On the burstiness of visual elements. Computer Vision Pattern Recogn 2009. CVPR 2009. IEEE Conf: 1169–1176

  17. Jian X, Chunheng W, Chengzuo Q, Cunzhao S, Baihua X (2018) Unsupervised Semantic-based Aggregation of Deep Convolutional Features. arXiv:10

  18. Kalantidis Y, Mellina C, Osindero S (2016) Cross-dimensional weighting for aggregated deep convolutional features. In: European conference on computer vision. 685–701

  19. Kim DS, Arsalan M, Park KR (2018) Convolutional neural network-based shadow detection in images using visible light camera sensor. Sensors 18 (4)

  20. Krizhevsky A, Sutskever I, Hinton GE (2012) ImageNet classification with deep convolutional neural networks. Int Conf Neural Inform Process Syst: 1097–1105

  21. Lowe DG (2004) Distinctive image features from scale-invariant Keypoints. Int J Comput Vis 60(2):91–110

    Article  Google Scholar 

  22. Lu H, Li Y, Chen M, Kim H, Serikawa S (2017) Brain intelligence: go beyond artificial intelligence. Mobile Netw Appl 23(2):368–375

    Article  Google Scholar 

  23. Lu H, Li Y, Mu S, Wang D, Kim H, Serikawa S (2017) Motor anomaly detection for unmanned aerial vehicles using reinforcement learning. IEEE Int Things J PP (99):1–1

  24. Lu H, Li Y, Uemura T, Ge Z, Xu X, He L, Serikawa S, Kim H (2017) FDCNet: filtering deep convolutional network for marine organism classification. Multimed Tools Appl (2):1–14

  25. Lu H, Li B, Zhu J, Li Y, Li Y, Xu X, He L, Li X, Li J, Serikawa S (2017) Wound intensity correction and segmentation with convolutional neural networks. Concurr Comput Pract Exper 29 (6)

  26. Lu H, Li Y, Uemura T, Kim H, Serikawa S (2018) Low illumination underwater light field images reconstruction using deep convolutional neural networks. Futur Gener Comput Syst 82

  27. Mao XJ, Shen C, Yang YB (2016) Image restoration using very deep convolutional encoder-decoder networks with symmetric skip connections

  28. Pang S, Ma J, Zhu J, Xue J, Tian Q Improving object retrieval quality by integration of similarity propagation and query expansion. IEEE Trans Multimed (99):1–1

  29. Philbin J, Chum O, Isard M, Sivic J, Zisserman A (2007) Object retrieval with large vocabularies and fast spatial matching. In: Computer vision and pattern recognition, 2007. CVPR 2007. IEEE conference on. pp 1–8

  30. Philbin J, Chum O, Isard M, Sivic J (2008) Lost in quantization: improving particular object retrieval in large scale image databases. In: Computer vision and pattern recognition, 2008. CVPR 2008. IEEE conference on. pp 1–8

  31. Radenović F, Tolias G, Chum O CNN (2016) Image retrieval learns from BoW: unsupervised fine-tuning with hard examples. In: European conference on computer vision. 3–20

  32. Ran L, Zhang Y, Wei W, Zhang Q (2017) A hyperspectral image classification framework with spatial pixel pair features. Sensors 17(10):2421

    Article  Google Scholar 

  33. Razavian AS, Azizpour H, Sullivan J, Carlsson S CNN (2014) Features off-the-shelf: an astounding baseline for recognition. In: IEEE conference on computer vision and pattern recognition workshops. 512–519

  34. Razavian AS, Sullivan J, Maki A, Carlsson S (2014) A baseline for visual instance retrieval with deep convolutional networks. Computer Science

  35. Ren S, He K, Girshick R, Sun J (2015) Faster R-CNN: towards real-time object detection with region proposal networks. IEEE transactions on Pattern Analysis & Machine. Intelligence 39(6):1137–1149

    Google Scholar 

  36. Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M (2015) ImageNet large scale visual recognition challenge. Int J Comput Vis 115(3):211–252

    Article  MathSciNet  Google Scholar 

  37. Scott BL, Hardesty LH (2018) Method and apparatus for speech recognition. J Acoust Soc Am 109(3):864

    Google Scholar 

  38. Serikawa S, Lu H (2014) Underwater image dehazing using joint trilateral filter. Pergamon press, Inc

  39. Shelhamer E, Long J, Darrell T (2017) Fully convolutional networks for semantic segmentation. IEEE Trans Pattern Anal Mach Intell 39(4):640–651

    Article  Google Scholar 

  40. Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. Computer Science

  41. Tolias G, Sicre R, Jégou H (2015) Particular object retrieval with integral max-pooling of CNN activations. Computer Science

  42. Tollari S, Detyniecki M, Marsala C, Fakeri-Tabrizi A, Amini MR, Gallinari P (2009) Exploiting visual concepts to improve text-based image retrieval. Eur Conf Ir Res Adv Inform Retriev: 701–705

  43. Tuan H, Thanh-Toan D, Dang-Khoa Le T, Ngai-Man C (2017) Selective deep convolutional features for image retrieval arXiv:9 pp.-9 pp

  44. Wang L, Xu X, Dong H, Gui R, Pu F (2018) Multi-pixel simultaneous classification of PolSAR image using convolutional neural networks. Sensors 18(3):769

    Article  Google Scholar 

  45. Wang X, Pang S, Zhu J, Wang J, Wang L (2018) An efficient aggregation method of convolutional features for image retrieval. In: International symposium on artificial intelligence and robotics, Nanjing, China

  46. Wang J, Zhu J, Pang S, Li Z, Li Y, Qian X (2018) Adaptive Co-weighting Deep Convolutional Features For Object Retrieval

  47. Wei XS, Luo JH, Wu J, Zhou ZH (2016) Selective convolutional descriptor aggregation for fine-grained image retrieval. IEEE Trans Image Proc 99:1–1

    MATH  Google Scholar 

  48. Xiu-Shen W, Jian-Hao L, Jianxin W (2016) Selective convolutional descriptor aggregation for fine-grained image retrieval. arXiv:16 pp.-16 pp.

  49. Xu X, He L, Shimada A, Taniguchi RI, Lu H (2016) Learning unified binary codes for cross-modal retrieval via latent semantic hashing. Neurocomputing 213:191–203

    Article  Google Scholar 

  50. Xu X, Shen F, Yang Y, Shen HT, Li X (2017) Learning discriminative binary codes for large-scale cross-modal retrieval. IEEE Trans Image Process (99):1–1

  51. Xu J, Shi C, Qi C, Wang C, Xiao B (2017) Unsupervised Part-based Weighting Aggregation of Deep Convolutional Features for Image Retrieval

  52. Xu X, He L, Lu H, Gao L, Ji Y (2018) Deep adversarial metric learning for cross-modal retrieval. World Wide web-internet & web Inf Syst:1–16

  53. Yandex AB, Lempitsky V (2016) Aggregating local deep features for image retrieval. In: IEEE international conference on computer vision. 1269–1277

  54. Yang J, She D, Sun M, Cheng MM, Rosin P, Wang L (2018) Visual sentiment prediction based on automatic discovery of affective regions. IEEE Trans Multimed (99):1–1

  55. Zeiler MD, Fergus R (2013) Visualizing and Understanding Convolutional Networks 8689:818–833

  56. Zhang X, Xiong H, Zhou W, Lin W, Tian Q (2016) Picking deep filter responses for fine-grained image recognition. In: Computer vision and pattern recognition, –1142

  57. Zhang Y, Wei XS, Wu J, Cai J, Lu J, Nguyen VA, Do MN (2016) Weakly supervised fine-grained categorization with part-based image representation. IEEE Trans Image Process 25(4):1713–1725

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgements

This research was funded by National Natural Science Foundation of China Grant 61603289, China Postdoctoral Science Foundation Grant 2016 M602823, and Fundamental Research Funds for the Central Universities xjj2017118.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jihua Zhu.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, X., Pang, S., Zhu, J. et al. Unsupervised semantic-based convolutional features aggregation for image retrieval. Multimed Tools Appl 79, 14465–14489 (2020). https://doi.org/10.1007/s11042-018-6915-3

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-018-6915-3

Keywords

Navigation