Abstract
The feature representation of images is the key to bridge the semantic gap and make computer understand images in the tasks of image retrieval. In image recognition, semantic regions and salient targets are irreplaceable parts of image cognition. The purpose of segmenting semantic regions is to analyze the connections between image targets, while some distinctive targets are able to highlight the important semantics. However, the semantic regions and salient targets were often ignored in previous retrieval methods. Considering the two aspects, this paper proposes an unsupervised image retrieval method based on multi-semantic region weighting and multi-scale flatness weighting. Firstly, we divide the semantic regions by using the Fully Convolutional Network and calculate the multi-Semantic weight map (S-mask) to obtain the global features. Secondly, we introduce a flatness-weighted strategy to weight feature maps and aggregate the multi-scale features to obtain the local features. Finally, we cascade the global features and the local features to construct the final image representation. There are two main contributions in this paper. One is that the S-mask assigns different weights to the semantic regions. It distinguishes the importance of the semantic regions and balances the weight within the semantic region. The other is that the flatness-weighted strategy suppresses the background and highlights the target region. Experimental results demonstrate that the proposed method achieves the state-of-the-art performance on Paris and Oxford databases.








Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Arandjelovic R, Gronat P, Torii A, Pajdla T, Sivic J (2016) NetVLAD: CNN architecture for weakly supervised place recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5297–5307. https://doi.org/10.1109/TPAMI.2017.2711011
Azizpour H, Sharif Razavian A, Sullivan J, Maki A, Carlsson S (2015) From generic to specific deep representations for visual recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp 36–45. https://doi.org/10.1109/CVPRW.2015.7301270
Babenko A, Slesarev A, Chigorin A, Lempitsky V (2014) Neural codes for image retrieval. In: European conference on computer vision. Springer, pp 584–599. https://doi.org/10.1007/978-3-319-10590-1_38
Chaudhuri B, Demir B, Bruzzone L, Chaudhuri S (2017) Multi-label remote sensing image retrieval using a semi-supervised graph-theoretic method. IEEE Trans Geosci Rem Sens 99(1):1. https://doi.org/10.1109/TGRS.2017.2760909
Chum O, Philbin J, Sivic J, Isard M, Zisserman A (2007) Total recall: Automatic query expansion with a generative feature model for object retrieval. In: IEEE 11th international conference on computer vision. IEEE, pp 1–8. https://doi.org/10.1109/ICCV.2007.4408891
Fu J, Zheng H, Mei T (2017) Look closer to see better: Recurrent attention convolutional neural network for fine-grained image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4438–4446. https://doi.org/10.1109/CVPR.2017.476
Gong Y, Wang L, Guo R, Lazebnik S (2014) Multi-scale orderless pooling of deep convolutional activation features. In: European conference on computer vision. Springer, pp 392–407. https://doi.org/10.1007/978-3-319-10584-0_26
Iscen A, Tolias G, Avrithis Y, Furon T, Chum O (2017) Efficient diffusion on region manifolds: Recovering small objects with compact cnn representations. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2077–2086. https://doi.org/10.1109/CVPR.2017.105
Jégou H, Chum O (2012) Negative evidences and co-occurences in image retrieval: the benefit of PCA and whitening. European Conference on Computer Vision. https://doi.org/10.1007/978-3-642-33709-3_55
Jimenez A, Alvarez JM, Giro-i Nieto X (2017) Class-weighted convolutional features for visual instance search. https://doi.org/10.5244/C.31.144
Kalantidis Y, Mellina C, Osindero S (2016) Cross-dimensional weighting for aggregated deep convolutional features. In: European conference on computer vision. Springer, pp 685–701. https://doi.org/10.1007/978-3-319-46604-0_48
Krizhevsky A, Sutskever I, Hinton G (2012) ImageNet classification with deep convolutional neural networks. Adv Neural Inf Process Syst 25(2):1. https://doi.org/10.1145/3065386
Li X, Yang J, Ma J (2020) Large scale category-structured image retrieval for object identification through supervised learning of CNN and SURF-based matching. IEEE Access 8:57796. https://doi.org/10.1109/ACCESS.2020.2982560
Long J, Shelhamer E, Darrell T (2014) Fully convolutional networks for semantic segmentation. IEEE Trans Pattern Anal Mach Intell 39(4):640. https://doi.org/10.1109/TPAMI.2016.2572683
Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60(2):91. https://doi.org/10.1023/b:visi.0000029664.99615.94
MacTavish K, Paton M, Barfoot TD (2017) Visual triage: a bag-of-words experience selector for long-term visual route following. In: IEEE international conference on robotics and automation (ICRA) (IEEE, 2017), pp 2065–2072. https://doi.org/10.1109/ICRA.2017.7989238
Manipoonchelvi P, Muneeswaran K (2014) Significant region-based image retrieval. Signal Image Video Process 9(8):1. https://doi.org/10.1007/s11760-014-0657-0
Mohedano E, McGuinness K, O’Connor NE, Salvador A, Marques F, Giro-i Nieto X (2016) Bags of local convolutional features for scalable instance search. In: Proceedings of the 2016 ACM on international conference on multimedia retrieval (ACM), pp 327–331. https://doi.org/10.1145/2911996.2912061
Ning Q, Zhu J, Zhong Z, Hoi SCH, Chen C (2017) Scalable Image Retrieval by Sparse Product Quantization. IEEE Trans Multimedia 19(3):586. https://doi.org/10.1109/TMM.2016.2625260
Noh H, Araujo A, Sim J, Weyand T, Han B (2017) Large-scale image retrieval with attentive deep local features. In: Proceedings of the IEEE international conference on computer vision, pp 3456–3465. https://doi.org/10.1109/ICCV.2017.374
Philbin J, Chum O, Isard M, Sivic J, Zisserman A (2007) Object retrieval with large vocabularies and fast spatial matching. In: IEEE conference on computer vision and pattern recognition. IEEE, pp 1–8. https://doi.org/10.1109/CVPR.2007.383172
Philbin J, Chum O, Isard M, Sivic J, Zisserman A (2008) Lost in quantization: Improving particular object retrieval in large scale image databases. In: 2008 IEEE conference on computer vision and pattern recognition. IEEE, pp 1–8. https://doi.org/10.1109/CVPR.2008.4587635
Portaz M, Kohl M, Quénot G, Chevallet JP (2018) Fully convolutional network and region proposal for instance identification with egocentric vision. In: IEEE international conference on computer vision workshop. https://doi.org/10.1109/ICCVW.2017.281
Radenović F, Iscen A, Tolias G, Avrithis Y, Chum O (2018) Revisiting oxford and paris: Large-scale image retrieval benchmarking. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5706–5715. https://doi.org/10.1109/CVPR.2018.00598
Radenović F, Tolias G, Chum O, CNN image retrieval learns from BoW: unsupervised fine-tuning with hard examples. In: European conference on computer vision. Springer, pp 3–20. https://doi.org/10.1007/978-3-319-46448-0_1
Razavian AS, Sullivan J, Carlsson S, Maki A (2016) Visual instance retrieval with deep convolutional networks. ITE Trans Media Technol Appl 4(3):251
Shao Z, Zhou W, Deng X, Zhang M, Cheng Q (2020) Multilabel remote sensing image retrieval based on fully convolutional network. IEEE J Sel Top Appl Earth Observ Rem Sens 13(1):318. https://doi.org/10.1109/JSTARS.2019.2961634
Sharif Razavian A, Azizpour H, Sullivan J, Carlsson S (2014) CNN features off-the-shelf: an astounding baseline for recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp 806–813. https://doi.org/10.1109/CVPRW.2014.131
Spyromitros-Xioufis E, Papadopoulos S, Kompatsiaris IY, Tsoumakas G, Vlahavas I (2014) A comprehensive study over VLAD and product quantization in large-scale image retrieval. IEEE Trans Multimed 16(6):1713. https://doi.org/10.1109/tmm.2014.2329648
Sundararajan SK, Sankaragomathi B, Priya DS (2019) Deep belief CNN feature representation based content based image retrieval for medical images. J Med Syst 43(6):1. https://doi.org/10.1007/s10916-019-1305-6
Tolias G, Jégou H (2014) Visual query expansion with or without geometry: refining local descriptors by feature aggregation. Pattern Recogn 47(10):3466. https://doi.org/10.1016/j.patcog.2014.04.007
Tolias G, Sicre R, Jégou H (2015) Particular object retrieval with integral max-pooling of CNN activations. arXiv preprint arXiv:1511.05879
Wang Q, Feng G, Wang Y, Duan LY (2016) Adaptive weighted matching of deep convolutional features for painting retrieval. In: IEEE 2nd international conference on multimedia big data (BigMM). https://doi.org/10.1109/BigMM.2016.69
Wu HC, Luk RWP, Wong KF, Kwok KL (2008) Interpreting TF-IDF term weights as making relevance decisions. ACM Trans Inf Syst (TOIS) 26(3):13. https://doi.org/10.1145/1361684.1361686
Xie L, Hong R, Bo Z, Qi T (2015) Image classification and retrieval are one. In the 5th ACM. https://doi.org/10.1145/2671188.2749289
Yandex AB, Lempitsky V (2015) Aggregating local deep features for image retrieval. In: IEEE international conference on computer vision (ICCV). https://doi.org/10.1109/ICCV.2015.150
Zhang G, Zeng Z, Zhang S, Zhang Y, Wu W (2017) SIFT matching with CNN evidences for particular object retrieval. Neurocomputing 238(238):399. https://doi.org/10.1016/j.neucom.2017.01.081
Zhong Z, Zhu J, Hoi SC (2015) Fast object retrieval using direct spatial matching. IEEE Trans Multimedia 17(8):1391. https://doi.org/10.1109/TMM.2015.2446201
Zhou B, Khosla A, Lapedriza A, Oliva A, Torralba A (2016) Learning deep features for discriminative localization. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2921–2929. https://doi.org/10.1109/CVPR.2016.319
Zongmin L, Xiuxiu L, Yujie L, Hua L (2019) Sketch-based image retrieval based on fine-grained feature and deep convolutional neural network. J Image Gr
Acknowledgements
The authors would like to thank the anonymous reviewers for valuable comments. This work was partly supported by National Natural Science Foundation of China (No.62072394), Natural Science Foundation of Hebei province (F2017203169) and Key Scientific Research Projects of Colleges and Universities in Hebei province (ZD2017080).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
We declare that we have no conflict of interest.
Ethical approval
This article does not contain any studies with human participants or animals performed by any of the authors.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Gu, G., Li, Z., Feng, L. et al. Multi-semantic region weighting and multi-scale flatness weighting based image retrieval. Soft Comput 25, 5699–5708 (2021). https://doi.org/10.1007/s00500-020-05565-5
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00500-020-05565-5