Abstract
Recent works have demonstrated that the convolutional descriptor aggregation can provide state-of-the-art performance for image retrieval. In this paper, we propose a multi-center convolutional descriptor aggregation (MCDA) method to produce global image representation for image retrieval. We first present a feature map center selection method to eliminate the background information in the feature maps. We then propose the channel weighting and spatial weighting schemes based on the centers to boost the effect of the features on the object. Finally, the weighted convolutional descriptors are aggregated to represent images. Experiments demonstrate that MCDA can produce state-of-the-art retrieval performance, and the generated activation map is also effective for object localization.
Similar content being viewed by others
References
Krizhevsky A, Sutskever I, Hinton GE(2012) Imagenet classification with deep convolutional neural networks. In: Annual conference on neural information processing systems, pp 1097–1105
Cui L, Yang S, Chen F et al (2018) A survey on application of machine learning for internet of things. Int J Mach Learn Cybernet 9(8):1399–1417
Banharnsakun A (2018) Towards improving the convolutional neural networks for deep learning using the distributed artificial bee colony method. Int J Mach Learn Cybern. https://doi.org/10.1007/s13042-018-0811-z
Cui Y, Jiang J, Lai Z et al (2018) Supervised discrete discriminant hashing for image retrieval. Pattern Recogn 78:79–90
Kalantidis Y, Mellina C, Osindero S (2016) Cross-dimensional weighting for aggregated deep convolutional features. European conference on computer vision. Springer, Cham, pp 685–701
Wei XS, Luo JH, Wu J (2017) Selective convolutional descriptor aggregation for fine-grained image retrieval. IEEE Trans Image Process 26(6):2868–2881
Ng JYH, Yang F, Davis LS (2015) Exploiting local features from deep networks for image retrieval. arXiv preprint: 1504.05133
Jegou H, Douze M, Schmid C (2008) Hamming embedding and weak geometric consistency for large scale image search. European conference on computer vision, pp 304–317
Philbin J, Chum O, Isard M (2008) Lost in quantization: Improving particular object retrieval in large scale image databases. International conference on computer vision and pattern recognition, pp 1–8
Philbin J, Chum O, Isard M (2007) Object retrieval with large vocabularies and fast spatial matching. International conference on computer vision and pattern recognition, pp 1–8
Wang XZ, Wang R, Xu C (2018) Discovering the relationship between generalization and uncertainty by incorporating complexity of classification. IEEE Trans Cybern 48(2):703–715
Wang R, Kwong S, Wang XZ et al (2015) Segment based decision tree induction with continuous valued attributes. IEEE Trans Cybern 45(7):1262–1275
Wang R, Chow CY, Kwong S (2016) Ambiguity-based multiclass active learning. IEEE Trans Fuzzy Syst 24(1):242–248
Wang R, Wang XZ, Kwong S et al (2017) Incorporating diversity and informativeness in multiple-instance active learning. IEEE Trans Fuzzy Syst 25(6):1460–1475
Jégou H, Douze M, Schmid C (2010) Aggregating local descriptors into a compact image representation. International conference on computer vision and pattern recognition, pp 3304–3311
Sánchez J, Perronnin F, Mensink T et al (2013) Image classification with the fisher vector: theory and practice. Int J Comput Vis 105(3):222–245
Babenko A, Slesarev A, Chigorin A (2014) Neural codes for image retrieval. European conference on computer vision, pp 584–599
Razavian AS, Azizpour H, Sullivan J, Carlsson S (2014) CNN features off-the-shelf: an astounding baseline for recognition. International conference on computer vision and pattern recognition workshops, pp 512–519
Gong Y, Wang L, Guo R (2014) Multi-scale orderless pooling of deep convolutional activation features. European conference on computer vision, pp 392–407
Babenko A, Lempitsky V (2015) Aggregating local deep features for image retrieval. IEEE international conference on computer vision, pp 1269–1277
Tolias G, Sicre R, Jégou H (2015) Particular object retrieval with integral max-pooling of CNN activations. arXiv preprint 1511.05879
Radenović F, Tolias G, Chum O (2016) CNN image retrieval learns from BoW: unsupervised fine-tuning with hard examples. European conference on computer vision, pp 3–20
Liu Z, Li J, Shen Z (2017) Learning efficient convolutional networks through network slimming. IEEE international conference on computer vision, pp 2755–2763
Boscaini D, Masci J, Rodolà E (2016) Learning shape correspondence with anisotropic convolutional neural networks. Adv Neural Inf Process Syst 3189–3197
Azizpour H, Sharif Razavian A, Sullivan J, Maki A, Carlsson S (2015) From generic to specific deep representations for visual recognition. International conference on computer vision and pattern recognition workshops, pp 36–45
Fu Z, Robles-Kelly A, Zhou J (2011) MILIS: Multiple instance learning with instance selection. IEEE Trans Pattern Anal Mach Intell 33(5):958–977
Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. International conference on learning representations, pp 1–14
Razavian AS, Sullivan J, Carlsson S (2016) Visual instance retrieval with deep convolutional networks. ITE Trans Media Technol Appl 4(3):251–258
Huang R, Zhang G, Chen J (2018) Semi-supervised discriminant Isomap with application to visualization, image retrieval and classification. Int J Mach Learn Cybern. https://doi.org/10.1007/s13042-018-0809-6
Zhu Q, Yuan N, Guan D et al (2018) An alternative to face image representation and classification. Int J Mach Learn Cybern. https://doi.org/10.1007/s13042-018-0802-0
Liu J, Liu W, Ma S et al (2018) Image-set based face recognition using K-SVD dictionary learning. Int J Mach Learn Cybern. https://doi.org/10.1007/s13042-017-0782-5
Ding S, Zhang N, Zhang J et al (2017) Unsupervised extreme learning machine with representational features. Int J Mach Learn Cybernet 8(2):587–595
Fang J, Xu X, Liu H et al (2018) Local receptive field based extreme learning machine with three channels for histopathological image classification. Int J Mach Learn Cybern. https://doi.org/10.1007/s13042-018-0825-6
Acknowledgements
This work is supported by the National Natural Science Foundation of China (Grant 61772344 and Grant 61732011), in part by the Natural Science Foundation of SZU (Grant 827-000140, Grant 827-000230, and Grant 2017060), the National Social Science Foundation of China (17BTQ068), the Youth Foundation of Education Bureau of Hebei Province (Grant QN2015099), China Postdoctoral Science Foundation funded project (Grant 2017M621078). The funding project of midwest colleges and universities promoting comprehensive strength of Hebei University.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Zhu, J., Wu, S., Zhu, H. et al. Multi-center convolutional descriptor aggregation for image retrieval. Int. J. Mach. Learn. & Cyber. 10, 1863–1873 (2019). https://doi.org/10.1007/s13042-018-0898-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13042-018-0898-2