Abstract
Effective feature representation plays an important role in image analysis tasks. In recent years, deep features, instead of hand-crafted features, have become the mainstream of the representation in image analysis tasks. However, the existing deep learning methods always extract feature representations from the whole image directly. Such strategies concentrate on extracting global features, and tend to fail in capturing local geometric invariance and introduce noise information from regions of not interest. In this paper, we propose a novel region-wise deep feature extraction framework for promoting the local geometric invariance and reducing noise information. In our algorithm, object proposal is adopted to generate a set of foreground object bounding boxes, from which the pre-trained convolutional neural network model is adopted to extract region-wise deep features. Then, an improved vector of locally aggregated descriptors strategy with weighted multi-neighbor assignment is proposed to encode the local region-wise feature representations. The final feature representation is not restricted to the classification task, and can also be further quantized to hash codes for large-scale image retrieval. Extensive experiments conducted on publicly available datasets demonstrate the promising performance of our work against the state-of-the-art methods in both image retrieval and classification tasks.



Similar content being viewed by others
Explore related subjects
Discover the latest articles and news from researchers in related subjects, suggested using machine learning.References
Alexe B, Deselaers T, Ferrari V (2010) What is an object? In: IEEE conference on computer vision and pattern recognition, pp 73–80
Andoni A, Indyk P (2008) Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. Found Comput Sci 51:117–122
Arandjelovic R, Gronát P, Torii A, Pajdla T, Sivic J (2016) Netvlad: CNN architecture for weakly supervised place recognition. In: IEEE conference on computer vision and pattern recognition, pp 5297–5307
Babenko A, Lempitsky VS (2015) Aggregating deep convolutional features for image retrieval. CoRR abs. arxiv:1510.07493
Barat C, Ducottet C (2016) String representations and distances in deep convolutional neural networks for image classification. Pattern Recogn 54:104–115
Cai L, Zhu J, Zeng H, Chen J, Cai C, Ma K (2018) Hog-assisted deep feature learning for pedestrian gender recognition. J Franklin Inst 355(4):1991–2008
Cao Z, Long M, Wang J, Yu PS (2017a) Hashnet: deep learning to hash by continuation. CoRR abs. arxiv:1702.00758
Cao Z, Long M, Wang J, Yu PS (2017b) Hashnet: deep learning to hash by continuation. In: ICCV, pp 5609–5618
Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: IEEE computer society conference on computer vision and pattern, pp 886–893
Dixit M, Chen S, Gao D, Rasiwasia N, Vasconcelos N (2015) Scene classification with semantic fisher vectors. In: IEEE conference on computer vision and pattern recognition, pp 2974–2983
Dollár P, Zitnick CL (2015) Fast edge detection using structured forests. IEEE Trans Pattern Anal Mach Intell 37(8):1558–1570
Dollár P, Zitnick CL (2013) Structured forests for fast edge detection. In: IEEE international conference on computer vision, pp 1841–1848
Durand T, Mordan T, Thome N, Cord M (2017) WILDCAT: weakly supervised learning of deep convnets for image classification, pointwise localization and segmentation. In: IEEE conference on computer vision and pattern recognition, pp 5957–5966
Fan R, Chang K, Hsieh C, Wang X, Lin C (2008) LIBLINEAR: a library for large linear classification. J Mach Learn Res 9:1871–1874
Girshick RB (2015) Fast R-CNN. In: IEEE international conference on computer vision, pp 1440–1448
Gong Y, Lazebnik S, Gordo A, Perronnin F (2013) Iterative quantization: a procrustean approach to learning binary codes for large-scale image retrieval. IEEE Trans Pattern Anal Mach Intell 35(12):2916–2929
Gong Y, Wang L, Guo R, Lazebnik S (2014) Multi-scale orderless pooling of deep convolutional activation features. In: European conference on computer vision, pp 392–407
Hoang T, Do T, Tan DL, Cheung N (2017) Selective deep convolutional features for image retrieval. CoRR abs. arxiv:1707.00809
Jegou H, Douze M, Schmid C, Pérez P (2010) Aggregating local descriptors into a compact image representation. In: IEEE conference on conference on computer vision and pattern recognition, pp 3304–3311
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp 1106–1114
Lai H, Pan Y, Liu Y, Yan S (2015) Simultaneous feature learning and hash coding with deep neural networks. In: IEEE conference on computer vision and pattern recognition, pp 3270–3278
Lazebnik S, Schmid C, Ponce J (2006) Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In: IEEE computer society conference on computer vision and pattern recognition, pp 2169–2178
Li P, Liu Y, Liu G, Guo M, Pan Z (2016a) A robust local sparse coding method for image classification with histogram intersection kernel. Neurocomputing 184:36–42
Li Y, Li W, Mahadevan V, Vasconcelos N (2016b) VLAD3: encoding dynamics of deep features for action recognition. In: IEEE conference on computer vision and pattern recognition, pp 1951–1960
Lin K, Lu J, Chen C, Zhou J (2016) Learning compact binary descriptors with unsupervised deep neural networks. In: IEEE conference on computer vision and pattern recognition, pp 1183–1192
Liu P, Liu G, Guo M, Li P (2015) Image classification based on non-negative locality-constrained linear coding. Acta Autom Sin 41(7):1235–1243
Liu Y, Zhang X, Zhu X, Guan Q, Zhao X (2017) Listnet-based object proposals ranking. Neurocomputing 267:182–194
Liu L, Shen C, Wang L, van den Hengel A, Wang C (2014) Encoding high dimensional local features by sparse coding based fisher vectors. In: Advances in neural information processing systems, pp 1143–1151
Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60(2):91–110
Maninis K, Pont-Tuset J, Arbelaez P, Gool LV (2018) Convolutional oriented boundaries: From image segmentation to high-level tasks. IEEE Trans Pattern Anal Mach Intell 40(4):819–833
Ng JY, Yang F, Davis LS (2015) Exploiting local features from deep networks for image retrieval. In: IEEE conference on computer vision and pattern recognition workshops, pp 53–61
Peng X, Wang L, Qiao Y, Peng Q (2014) Boosting VLAD with supervised dictionary learning and high-order statistics. In: Computer vision—ECCV 2014–13th European conference, Zurich, Switzerland, September 6–12, 2014, Proceedings, Part III, pp 660–674
Pont-Tuset J, Arbelaez P, Barron JT, Marqués F, Malik J (2017) Multiscale combinatorial grouping for image segmentation and object proposal generation. IEEE Trans Pattern Anal Mach Intell 39(1):128–140
Rahtu E, Kannala J, Blaschko MB (2011) Learning a category independent object detection cascade. In: IEEE international conference on computer vision, pp 1052–1059
Razavian AS, Azizpour H, Sullivan J, Carlsson S (2014a) CNN features off-the-shelf: An astounding baseline for recognition. In: IEEE conference on computer vision and pattern recognition, pp 512–519
Razavian AS, Sullivan J, Maki A, Carlsson S (2014b) Visual instance retrieval with deep convolutional networks. CoRR abs. arxiv:1412.6574
Shen F, Shen C, Liu W, Shen HT (2015) Supervised discrete hashing. In: IEEE conference on computer vision and pattern recognition, pp 37–45
Simonyan K, Vedaldi A, Zisserman A (2013) Deep fisher networks for large-scale image classification. In: Advances in neural information processing systems, pp 163–171
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. CoRR abs. arxiv:1409.1556
Tsai T, Huang Y, Chiang T (2006) Image retrieval based on dominant texture features. In: IEEE international symposium on industrial electronics, pp 441–446
Uijlings JRR, van de Sande KEA, Gevers T, Smeulders AWM (2013) Selective search for object recognition. Int J Comput Vis 104(2):154–171
Xia R, Pan Y, Lai H, Liu C, Yan S (2014) Supervised hashing for image retrieval via image representation learning. In: Proceedings of the twenty-eighth AAAI conference on artificial intelligence, pp 2156–2162
Yang J, Liu J, Dai Q (2015) An improved bag-of-words framework for remote sensing image retrieval in large-scale image databases. Int J Digit Earth 8(4):273–292
Yang H, Lin K, Chen C (2018) Supervised learning of semantics-preserving hash via deep convolutional neural networks. IEEE Trans Pattern Anal Mach Intell 40(2):437–451
Zhang XY, Wang S, Zhu X, Yun X, Wu G (2015) Update vs. upgrade: modeling with indeterminate multi-class active learning. Neurocomputing 162:163–170
Zhang J, Peng Y, Zhang J (2016a) Query-adaptive image retrieval by deep weighted hashing. CoRR abs. arxiv:1612.02541
Zhang J, Peng Y, Zhang J (2016b) SSDH: semi-supervised deep hashing for large scale image retrieval. CoRR abs. arxiv:1607.08477
Zhu X, Liu J, Wang J, Li C, Lu H (2014) Sparse representation for robust abnormality detection in crowded scenes. Pattern Recogn 47(5):1791–1799
Zhu J, Liao S, Lei Z, Li SZ (2017) Multi-label convolutional neural network based pedestrian attribute classification. Image Vis Comput 58:224–229
Zitnick CL, Dollár P (2014) Edge boxes: locating object proposals from edges. In: European conference on computer vision, pp 391–405
Acknowledgements
This work was supported by National Key R&D Program of China (2017YFB1401000) and National Natural Science Foundation of China (61501457, 61602517). The corresponding authors are Peng Li and Xiao-Yu Zhang, who contribute equally to this paper.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Peng Li and Xiao-Yu Zhang are contributed equally to this paper.
Rights and permissions
About this article
Cite this article
Zhu, X., Wang, Q., Li, P. et al. Learning region-wise deep feature representation for image analysis. J Ambient Intell Human Comput 14, 14775–14784 (2023). https://doi.org/10.1007/s12652-018-0894-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12652-018-0894-0