Abstract
It is challenge to segment fine-grained objects due to appearance variations and clutter of backgrounds. Most of existing segmentation methods hardly separate small parts of the instance from its background with sufficient accuracy. However, such small parts usually contain important semantic information, which is crucial in fine-grained categorization. Observing that fine-grained objects almost share the same configuration of parts, we present a novel part-aware segmentation method, which explicitly detects semantic parts and preserve these parts during segmentation. We firstly design a hybrid part localization method, which generates accurate part proposals with moderate computation. Then we iteratively update the segmentation outputs and the part proposals, which obtains better foreground segmentation results. Experiments demonstrate the superiority of the proposed method, as compared to state-of-the-art segmentation approaches for fine-grained categorization.
Similar content being viewed by others
References
Akata Z, Reed S, Walter D, Lee H, Schiele B (2015) Evaluation of output embeddings for fine-grained image classification. In: 2015 IEEE conference on computer vision and pattern recognition (CVPR). IEEE, pp 2927–2936
Angelova A, Zhu S (2013) Efficient object detection and segmentation for fine-grained recognition. In: 2013 IEEE Conference on computer vision and pattern recognition (CVPR). IEEE, pp 811–818
Berg T, Belhumeur PN (2013) Poof: Part-based one-vs.-one features for fine-grained categorization, face verification, and attribute estimation. In: 2013 IEEE conference on computer vision and pattern recognition (CVPR). IEEE, pp 955–962
Berg T, Liu J, Lee SW, Alexander ML, Jacobs DW, Belhumeur PN (2014) Birdsnap: Large-scale fine-grained visual categorization of birds. In: 2014 IEEE conference on computer vision and pattern recognition (CVPR). IEEE, pp 2019–2026
Bossard L, Guillaumin M, Van Gool L (2014) Food-101–mining discriminative components with random forests. In: European conference on computer vision (ECCV). Springer, pp 446–461
Boykov YY, Jolly MP (2001) Interactive graph cuts for optimal boundary & region segmentation of objects in nd images. In: 2001 IEEE international conference on computer vision (ICCV). IEEE, pp 105–112
Branson S, Van Horn G, Wah C, Perona P, Belongie S (2014) The ignorant led by the blind: a hybrid human–machine vision system for fine-grained categorization. Int J Comput Vis 108(1-2):3–29
Chai Y, Lempitsky V, Zisserman A Symbiotic segmentation and part localization for fine-grained categorization. In: 2013 IEEE international conference on computer vision (ICCV). IEEE, pp 321–328
Cheng M, Mitra NJ, Huang X, Torr PH, Hu S (2015) Global contrast based salient region detection. IEEE Trans Pattern Anal Mach Intell 37(3):569–582
Csurka G, Dance C, Fan L, Willamowski J, Bray C (2004) Visual categorization with bags of keypoints. In: Workshop on statistical learning in computer vision, ECCV, vol 1, pp 1–2
Cui Y, Zhou F, Lin Y, Belongie S (2016) Fine-grained categorization and dataset bootstrapping using deep metric learning with humans in the loop. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR). IEEE
Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: 2005 IEEE computer society conference on computer vision and pattern recognition (CVPR), 2005 IEEE conference on, vol 1, pp 886–893. IEEE
Deng J, Krause J, Fei-Fei L (2013) Fine-grained crowdsourcing for fine-grained recognition. In: 2013 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, pp 580–587
Everingham M, Van Gool L, Williams CK, Winn J, Zisserman A (2010) The pascal visual object classes (voc) challenge. Int J Comput Vision 88(2):303–338
Felzenszwalb P, McAllester D, Ramanan D (2008) A discriminatively trained, multiscale, deformable part model. In: 2008 IEEE conference on computer vision and pattern recognition (CVPR). IEEE, pp 1–8
Felzenszwalb PF, Huttenlocher DP (2004) Efficient graph-based image segmentation. Int J Comput Vision 59(2):167–181
Freytag A, Rodner E, Darrell T, Denzler J (2014) Exemplar-specific patch features for fine-grained recognition. In: German Conference on Pattern Recognition. Springer, Cham, pp 144–156
Freytag A, Rodner E, Denzler J (2014) Birds of a feather flock together–local learning of mid-level representations for fine-grained recognition. In: ECCV workshop on parts and attributes, vol 2
Gkioxari G, Malik J (2015) Finding action tubes. In: 2015 IEEE conference on computer vision and pattern recognition (CVPR). IEEE, pp 759–768
Goering C, Rodner E, Freytag A, Denzler J (2014) Nonparametric part transfer for fine-grained recognition. In: 2014 IEEE conference on computer vision and pattern recognition (CVPR). IEEE, pp 2489–2496
Huang S, Xu Z, Tao D, Zhang Y (2016) Part-stacked cnn for fine-grained visual categorization. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR). IEEE
Jain S, Xiong B, Grauman K (2017) Pixel objectness. arXiv:1701.0534
Jiang F, Zhang S, Wu S, Gao Y, Zhao D (2015) Multi-layered gesture recognition with kinect. J Mach Learn Res 16:227–254
Khosla A, Jayadevaprakash N, Yao B, Li FF (2011) Novel dataset for fine-grained image categorization: Stanford dogs. In: Proceedings CVPR workshop on fine-grained visual categorization (FGVC), vol 2
Lin D, Shen X, Lu C, Jia J (2015) Deep lac: deep localization, alignment and classification for fine-grained recognition. In: 2015 IEEE conference on computer vision and pattern recognition (CVPR). IEEE, pp 1666–1674
Lin TY, RoyChowdhury A, Maji S (2015) Bilinear cnn models for fine-grained visual recognition. In: 2013 IEEE international conference on computer vision (ICCV). IEEE, pp 1449–1457
Liu J, Belhumeur PN (2013) BBird part localization using exemplar-based models with enforced pose and subcategory consistency. In: 2013 IEEE international conference on computer vision (ICCV). IEEE, pp 2520–2527
Liu J, Kanazawa A., Jacobs D., Belhumeur P. (2012) Dog breed classification using part localization. In: Computer Vision–ECCV 2012, pp 172–185. Springer
Liu J, Li Y, Belhumeur PN (2014) Part-pair representation for part localization. In: Computer Vision–ECCV 2014, pp 456–471. Springer
Liu L, Cheng L, Liu Y, Jia Y, Rosenblum DS (2016) Recognizing complex activities by a probabilistic interval-based model. In: AAAI, vol 30, pp 1266–1272
Liu W, Yang X, Tao D, Cheng J, Tang Y (2018) Multiview dimension reduction via Hessian multiset canonical correlations. Information Fusion 41:119–128
Liu Y, Nie L, Han L, Zhang L, Rosenblum DS (2015) Action2activity: Recognizing complex activities from sensor data. In: IJCAI, pp 1617–1623
Liu Y, Nie L, Liu L, Rosenblum DS (2016) From action to activity: sensor-based activity recognition. Neurocomputing 181:108–115
Liu Y., Zhang L, Nie L, Yan Y, Rosenblum DS (2016) Fortune teller: predicting your career path. In: AAAI, pp 201–207
Liu Y, Zheng Y, Liang Y, Liu S, Rosenblum DS (2016) Urban water quality prediction based on multi-task multi-view learning. In: International joint conference on artificial intelligence
Liu Z, Luo P, Qiu S, Wang X, Tang X (2016) Deepfashion: Powering robust clothes recognition and retrieval with rich annotations. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR). IEEE, pp 1096–1104
Lowe DG (1999) Object recognition from local scale-invariant features. In: Proceedings of IEEE international conference on computer vision, p 1150
Malisiewicz T, Gupta A, Efros AA (2011) Ensemble of exemplar-svms for object detection and beyond. In: 2011 IEEE International conference on computer vision (ICCV). IEEE, pp 89–96
Mottos AB, Feris RS (2014) Fusing well-crafted feature descriptors for efficient fine-grained classification. In: 2014 IEEE international conference on image processing (ICIP). IEEE, pp 5197–5201
Ni B, Yang X, Gao S (2016) Progressively parsing interactional objects for fine grained action detection. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR). IEEE, pp 1020–1028
Pang C, Yao H, Sun X (2014) Discriminative features for bird species classification. In: International Conference on internet multimedia computing and service. ACM, p 256
Pang C, Yao H, Yang Z, Sun X, Zhao S, Zhang Y (2015) Part-aware segmentation for fine-grained categorization. In: Pacific rim conference on multimedia, pp 538–548. Springer
Preoţiuc-Pietro D, Liu Y, Hopkins D, Ungar L Beyond binary labels: political ideology prediction of twitter users. In: Proceedings of the 55th annual meeting of the association for computational linguistics (Volume 1: Long Papers), vol 1, pp 729–740
Rosch E, Mervis CB, Gray WD, Johnson DM, Boyes-Braem P (1976) Basic objects in natural categories. Cogn Psychol 8(3):382–439
Rother C, Kolmogorov V, Blake A (2004) Interactive foreground extraction using iterated graph cuts. In: ACM transactions on graphics (TOG). ACM, vol 23, pp 309–314
Singh B, Shao M (2016) A multi-stream bi-directional recurrent neural network for fine-grained action detection. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR). IEEE
Sochor J, Herout A, Havel J (2016) Boxcars: 3d boxes as cnn input for improved fine-grained vehicle recognition. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR). IEEE, pp 3006–3015
Wah C, Branson S, Welinder P et al. (2011) The caltech-ucsd birds-200-2011 dataset. California Institute of Technology
Wang Y, Choi J, Morariu VI, Davis LS (2016) Mining discriminative triplets of patches for fine-grained classification. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR). IEEE
Weijer JVD, Schmid C, Verbeek J, Larlus D (2009) Learning color names for real-world applications. IEEE Transactions on Image Processing A Publication of the IEEE Signal Processing Society 18(7):1512–23
Wilf P, Zhang S, Chikkerur S, Little SA, Wing SL, Serre T (2016) Computer vision cracks the leaf code. Proceedings of the National Academy of Sciences of the United States of America 113(12):3305– 3310
Wu B, Nevatia R, Li Y (2008) Segmentation of multiple, partially occluded objects by grouping, merging, assigning part detection responses. In: 2008 IEEE conference on computer vision and pattern recognition (CVPR). IEEE, pp 1–8
Xiao T, Xu Y, Yang K, Zhang J, Peng Y, Zhang Z (2015) The application of two-level attention models in deep convolutional neural network for fine-grained image classification. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, pp 842–850
Xie L, Tian Q, Hong R, Yan S, Zhang B (2013) Hierarchical part matching for fine-grained visual categorization. In: 2013 IEEE International Conference on Computer Vision (ICCV). IEEE, pp 1641–1648
Yao B, Khosla A, Fei-Fei L (2011) Combining randomization and discrimination for fine-grained image categorization. In: 2011 IEEE conference on computer vision and pattern recognition (CVPR). IEEE, pp 1577–1584
Yao B, Ma J, Fei-Fei L (2013) Discovering object functionality. In: 2013 IEEE international conference on computer vision (ICCV). IEEE, pp 2512–2519
Zhang N, Donahue J, Girshick R, Darrell T (2014) Part-based r-cnns for fine-grained category detection. In: European conference on computer vision (ECCV), pp 834–849. Springer
Zhang N, Farrell R, Iandola F, Darrell T (2013) Deformable part descriptors for fine-grained recognition and attribute prediction. In: 2013 IEEE International Conference on Computer Vision (ICCV). IEEE, pp 729–736
Zhang S, Kasiviswanathan S, Yuen PC, Harandi M (2015) Online dictionary learning on symmetric positive definite manifolds with vision applications. In: Proceedings of the twenty-ninth AAAI conference on artificial intelligence, pp 3165–3173
Zhang S, Yao H, Sun X, Wang K, Zhang J, Lu X, Zhang Y (2014) Action recognition based on overcomplete independent component analysis. Inf Sci 281:635–647
Zhang S, Zhou H, Jiang F, Li X (2015) Robust visual tracking using structurally random projection and weighted least squares. IEEE Trans Circuits Syst Video Technol 25(11):1749–1760
Zhang S, Zhou H, Yao H, Zhang Y, Wang K, Zhang J (2015) Adaptive normalhedge for robust visual tracking. Signal Process 110:132–142
Zhang X, Zhou F, Lin Y, Zhang S (2016) Embedding label structures for fine-grained feature representation. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR). IEEE
Zhou F, Lin Y (2016) Fine-grained image classification by exploring bipartite-graph labels. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR). IEEE
Acknowledgments
This work was supported by the National Natural Science Foundation of China under Project No. 61472103, No. 61772158 and No. 61702136.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Pang, C., Yao, H., Sun, X. et al. Exploring part-aware segmentation for fine-grained visual categorization. Multimed Tools Appl 77, 30291–30310 (2018). https://doi.org/10.1007/s11042-018-5957-x
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-018-5957-x