A novel image annotation model based on content representation with multi-layer segmentation

Zhang, Jing; Zhao, Yaxin; Li, Da; Chen, Zhihua; Yuan, Yubo

doi:10.1007/s00521-014-1815-6

A novel image annotation model based on content representation with multi-layer segmentation

Original Article
Published: 04 January 2015

Volume 26, pages 1407–1422, (2015)
Cite this article

Neural Computing and Applications Aims and scope Submit manuscript

Jing Zhang^1,2,
Yaxin Zhao¹,
Da Li¹,
Zhihua Chen¹ &
…
Yubo Yuan¹

398 Accesses
9 Citations
Explore all metrics

Abstract

Image automatic annotation is an important issue of semantic-based image retrieval, and it is still a challenging problem for the reason of semantic gap. In this paper, a novel model with three parts is proposed. The first one is multi-layer image segmentation, in which saliency analysis and normalized cut are combined to segment images into semantic regions in the first layer. While in the second layer, the semantic regions are segmented into grids further . The second one is image content representation by region-based bag-of-words (RBoW) model, which is the variant of BoW model. Considering the correlations of labels, we adopt second-order CRFs as the third part of our model to ensure the accuracy of automatic image annotation. Experimental results show that our multi-layer segmentation-based image annotation model can achieve promising performance for multi-labeling and outperform the model based on single-layer segmentation and previous algorithm on Corel 5K and Pascal VOC 2007 datasets .

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

LWTP: An Improved Automatic Image Annotation Method Based on Image Segmentation

Applying a Lightweight Iterative Merging Chinese Segmentation in Web Image Annotation

A two-stage hybrid probabilistic topic model for refining image annotation

Article 20 July 2019

References

Zhang H, Berg A, Maire M, Malik J (2006) SVM-KNN: discriminative nearest neighbor classification for visual category recognition. In: 2006 IEEE computer society conference on computer vision and pattern recognition, vol 2, pp 2126–2136
Guillaumin M, Mensink T, Verbeek J, Schmid C (2009) TagProp: discriminative metric learning in nearest neighbor models for image auto-annotation. In: 2009 IEEE 12th international conference on computer vision, pp 309–316
Mei T, Wang Y, Hua XS, Gong SG, Li SP (2008) Coherent image annotation by learning semantic distance. In: 2008 CVPR 2008 IEEE conference on computer vision and pattern recognition, pp 1–8
Shi JB, Malik J (2000) Normalized cuts and image segmentation. IEEE Trans Pattern Anal Mach Intell 22(8):888–905
Article Google Scholar
Cour T, Bénézit F, Shi JB (2005) Spectral segmentation with multiscale graph decomposition. Spectr Segm Multiscale Graph Decompos 2:1124–1131
Google Scholar
Deng YN, Manjunath BS (2001) Unsupervised segmentation of color-texture regions in images and video. IEEE Trans Pattern Anal Mach Intell 23(8):800–810
Serrano N, Savakis A, Luo JB (2002) A computationally efficient approach to indoor/outdoor scene classification. In: 2002 Proceedings 16th international conference on pattern recognition, vol 4, pp 146–149
Zhang J, Zhao Y, Li D, Chen Z, Yuan Y (2013) Representation of image content of image content with multi-scale segmentation. In: 2013 ICMLC machine learning and cybernetics Tianjin, China, July 14–17
Zhang J, Li D, Zhao Y, Chen Z, Yuan Y (2015) Representation of image content based on RoI-BoW. J Vis Commun Image Represent 26(1):37C49
Google Scholar
Li FF, Perona P (2005) A Bayesian hierarchical model for learning natural scene categories. In: IEEE computer society conference on computer vision and pattern recognition, 2005 (CVPR 2005), vol 2, pp 524–531
Sivic J, Zisserman A (2003) Video google: a text retrieval approach to object matching in videos. In: 2003 proceedings ninth IEEE international conference on computer vision, vol 2, pp 1470–1477
Wang XS, Liu X, Shi ZP, Shi ZZ, Sui HJ (2010) Voting conditional random fields for multi-label image classification. In: 2010 3rd international congress on image and signal processing (CISP), vol 4, pp 1984–1988
Varma M, Zisserman A (2005) A statistical approach to texture classification from single images. Int J Comput Vis 62:61–81
Article Google Scholar
Li T, Kweon I-S (2008) A semantic region descriptor for local feature based image categorization. In: 2008 IEEE international conference on acoustics, speech and signal processing (2008 ICASSP), pp 1333–1336
Zhang JG, Marszalek M, Lazebnik S, Schmid C (2006) Local features and kernels for classification of texture and object categories: a comprehensive study. In: Conference on computer vision and pattern recognition workshop, 2006 (CVPRW’06), p 13
Wu X, Zhao WL, Ngo CW (2007) Near-duplicate keyframe retrieval with visual keywords and semantic context. In: CIVR ’07 proceedings of the 6th ACM international conference on image and video retrieval, pp 162–169
Alvarez S, Vanrell M (2012) Texton theory revisited: a bag-of-words approach to combine textons. Pattern Recognit 45:4312–4325
Article Google Scholar
Chen T, Yap KH, Chau LP (2011) From universal bag-of-words to adaptive bag-of-phrases for mobile scene recognition. 2011 18th IEEE international conference on image processing (ICIP), pp 825–828
Albatal R, Mulhem P, Chiaramella Y (2010) Visual phrases for automatic images annotation. In: 2010 International workshop on content-based multimedia indexing (CBMI), pp 1–6
Albatal R, Mulhem P, Chiaramella Y (2011) A new ROI grouping schema for automatic image annotation. In: 2011 IEEE international conference on multimedia and expo (ICME), pp 1–6
Zhang YM, Jia ZY, Chen T (2011) Image retrieval with geometry-preserving visual phrases. In: 2011 IEEE conference on computer vision and pattern recognition (CVPR), pp 809–816
Wang FY, Zhang SW, Li HP, Zhang NG (2012) Image retrieval using multiple orders of geometry-preserving visual phrases. In: 2012 International conference on image analysis and signal processing (IASP), pp 1–5
Zhang SL, Tian Q, Hua G (2009) Descriptive visual words and visual phrases for image applications. In: MM ’09 proceedings of the 17th ACM international conference on Multimedia, pp 75–84
Yang C, Dong M, Fotouhi F (2005) Region based image annotation through multiple instance learning. In: Multimedia’05 proceedings of the 13th annual ACM international conference on multimedia, pp 435–438
Veksler O, Boykov Y, Mehrani P (2010) Superpixels and supervoxels in an energy optimization framework. In: European conference on computer vision (ECCV), pp 211–224
Huang QX, Han M, Wu B, Ioffe S (2011) A hierarchical conditional random field model for labeling and segmenting images of street scenes. In: 2011 IEEE conference on computer vision and pattern recognition (CVPR), pp 1953–1960
Zhang DS, Islam MM, Lu GJ (2012) A review on automatic image annotation techniques. Pattern Recognit 45(1):346–362
Article Google Scholar
Goh KS, Chang EY, Li BT (2005) Using one-class and two-class SVMs for multiclass image annotation. IEEE Trans Knowl Data Eng 17(10):1333–1346
Article Google Scholar
Qi XJ, Han YT (2007) Incorporating multiple SVMs for automatic image annotation. Pattern Recognit 40(2):728–741
Article Google Scholar
Shi R, Feng H, Chua TS, Lee CH (2004) An adaptive image content representation and segmentation approach to automatic image annotation. In: International conference on image and video retrieval, pp 545–554
Tao DP, Jin LW, Liu WF, Li XL (2013) Hessian regularized support vector machines for mobile image annotation on the cloud. IEEE Trans Multimed 15(4):833–844
Article Google Scholar
Park SB, Lee JW, Kim SK (2004) Content-based image classification using a neural network. Pattern Recognit Lett 25(3):287–300
Article Google Scholar
Kim S, Park S, Kim M (2004) Image classification into object/non-object classes. In: International conference on image and video retrieval, Dublin, Ireland, pp 393–400
Frate FD, Pacifici F, Schiavon G, Solimini C (2007) Use of neural networks for automatic classification from high-resolution images. IEEE Trans Geosci Remote Sens 45(4):800–809
Article Google Scholar
Su JH, Chou CL, Lin CY, Tseng VS (2011) Effective semantic annotation by image-to-concept distribution model. IEEE Trans Multimed 13(3):530–538
Article Google Scholar
Bao BK, Li T, Yan SC (2012) Hidden-concept driven multilabel image annotation and label ranking, multimedia. IEEE Trans Multimed 14(1):199–210
Article Google Scholar
Wang Y, Mei T, Gong SG, Hua XS (2009) Combining global, regional and contextual features for automatic image annotation. Pattern Recognit 42(2):259–266
Article Google Scholar
Hu JW, Lam KM (2013) An efficient two-stage framework for image annotation. Pattern Recognit 46(3):936–947
Article Google Scholar
Blanchart P, Datcu M (2010) A semi-supervised algorithm for auto-annotation and unknown structures discovery in satellite image databases. IEEE J Sel Top Appl Earth Obs Remote Sens 3(4):698–717
Article Google Scholar
Liu J, Li MJ, Liu QS, Lu HQ, Ma SD (2009) Image annotation via graph learning. Pattern Recognit 42(2):218–228
Article Google Scholar
Chen ZH, Fu H, Chi ZR, Feng DD (2012) An adaptive recognition model for image annotation. IEEE Trans Syst Man Cybern Part C Appl Rev 42(6):1120–1127
Article Google Scholar
Lafferty J, McCallum A, Pereira F (2001) Conditional random fields: probabilistic models for segmenting and labelling sequence data. In: The 18th international conference on machine learning 2001 (ICML 2001), pp 282–289
He X, Zemel RS, Carreira-Perpiñán MÁ (2004) Multiscale conditional random fields for image labeling. In: Proceedings of the 2004 IEEE computer society conference on computer Vision and pattern recognition, 2004 (CVPR 2004), vol 2, pp 695–702
Liu T, Sun J, Zheng NN, Tang XO, Shum HY (2007) Learning to detect a salient object. In: IEEE conference on computer vision and pattern recognition (2007 CVPR’07), pp 1–8
Mensink T, Verbeek J, Csurka G (2012) Tree-structured CRF models for interactive image labeling. IEEE Trans Pattern Anal Mach Intell 35(2):476–489
Article Google Scholar
Zhong P, Wang RS (2010) Learning conditional random fields for classification of hyperspectral images. IEEE Trans Image Process 19(7):1890–1907
Article MathSciNet Google Scholar
Zhang J, Hu WW (2013) Multi-label image annotation based on multi-model. In: ACM international conference on ubiguitous information management and communication (ACM ICUIMC 2013), Kota Kinabalu, Malaysia, pp 17–19

Download references

Acknowledgments

The authors would like to offer sincere thanks to reviewers. Their comments and suggestions are very important to improve the presentation and technical sounds. The authors also would like to offer sincere thanks to Fangli Ying, Lei Jiang, Yun Liu, Yongwei Gao and Guanghui Dai at the East China University of Science and Technology for their reading this paper carefully and the useful suggestions in the images retrieval. This research has been supported by the National Nature Science Foundation of China (Grants 61370174, 61402174) and also partly supported by Nature Science Foundation of Shanghai Province of China (11ZR1409600).

Author information

Authors and Affiliations

Department of Computer Science and Engineering, East China University of Science and Technology, Shanghai, 200237, People’s Republic of China
Jing Zhang, Yaxin Zhao, Da Li, Zhihua Chen & Yubo Yuan
State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, People’s Republic of China
Jing Zhang

Authors

Jing Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Yaxin Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Da Li
View author publications
You can also search for this author in PubMed Google Scholar
Zhihua Chen
View author publications
You can also search for this author in PubMed Google Scholar
Yubo Yuan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yubo Yuan.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhang, J., Zhao, Y., Li, D. et al. A novel image annotation model based on content representation with multi-layer segmentation. Neural Comput & Applic 26, 1407–1422 (2015). https://doi.org/10.1007/s00521-014-1815-6

Download citation

Received: 10 September 2014
Accepted: 19 December 2014
Published: 04 January 2015
Issue Date: August 2015
DOI: https://doi.org/10.1007/s00521-014-1815-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A novel image annotation model based on content representation with multi-layer segmentation

Abstract

Access this article

Similar content being viewed by others

LWTP: An Improved Automatic Image Annotation Method Based on Image Segmentation

Applying a Lightweight Iterative Merging Chinese Segmentation in Web Image Annotation

A two-stage hybrid probabilistic topic model for refining image annotation

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A novel image annotation model based on content representation with multi-layer segmentation

Abstract

Access this article

Similar content being viewed by others

LWTP: An Improved Automatic Image Annotation Method Based on Image Segmentation

Applying a Lightweight Iterative Merging Chinese Segmentation in Web Image Annotation

A two-stage hybrid probabilistic topic model for refining image annotation

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation