With the rapid growth of various media data, how to effectively manage and retrieve multimedia data has become an urgent problem to be solved. Due to semantic gap, overcoming the semantic gap has become a difficult problem for image semantic annotation. In this paper, a hybrid approach is proposed to learn automatically semantic concepts of images, which is called CNN-ECC. It’s divided into two processes generative feature learning and discriminative semantic learning. In feature learning phase, the redesigned convolutional neural network (CNN) is utilized for feature learning, instead of traditional methods of feature learning. Besides the reconstructed CNN model has the ability to learn multi-instance feature, which can enhance the image features’ representation when extracting features from images containing multiple instances. In semantic learning phase, the ensembles of classifier chains (ECC) are trained based on obtained visual feature for semantic learning. In addition, the ensembles of classifier chains can learn semantic association between different labels, which can effectively avoid generating redundant labels when resolving multi-label classification task. Furthermore, the experimental results confirm that proposed approach performs more effectively and accurately than state-of-the-art for image semantic annotation.

This work is supported by the National Natural Science Foundation of China (Nos. 61663004, 61363035, 61365009), the Guangxi Natural Science Foundation (Nos. 2016GXNSFAA380146, 2014GXNSFAA118368), the Director Fund of Guangxi Key Lab of Multi-source Information Mining and Security (16-A-03-02), the Guangxi "Bagui Scholar" Teams for Innovation and Research Project, Guangxi Collaborative Innovation Center of Multi-source Information Integration and Intelligent Processing.
Zheng, Y., Li, Z. & Zhang, C. A hybrid architecture based on CNN for cross-modal semantic instance annotation. Multimed Tools Appl 77, 8695–8710 (2018). https://doi.org/10.1007/s11042-017-4764-0
Issue Date:
DOI: https://doi.org/10.1007/s11042-017-4764-0