Multi-view based multi-label propagation for image annotation
Introduction
In recent years, with the exponential increasing of digital cameras, people are overwhelmed by a huge number of accessible images, whereby most of them are unlabeled. To effectively manage, access and retrieve this multimedia data, a widely adopted solution is to associate textual annotations to the semantic content of images. With the annotations, an image retrieval problem can be converted into a text retrieval problem, which enjoys both efficient computation and high retrieval accuracy [1]. Since manual annotation is usually time consuming and tedious, semi-supervised multi-label propagation lends itself as an effective technique, through which users only need to label a small number of images, and other unlabeled image can work together with these labeled image for learning and inference [2].
In general, one important step of an automated image annotation task is to extract visual features for image representation [3]. However, we can obtain heterogenous features (multiple views) from images. Different kinds of features describe various aspects of image׳s visual characteristics and have different discriminative power for image understanding [4]. Numerous studies have been devoted to the multi-view based image annotation problem, but disregard the consistencies among different views. Although some sparsity-based approaches have been studied on the selection of heterogenous image features, they only combine several types of features into a single “big” view [5], [4].
The following step of an automated image annotation task is to associate each unlabeled image with a number of different given labels. But most existing work in the line of multi-label propagation suffer (or partially suffer) from the disadvantage that they consider each label independently when handling the multi-label propagation problem [2], [6].
To the best of our knowledge, there does not exist an effective annotation method to fully explore both the view heterogeneity and the label heterogeneity simultaneously. Therefore, we present a novel multi-view based framework for multi-label propagation (MMP) to bridge multi-view learning and multi-label propagation together. The central idea is that: (1) the label propagation from one view should agree with the propagation from another view; (2) the propagations of related labels should be similar. Specially, for each view, MMP models the relationships between the images and the features by constructing a bipartite graph [7], [8], and models the manifold structure among images by using graph Laplacian [9], [10]. Thus, both the image-feature relationships and the geometrical structure are captured by minimizing the fitting error. Given a label, different views should generate the same annotation result so that the consistencies among different views are handled. And we believe that the information of related labels can help to improve the annotation performance. So we impose the similarity constraints between related labels to capture the correlations among different labels. Combining the overall consistencies among views and the similarity of related labels, MMP solves the complex problem with the heterogeneities from both the view level and the label level. Furthermore, we introduce an iterative algorithm to solve the optimization problem.
It is worthwhile to highlight the following contributions of our proposed MMP algorithm in this paper.
- •
We propose a novel image annotation method which fully explores both the view heterogeneity and the label heterogeneity simultaneously. The proposed algorithm handles the two types of heterogeneities by requiring that: (1) the label propagation from one view should agree with the propagation from another view; (2) the propagations of related labels should be similar.
- •
Though this study is mainly motivated by the previous work in [11], we concern about the real application problem of image annotation rather than the mathematical framework itself. Besides, we introduce image-feature, inter-image and inter-label relationships to improve the performance of the proposed Multi-view Based Multi-label Propagation algorithm.
- •
We introduce an iterative algorithm to solve the optimization problem and calculate its computational cost. We implement the effective experiments on a real image data set and discuss tuning process of all parameters.
The rest of this paper is organized as follows: Section 2 briefly introduces the related work about existing image annotation methods. The proposed method is described in Section 3 including the theoretical formula and the optimization algorithm. We set up the experiments and discuss the performance evaluations in Section 4. Finally, we conclude this paper in Section 5.
Section snippets
Related work
In general, image annotation methods can be categorized into three types: free text annotation, keyword annotation and annotation based on ontologies [12]. In this study, we focus on the keyword annotation approach which allows users to annotate images with a chosen set of keywords (“labels”) from a controlled or uncontrolled vocabulary [13]. The keyword based image annotation has attracted a lot of attention from researchers in the last decade [14]. It views labels as the central components
The proposed framework
Suppose we have z labels and v views. Each view denotes a type of feature (e.g., color histogram or SIFT). For the jth view, there are dj features, namely, the feature dimension is dj. Suppose we have n images. We use to denote the sth image in the jth view. Then we construct a non-negative matrix whose rows are images and columns are features.
To be specific, for the ith label, we define to indicate that the sth image is positive with the ith label and vice
Experiments
To prove the effectiveness of the proposed algorithm, we implement image annotation experiments on the NUS-WIDE [34] data set.
Conclusion
We propose a new framework called MMP for image annotation which is the first to bridge the multi-view learning and the multi-label propagation. MMP handles the consistencies among different views by generating the same annotation result and capture the correlation among different labels by imposing the similarity constraints between related labels. By exploring the heterogeneities from both the view level and the label level, we are able to improve the annotation performance. And extensive
Acknowledgment
This work is supported in part by National Key Technology R&D Program (2012BAI34B01), National Natural Science Foundation of China (Grant no. 61170142, 61173185), National High Technology Research and Development Program of China (863 Program) under Grant no. 2013AA040601.
Zhanying He received the B.S. degree in Software Engineering from Zhejiang University, China, in 2009. She is currently a candidate for a Ph.D. degree in Computer Science at Zhejiang University. Her research interests include information retrieval, data mining and machine learning.
References (36)
- et al.
Multi-label ensemble based on variable pairwise constraint projection
Inf. Sci.
(2013) A survey of methods for image annotation
J. Vis. Lang. Comput.
(2008)- et al.
Graph-based semi-supervised learning with multiple labels
J. Vis. Commun. Image Represent.
(2009) - L. Wu, S. Hoi, R. Jin, J. Zhu, N. Yu, Distance metric learning from uncertain side information with application to...
- X. Chen, Y. Mu, S. Yan, T. Chua, Efficient large-scale image annotation by probabilistic collaborative multi-label...
Distinctive image features from scale-invariant keypoints
Int. J. Comput. Vis.
(2004)- F. Wu, Y. Han, Q. Tian, Y. Zhuang, Multi-label boosting for image annotation by structural grouping sparsity, in:...
- L. Cao, J. Luo, F. Liang, T. Huang, Heterogeneous feature machines for visual recognition, in: The 12th ICCV, IEEE,...
- I. Dhillon, Co-clustering documents and words using bipartite spectral graph partitioning, in: The 7th SIGKDD, ACM, San...
- H. Zha, X. He, C. Ding, H. Simon, M. Gu, Bipartite graph partitioning and data clustering, in: The 10th CIKM, ACM,...
Assistive tagginga survey of multimedia tagging with human–computer joint exploration
ACM Comput. Surv.
Color indexing
Int. J. Comput. Vis.
Cited by (25)
A new multi-view multi-label model with privileged information learning
2024, Information SciencesGlobal and local multi-view multi-label learning
2020, NeurocomputingCitation Excerpt :Then this data set has 81 labels and 810 images are adopted as the instances in our experiments. Details can be found in [23,29]. The compared methods include multi-view learning methods MVML [16], LMSC [9] and MLDL [18], multi-label learning methods LF-LPLC [21], MLCHE [22], and GLOCAL [11], multi-view multi-label learning methods MVMLP [23], SSDR-MML [24], and LSA-MML [8].
Nuclear-norm based semi-supervised multiple labels learning
2018, NeurocomputingMulti-view local discrimination and canonical correlation analysis for image classification
2018, NeurocomputingCitation Excerpt :For example, a multimedia video has two kinds of signals at the same time, which are synchronous video stream and audio stream; each internet page can be represented as documents, pictures, and hyperlinks to it; in a shopping mall, a multi-view image of a human face or object can be obtained by capturing from different perspectives. Each type of data above is referred as a specific view, and we regard these data in different forms as multi-view data [6–8]. As multi-view data contains more complementary information [9] than that of single view data [10–12], it is becoming a severe challenge to jointly learn features from multi-view data effectively [13,14].
Effective active learning strategy for multi-label learning
2018, NeurocomputingCitation Excerpt :In recent years, the study of problems that involve data associated with more than one label at the same time has attracted a great deal of attention [1–6]. Particular multi-label problems include text categorization [7–9], classification of emotions evoked by music [10], semantic annotation of images [11–14], classification of music and videos [15–17], classification of protein and gene function [18–23], acoustic classification [24], chemical data analysis [25] and more. Multi-label learning is concerned with learning a model able to predict a set of labels for an unseen example.
Zhanying He received the B.S. degree in Software Engineering from Zhejiang University, China, in 2009. She is currently a candidate for a Ph.D. degree in Computer Science at Zhejiang University. Her research interests include information retrieval, data mining and machine learning.
Chun Chen received the B.S. degree in Mathematics from Xiamen University, China, in 1981, and his M.S. and Ph.D. degrees in Computer Science from Zhejiang University, China, in 1984 and 1990 respectively. He is a Professor in College of Computer Science, Zhejiang University. His research interests include information retrieval, data mining, computer vision, computer graphics and embedded technology.
Jiajun Bu received the B.S. and Ph.D. degrees in Computer Science from Zhejiang University, China, in 1995 and 2000, respectively. He is a Professor in College of Computer Science, Zhejiang University. His research interests include embedded system, data mining, information retrieval and mobile database.
Ping Li received the Ph.D. degree in Computer Science from Zhejiang University, China, in 2014 and the M.S. degree in Information and Communication Engineering from Central South University, China, in 2010. He is an Assistant Professor in the School of Computer Science and Technology, Hangzhou Dianzi University. His research interests include machine learning, data mining and multimedia analysis.
Deng Cai is a Professor in the State Key Lab of CAD&CG, College of Computer Science at Zhejiang University, China. He received the Ph.D. degree in Computer Science from University of Illinois at Urbana Champaign in 2009. Before that, he received his Bachelor׳s degree and a Master׳s degree from Tsinghua University in 2000 and 2003 respectively, both in Automation. His research interests include machine learning, data mining and information retrieval.