Elsevier

Neurocomputing

Volume 168, 30 November 2015, Pages 853-860
Neurocomputing

Multi-view based multi-label propagation for image annotation

https://doi.org/10.1016/j.neucom.2015.05.039Get rights and content

Abstract

Multi-view learning and multi-label propagation are two common approaches to address the problem of image annotation. Traditional multi-view methods disregard the consistencies among different views while existing algorithms toward multi-label propagation ignore the underlying mutual correlations among different labels. In this paper, we present a novel image annotation algorithm by exploring the heterogeneities from both the view level and the label level. For a single label, its propagation from one view should agree with the propagation from another view. Similarly, for a single view, the propagations of related labels should be similar. We call the proposed approach as Multi-view based Multi-label Propagation for image annotation (MMP). MMP handles the consistencies among different views by requiring them to generate the same annotation result, and captures the correlations among different labels by imposing the similarity constraints. By taking full advantage of the dual-heterogeneity from views and labels, MMP is able to propagate the labels better than state of the art. Furthermore, we introduce an iterative algorithm to solve the optimization problem. Extensive experiments on real image data have shown that the proposed framework has effective image annotation performance.

Introduction

In recent years, with the exponential increasing of digital cameras, people are overwhelmed by a huge number of accessible images, whereby most of them are unlabeled. To effectively manage, access and retrieve this multimedia data, a widely adopted solution is to associate textual annotations to the semantic content of images. With the annotations, an image retrieval problem can be converted into a text retrieval problem, which enjoys both efficient computation and high retrieval accuracy [1]. Since manual annotation is usually time consuming and tedious, semi-supervised multi-label propagation lends itself as an effective technique, through which users only need to label a small number of images, and other unlabeled image can work together with these labeled image for learning and inference [2].

In general, one important step of an automated image annotation task is to extract visual features for image representation [3]. However, we can obtain heterogenous features (multiple views) from images. Different kinds of features describe various aspects of image׳s visual characteristics and have different discriminative power for image understanding [4]. Numerous studies have been devoted to the multi-view based image annotation problem, but disregard the consistencies among different views. Although some sparsity-based approaches have been studied on the selection of heterogenous image features, they only combine several types of features into a single “big” view [5], [4].

The following step of an automated image annotation task is to associate each unlabeled image with a number of different given labels. But most existing work in the line of multi-label propagation suffer (or partially suffer) from the disadvantage that they consider each label independently when handling the multi-label propagation problem [2], [6].

To the best of our knowledge, there does not exist an effective annotation method to fully explore both the view heterogeneity and the label heterogeneity simultaneously. Therefore, we present a novel multi-view based framework for multi-label propagation (MMP) to bridge multi-view learning and multi-label propagation together. The central idea is that: (1) the label propagation from one view should agree with the propagation from another view; (2) the propagations of related labels should be similar. Specially, for each view, MMP models the relationships between the images and the features by constructing a bipartite graph [7], [8], and models the manifold structure among images by using graph Laplacian [9], [10]. Thus, both the image-feature relationships and the geometrical structure are captured by minimizing the fitting error. Given a label, different views should generate the same annotation result so that the consistencies among different views are handled. And we believe that the information of related labels can help to improve the annotation performance. So we impose the similarity constraints between related labels to capture the correlations among different labels. Combining the overall consistencies among views and the similarity of related labels, MMP solves the complex problem with the heterogeneities from both the view level and the label level. Furthermore, we introduce an iterative algorithm to solve the optimization problem.

It is worthwhile to highlight the following contributions of our proposed MMP algorithm in this paper.

  • We propose a novel image annotation method which fully explores both the view heterogeneity and the label heterogeneity simultaneously. The proposed algorithm handles the two types of heterogeneities by requiring that: (1) the label propagation from one view should agree with the propagation from another view; (2) the propagations of related labels should be similar.

  • Though this study is mainly motivated by the previous work in [11], we concern about the real application problem of image annotation rather than the mathematical framework itself. Besides, we introduce image-feature, inter-image and inter-label relationships to improve the performance of the proposed Multi-view Based Multi-label Propagation algorithm.

  • We introduce an iterative algorithm to solve the optimization problem and calculate its computational cost. We implement the effective experiments on a real image data set and discuss tuning process of all parameters.

The rest of this paper is organized as follows: Section 2 briefly introduces the related work about existing image annotation methods. The proposed method is described in Section 3 including the theoretical formula and the optimization algorithm. We set up the experiments and discuss the performance evaluations in Section 4. Finally, we conclude this paper in Section 5.

Section snippets

Related work

In general, image annotation methods can be categorized into three types: free text annotation, keyword annotation and annotation based on ontologies [12]. In this study, we focus on the keyword annotation approach which allows users to annotate images with a chosen set of keywords (“labels”) from a controlled or uncontrolled vocabulary [13]. The keyword based image annotation has attracted a lot of attention from researchers in the last decade [14]. It views labels as the central components

The proposed framework

Suppose we have z labels and v views. Each view denotes a type of feature (e.g., color histogram or SIFT). For the jth view, there are dj features, namely, the feature dimension is dj. Suppose we have n images. We use xjs to denote the sth image in the jth view. Then we construct a n×dj non-negative matrix Xj=[xj1,xj2,,xjn]T whose rows are images and columns are features.

To be specific, for the ith label, we define gi(s)>0 to indicate that the sth image is positive with the ith label and vice

Experiments

To prove the effectiveness of the proposed algorithm, we implement image annotation experiments on the NUS-WIDE [34] data set.

Conclusion

We propose a new framework called MMP for image annotation which is the first to bridge the multi-view learning and the multi-label propagation. MMP handles the consistencies among different views by generating the same annotation result and capture the correlation among different labels by imposing the similarity constraints between related labels. By exploring the heterogeneities from both the view level and the label level, we are able to improve the annotation performance. And extensive

Acknowledgment

This work is supported in part by National Key Technology R&D Program (2012BAI34B01), National Natural Science Foundation of China (Grant no. 61170142, 61173185), National High Technology Research and Development Program of China (863 Program) under Grant no. 2013AA040601.

Zhanying He received the B.S. degree in Software Engineering from Zhejiang University, China, in 2009. She is currently a candidate for a Ph.D. degree in Computer Science at Zhejiang University. Her research interests include information retrieval, data mining and machine learning.

References (36)

  • P. Li et al.

    Multi-label ensemble based on variable pairwise constraint projection

    Inf. Sci.

    (2013)
  • A. Hanbury

    A survey of methods for image annotation

    J. Vis. Lang. Comput.

    (2008)
  • Z. Zha et al.

    Graph-based semi-supervised learning with multiple labels

    J. Vis. Commun. Image Represent.

    (2009)
  • L. Wu, S. Hoi, R. Jin, J. Zhu, N. Yu, Distance metric learning from uncertain side information with application to...
  • X. Chen, Y. Mu, S. Yan, T. Chua, Efficient large-scale image annotation by probabilistic collaborative multi-label...
  • D. Lowe

    Distinctive image features from scale-invariant keypoints

    Int. J. Comput. Vis.

    (2004)
  • F. Wu, Y. Han, Q. Tian, Y. Zhuang, Multi-label boosting for image annotation by structural grouping sparsity, in:...
  • L. Cao, J. Luo, F. Liang, T. Huang, Heterogeneous feature machines for visual recognition, in: The 12th ICCV, IEEE,...
  • I. Dhillon, Co-clustering documents and words using bipartite spectral graph partitioning, in: The 7th SIGKDD, ACM, San...
  • H. Zha, X. He, C. Ding, H. Simon, M. Gu, Bipartite graph partitioning and data clustering, in: The 10th CIKM, ACM,...
  • M. Belkin, P. Niyogi, Laplacian eigenmaps and spectral techniques for embedding and clustering, in: Advances in Neural...
  • X. He, P. Niyogi, Locality preserving projections, in: Advances in Neural Information Processing Systems, Vancouver,...
  • J. He, R. Lawrence, A graph-based framework for multi-task multi-view learning, in: The 28th ICML, vol. 11, 2011, pp....
  • R. Yan, A. Natsev, M. Campbell, An efficient manual image annotation approach based on tagging and browsing, in:...
  • M. Wang et al.

    Assistive tagginga survey of multimedia tagging with human–computer joint exploration

    ACM Comput. Surv.

    (2012)
  • M. Swain et al.

    Color indexing

    Int. J. Comput. Vis.

    (1991)
  • O. Maron, A. Ratan, Multiple-instance learning for natural scene classification, in: The 15th ICML, Madison, Wisconsin,...
  • D. Lowe, Object recognition from local scale-invariant features, in: The 7th ICCV, IEEE, Kerkyra, Corfu, Greece, 1999,...
  • Cited by (25)

    • Global and local multi-view multi-label learning

      2020, Neurocomputing
      Citation Excerpt :

      Then this data set has 81 labels and 810 images are adopted as the instances in our experiments. Details can be found in [23,29]. The compared methods include multi-view learning methods MVML [16], LMSC [9] and MLDL [18], multi-label learning methods LF-LPLC [21], MLCHE [22], and GLOCAL [11], multi-view multi-label learning methods MVMLP [23], SSDR-MML [24], and LSA-MML [8].

    • Multi-view local discrimination and canonical correlation analysis for image classification

      2018, Neurocomputing
      Citation Excerpt :

      For example, a multimedia video has two kinds of signals at the same time, which are synchronous video stream and audio stream; each internet page can be represented as documents, pictures, and hyperlinks to it; in a shopping mall, a multi-view image of a human face or object can be obtained by capturing from different perspectives. Each type of data above is referred as a specific view, and we regard these data in different forms as multi-view data [6–8]. As multi-view data contains more complementary information [9] than that of single view data [10–12], it is becoming a severe challenge to jointly learn features from multi-view data effectively [13,14].

    • Effective active learning strategy for multi-label learning

      2018, Neurocomputing
      Citation Excerpt :

      In recent years, the study of problems that involve data associated with more than one label at the same time has attracted a great deal of attention [1–6]. Particular multi-label problems include text categorization [7–9], classification of emotions evoked by music [10], semantic annotation of images [11–14], classification of music and videos [15–17], classification of protein and gene function [18–23], acoustic classification [24], chemical data analysis [25] and more. Multi-label learning is concerned with learning a model able to predict a set of labels for an unseen example.

    View all citing articles on Scopus

    Zhanying He received the B.S. degree in Software Engineering from Zhejiang University, China, in 2009. She is currently a candidate for a Ph.D. degree in Computer Science at Zhejiang University. Her research interests include information retrieval, data mining and machine learning.

    Chun Chen received the B.S. degree in Mathematics from Xiamen University, China, in 1981, and his M.S. and Ph.D. degrees in Computer Science from Zhejiang University, China, in 1984 and 1990 respectively. He is a Professor in College of Computer Science, Zhejiang University. His research interests include information retrieval, data mining, computer vision, computer graphics and embedded technology.

    Jiajun Bu received the B.S. and Ph.D. degrees in Computer Science from Zhejiang University, China, in 1995 and 2000, respectively. He is a Professor in College of Computer Science, Zhejiang University. His research interests include embedded system, data mining, information retrieval and mobile database.

    Ping Li received the Ph.D. degree in Computer Science from Zhejiang University, China, in 2014 and the M.S. degree in Information and Communication Engineering from Central South University, China, in 2010. He is an Assistant Professor in the School of Computer Science and Technology, Hangzhou Dianzi University. His research interests include machine learning, data mining and multimedia analysis.

    Deng Cai is a Professor in the State Key Lab of CAD&CG, College of Computer Science at Zhejiang University, China. He received the Ph.D. degree in Computer Science from University of Illinois at Urbana Champaign in 2009. Before that, he received his Bachelor׳s degree and a Master׳s degree from Tsinghua University in 2000 and 2003 respectively, both in Automation. His research interests include machine learning, data mining and information retrieval.

    View full text