Elsevier

Neurocomputing

Volume 105, 1 April 2013, Pages 38-44
Neurocomputing

Nonlinear matrix factorization with unified embedding for social tag relevance learning

https://doi.org/10.1016/j.neucom.2012.02.046Get rights and content

Abstract

With the proliferation of social images, social image tagging is an essential issue for text-based social image retrieval. However, the original tags annotated by web users are always noisy, irrelevant and incomplete to interpret the image visual contents. In this paper, we propose a nonlinear matrix factorization method with the priors of inter- and intra-correlations among images and tags to effectively predict the tag relevance to the visual contents. In the proposed method, we attempt to discover the image latent feature space and the tag latent feature space in a unified space, that is, each image or each tag can be described as a point in the unified space. Intuitively, it is more understandable to estimate the relationships between images and tags directly based on their distances or similarities in the unified space. Thus, the task of image tagging or tag recommendation can be efficiently solved by the nearest tag-neighbors search in the unified space. Similarly, we can obtain the top relevant images corresponding to any tag so as to perform the task of image search by keywords. We investigate the performance of the proposed method on tag recommendation and image search respectively and compare to existing work on the challenging NUS-WIDE dataset. Extensive experiments demonstrate the effectiveness and potentials of the proposed method in real-world applications.

Introduction

In the web 2.0 era, with the development of Internet technologies and digital devices, image sharing websites such as Flickr and Facebook are increasingly popular. Users cannot only easily upload, distribute and share their digital images and photos, but also tag and comment on their interested images. As a consequence, text-based social image retrieval has become an emerging popular yet rather challenging research topic.

However, due to the diversity of knowledge and cultural background of users, social tagging is often subjective and inaccurate. Consequently, many images are usually not tagged with proper tags, and even completely untagged. The tags associated with social images could be noisy, irrelevant and incomplete as shown in Fig. 1, which may severely deteriorate the performance of text-based image retrieval [1]. Existing studies reveal that many tags provided by Flickr users are imprecise and there are only around 50% tags actually related to the image [2], [1]. Hence, a fundamental problem for text-based social image retrieval is how to rank the tags for any given image by the relevance of tags with respect to the visual content.

In this paper, we investigate the tag relevance learning problem to address the above challenge. Fig. 1 shows an exemplary image from Flickr, from which we can see that there are some irrelevant, noisy and incomplete tags. After tag relevance learning, the tags are adjusted and some relevant tags are added. Many approaches have been proposed to tackle the tag relevance learning problem [3], [4], [5], [6], [2], [7], [8], [1]. The most related work is the multi-correlation probabilistic matrix factorization (MPMF) model [7], which is based on a latent factor model. The image-tag relation matrix is decomposed to two latent feature matrices and the image similarity and tag correlation matrices are exploited simultaneously and seamlessly by the shared latent matrices. It is a linear Gaussian model and the latent factors can be embedded in the different spaces.

Unlike the existing matrix factorization work, this paper proposes a nonlinear matrix factorization approach with unified embedding (MFUE) to learn the tag relevance for social image retrieval. The image latent features and tag latent features are embedded in a unified space and the distance represents the relevance. Compared to the standard matrix factorization, MFUE can also scale to the number of observations and track the sparse data. On the other hand, the structure of latent features embedding in the same space is more intuitive to understand the relationship between images and tags. The new images with few or no tag can be easily mapped into the unified space and recommended relevant tags using the nearest neighbor searches. Finally, in the process of matrix factorization, the visual similarity and tag correlation are jointly investigated by the shared latent feature vectors to preserve the visual and semantic local geometry properties. We conduct an extensive set of experiments to evaluate the empirical performance of the proposed MFUE method with the application of social image retrieval and tag recommendation.

The reminder of this paper is organized as follows. We review related work in Section 2. Section 3 elaborates the proposed nonlinear matrix factorization with unified embedding algorithm. In Section 4, extensive experiments are conducted to evaluate the performance of the proposed method and compare it to other related methods. The conclusion of this paper with future work discussion is presented in Section 5.

Section snippets

Related work

It is an essential issue to estimate the relevance of tags with respect to images in text-based image retrieval. The related techniques are categorized into two main scenarios, namely tag annotation for untagged images and tag refinement for tagged images.

Methods in the first category predicts relevant tags for images with no tag. A variety of methods have been proposed to annotate images automatically [9], [10], [11], [12], [13], [14], [15], [16], [17], [18], [7], which can be categorized into

Nonlinear matrix factorization with unified embedding

To estimate the relationship between tags and images, we jointly exploit three aspects: matrix factorization, local visual geometry preserving and local textual geometry preserving. In this section, we first present the formulation of the proposed methods with some preliminaries. We then elaborate each part of the objective function and discuss the optimization.

Experimental analysis

To validate the effectiveness of our proposed approach on tag relevance learning, we conduct extensive experiments, and apply our method to text-based social image retrieval and automatic image recommendation. All of the experiments are implemented via MATLAB on a 2.39 GHz PC with 16 GB RAM.

Conclusion

In this paper, we propose a nonlinear matrix factorization approach to estimate the tag relevance. The latent factors are mapped into a unified space and their relevance can be measured by distance between them. The image visual similarity and tag correlation are incorporated simultaneously to preserve the local visual geometry and local textual geometry. The latent factors in the same space makes their relationship more understandable, and allows to fast return top tags (images) to a query

Acknowledgments

This work was supported by 973 Program (Project no. 2010CB327905) and the National Natural Science Foundation of China (Grant nos. 60833006, 60903146, 61272329).

Zechao Li received the BE degree from University of Science and Technology of China (USTC), Anhui, China, in 2008. He is currently pursuing the PhD degree at National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing, China.

References (28)

  • J. Liu et al.

    Image annotation via graph learning

    Pattern Recogn.

    (2009)
  • J. Zhuang, S.C.H. Hoi, A two-view learning approach for image tag ranking, in: WSDM,...
  • D. Liu, X.-S. Hua, L. Yang, M. Wang, H.-J. Zhang, Tag ranking, in: WWW,...
  • C. Wang, F. Jing, L. Zhang, Image annotation refinement using random walk with restarts, in: MM,...
  • C. Wang, F. Jing, L. Zhang, H.-J. Zhang, Content-based image annotation refinement, in: CVPR,...
  • X. Li, C.G.M. Snoek, M. Worring, Learning tag relevance by neighbor voting for social image retrieval, in: MIR, 2008,...
  • L. Wu, L. Yang, N. Yu, X.-S. Hua, Learning to tag, in: WWW, 2009, pp....
  • Z. Li, J. Liu, X. Zhu, T. Liu, H. Lu, Image annotation using multi-correlation probabilistic matrix factorization, in:...
  • G. Zhu, S. Yan, Y. Ma, Image tag refinement towards low-rank, content-tag prior and error sparsity, in: MM, 2010, pp....
  • P. Duygulu, K. Barnard, N. de Freitas, D. Forsyth, Object recognition as machine translation: learning a lexicon for a...
  • J. Jeon, V. Lavrenko, R. Manmatha, Automatic image annotation and retrieval using cross-media relevance models, in:...
  • S. Feng, R. Manmatha, V. Lavrenko, Multiple bernoulli relevance models for image and video annotation, in: CVPR, 2004,...
  • V. Lavrenko, R. Manmatha, J. Jeon, A model for learning the semantics of pictures, in: NIPS, 2004, pp....
  • J. Liu, B. Wang, M. Li, W. Ma, H. Lu, S. Ma, Dual cross-media relevance model for image annotation, in: MM, 2007, pp....
  • Cited by (0)

    Zechao Li received the BE degree from University of Science and Technology of China (USTC), Anhui, China, in 2008. He is currently pursuing the PhD degree at National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing, China.

    Jing Liu received the BE and ME degrees from Shandong University, Shandong, in 2001 and 2004, respectively, and the PhD degree from the Institute of Automation, Chinese Academy of Sciences, Beijing, in 2008. She is an Associate Professor with the National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences. Her current research interests include multimedia analysis, understanding, and retrieval.

    Hangqing Lu received his BS and MS from Department of Computer Science and Department of Electric Engineering in Harbin Institute of Technology in 1982 and 1985. He got his PhD from Department of Electronic and Information Science in Huazhong University of Sciences and Technology. He is a professor with the National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences. Current research interests include Image similarity measure, Video Analysis, Multimedia Technology and System.

    View full text