Low-rank regularized multi-view inverse-covariance estimation for visual sentiment distribution prediction

https://doi.org/10.1016/j.jvcir.2018.11.006Get rights and content

Highlights

  • LDL is considered as an effective manner to address the subjective evaluation.

  • We proposed a Low-rank regularized multi-view learning model.

  • Experiment results validate the effectiveness of our model.

Abstract

With the increasing tendency of using images to express opinions and share experiences, sentiment analysis of visual content has aroused considerable attention interests in the past few years. Traditional sentiment analysis methods mainly focus on predicting the most dominant sentiment category of images while neglecting the sentiment ambiguity problem restricted by various factors such as environment, subjectivity, and cultural background. To tackle this problem, visual sentiment distribution prediction has been put forward to characterize images by distributions over a set of sentiment labels instead of a single distinct label or multiple distinct labels. Nevertheless, existing approaches usually separate feature embedding and distribution prediction.

In this paper, we propose a novel supervised visual sentiment distribution prediction model, termed as low-rank regularized multi-view inverse-covariance estimation, in which feature embedding and distribution prediction are jointly performed. Specifically, our proposed model contains two main components: multi-view embedding and inverse-covariance estimation terms. The multi-view embedding term is restricted by low-rank constraints to seek the lowest-rank representation of samples. The inverse-covariance estimation term is restricted by structured sparsity regularization to learn a more reasonable distribution prediction model. We develop an alternative heuristic optimization algorithm to solve the objective function of the proposed model. Experiment results performed on three publicly available datasets demonstrate the effectiveness of our proposed scheme compared with state-of-the-art algorithms.

Introduction

With the rapid development of social media, Facebook1 and YouTube2, as the two representative social media platforms, which have become one type of powerful tool to spread ideas and affect people’s attitude. In recent years, with the popularization of mobile devices with camera functions, the social media platforms focusing on some photo-sharing services, have become an important way for people to express themselves. Therefore, the user generated contents (UGCs) are diverse involving texts, images, and video[1], [2], [3], [4]. Among them, image is one of the most representative information sources. For example, a study on Twitter data indicates that the images accounted for 90% of the total data [5]. Image sentiment analysis has a great practical significance since the emotional tendency of the publishers and reviewers can be reflected by images. Under this scenario, visual sentiment sentiment analysis has spanned widespread applications ranging from politics, education, entertainment, and advertisement [6], [7], [8], [9], [10], [11], [12], [13].

In recent years, a huge number of studies have been proposed for visual sentiment analysis [14], [15], [16], [17], [18], [19]. For example, Corchs et al. [14] proposed an ensemble learning approach for social image emotion classification tasks by combining five state-of-the-art classifiers. Rao et al. [17] proposed a deep multi-level patch learning network framework based on different deep representations, which effectively deals with the noisy labeled dataset. Poria et al. [18] proposed to build a multi-model framework, which fuses audio, visual, and textual clues for sentiment analysis. Although a lot of visual emotion analysis work has been completed, the image-based emotion analysis lags behind the text-based emotion analysis, the main reasons are as follows: (1) Semantic gap. Although images can be represented by various types of features, there is an inevitable problem known as semantic gap, which characterizes the differences between the high-level sentiment semantics of an image and the extracted low- and mid-level visual feature representations. (2) Label ambiguity. The emotions aroused by different people from the same image may not be consistent. Furthermore, the same people may have different emotion marks at different time points. Fig. 1 shows the sentiment label distribution of a sample image from Twitter_LDL [20], which is annotated by 8 volunteers. Ultimately, this image evoked 5 sentiments altogether. Therefore, a reasonable assumption is that image sentiment is a mixture of multiple sentiments rather than a single representative sentiment category. Moreover, to better fit many real applications, it is reasonable to assign different importance of each label to an image. To tackle challenges presented above, label distribution learning (LDL) is proposed to learn a set of probability distribution values that represents the intensity of each label. LDL provides a more general learning framework, in which both the single-label and the multi-label learning can be considered as its special cases. In the case of image sentiment prediction, LDL provides overall label distributions and each value represents the degree to which each label describes the image. Therefore, LDL can be considered as an effective manner to address the problem of subjective evaluation in image sentiment analysis to some extent.

To better tackle these challenges, in this paper, we present a novel low-rank regularized multi-view inverse-covariance estimation algorithm for visual sentiment distribution prediction. Our proposed scheme unifies feature representation and distribution prediction into a multi-view learning framework such that the lowest-rank representation not only captures the intrinsic structure embedded in data but also indicates the distribution prediction requirements. The core idea of our proposed scheme includes two regularization terms, one is the low-rank regularization for enhancing the feature representation of samples, and the other is the structured sparsity regularization term for learning more matching distribution prediction model. Inspired by the great success of the multi-view embedding techniques in solving the semantic gap problem [21], [22], [23], [24], [25], [26], we impose low-rank regularization on multi-view embedding to seek the lowest-rank common representation among views. As to the label ambiguity problem, we model visual sentiment prediction as a multi-output regression learning problem. To obtain a more reasonable distribution learning model, we introduce inverse-covariance mechanism by taking structured sparsity of regression coefficients into account. Furthermore, we show that this approach is to learn a structured sparse conditional Gaussian model [27] for multi-outputs prediction. Because the formulated objective function is hard to solve, we design an alternative heuristic optimization algorithm to solve our proposed model. The main contributions of this paper are summarized as follows:

  • In comparison with previous studies that generally treat sentiment analysis as a single/multiple label learning problem, we propose to model image sentiment prediction as a label distribution learning problem. It can be considered as an effective attempt to address the challenge of subjective evaluation to some extent.

  • We propose multi-view inverse-covariance estimation for visual sentiment distribution prediction, in which multi-view low-rank learning and inverse-covariance estimation are jointly integrated for learning the more intrinsic feature representation and more robust prediction model, respectively.

  • We develop an alternating heuristic optimization algorithm to solve our proposed model. Experiment results on three publicly available datasets demonstrate that our proposed algorithm reaches convergence within a small number of iterations and obtains promising results in comparison with state-of-the-art algorithms.

The remainder of this paper is organized as follows: Section 2 reviews related work. Section 3 describes the proposed method. Experimental results and discussions are reported in Section 4, followed by conclusions in Section 5.

Section snippets

Related work

In this section, we discuss some of the related work to our proposed method. In Section 2.1, we first described the application of label distribution learning. We then provide a brief review of the standard formulation of multi-view learning in Section 2.2. In Section 2.3, we give a detailed explanation of the inverse covariance estimation.

Low-rank regularized multi-view inverse-covariance estimation for visual sentiment distribution prediction

In this section, we first illustrate the proposed algorithm in the details and then show the proposed alternating direction algorithm applying to optimize our proposed algorithm.

Experiment results and discussions

To evaluate the effectiveness of our proposed approach, we first presented the details of three datasets and experimental settings in Section 4.1. We then described three types of evaluation metrics in Section 4.2. Finally, the experimental results and discussions are illustrated in Section 4.3.

Conclusion

In this paper, we propose a low-rank regularized multi-view inverse-covariance estimation for visual sentiment distribution prediction. Our proposed method jointly employs multi-view learning and inverse-covariance estimation to guarantee the superior performance. In this framework, a low-rank multi-view regularization constraint is exploited to uncover the potential intrinsic low-rank feature representation. Inverse-covariance estimation of multiple-output regression that can estimate both the

Conflict of interest

The authors declared that there is no conflict of interest.

References (65)

  • J. Chen, X. Song, L. Nie, X. Wang, H. Zhang, T.-S. Chua, Micro tells macro: predicting the popularity of micro-videos...
  • T. Chen, D. Lu, M.-Y. Kan, P. Cui, Understanding and classifying image tweets, in: Proceedings of ACM International...
  • F. Wanner, C. Rohrdantz, F. Mansmann, D. Oelke, D.A. Keim, Visual sentiment analysis of rss news feeds featuring the us...
  • Q. You, J. Luo, H. Jin, J. Yang, Robust image sentiment analysis using progressively trained and domain transferred...
  • T. Chen, F.X. Yu, J. Chen, Y. Cui, Y.-Y. Chen, S.-F. Chang, Object-based visual sentiment concept analysis and...
  • D. Borth, R. Ji, T. Chen, T. Breuel, S.-F. Chang, Large-scale visual sentiment ontology and detectors using adjective...
  • L. Nie, M. Wang, Z. Zha, G. Li, T.-S. Chua, Multimedia answering: enriching text qa with media information, in:...
  • L. Nie et al.

    Beyond text qa: multimedia answer generation by harvesting web information

    IEEE Trans. Multimedia

    (2013)
  • X. Song, L. Nie, L. Zhang, M. Akbari, T.-S. Chua, Multiple social network learning and its application in volunteerism...
  • S. Corchs et al.

    Ensemble learning on visual and textual data for social image emotion classification

    Int. J. Mach. Learn. Cybernet.

    (2017)
  • H. Zheng, T. Chen, J. Luo, When saliency meets sentiment: Understanding how image content invokes emotion and...
  • T. Rao, M. Xu, D. Xu, Learning multi-level deep representations for image emotion classification, arXiv preprint...
  • L. Nie, S. Yan, M. Wang, R. Hong, T.-S. Chua, Harvesting visual concepts for image search with complex queries, in:...
  • J. Yang, M. Sun, X. Sun, Learning visual sentiment distributions via augmented conditional probability neural network,...
  • E.J. Candes et al.

    Tight oracle inequalities for low-rank matrix recovery from a minimal number of noisy random measurements

    IEEE Trans. Inf. Theory

    (2011)
  • P. Jing et al.

    Predicting image memorability through adaptive transfer learning from external sources

    IEEE Trans. Multimedia

    (2017)
  • T. Zhou, D. Tao, Godec: Randomized low-rank & sparse matrix decomposition in noisy case, in: Proceedings of...
  • L. Nie et al.

    Oracle in image search: a content-based approach to performance prediction

    ACM Trans. Inform. Syst.

    (2012)
  • X. Geng et al.

    Facial age estimation by learning from label distributions

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2013)
  • P. Hou, X. Geng, Z.-W. Huo, and J. Lv, Semi-supervised adaptive label distribution learning for facial age estimation.”...
  • X. Yang, B.-B. Gao, C. Xing, Z.-W. Huo, X.-S. Wei, Y. Zhou, J. Wu, X. Geng, Deep label distribution learning for...
  • B.-B. Gao et al.

    Deep label distribution learning with label ambiguity

    IEEE Trans. Image Process.

    (2017)
  • This article is part of the Special Issue on Multimodal_Cooperation.

    View full text