Low-rank regularized multi-view inverse-covariance estimation for visual sentiment distribution prediction☆
Introduction
With the rapid development of social media, Facebook1 and YouTube2, as the two representative social media platforms, which have become one type of powerful tool to spread ideas and affect people’s attitude. In recent years, with the popularization of mobile devices with camera functions, the social media platforms focusing on some photo-sharing services, have become an important way for people to express themselves. Therefore, the user generated contents (UGCs) are diverse involving texts, images, and video[1], [2], [3], [4]. Among them, image is one of the most representative information sources. For example, a study on Twitter data indicates that the images accounted for 90 of the total data [5]. Image sentiment analysis has a great practical significance since the emotional tendency of the publishers and reviewers can be reflected by images. Under this scenario, visual sentiment sentiment analysis has spanned widespread applications ranging from politics, education, entertainment, and advertisement [6], [7], [8], [9], [10], [11], [12], [13].
In recent years, a huge number of studies have been proposed for visual sentiment analysis [14], [15], [16], [17], [18], [19]. For example, Corchs et al. [14] proposed an ensemble learning approach for social image emotion classification tasks by combining five state-of-the-art classifiers. Rao et al. [17] proposed a deep multi-level patch learning network framework based on different deep representations, which effectively deals with the noisy labeled dataset. Poria et al. [18] proposed to build a multi-model framework, which fuses audio, visual, and textual clues for sentiment analysis. Although a lot of visual emotion analysis work has been completed, the image-based emotion analysis lags behind the text-based emotion analysis, the main reasons are as follows: (1) Semantic gap. Although images can be represented by various types of features, there is an inevitable problem known as semantic gap, which characterizes the differences between the high-level sentiment semantics of an image and the extracted low- and mid-level visual feature representations. (2) Label ambiguity. The emotions aroused by different people from the same image may not be consistent. Furthermore, the same people may have different emotion marks at different time points. Fig. 1 shows the sentiment label distribution of a sample image from TwitterLDL [20], which is annotated by 8 volunteers. Ultimately, this image evoked 5 sentiments altogether. Therefore, a reasonable assumption is that image sentiment is a mixture of multiple sentiments rather than a single representative sentiment category. Moreover, to better fit many real applications, it is reasonable to assign different importance of each label to an image. To tackle challenges presented above, label distribution learning (LDL) is proposed to learn a set of probability distribution values that represents the intensity of each label. LDL provides a more general learning framework, in which both the single-label and the multi-label learning can be considered as its special cases. In the case of image sentiment prediction, LDL provides overall label distributions and each value represents the degree to which each label describes the image. Therefore, LDL can be considered as an effective manner to address the problem of subjective evaluation in image sentiment analysis to some extent.
To better tackle these challenges, in this paper, we present a novel low-rank regularized multi-view inverse-covariance estimation algorithm for visual sentiment distribution prediction. Our proposed scheme unifies feature representation and distribution prediction into a multi-view learning framework such that the lowest-rank representation not only captures the intrinsic structure embedded in data but also indicates the distribution prediction requirements. The core idea of our proposed scheme includes two regularization terms, one is the low-rank regularization for enhancing the feature representation of samples, and the other is the structured sparsity regularization term for learning more matching distribution prediction model. Inspired by the great success of the multi-view embedding techniques in solving the semantic gap problem [21], [22], [23], [24], [25], [26], we impose low-rank regularization on multi-view embedding to seek the lowest-rank common representation among views. As to the label ambiguity problem, we model visual sentiment prediction as a multi-output regression learning problem. To obtain a more reasonable distribution learning model, we introduce inverse-covariance mechanism by taking structured sparsity of regression coefficients into account. Furthermore, we show that this approach is to learn a structured sparse conditional Gaussian model [27] for multi-outputs prediction. Because the formulated objective function is hard to solve, we design an alternative heuristic optimization algorithm to solve our proposed model. The main contributions of this paper are summarized as follows:
- •
In comparison with previous studies that generally treat sentiment analysis as a single/multiple label learning problem, we propose to model image sentiment prediction as a label distribution learning problem. It can be considered as an effective attempt to address the challenge of subjective evaluation to some extent.
- •
We propose multi-view inverse-covariance estimation for visual sentiment distribution prediction, in which multi-view low-rank learning and inverse-covariance estimation are jointly integrated for learning the more intrinsic feature representation and more robust prediction model, respectively.
- •
We develop an alternating heuristic optimization algorithm to solve our proposed model. Experiment results on three publicly available datasets demonstrate that our proposed algorithm reaches convergence within a small number of iterations and obtains promising results in comparison with state-of-the-art algorithms.
The remainder of this paper is organized as follows: Section 2 reviews related work. Section 3 describes the proposed method. Experimental results and discussions are reported in Section 4, followed by conclusions in Section 5.
Section snippets
Related work
In this section, we discuss some of the related work to our proposed method. In Section 2.1, we first described the application of label distribution learning. We then provide a brief review of the standard formulation of multi-view learning in Section 2.2. In Section 2.3, we give a detailed explanation of the inverse covariance estimation.
Low-rank regularized multi-view inverse-covariance estimation for visual sentiment distribution prediction
In this section, we first illustrate the proposed algorithm in the details and then show the proposed alternating direction algorithm applying to optimize our proposed algorithm.
Experiment results and discussions
To evaluate the effectiveness of our proposed approach, we first presented the details of three datasets and experimental settings in Section 4.1. We then described three types of evaluation metrics in Section 4.2. Finally, the experimental results and discussions are illustrated in Section 4.3.
Conclusion
In this paper, we propose a low-rank regularized multi-view inverse-covariance estimation for visual sentiment distribution prediction. Our proposed method jointly employs multi-view learning and inverse-covariance estimation to guarantee the superior performance. In this framework, a low-rank multi-view regularization constraint is exploited to uncover the potential intrinsic low-rank feature representation. Inverse-covariance estimation of multiple-output regression that can estimate both the
Conflict of interest
The authors declared that there is no conflict of interest.
References (65)
- et al.
On effective location-aware music recommendation
ACM Trans. Inform. Syst.
(2016) - et al.
Exploring auxiliary context: discrete semantic transfer hashing for scalable image retrieval
IEEE Trans. Neural Networ. Learn. Syst.
(2018) - et al.
Heterogeneous knowledge transfer in video emotion recognition, attribution and summarization
IEEE Trans. Affective Comput. PP
(2016) - et al.
visual and textual clues for sentiment analysis from multimodal content
Neurocomputing
(2016) - et al.
Rank canonical correlation analysis and its application in visual search reranking
Signal Process.
(2013) - et al.
Improved sparse low-rank matrix estimation
Signal Process.
(2017) - et al.
Sparse bayesian dictionary learning with a gaussian hierarchical model
Signal Process.
(2017) - et al.
Transfer independently together: a generalized framework for domain adaptation
IEEE Trans. Cybern.
(2018) - Z. Cheng, Y. Ding, L. Zhu, M. Kankanhalli, Aspect-Aware Latent Factor Model: Rating Prediction with Ratings and...
- Z. Cheng, J. Shen, L. Nie, T.-S. Chua, M. Kankanhalli, Exploring user-specific information in music retrieval, in:...
Beyond text qa: multimedia answer generation by harvesting web information
IEEE Trans. Multimedia
Ensemble learning on visual and textual data for social image emotion classification
Int. J. Mach. Learn. Cybernet.
Tight oracle inequalities for low-rank matrix recovery from a minimal number of noisy random measurements
IEEE Trans. Inf. Theory
Predicting image memorability through adaptive transfer learning from external sources
IEEE Trans. Multimedia
Oracle in image search: a content-based approach to performance prediction
ACM Trans. Inform. Syst.
Facial age estimation by learning from label distributions
IEEE Trans. Pattern Anal. Mach. Intell.
Deep label distribution learning with label ambiguity
IEEE Trans. Image Process.
Cited by (5)
Visual Sentiment Classification via Low-Rank Regularization and Label Relaxation
2022, IEEE Transactions on Cognitive and Developmental SystemsAn End-to-End Perceptual Quality Assessment Method via Score Distribution Prediction
2020, Neural Processing LettersMedian based multi-label prediction by inflating emotions with dyads for visual sentiment analysis
2019, 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2019
- ☆
This article is part of the Special Issue on Multimodal_Cooperation.