research-article

Cross-modality Consistent Regression for Joint Visual-Textual Sentiment Analysis of Social Multimedia

Authors:

Jianchao YangAuthors Info & Claims

WSDM '16: Proceedings of the Ninth ACM International Conference on Web Search and Data Mining

Pages 13 - 22

https://doi.org/10.1145/2835776.2835779

Published: 08 February 2016 Publication History

Abstract

Sentiment analysis of online user generated content is important for many social media analytics tasks. Researchers have largely relied on textual sentiment analysis to develop systems to predict political elections, measure economic indicators, and so on. Recently, social media users are increasingly using additional images and videos to express their opinions and share their experiences. Sentiment analysis of such large-scale textual and visual content can help better extract user sentiments toward events or topics. Motivated by the needs to leverage large-scale social multimedia content for sentiment analysis, we propose a cross-modality consistent regression (CCR) model, which is able to utilize both the state-of-the-art visual and textual sentiment analysis techniques. We first fine-tune a convolutional neural network (CNN) for image sentiment analysis and train a paragraph vector model for textual sentiment analysis. On top of them, we train our multi-modality regression model. We use sentimental queries to obtain half a million training samples from Getty Images. We have conducted extensive experiments on both machine weakly labeled and manually labeled image tweets. The results show that the proposed model can achieve better performance than the state-of-the-art textual and visual sentiment analysis algorithms alone.

References

[1]

S. Asur and B. A. Huberman. Predicting the future with social media. In WI-IAT, volume 1, pages 492--499. IEEE, 2010.

Digital Library

[2]

J. Bollen, H. Mao, and A. Pepe. Modeling public mood and emotion: Twitter sentiment and socio-economic phenomena. In ICWSM, 2011.

[3]

J. Bollen, H. Mao, and X. Zeng. Twitter mood predicts the stock market. Journal of Computational Science, 2(1):1--8, 2011.

[4]

D. Borth, T. Chen, R. Ji, and S.-F. Chang. Sentibank: large-scale ontology and classifiers for detecting sentiment and emotions in visual content. In ACM MM, pages 459--460. ACM, 2013.

Digital Library

[5]

D. Borth, R. Ji, T. Chen, T. Breuel, and S.-F. Chang. Large-scale visual sentiment ontology and detectors using adjective noun pairs. In ACM MM, pages 223--232. ACM, 2013.

Digital Library

[6]

D. Cao, R. Ji, D. Lin, and S. Li. A cross-media public sentiment analysis system for microblog. Multimedia Systems, pages 1--8, 2014.

[7]

D. C. Cireşan, U. Meier, J. Masci, L. M. Gambardella, and J. Schmidhuber. Flexible, high performance convolutional neural networks for image classification. In IJCAI, pages 1237--1242, 2011.

Digital Library

[8]

D. Davidov, O. Tsur, and A. Rappoport. Enhanced sentiment learning using twitter hashtags and smileys. In ICL, pages 241--249, 2010.

Digital Library

[9]

L. Duan, D. Xu, I.-H. Tsang, and J. Luo. Visual event recognition in videos by learning from web data. IEEE PAMI, 34(9):1667--1680, 2012.

Digital Library

[10]

F. Feng, X. Wang, and R. Li. Cross-modal retrieval with correspondence autoencoder. In ACM MM, pages 7--16. ACM, 2014.

Digital Library

[11]

A. Frome, G. S. Corrado, J. Shlens, S. Bengio, J. Dean, T. Mikolov, et al. Devise: A deep visual-semantic embedding model. In NIPS, pages 2121--2129, 2013.

Digital Library

[12]

Y. Gong, L. Wang, M. Hodosh, J. Hockenmaier, and S. Lazebnik. Improving image-sentence embeddings using large weakly annotated photo collections. In ECCV, pages 529--545. Springer, 2014.

[13]

X. Hu, J. Tang, H. Gao, and H. Liu. Unsupervised sentiment analysis with emotional signals. In WWW, pages 607--618, 2013.

Digital Library

[14]

C. Hutto and E. Gilbert. Vader: A parsimonious rule-based model for sentiment analysis of social media text. In ICWSM, 2014.

[15]

Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama, and T. Darrell. Caffe: Convolutional architecture for fast feature embedding. arXiv preprint arXiv:1408.5093, 2014.

[16]

D. Joshi, R. Datta, E. Fedorovskaya, Q.-T. Luong, J. Z. Wang, J. Li, and J. Luo. Aesthetics and emotions in images. IEEE Signal Processing Magazine, 28(5):94--115, 2011.

[17]

A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. In NIPS, pages 1097--1105, 2012.

Digital Library

[18]

Q. Le and T. Mikolov. Distributed representations of sentences and documents. In ICML, 2014.

Digital Library

[19]

Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278--2324, 1998.

[20]

T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean. Distributed representations of words and phrases and their compositionality. In NIPS, pages 3111--3119, 2013.

Digital Library

[21]

L.-P. Morency, R. Mihalcea, and P. Doshi. Towards multimodal sentiment analysis: Harvesting opinions from the web. In ICMI, pages 169--176, 2011.

Digital Library

[22]

J. Ngiam, A. Khosla, M. Kim, J. Nam, H. Lee, and A. Y. Ng. Multimodal deep learning. In ICML, pages 689--696, 2011.

Digital Library

[23]

B. O'Connor, R. Balasubramanyan, B. R. Routledge, and N. A. Smith. From tweets to polls: Linking text sentiment to public opinion time series. ICWSM, 11:122--129, 2010.

[24]

N. Rasiwasia, J. Costa Pereira, E. Coviello, G. Doyle, G. R. Lanckriet, R. Levy, and N. Vasconcelos. A new approach to cross-modal multimedia retrieval. In ACM MM, pages 251--260. ACM, 2010.

Digital Library

[25]

S. Siersdorfer, E. Minack, F. Deng, and J. Hare. Analyzing and predicting sentiment of images on the social web. In ACM MM, pages 715--718. ACM, 2010.

Digital Library

[26]

R. Socher, M. Ganjoo, C. D. Manning, and A. Ng. Zero-shot learning through cross-modal transfer. In NIPS, pages 935--943, 2013.

Digital Library

[27]

R. Socher, Q. Le, C. Manning, and A. Ng. Grounded compositional semantics for finding and describing images with sentences. In NIPS Workshop, 2013.

[28]

N. Srivastava and R. Salakhutdinov. Multimodal learning with deep boltzmann machines. In NIPS, pages 2222--2230, 2012.

Digital Library

[29]

A. Tumasjan, T. O. Sprenger, P. G. Sandner, and I. M. Welpe. Predicting elections with twitter: What 140 characters reveal about political sentiment. ICWSM, 10:178--185, 2010.

[30]

M. Wang, D. Cao, L. Li, S. Li, and R. Ji. Microblog sentiment analysis based on cross-media bag-of-words model. In ICIMCS, pages 76:76--76:80. ACM, 2014.

Digital Library

[31]

S. Wang and C. D. Manning. Baselines and bigrams: Simple, good sentiment and topic classification. In ACL, pages 90--94, 2012.

Digital Library

[32]

Q. You, J. Luo, H. Jin, and J. Yang. Robust image sentiment analysis using progressively trained and domain transferred deep networks. In The Twenty-Ninth AAAI Conference on Artificial Intelligence (AAAI), 2015.

Digital Library

[33]

Q. You, J. Luo, H. Jin, and J. Yang. Building a large scale dataset for image emotion recognition: The fine print and the benchmark. In The Thirtieth AAAI Conference on Artificial Intelligence (AAAI), 2016.

Digital Library

[34]

J. Yuan, S. Mcdonough, Q. You, and J. Luo. Sentribute: image sentiment analysis from a mid-level perspective. In WISDOM, page 10, 2013.

Digital Library

[35]

X. Zhang, H. Fuehres, and P. A. Gloor. Predicting stock market indicators through twitter "i hope it is not as bad as i fear". Procedia-Social and Behavioral Sciences, 26:55--62, 2011.

Cited By

Peng JHe YChang YLu YZhang POu ZYu Q(2025)A Social Media Dataset and H-GNN-Based Contrastive Learning Scheme for Multimodal Sentiment AnalysisApplied Sciences10.3390/app1502063615:2(636)Online publication date: 10-Jan-2025
https://doi.org/10.3390/app15020636
Liu YZhang BWang CYan GZhou KLi ZZhang L(2025)Vision-language representation learning with breadth and depth attention pre-trainingKnowledge-Based Systems10.1016/j.knosys.2024.112941(112941)Online publication date: Jan-2025
https://doi.org/10.1016/j.knosys.2024.112941
Cladière TAlata ODucottet CKonik HLegrand A(2025)Visual emotion analysis using skill-based multi-teacher knowledge distillationPattern Analysis and Applications10.1007/s10044-025-01426-928:2Online publication date: 21-Feb-2025
https://doi.org/10.1007/s10044-025-01426-9
Show More Cited By

Index Terms

Cross-modality Consistent Regression for Joint Visual-Textual Sentiment Analysis of Social Multimedia
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision tasks
        Scene understanding

Recommendations

Robust Visual-Textual Sentiment Analysis: When Attention meets Tree-structured Recursive Neural Networks
MM '16: Proceedings of the 24th ACM international conference on Multimedia

Sentiment analysis is crucial for extracting social signals from social media content. Due to huge variation in social media, the performance of sentiment classifiers using single modality (visual or textual) still lags behind satisfaction. In this ...
Joint Visual-Textual Sentiment Analysis with Deep Neural Networks
MM '15: Proceedings of the 23rd ACM international conference on Multimedia

Sentiment analysis of online user generated content is important for many social media analytics tasks. Researchers have largely relied on textual sentiment analysis to develop systems to predict political elections, measure economic indicators, and so ...
Joint sentiment/topic model for sentiment analysis
CIKM '09: Proceedings of the 18th ACM conference on Information and knowledge management

Sentiment analysis or opinion mining aims to use automated tools to detect subjective information such as opinions, attitudes, and feelings expressed in text. This paper proposes a novel probabilistic modeling framework based on Latent Dirichlet ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

WSDM '16: Proceedings of the Ninth ACM International Conference on Web Search and Data Mining

February 2016

746 pages

ISBN:9781450337168

DOI:10.1145/2835776

General Chairs:
Paul N. Bennett
Microsoft Research
,
Vanja Josifovski
Pinterest
,
Program Chairs:
Jennifer Neville
Purdue University
,
Filip Radlinski
Microsoft

Copyright © 2016 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 08 February 2016

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

WSDM 2016

Sponsor:

WSDM 2016: Ninth ACM International Conference on Web Search and Data Mining

February 22 - 25, 2016

California, San Francisco, USA

Acceptance Rates

WSDM '16 Paper Acceptance Rate 67 of 368 submissions, 18%;

Overall Acceptance Rate 498 of 2,863 submissions, 17%

Upcoming Conference

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

139
Total Citations
View Citations
1,901
Total Downloads

Downloads (Last 12 months)65
Downloads (Last 6 weeks)15

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Peng JHe YChang YLu YZhang POu ZYu Q(2025)A Social Media Dataset and H-GNN-Based Contrastive Learning Scheme for Multimodal Sentiment AnalysisApplied Sciences10.3390/app1502063615:2(636)Online publication date: 10-Jan-2025
https://doi.org/10.3390/app15020636
Liu YZhang BWang CYan GZhou KLi ZZhang L(2025)Vision-language representation learning with breadth and depth attention pre-trainingKnowledge-Based Systems10.1016/j.knosys.2024.112941(112941)Online publication date: Jan-2025
https://doi.org/10.1016/j.knosys.2024.112941
Cladière TAlata ODucottet CKonik HLegrand A(2025)Visual emotion analysis using skill-based multi-teacher knowledge distillationPattern Analysis and Applications10.1007/s10044-025-01426-928:2Online publication date: 21-Feb-2025
https://doi.org/10.1007/s10044-025-01426-9
Huo YLiu MZheng JHe L(2024)MGAFN-ISA: Multi-Granularity Attention Fusion Network for Implicit Sentiment AnalysisElectronics10.3390/electronics1324490513:24(4905)Online publication date: 12-Dec-2024
https://doi.org/10.3390/electronics13244905
Zhang JZhang ZWen J(2024)A multifactor model using large language models and investor sentiment from photos and news: new evidence from ChinaSSRN Electronic Journal10.2139/ssrn.4708979Online publication date: 2024
https://doi.org/10.2139/ssrn.4708979
Wang DTian CLiang XZhao LHe LWang Q(2024)Dual-Perspective Fusion Network for Aspect-Based Multimodal Sentiment AnalysisIEEE Transactions on Multimedia10.1109/TMM.2023.332143526(4028-4038)Online publication date: 2024
https://doi.org/10.1109/TMM.2023.3321435
Luo YZhong XZeng MXie JWang SLiu G(2024)CGLF-Net: Image Emotion Recognition Network by Combining Global Self-Attention Features and Local Multiscale FeaturesIEEE Transactions on Multimedia10.1109/TMM.2023.328976226(1894-1908)Online publication date: 2024
https://doi.org/10.1109/TMM.2023.3289762
Ruan SZhang KWu LXu TLiu QChen E(2024)Color Enhanced Cross Correlation Net for Image Sentiment AnalysisIEEE Transactions on Multimedia10.1109/TMM.2021.311820826(4097-4109)Online publication date: 2024
https://doi.org/10.1109/TMM.2021.3118208
Aggrawal AVarshney D(2024)Multimodal Sentiment Analysis: Perceived vs Induced Sentiments2024 Silicon Valley Cybersecurity Conference (SVCC)10.1109/SVCC61185.2024.10637377(1-7)Online publication date: 17-Jun-2024
https://doi.org/10.1109/SVCC61185.2024.10637377
Chen JAn JLyu HKanan CLuo J(2024)Holistic Visual-Textual Sentiment Analysis with Prior Models2024 IEEE 7th International Conference on Multimedia Information Processing and Retrieval (MIPR)10.1109/MIPR62202.2024.00037(196-202)Online publication date: 7-Aug-2024
https://doi.org/10.1109/MIPR62202.2024.00037
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten