skip to main content
10.1145/2835776.2835779acmconferencesArticle/Chapter ViewAbstractPublication PageswsdmConference Proceedingsconference-collections
research-article

Cross-modality Consistent Regression for Joint Visual-Textual Sentiment Analysis of Social Multimedia

Published: 08 February 2016 Publication History

Abstract

Sentiment analysis of online user generated content is important for many social media analytics tasks. Researchers have largely relied on textual sentiment analysis to develop systems to predict political elections, measure economic indicators, and so on. Recently, social media users are increasingly using additional images and videos to express their opinions and share their experiences. Sentiment analysis of such large-scale textual and visual content can help better extract user sentiments toward events or topics. Motivated by the needs to leverage large-scale social multimedia content for sentiment analysis, we propose a cross-modality consistent regression (CCR) model, which is able to utilize both the state-of-the-art visual and textual sentiment analysis techniques. We first fine-tune a convolutional neural network (CNN) for image sentiment analysis and train a paragraph vector model for textual sentiment analysis. On top of them, we train our multi-modality regression model. We use sentimental queries to obtain half a million training samples from Getty Images. We have conducted extensive experiments on both machine weakly labeled and manually labeled image tweets. The results show that the proposed model can achieve better performance than the state-of-the-art textual and visual sentiment analysis algorithms alone.

References

[1]
S. Asur and B. A. Huberman. Predicting the future with social media. In WI-IAT, volume 1, pages 492--499. IEEE, 2010.
[2]
J. Bollen, H. Mao, and A. Pepe. Modeling public mood and emotion: Twitter sentiment and socio-economic phenomena. In ICWSM, 2011.
[3]
J. Bollen, H. Mao, and X. Zeng. Twitter mood predicts the stock market. Journal of Computational Science, 2(1):1--8, 2011.
[4]
D. Borth, T. Chen, R. Ji, and S.-F. Chang. Sentibank: large-scale ontology and classifiers for detecting sentiment and emotions in visual content. In ACM MM, pages 459--460. ACM, 2013.
[5]
D. Borth, R. Ji, T. Chen, T. Breuel, and S.-F. Chang. Large-scale visual sentiment ontology and detectors using adjective noun pairs. In ACM MM, pages 223--232. ACM, 2013.
[6]
D. Cao, R. Ji, D. Lin, and S. Li. A cross-media public sentiment analysis system for microblog. Multimedia Systems, pages 1--8, 2014.
[7]
D. C. Cireşan, U. Meier, J. Masci, L. M. Gambardella, and J. Schmidhuber. Flexible, high performance convolutional neural networks for image classification. In IJCAI, pages 1237--1242, 2011.
[8]
D. Davidov, O. Tsur, and A. Rappoport. Enhanced sentiment learning using twitter hashtags and smileys. In ICL, pages 241--249, 2010.
[9]
L. Duan, D. Xu, I.-H. Tsang, and J. Luo. Visual event recognition in videos by learning from web data. IEEE PAMI, 34(9):1667--1680, 2012.
[10]
F. Feng, X. Wang, and R. Li. Cross-modal retrieval with correspondence autoencoder. In ACM MM, pages 7--16. ACM, 2014.
[11]
A. Frome, G. S. Corrado, J. Shlens, S. Bengio, J. Dean, T. Mikolov, et al. Devise: A deep visual-semantic embedding model. In NIPS, pages 2121--2129, 2013.
[12]
Y. Gong, L. Wang, M. Hodosh, J. Hockenmaier, and S. Lazebnik. Improving image-sentence embeddings using large weakly annotated photo collections. In ECCV, pages 529--545. Springer, 2014.
[13]
X. Hu, J. Tang, H. Gao, and H. Liu. Unsupervised sentiment analysis with emotional signals. In WWW, pages 607--618, 2013.
[14]
C. Hutto and E. Gilbert. Vader: A parsimonious rule-based model for sentiment analysis of social media text. In ICWSM, 2014.
[15]
Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama, and T. Darrell. Caffe: Convolutional architecture for fast feature embedding. arXiv preprint arXiv:1408.5093, 2014.
[16]
D. Joshi, R. Datta, E. Fedorovskaya, Q.-T. Luong, J. Z. Wang, J. Li, and J. Luo. Aesthetics and emotions in images. IEEE Signal Processing Magazine, 28(5):94--115, 2011.
[17]
A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. In NIPS, pages 1097--1105, 2012.
[18]
Q. Le and T. Mikolov. Distributed representations of sentences and documents. In ICML, 2014.
[19]
Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278--2324, 1998.
[20]
T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean. Distributed representations of words and phrases and their compositionality. In NIPS, pages 3111--3119, 2013.
[21]
L.-P. Morency, R. Mihalcea, and P. Doshi. Towards multimodal sentiment analysis: Harvesting opinions from the web. In ICMI, pages 169--176, 2011.
[22]
J. Ngiam, A. Khosla, M. Kim, J. Nam, H. Lee, and A. Y. Ng. Multimodal deep learning. In ICML, pages 689--696, 2011.
[23]
B. O'Connor, R. Balasubramanyan, B. R. Routledge, and N. A. Smith. From tweets to polls: Linking text sentiment to public opinion time series. ICWSM, 11:122--129, 2010.
[24]
N. Rasiwasia, J. Costa Pereira, E. Coviello, G. Doyle, G. R. Lanckriet, R. Levy, and N. Vasconcelos. A new approach to cross-modal multimedia retrieval. In ACM MM, pages 251--260. ACM, 2010.
[25]
S. Siersdorfer, E. Minack, F. Deng, and J. Hare. Analyzing and predicting sentiment of images on the social web. In ACM MM, pages 715--718. ACM, 2010.
[26]
R. Socher, M. Ganjoo, C. D. Manning, and A. Ng. Zero-shot learning through cross-modal transfer. In NIPS, pages 935--943, 2013.
[27]
R. Socher, Q. Le, C. Manning, and A. Ng. Grounded compositional semantics for finding and describing images with sentences. In NIPS Workshop, 2013.
[28]
N. Srivastava and R. Salakhutdinov. Multimodal learning with deep boltzmann machines. In NIPS, pages 2222--2230, 2012.
[29]
A. Tumasjan, T. O. Sprenger, P. G. Sandner, and I. M. Welpe. Predicting elections with twitter: What 140 characters reveal about political sentiment. ICWSM, 10:178--185, 2010.
[30]
M. Wang, D. Cao, L. Li, S. Li, and R. Ji. Microblog sentiment analysis based on cross-media bag-of-words model. In ICIMCS, pages 76:76--76:80. ACM, 2014.
[31]
S. Wang and C. D. Manning. Baselines and bigrams: Simple, good sentiment and topic classification. In ACL, pages 90--94, 2012.
[32]
Q. You, J. Luo, H. Jin, and J. Yang. Robust image sentiment analysis using progressively trained and domain transferred deep networks. In The Twenty-Ninth AAAI Conference on Artificial Intelligence (AAAI), 2015.
[33]
Q. You, J. Luo, H. Jin, and J. Yang. Building a large scale dataset for image emotion recognition: The fine print and the benchmark. In The Thirtieth AAAI Conference on Artificial Intelligence (AAAI), 2016.
[34]
J. Yuan, S. Mcdonough, Q. You, and J. Luo. Sentribute: image sentiment analysis from a mid-level perspective. In WISDOM, page 10, 2013.
[35]
X. Zhang, H. Fuehres, and P. A. Gloor. Predicting stock market indicators through twitter "i hope it is not as bad as i fear". Procedia-Social and Behavioral Sciences, 26:55--62, 2011.

Cited By

View all
  • (2025)A Social Media Dataset and H-GNN-Based Contrastive Learning Scheme for Multimodal Sentiment AnalysisApplied Sciences10.3390/app1502063615:2(636)Online publication date: 10-Jan-2025
  • (2025)Vision-language representation learning with breadth and depth attention pre-trainingKnowledge-Based Systems10.1016/j.knosys.2024.112941(112941)Online publication date: Jan-2025
  • (2025)Visual emotion analysis using skill-based multi-teacher knowledge distillationPattern Analysis and Applications10.1007/s10044-025-01426-928:2Online publication date: 21-Feb-2025
  • Show More Cited By

Index Terms

  1. Cross-modality Consistent Regression for Joint Visual-Textual Sentiment Analysis of Social Multimedia

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    WSDM '16: Proceedings of the Ninth ACM International Conference on Web Search and Data Mining
    February 2016
    746 pages
    ISBN:9781450337168
    DOI:10.1145/2835776
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 08 February 2016

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. cross-modality regression
    2. multimodality analysis
    3. sentiment analysis

    Qualifiers

    • Research-article

    Conference

    WSDM 2016
    WSDM 2016: Ninth ACM International Conference on Web Search and Data Mining
    February 22 - 25, 2016
    California, San Francisco, USA

    Acceptance Rates

    WSDM '16 Paper Acceptance Rate 67 of 368 submissions, 18%;
    Overall Acceptance Rate 498 of 2,863 submissions, 17%

    Upcoming Conference

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)65
    • Downloads (Last 6 weeks)15
    Reflects downloads up to 05 Mar 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2025)A Social Media Dataset and H-GNN-Based Contrastive Learning Scheme for Multimodal Sentiment AnalysisApplied Sciences10.3390/app1502063615:2(636)Online publication date: 10-Jan-2025
    • (2025)Vision-language representation learning with breadth and depth attention pre-trainingKnowledge-Based Systems10.1016/j.knosys.2024.112941(112941)Online publication date: Jan-2025
    • (2025)Visual emotion analysis using skill-based multi-teacher knowledge distillationPattern Analysis and Applications10.1007/s10044-025-01426-928:2Online publication date: 21-Feb-2025
    • (2024)MGAFN-ISA: Multi-Granularity Attention Fusion Network for Implicit Sentiment AnalysisElectronics10.3390/electronics1324490513:24(4905)Online publication date: 12-Dec-2024
    • (2024)A multifactor model using large language models and investor sentiment from photos and news: new evidence from ChinaSSRN Electronic Journal10.2139/ssrn.4708979Online publication date: 2024
    • (2024)Dual-Perspective Fusion Network for Aspect-Based Multimodal Sentiment AnalysisIEEE Transactions on Multimedia10.1109/TMM.2023.332143526(4028-4038)Online publication date: 2024
    • (2024)CGLF-Net: Image Emotion Recognition Network by Combining Global Self-Attention Features and Local Multiscale FeaturesIEEE Transactions on Multimedia10.1109/TMM.2023.328976226(1894-1908)Online publication date: 2024
    • (2024)Color Enhanced Cross Correlation Net for Image Sentiment AnalysisIEEE Transactions on Multimedia10.1109/TMM.2021.311820826(4097-4109)Online publication date: 2024
    • (2024)Multimodal Sentiment Analysis: Perceived vs Induced Sentiments2024 Silicon Valley Cybersecurity Conference (SVCC)10.1109/SVCC61185.2024.10637377(1-7)Online publication date: 17-Jun-2024
    • (2024)Holistic Visual-Textual Sentiment Analysis with Prior Models2024 IEEE 7th International Conference on Multimedia Information Processing and Retrieval (MIPR)10.1109/MIPR62202.2024.00037(196-202)Online publication date: 7-Aug-2024
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media