Abstract
Texts, images and other information are posted everyday on the social network and provides a large amount of multimodal data. The aim of this work is to investigate if combining and integrating both visual and textual data permits to identify emotions elicited by an image. We focus on image emotion classification within eight emotion categories: amusement, awe, contentment, excitement, anger, disgust, fear and sadness. Within this classification task we here propose to adopt ensemble learning approaches based on the Bayesian model averaging method, that combine five state-of-the-art classifiers. The proposed ensemble approaches consider predictions given by several classification models, based on visual and textual data, through respectively a late and an early fusion schemes. Our investigations show that an ensemble method based on a late fusion of unimodal classifiers permits to achieve high classification performance within all of the eight emotion classes. The improvement is higher when deep image representations are adopted as visual features, compared with hand-crafted ones.
Similar content being viewed by others
Notes
We used WEKA (http://www.cs.waikato.ac.nz/ml/weka) to train all the baseline models, while BMA has been developed from scratch.
References
Xiong Y, Wang D, Zhang Y, Feng S, Wang G (2014) Multimodal data fusion in text-image heterogeneous graph for social media recommendation. In: International conference on web-age information management. Springer, pp 96–99
Picard RW (1999) Affective computing for HCI. In: HCI (1), pp 829–833
Scherer KR (2005) What are emotions? And how can they be measured? Soc Sci Inf 44(4):695–729
Ressel JA (1980) A circumplex model of affect. J Personal Soc Psychol 39:1161–78
Ekman P 1992) An argument for basic emotions. Cogn Emot 6(3–4):169–200
Lang PJ, Bradley MM, Cuthbert BN et al (1999) International affective picture system (IAPS): instruction manual and affective ratings. The center for research in psychophysiology. University of Florida, Florida
Machajdik J, Hanbury A (2010) Affective image classification using features inspired by psychology and art theory. In: Proceedings of the 18th ACM international conference on multimedia. ACM, pp 83–92
Dan-Glauser ES, Scherer Klaus R (2011) The geneva affective picture database (gaped): a new 730-picture database focusing on valence and normative significance. Behav Res Methods 43(2):468
Joshi D, Datta R, Fedorovskaya E, Luong Q-T, Wang JZ, Li J, Luo J (2011) Aesthetics and emotions in images. IEEE Signal Process Mag 28(5):94–115
Zhao S, Gao Y, Jiang X, Yao H, Chua TS, Sun X (2014) Exploring principles-of-art features for image emotion recognition. In: Proceedings of the 22nd ACM international conference on multimedia. ACM, pp 47–56
Pan Z, Zhang Y, Kwong S 2015) Efficient motion and disparity estimation optimization for low complexity multiview video coding. IEEE Trans Broadcast 61(2):166–176
Pan Z, Lei J, Zhang Y, Sun X, Kwong S (2016) Fast motion estimation based on content property for low-complexity H. 265/HEVC encoder. IEEE Trans Broadcast 62(3):675–684
Wang J, Li T, Shi YQ, Lian S, Ye J (2016) Forensics feature analysis in quaternion wavelet domain for distinguishing photographic images and computer graphics. Multimedia tools and applications, pp 1–17
Chen M, Zhang L, Allebach JP (2015) Learning deep features for image emotion classification. In: Image processing (ICIP), 2015 IEEE international conference on. IEEE, pp 4491–4495
You Q, Luo J, Jin H, Yang J (2016) Building a large scale dataset for image emotion recognition: the fine print and the benchmark. In: Proceedings of the thirtieth AAAI conference on artificial intelligence, pp 308–314
Rao T, Xu M, Xu D (2016) Learning multi-level deep representations for image emotion classification. arXiv:1611.07145 (preprint)
Zhao S, Yao H, Gao Y, Ji R, Ding G (2016) Continuous probability distribution prediction of image emotions via multi-task shared sparse regression. In: IEEE transactions on multimedia
Pozzi FA, Fersini E, Messina E, Liu B (2016) Sentiment analysis in social networks. Morgan Kaufmann, Burlington
Li X, Xie H, Chen L, Wang J, Deng X 2014) News impact on stock price return via sentiment analysis. Knowl Based Syst 69:14–23
Rao Y, Xie H, Li J, Jin F, Wang FL, Li Q (2016) Social emotion classification of short text via topic-level maximum entropy model. Inf Manag 53(8):978–986
Niu T, Zhu S, Pang L, El Saddik A (2016) Sentiment analysis on multi-view social data. In: International conference on multimedia modeling. Springer, pp 15–27
You Q, Luo J, Jin H, Yang J (2016) Cross-modality consistent regression for joint visual-textual sentiment analysis of social multimedia. In: Proceedings of the ninth ACM international conference on web search and data mining. ACM, pp 13–22
Atrey PK, Kankanhalli MS, Oommen JB (2007) Goal-oriented optimal subset selection of correlated multimedia streams. ACM Trans Multimed Comput Commun Appl 3(1):2
Atrey PK, Hossain MA, El Saddik A, Kankanhalli MS (2010) Multimodal fusion for multimedia analysis: a survey. Multimedia Syst 16(6):345–379
Poria S, Cambria E, Bajpai R, Hussain A 2017) A review of affective computing: from unimodal analysis to multimodal fusion. Inf Fusion 37:98–125
Snoek CGM, Worring M, Smeulders AWM (2005) Early versus late fusion in semantic video analysis. In: Proceedings of the 13th annual ACM international conference on multimedia. ACM, pp 399–402
Mikels JA, Fredrickson BL, Larkin GR, Lindberg CM, Maglio SJ, Reuter-Lorenz PA (2005) Emotional category data on images from the international affective picture system. Behav Res Methods 37(4):626–630
Fersini E, Messina E, Pozzi FA (2014) Sentiment analysis: Bayesian ensemble learning. Decis Support Syst 68:26–38
Dietterich TG (2002) Ensemble learning. In: The handbook of brain theory and neural networks, vol 2, pp 110–125
Tamura H, Mori S, Yamawaki T 1978) Textural features corresponding to visual perception. Syst Man Cybern IEEE Trans 8(6):460–473
Mack ML, Oliva A (2004) Computational estimation of visual complexity. In: The 12th annual object, perception, attention, and memory conference
Ojala T, Pietikäinen M, Harwood D 1996) A comparative study of texture measures with classification based on featured distributions. Pattern Recogn 29(1):51–59
Junior OL, Delgado D, Gonçalves V, Nunes U (2009) Trainable classifier-fusion schemes: an application to pedestrian detection. In: Intelligent transportation systems, 2009. ITSC’09. 12th International IEEE conference on. IEEE, pp 1–6
Ciocca G, Corchs S, Gasparini F 2016) Genetic programming approach to evaluate complexity of texture images. J Electron Imaging 25(6):061408–061408
Comaniciu D, Meer P 2002) Mean shift: a robust approach toward feature space analysis. Pattern Anal Mach Intell IEEE Trans 24(5):603–619
Hasler D, Suesstrunk SE (2003) Measuring colorfulness in natural images. In: Electronic imaging 2003. International Society for Optics and Photonics, pp 87–95
Rosenholtz R, Li Y, Nakano L (2007) Measuring visual clutter. J Vis 7(2):17–17
Corchs SE, Ciocca G, Bricolo E, Gasparini F (2016) Predicting complexity perception of real world images. PLoS One 11(6):e0157986
Marziliano P, Dufaux F, Winkler S, Ebrahimi T (2002) A no-reference perceptual blur metric. In: Image processing. 2002. Proceedings. 2002 international conference on, vol 3. IEEE, pp III–III
Schettini R, Gasparini F, Corchs S, Marini F, Capra A, Castorina A 2010) Contrast image correction method. J Electron Imaging 19(2):023005–023005
Mittal A, Soundararajan R, Bovik AC (2013) Making a completely blind image quality analyzer. IEEE Signal Proces Lett 20(3):209–212
Immerkaer J 1996) Fast noise variance estimation. Comput Vis Image Underst 64(2):300–302
Minhas R, Mohammed AA, Wu QMJ, Sid-Ahmed MA (2009) 3d shape from focus and depth map computation using steerable filters. In: International conference image analysis and recognition. Springer, pp 573–583
Bhattacharya S, Nojavanasghari B, Chen T, Liu D, Chang SF, Shah M (2013) Towards a comprehensive computational model foraesthetic assessment of videos. In: Proceedings of the 21st ACM international conference on multimedia. ACM, pp 361–364
Gasparini F, Corchs S, Schettini R 2008) Recall or precision-oriented strategies for binary classification of skin pixels. J Electron Imaging 17(2):023017–023017
Viola P, Jones M (2001) Rapid object detection using a boosted cascade of simple features. In: Computer vision and pattern recognition, 2001. CVPR 2001. Proceedings of the 2001 IEEE computer society conference on, vol 1. IEEE, pp I–I
Sharif Razavian A, Azizpour H, Sullivan J, Carlsson S (2014) CNN features off-the-shelf: an astounding baseline for recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp 806–813
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp 1097–1105
Vedaldi A,Lenc K (2015) Matconvnet—convolutional neural networks for matlab. In: Proceeding of the ACM international conference on multimedia
Breiman L 1996) Bagging predictors. Mach Learn 24(2):123–140
Freund Y, Schapire RE (1997) A decision-theoretic generalization of on-line learning and an application to boosting. J Comput Syst Sci 55(1):119–139
Ho TK (1998) The random subspace method for constructing decision forests. IEEE Trans Pattern Anal Mach Intell 20(8):832–844
Kittler J, Hatef M, Duin RPW, Matas J (1998) On combining classifiers. IEEE Trans Pattern Anal Mach Intell 20(3):226–239
John GH, Langley P (1995) Estimating continuous distributions in Bayesian classifiers. In: Eleventh conference on uncertainty in artificial intelligence. Morgan Kaufmann, San Mateo, pp 338–345
Cristianini N, Shawe-Taylor J (2000) An introduction to support vector machines and other kernel-based learning methods. Cambridge University Press, Cambridge
Pearl J (1988) Probabilistic reasoning in intelligent systems: networks of plausible inference. Morgan Kaufmann Publishers Inc., Burlington
Quinlan JR (1993) 4.5: programs for machine learning. Morgan Kaufmann Publishers Inc., San Francisco
Aha DW, Kibler D, Albert MK (1991) Instance-based learning algorithms. Mach Learn 6(1):37–66
Hastie T, Tibshirani R (1998) Classification by pairwise coupling. In: Advances in neural information processing systems, pp 507–513
Cooper GF, Herskovits E (1992) A Bayesian method for the induction of probabilistic networks from data. Mach Learn 9(4):309–347
Rennie JD, Shih L, Teevan J, Karger DR (2003) Tackling the poor assumptions of Naive Bayes text classifiers. In: Proceedings of the 20th international conference on machine learning (ICML-03), pp 616–623
Akbani R, Kwek S, Japkowicz N (2004) Applying support vector machines to imbalanced datasets. Mach Learn ECML 2004:39–50
Yang J, McAuley J, Leskovec J (2013) Community detection in networks with node attributes. In: Data mining (ICDM), 2013 IEEE 13th international conference on. IEEE, pp 1151–1156
Gu B, Sheng VS (2017) A robust regularization path algorithm for \(\nu\)-support vector classification. IEEE Trans Neural Netw Learn Syst 28(5):1241–1248
Xie H, Zou D, Lau RYK, Wang FL, Wong TL (2016) Generating incidental word-learning tasks via topic-based and load-based profiles. IEEE Multimedia 23(1):60–70
Acknowledgements
We gratefully acknowledge the support of NVIDIA Corporation with the donation of the Tesla K40 GPU used for this research.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Corchs, S., Fersini, E. & Gasparini, F. Ensemble learning on visual and textual data for social image emotion classification. Int. J. Mach. Learn. & Cyber. 10, 2057–2070 (2019). https://doi.org/10.1007/s13042-017-0734-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13042-017-0734-0