Skip to main content
Log in

Ensemble learning on visual and textual data for social image emotion classification

  • Original Article
  • Published:
International Journal of Machine Learning and Cybernetics Aims and scope Submit manuscript

Abstract

Texts, images and other information are posted everyday on the social network and provides a large amount of multimodal data. The aim of this work is to investigate if combining and integrating both visual and textual data permits to identify emotions elicited by an image. We focus on image emotion classification within eight emotion categories: amusement, awe, contentment, excitement, anger, disgust, fear and sadness. Within this classification task we here propose to adopt ensemble learning approaches based on the Bayesian model averaging method, that combine five state-of-the-art classifiers. The proposed ensemble approaches consider predictions given by several classification models, based on visual and textual data, through respectively a late and an early fusion schemes. Our investigations show that an ensemble method based on a late fusion of unimodal classifiers permits to achieve high classification performance within all of the eight emotion classes. The improvement is higher when deep image representations are adopted as visual features, compared with hand-crafted ones.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

Notes

  1. We used WEKA (http://www.cs.waikato.ac.nz/ml/weka) to train all the baseline models, while BMA has been developed from scratch.

References

  1. Xiong Y, Wang D, Zhang Y, Feng S, Wang G (2014) Multimodal data fusion in text-image heterogeneous graph for social media recommendation. In: International conference on web-age information management. Springer, pp 96–99

  2. Picard RW (1999) Affective computing for HCI. In: HCI (1), pp 829–833

  3. Scherer KR (2005) What are emotions? And how can they be measured? Soc Sci Inf 44(4):695–729

    Article  Google Scholar 

  4. Ressel JA (1980) A circumplex model of affect. J Personal Soc Psychol 39:1161–78

    Article  Google Scholar 

  5. Ekman P 1992) An argument for basic emotions. Cogn Emot 6(3–4):169–200

    Article  Google Scholar 

  6. Lang PJ, Bradley MM, Cuthbert BN et al (1999) International affective picture system (IAPS): instruction manual and affective ratings. The center for research in psychophysiology. University of Florida, Florida

    Google Scholar 

  7. Machajdik J, Hanbury A (2010) Affective image classification using features inspired by psychology and art theory. In: Proceedings of the 18th ACM international conference on multimedia. ACM, pp 83–92

  8. Dan-Glauser ES, Scherer Klaus R (2011) The geneva affective picture database (gaped): a new 730-picture database focusing on valence and normative significance. Behav Res Methods 43(2):468

    Article  Google Scholar 

  9. Joshi D, Datta R, Fedorovskaya E, Luong Q-T, Wang JZ, Li J, Luo J (2011) Aesthetics and emotions in images. IEEE Signal Process Mag 28(5):94–115

    Article  Google Scholar 

  10. Zhao S, Gao Y, Jiang X, Yao H, Chua TS, Sun X (2014) Exploring principles-of-art features for image emotion recognition. In: Proceedings of the 22nd ACM international conference on multimedia. ACM, pp 47–56

  11. Pan Z, Zhang Y, Kwong S 2015) Efficient motion and disparity estimation optimization for low complexity multiview video coding. IEEE Trans Broadcast 61(2):166–176

    Article  Google Scholar 

  12. Pan Z, Lei J, Zhang Y, Sun X, Kwong S (2016) Fast motion estimation based on content property for low-complexity H. 265/HEVC encoder. IEEE Trans Broadcast 62(3):675–684

    Article  Google Scholar 

  13. Wang J, Li T, Shi YQ, Lian S, Ye J (2016) Forensics feature analysis in quaternion wavelet domain for distinguishing photographic images and computer graphics. Multimedia tools and applications, pp 1–17

  14. Chen M, Zhang L, Allebach JP (2015) Learning deep features for image emotion classification. In: Image processing (ICIP), 2015 IEEE international conference on. IEEE, pp 4491–4495

  15. You Q, Luo J, Jin H, Yang J (2016) Building a large scale dataset for image emotion recognition: the fine print and the benchmark. In: Proceedings of the thirtieth AAAI conference on artificial intelligence, pp 308–314

  16. Rao T, Xu M, Xu D (2016) Learning multi-level deep representations for image emotion classification. arXiv:1611.07145 (preprint)

  17. Zhao S, Yao H, Gao Y, Ji R, Ding G (2016) Continuous probability distribution prediction of image emotions via multi-task shared sparse regression. In: IEEE transactions on multimedia

  18. Pozzi FA, Fersini E, Messina E, Liu B (2016) Sentiment analysis in social networks. Morgan Kaufmann, Burlington

    Google Scholar 

  19. Li X, Xie H, Chen L, Wang J, Deng X 2014) News impact on stock price return via sentiment analysis. Knowl Based Syst 69:14–23

    Article  Google Scholar 

  20. Rao Y, Xie H, Li J, Jin F, Wang FL, Li Q (2016) Social emotion classification of short text via topic-level maximum entropy model. Inf Manag 53(8):978–986

    Article  Google Scholar 

  21. Niu T, Zhu S, Pang L, El Saddik A (2016) Sentiment analysis on multi-view social data. In: International conference on multimedia modeling. Springer, pp 15–27

  22. You Q, Luo J, Jin H, Yang J (2016) Cross-modality consistent regression for joint visual-textual sentiment analysis of social multimedia. In: Proceedings of the ninth ACM international conference on web search and data mining. ACM, pp 13–22

  23. Atrey PK, Kankanhalli MS, Oommen JB (2007) Goal-oriented optimal subset selection of correlated multimedia streams. ACM Trans Multimed Comput Commun Appl 3(1):2

    Article  Google Scholar 

  24. Atrey PK, Hossain MA, El Saddik A, Kankanhalli MS (2010) Multimodal fusion for multimedia analysis: a survey. Multimedia Syst 16(6):345–379

    Article  Google Scholar 

  25. Poria S, Cambria E, Bajpai R, Hussain A 2017) A review of affective computing: from unimodal analysis to multimodal fusion. Inf Fusion 37:98–125

    Article  Google Scholar 

  26. Snoek CGM, Worring M, Smeulders AWM (2005) Early versus late fusion in semantic video analysis. In: Proceedings of the 13th annual ACM international conference on multimedia. ACM, pp 399–402

  27. Mikels JA, Fredrickson BL, Larkin GR, Lindberg CM, Maglio SJ, Reuter-Lorenz PA (2005) Emotional category data on images from the international affective picture system. Behav Res Methods 37(4):626–630

    Article  Google Scholar 

  28. Fersini E, Messina E, Pozzi FA (2014) Sentiment analysis: Bayesian ensemble learning. Decis Support Syst 68:26–38

    Article  Google Scholar 

  29. Dietterich TG (2002) Ensemble learning. In: The handbook of brain theory and neural networks, vol 2, pp 110–125

  30. Tamura H, Mori S, Yamawaki T 1978) Textural features corresponding to visual perception. Syst Man Cybern IEEE Trans 8(6):460–473

    Article  Google Scholar 

  31. Mack ML, Oliva A (2004) Computational estimation of visual complexity. In: The 12th annual object, perception, attention, and memory conference

  32. Ojala T, Pietikäinen M, Harwood D 1996) A comparative study of texture measures with classification based on featured distributions. Pattern Recogn 29(1):51–59

    Article  Google Scholar 

  33. Junior OL, Delgado D, Gonçalves V, Nunes U (2009) Trainable classifier-fusion schemes: an application to pedestrian detection. In: Intelligent transportation systems, 2009. ITSC’09. 12th International IEEE conference on. IEEE, pp 1–6

  34. Ciocca G, Corchs S, Gasparini F 2016) Genetic programming approach to evaluate complexity of texture images. J Electron Imaging 25(6):061408–061408

    Article  Google Scholar 

  35. Comaniciu D, Meer P 2002) Mean shift: a robust approach toward feature space analysis. Pattern Anal Mach Intell IEEE Trans 24(5):603–619

    Article  Google Scholar 

  36. Hasler D, Suesstrunk SE (2003) Measuring colorfulness in natural images. In: Electronic imaging 2003. International Society for Optics and Photonics, pp 87–95

  37. Rosenholtz R, Li Y, Nakano L (2007) Measuring visual clutter. J Vis 7(2):17–17

    Article  Google Scholar 

  38. Corchs SE, Ciocca G, Bricolo E, Gasparini F (2016) Predicting complexity perception of real world images. PLoS One 11(6):e0157986

    Article  Google Scholar 

  39. Marziliano P, Dufaux F, Winkler S, Ebrahimi T (2002) A no-reference perceptual blur metric. In: Image processing. 2002. Proceedings. 2002 international conference on, vol 3. IEEE, pp III–III

  40. Schettini R, Gasparini F, Corchs S, Marini F, Capra A, Castorina A 2010) Contrast image correction method. J Electron Imaging 19(2):023005–023005

    Article  Google Scholar 

  41. Mittal A, Soundararajan R, Bovik AC (2013) Making a completely blind image quality analyzer. IEEE Signal Proces Lett 20(3):209–212

    Article  Google Scholar 

  42. Immerkaer J 1996) Fast noise variance estimation. Comput Vis Image Underst 64(2):300–302

    Article  Google Scholar 

  43. Minhas R, Mohammed AA, Wu QMJ, Sid-Ahmed MA (2009) 3d shape from focus and depth map computation using steerable filters. In: International conference image analysis and recognition. Springer, pp 573–583

  44. Bhattacharya S, Nojavanasghari B, Chen T, Liu D, Chang SF, Shah M (2013) Towards a comprehensive computational model foraesthetic assessment of videos. In: Proceedings of the 21st ACM international conference on multimedia. ACM, pp 361–364

  45. Gasparini F, Corchs S, Schettini R 2008) Recall or precision-oriented strategies for binary classification of skin pixels. J Electron Imaging 17(2):023017–023017

    Article  Google Scholar 

  46. Viola P, Jones M (2001) Rapid object detection using a boosted cascade of simple features. In: Computer vision and pattern recognition, 2001. CVPR 2001. Proceedings of the 2001 IEEE computer society conference on, vol 1. IEEE, pp I–I

  47. Sharif Razavian A, Azizpour H, Sullivan J, Carlsson S (2014) CNN features off-the-shelf: an astounding baseline for recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp 806–813

  48. Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp 1097–1105

  49. Vedaldi A,Lenc K (2015) Matconvnet—convolutional neural networks for matlab. In: Proceeding of the ACM international conference on multimedia

  50. Breiman L 1996) Bagging predictors. Mach Learn 24(2):123–140

    MATH  Google Scholar 

  51. Freund Y, Schapire RE (1997) A decision-theoretic generalization of on-line learning and an application to boosting. J Comput Syst Sci 55(1):119–139

    Article  MathSciNet  MATH  Google Scholar 

  52. Ho TK (1998) The random subspace method for constructing decision forests. IEEE Trans Pattern Anal Mach Intell 20(8):832–844

    Article  Google Scholar 

  53. Kittler J, Hatef M, Duin RPW, Matas J (1998) On combining classifiers. IEEE Trans Pattern Anal Mach Intell 20(3):226–239

    Article  Google Scholar 

  54. John GH, Langley P (1995) Estimating continuous distributions in Bayesian classifiers. In: Eleventh conference on uncertainty in artificial intelligence. Morgan Kaufmann, San Mateo, pp 338–345

  55. Cristianini N, Shawe-Taylor J (2000) An introduction to support vector machines and other kernel-based learning methods. Cambridge University Press, Cambridge

  56. Pearl J (1988) Probabilistic reasoning in intelligent systems: networks of plausible inference. Morgan Kaufmann Publishers Inc., Burlington

    MATH  Google Scholar 

  57. Quinlan JR (1993) 4.5: programs for machine learning. Morgan Kaufmann Publishers Inc., San Francisco

    Google Scholar 

  58. Aha DW, Kibler D, Albert MK (1991) Instance-based learning algorithms. Mach Learn 6(1):37–66

    Google Scholar 

  59. Hastie T, Tibshirani R (1998) Classification by pairwise coupling. In: Advances in neural information processing systems, pp 507–513

  60. Cooper GF, Herskovits E (1992) A Bayesian method for the induction of probabilistic networks from data. Mach Learn 9(4):309–347

    MATH  Google Scholar 

  61. Rennie JD, Shih L, Teevan J, Karger DR (2003) Tackling the poor assumptions of Naive Bayes text classifiers. In: Proceedings of the 20th international conference on machine learning (ICML-03), pp 616–623

  62. Akbani R, Kwek S, Japkowicz N (2004) Applying support vector machines to imbalanced datasets. Mach Learn ECML 2004:39–50

    MATH  Google Scholar 

  63. Yang J, McAuley J, Leskovec J (2013) Community detection in networks with node attributes. In: Data mining (ICDM), 2013 IEEE 13th international conference on. IEEE, pp 1151–1156

  64. Gu B, Sheng VS (2017) A robust regularization path algorithm for \(\nu\)-support vector classification. IEEE Trans Neural Netw Learn Syst 28(5):1241–1248

    Article  Google Scholar 

  65. Xie H, Zou D, Lau RYK, Wang FL, Wong TL (2016) Generating incidental word-learning tasks via topic-based and load-based profiles. IEEE Multimedia 23(1):60–70

    Article  Google Scholar 

Download references

Acknowledgements

We gratefully acknowledge the support of NVIDIA Corporation with the donation of the Tesla K40 GPU used for this research.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Silvia Corchs.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Corchs, S., Fersini, E. & Gasparini, F. Ensemble learning on visual and textual data for social image emotion classification. Int. J. Mach. Learn. & Cyber. 10, 2057–2070 (2019). https://doi.org/10.1007/s13042-017-0734-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13042-017-0734-0

Keywords

Navigation