Abstract
The enormous number of videos posted everyday on multimedia websites such as Facebook and YouTube makes the Internet an infinite source of information. Collecting and processing such information, however, is a very challenging task as it involves dealing with a huge amount of information that is changing at a very high speed. To this end, we leverage on the processing speed of extreme learning machine and graphics processing unit to overcome the limitations of standard learning algorithms and central processing unit (CPU) and, hence, perform real-time multimodal sentiment analysis, i.e., harvesting sentiments from web videos by taking into account audio, visual and textual modalities as sources of the information. For the sentiment classification, we leveraged on sentic memes, i.e., basic units of sentiment whose combination can potentially describe the full range of emotional experiences that are rooted in any of us, including different degrees of polarity. We used both feature and decision level fusion methods to fuse the information extracted from the different modalities. Using the sentiment annotated dataset generated from YouTube video reviews, our proposed multimodal system is shown to achieve an accuracy of 78%. In term of processing speed, our method shows improvements of several orders of magnitude for feature extraction compared to CPU-based counterparts.








Similar content being viewed by others
Explore related subjects
Discover the latest articles and news from researchers in related subjects, suggested using machine learning.References
Alm CO, Roth D, Sproat R (2005) Emotions from text: machine learning for text-based emotion prediction. In: Proceedings of the conference on human language technology and empirical methods in natural language processing. Association for Computational Linguistics, pp 579–586
Busso C, Narayanan SS (2007) Interrelation between speech and facial gestures in emotional utterances: a single subject study. IEEE Trans Audio Speech Lang Process 15(8):2331–2347
Cambria E (2016) Affective computing and sentiment analysis. IEEE Intell Syst 31(2):102–107
Cambria E, Hussain A (2015) Sentic computing: a common-sense-based framework for concept-level sentiment analysis. Springer, Cham
Cambria E, Poria S, Bajpai R, Schuller B (2016) SenticNet 4: a semantic resource for sentiment analysis based on conceptual primitives. In: the 26th international conference on computational linguistics (COLING), pp 2666–2677
Chaumartin F-R (2007) Upar7: a knowledge-based system for headline sentiment tagging. In: Proceedings of the 4th international workshop on semantic evaluations. Association for Computational Linguistics, pp 422–425
Cootes TF, Taylor CJ, Cooper DH, Graham J (1995) Active shape models-their training and application. Comput Vis Image Underst 61(1):38–59
Ekman P, Keltner D (1970) Universal facial expressions of emotion. Calif Ment Health Res Dig 8(4):151–158
El Ayadi M, Kamel MS, Karray F (2011) Survey on speech emotion recognition: features, classification schemes, and databases. Pattern Recognit 44(3):572–587
Huang G-B, Cambria E, Toh K-A, Widrow B, Xu Z (2015) New trends of learning in computational intelligence. IEEE Comput Intell Mag 10(2):16–17
Johnstone T (1996) Emotional speech elicited using computer games. In: Proceedings, 4th international conference on spoken language, 1996. ICSLP 96. vol 3. IEEE, pp 1985–1988
Li J, Lu Y, Pu B, Xie Y, Qin J, Pang W-M, Heng P-A (2009) Accelerating active shape model using GPU for facial extraction in video. In: IEEE international conference on intelligent computing and intelligent systems, 2009. ICIS 2009, vol 4. IEEE, pp 522–526
Li X, Mao W, Jiang W, Yao Y (2016) Extreme learning machine via free sparse transfer representation optimization. Memet Comput 8(2):85–95
Majumder N, Poria S, Gelbukh A, Cambria E (2017) Deep learning based document modeling for personality detection from text. IEEE Intell Syst 32(2):42–49
Michálek J, Vaněk J (2014) An open-source GPU-accelerated feature extraction tool. In: 2014 12th international conference on signal processing (ICSP). IEEE, pp 450–454
Morency L-P, Mihalcea R, Doshi P (2011) Towards multimodal sentiment analysis: harvesting opinions from the web. In: Proceedings of the 13th international conference on multimodal interfaces. ACM, pp 169–176
Murray IR, Arnott JL (1993) Toward the simulation of emotion in synthetic speech: a review of the literature on human vocal emotion. J Acoust Soc Am 93(2):1097–1108
Oneto L, Bisio F, Cambria E, Anguita D (2016) Statistical learning theory and ELM for big social data analysis. IEEE Comput Intell Mag 11(3):45–55
Poria S, Cambria E, Gelbukh A (2016) Aspect extraction for opinion mining with a deep convolutional neural network. Knowl-Based Syst 108:42–49
Poria S, Chaturvedi I, Cambria E, Hussain A (2016) Convolutional MKL based multimodal emotion recognition and sentiment analysis. In: ICDM. Barcelona, pp 439–448
Poria S, Cambria E, Bajpai R, Hussain A (2017) A review of affective computing: from unimodal analysis to multimodal fusion. Inf Fus 37:98–125
Sattar A, Seguier R (2010) Hmoam: hybrid multi-objective genetic optimization for facial analysis by appearance model. Memet Comput 2(1):25–46
Song M, You M, Li N, Chen C (2008) A robust multimodal approach for emotion recognition. Neurocomputing 71(10):1913–1920
Tran H-N, Cambria E, Hussain A (2016) Towards GPU-based common-sense reasoning: using fast subgraph matching. Cognit Comput 8(6):1074–1086
Turney PD (2002) Thumbs up or thumbs down? semantic orientation applied to unsupervised classification of reviews. In: Proceedings of the 40th annual meeting on association for computational linguistics. Association for Computational Linguistics, pp 417–424
Várkonyi-Kóczy AR (2010) New advances in digital image processing. Memet Comput 2(4):283–304
Wiebe J, Riloff E (2005) Creating subjective and objective sentence classifiers from unannotated texts. In: International conference on intelligent text processing and computational linguistics. Springer, pp 486–497
Wilson T, Wiebe J, Hoffmann P (2005) Recognizing contextual polarity in phrase-level sentiment analysis. In: Proceedings of the conference on human language technology and empirical methods in natural language processing. Association for Computational Linguistics, pp 347–354
Xiao C, Dong Z, Xu Y, Meng K, Zhou X, Zhang X (2016) Rational and self-adaptive evolutionary extreme learning machine for electricity price forecast. Memet Comput 8(3):223–233
Yang C, Lin KH-Y, Chen H-H (2007) Building emotion lexicon from weblog corpora. In: Proceedings of the 45th annual meeting of the ACL on interactive poster and demonstration sessions. Association for Computational Linguistics, pp 133–136
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Tran, HN., Cambria, E. Ensemble application of ELM and GPU for real-time multimodal sentiment analysis. Memetic Comp. 10, 3–13 (2018). https://doi.org/10.1007/s12293-017-0228-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12293-017-0228-3