Skip to main content

Big Data and Multimodal Communication: A Perspective View

  • Chapter
  • First Online:
Innovations in Big Data Mining and Embedded Knowledge

Part of the book series: Intelligent Systems Reference Library ((ISRL,volume 159))

Abstract

Humans communicate face-to-face through at least two modalities, the auditive modality, speech, and the visual modality, gestures, which comprise e.g. gaze movements, facial expressions, head movements, and hand gestures. The relation between speech and gesture is complex and partly depends on factors such as the culture, the communicative situation, the interlocutors and their relation. Investigating these factors in real data is vital for studying multimodal communication and building models for implementing natural multimodal communicative interfaces able to interact naturally with individuals of different age, culture, and needs. In this paper, we discuss to what extent big data “in the wild”, which are growing explosively on the internet, are useful for this purpose also in light of legal aspects about the use of personal data, comprising multimodal data downloaded from social media.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 159.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://mocap.cs.cmu.edu/.

  2. 2.

    In the GDPR article 4(7) the controller is defined as “the natural or legal person, public authority, agency or other body which, alone or jointly with others, determines the purposes and means of the processing of personal data; where the purposes and means of such processing are determined by Union or Member State law, the controller or the specific criteria for its nomination may be provided for by Union or Member State law”.

  3. 3.

    This includes all information about (i) the identity and the contact details of the controller, (ii) the contact details of the data protection officer (where applicable), (iii) the purposes of the processing for which the personal data are intended and the legal basis for the processing, (iv) the recipients or categories of recipients of the personal data (if any), (v) the fact (when relevant) that the controller intends to transfer personal data to a third country or international organisation, (vi) the period for which the personal data will be stored, or if that is not possible, the criteria used to determine that period, (vii) the existence of the right to request from the controller access to and rectification or erasure of personal data or restriction of processing concerning the data subject or to object to processing as well as the right to data portability, (viii) in cases when the processing is based on consent, the existence of the right to withdraw consent at any time, without affecting the lawfulness of processing based on consent before its withdrawal, (ix) the right to lodge a complaint with a supervisory authority, and (x) the existence of automated decision-making, including profiling (if any).

  4. 4.

    Cf. Judgement of 6 November 2003, Lindqvist (C 101/01, EU:C:2003:596, paragraph 47).

References

  1. Allwood, J., Nivre, J., Ahls’en, E.: On the semantics and pragmatics of linguistic feedback. J. Semant. 9, 1–26 (1992)

    Article  Google Scholar 

  2. Allwood, J., Cerrato, L., Jokinen, K., Navarretta, C., Paggio, P.: The MUMIN coding scheme for the annotation of feedback, turn management and sequencing. Multimodal corpora for modelling human multimodal behaviour. Spec. Issue Int. J. Lang. Resour. Eval. 41(3–4), 273–287 (2007)

    Google Scholar 

  3. Amos, B., Ludwiczuk, B., Satyanarayanan, M.: Openface: A general-purpose face recognition library with mobile applications. Technical report, CMU-CS-1 6-118, CMU School of Computer Science (2016)

    Google Scholar 

  4. Batalha, N.M., Rowe, J.F., Bryson, S.T., Barclay, T., Burke, C.J. et al.: Planetary candidates observed by Kepler. III. Analysis of the first 16 months of data. Astrophys. J. Suppl. Ser. 204(2), 24 (2013)

    Google Scholar 

  5. Bourbakis, N., Esposito, A., Kavraki, D.: Extracting and associating meta-features for understanding people’s emotional behaviour: face and speech. Cogn. Comput. 3, 436–448 (2011)

    Article  Google Scholar 

  6. Bunt, H., Alexandersson, J., Carletta, J., Choe, J.W., Fang, A.C., Hasida, K., Lee, K., Petukhova, V., Popescu-Belis, A., Romary, L., Soria, C., Traum, D.: Towards an ISO standard for dialogue act annotation. Proc. LREC 2010, 2548–2555 (2010)

    Google Scholar 

  7. Cao, Z., Simon, T., Wei, S.E., Sheikh, Y.: Realtime multi-person 2D pose estimation using part affinity fields. In: CVPR (2017)

    Google Scholar 

  8. Carletta, J., Ashby, S., Bourban, S., Flynn, M., Guillemot, M., Hain, T., Kadlec, J., Karaiskos, V., Kraaij, W., Kronenthal, M., Lathoud, G., Lincoln, M., Lisowska, A., McCowan, I., Post, W., Reidsma, D., Wellner, P.: The AMI meeting corpus: a pre-announcement. In: Renals, S., Bengio, S. (eds.) Machine Learning for Multimodal Interaction, Second International Workshop, vol. 10. Lecture Notes in Computer Science, pp. 28–39. Springer, Berlin (2006)

    Chapter  Google Scholar 

  9. Cassell, J., Pelachaud, C., Badler, N., Steedman, M., Achorn, B., Becket, T., Douville, B., Prevost, S., Stone, M.: Animated conversation: rule-based generation of facial expression, gesture & spoken intonation for multiple conversational agents. In: Proceedings of the 21st Annual Conference on Computer Graphics and Interactive Techniques, pp. 413–420. ACM (1994)

    Google Scholar 

  10. Collobert, R., Weston, J.: A unified architecture for natural language processing: deep neural networks with multitask learning. In: Proceedings of the 25th International Conference on Machine Learning, ICML ’08, pp. 160–167. ACM, New York, NY, USA (2008)

    Google Scholar 

  11. de Kok, I., Heylen, D.: The MultiLis corpus dealing with individual differences in nonverbal listening behavior. Toward Autonomous, Adaptive, and Context-Aware Multimodal Interfaces. Theoretical and Practical Issues- Third COST 2102 International Training School, Caserta, Italy, March 15–19, 2010, Revised Selected Papers, pp. 362–375. Springer, Berlin (2010)

    Google Scholar 

  12. Deng, L., Li, J., Huang, J.T., Yao, K., Yu, D., Seide, F., Seltzer, M., Zweig, G., He, X., Williams, J., Gong, Y., Acero, A.: Recent advances in deep learning for speech research at Microsoft. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 8604–8608 (2013)

    Google Scholar 

  13. Duncan, S.: Gesture, verb aspect, and the nature of iconic imagery in natural discourse. Gesture 2(2), 183–206 (2002)

    Article  MathSciNet  Google Scholar 

  14. Esposito, A., Esposito, A.M.: On speech and gesture synchrony. In: Esposito, A., Vinciarelli, A., Vicsi, K., Pelachaud, C., Nijholt, A. (eds.) Communication and Enactment - The Processing Issues. LNCS, vol. 6800, pp. 252–272. Springer, Berlin (2011)

    Google Scholar 

  15. Eyben, F., Weninger, F., Gross, F., Schuller, B.: Recent developments in opensmile, the munich open-source multimedia feature extractor. In: Proceedings of the 21st ACM International Conference on Multimedia, MM ’13, pp. 835–838. ACM, New York, NY, USA (2013)

    Google Scholar 

  16. Feyereisen, P., Havard, I.: Mental imagery and production of hand gestures while speaking in younger and older adults. J. Nonverbal Behav. 23(2), 153–171 (1999)

    Article  Google Scholar 

  17. Giorgolo, G., Verstraten, F.A.: Perception of ‘speech-and-gesture’ integration. Proc. Int. Conf. Audit.-Vis. Speech Process. 2008, 31–36 (2008)

    Google Scholar 

  18. Hadar, U., Steiner, T.J., Grant, E.C., Rose, F.C.: The relationship between head movements and speech dysfluencies. Lang. Speech 27(4), 333–342 (1984)

    Article  Google Scholar 

  19. Hadar, U., Steiner, T.J., Grant, E.C., Rose, F.C.: The timing of shifts of head postures during conservation. Hum. Mov. Sci. 3(3), 237–245 (1984)

    Article  Google Scholar 

  20. Hostetter, A.B., Potthoff, A.L.: Effects of personality and social situation on representational gesture production. Gesture 12(1), 62–83 (2012)

    Article  Google Scholar 

  21. Hunyadi, L., Bertok, K., Nemeth, T., Szekrenyes, I., Abuczki, A., Nagy, G., Nagy, N., Nemeti, P., Bodog, A.: The outlines of a theory and technology of human-computer interaction as represented in the model of the HuComTech project. In: 2011 2nd International Conference on Cognitive Infocommunications, CogInfoCom 2011 (2011)

    Google Scholar 

  22. Ikizler-Cinbis, N., Sclaroff, S.: Object, scene and actions: combining multiple features for human action recognition. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) Computer Vision - ECCV 2010: 11th European Conference on Computer Vision, Heraklion, Crete, Greece, September 5–11, 2010, Proceedings, Part I, pp. 494–507. Springer, Berlin (2010)

    Chapter  Google Scholar 

  23. Kendon, A.: Some relationships between body motion and speech. In: Seigman, A., Pope, B. (eds.) Studies in Dyadic Communication, pp. 177–216. Pergamon Press, Elmsford, New York (1972)

    Chapter  Google Scholar 

  24. Kendon, A.: Gesture and speech: two aspects of the process of utterance. In: Key, M.R. (ed.) Nonverbal Communication and Language, pp. 207–227. Mouton (1980)

    Google Scholar 

  25. Kendon, A.: Gesture - Visible Action as Utterance. Cambridge University Press, Cambridge (2004)

    Google Scholar 

  26. Krämer, N., Kopp, S., Becker-Asano, C., Sommer, N.: Smile and the world will smile with you—the effects of a virtual agent’s smile on users’ evaluation and behavior. Int. J. Hum.-Comput. Stud. 71(3), 335–349 (2013)

    Article  Google Scholar 

  27. Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Pereira, F., Burges, C.J.C., Bottou, L., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems, vol. 25, pp. 1097–1105. Curran Associates Inc. (2012)

    Google Scholar 

  28. Leonard, T., Cummins, F.: The temporal relation between beat gestures and speech. Lang. Cogn. Process. 26(10), 1457–1471 (2010)

    Article  Google Scholar 

  29. Lis, M.: Multimodal representation of entities: a corpus-based investigation of co-speech hand gesture. Ph.D. thesis, University of Copenhagen (2014)

    Google Scholar 

  30. Lis, M., Navarretta, C.: Classifying the form of iconic hand gestures from the linguistic categorization of co-occurring verbs. In: 1st European Symposium on Multimodal Communication (MMSym’13), pp. 41–50 (2013)

    Google Scholar 

  31. Liu, M., Li, S., Shan, S., Chen, X.: AU-inspired deep networks for facial expression feature learning. Neurocomputing 159(Supplement C), 126–136 (2015)

    Article  Google Scholar 

  32. Loehr, D.P.: Gesture and intonation. Ph.D. thesis, Georgetown University (2004)

    Google Scholar 

  33. Loehr, D.P.: Aspects of rhythm in gesture and speech. Gesture 7(2), (2007)

    Article  Google Scholar 

  34. McNeill, D.: Hand and Mind: What Gestures Reveal about Thought. University of Chicago Press, Chicago (1992)

    Google Scholar 

  35. Mondada, L.: Emergent focused interactions in public places: a systematic analysis of the multimodal achievement of a common interactional space. J. Pragmat. 41, 1977–1997 (2009)

    Article  Google Scholar 

  36. Mou, D.: Automatic Face Recognition, pp. 91–106. Springer, Berlin (2010)

    Chapter  Google Scholar 

  37. Navarretta, C.: Individuality in communicative bodily behaviours. In: Esposito, A., Esposito, A.M., Vinciarelli, A., Hoffmann, R., Müller, V.C. (eds.) Behavioural Cognitive Systems. Lecture Notes in Computer Science, vol. 7403, pp. 417–423. Springer, Berlin (2012)

    Chapter  Google Scholar 

  38. Navarretta, C.: Transfer learning in multimodal corpora. In: IEEE (ed.) In Proceedings of the 4th IEEE International Conference on Cognitive Infocommunications (CogInfoCom2013), pp. 195–200. Budapest, Hungary (2013)

    Google Scholar 

  39. Navarretta, C.: Fillers, filled pauses and gestures in Danish first encounters. In: Abstract proceedings of 3rd European Symposium on Multimodal Communication, pp. 1–3. Speech Communication Lab at Trinity College Dublin, Dublin (2015)

    Google Scholar 

  40. Navarretta, C.: Mirroring facial expressions and emotions in dyadic conversations. In: Chair, N.C.C., Choukri, K., Declerck, T., Goggi, S., Grobelnik, M., Maegaard, B., Mariani, J., Mazo, H., Moreno, A., Odijk, J., Piperidis, S. (eds.) Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016), pp. 469–474. European Language Resources Association (ELRA), Paris, France (2016)

    Google Scholar 

  41. Navarretta, C., Lis, M.: Multimodal feedback expressions in Danish and polish spontaneous conversations. In: NEALT Proceedings. Northern European Association for Language and Technology, Proceedings of the Fourth Nordic Symposium of Multimodal Communication, pp. 55–62. Linköping Electronic Conference Proceedings (2013)

    Google Scholar 

  42. Navarretta, C., Lis, M.: Transfer learning of feedback head expressions in Danish and polish comparable multimodal corpora. In: Proceedings of 9th Language Resources and Evaluation Conference (LREC 2014), pp. 3597–3603. Reykjavik, Island (2014)

    Google Scholar 

  43. Navarretta, C., Paggio, P.: Verbal and non-verbal feedback in different types of interactions. In: Proceedings of LREC 2012, pp. 2338–2342. Istanbul Turkey (2012)

    Google Scholar 

  44. Navarretta, C., Ahlsn, E., Allwood, J., Jokinen, K., Paggio, P.: Feedback in Nordic first-encounters: a comparative study. In: Proceedings of LREC 2012, pp. 2494–2499. Istanbul Turkey (2012)

    Google Scholar 

  45. Özyürek, A., Kita, S., Allen, S., Furman, R., Brown, A.: How does linguistic framing of events influence co-speech gestures? Insights from crosslinguistic variations and similarities. Gesture 5(1–2), 219–240 (2005)

    Article  Google Scholar 

  46. Paggio, P., Ahlsén, E., Allwood, J., Jokinen, K., Navarretta, C.: The NOMCO multimodal Nordic resource - goals and characteristics. In: Proceedings of LREC 2010, pp. 2968–2973. Malta (2010)

    Google Scholar 

  47. Paggio, P., Navarretta, C.: The Danish NOMCO corpus: multimodal interaction in first acquaintance conversations. Lang. Resour. Eval. 51(2), 463–494 (2017). https://doi.org/10.1007/s10579-016-9371-6

    Article  Google Scholar 

  48. Parkhi, O.M., Vedaldi, A., Zisserman, A.: Deep face recognition. In: Proceedings of the British Machine Vision Conference (BMVC) (2015)

    Google Scholar 

  49. Raina, R., Madhavan, A., Ng, A.Y.: Large-scale deep unsupervised learning using graphics processors. In: Proceedings 26th Annual International Conference on Machine Learning, pp. 873–888 (2009)

    Google Scholar 

  50. Ringeval, F., Sonderegger, A., Sauer, J., Lalanne, D.: Introducing the RECOLA multimodal corpus of remote collaborative and affective interactions. In: Proceedings of the 2nd International Workshop on Emotion Representation, Analysis and Synthesis in Continuous Time and Space (EmoSPACE 2013), Held in Conjunction with the 10th International IEEE Conference on Automatic Face and Gesture Recognition (FG 2013). Shanghai, China (2013)

    Google Scholar 

  51. Riviello, M.T., Esposito, A., Vicsi, K.: A cross-cultural study on the perception of emotions: how Hungarian subjects evaluate American and Italian emotional expressions. In: Esposito, A., Esposito, A.M., Vinciarelli, A., Hoffmann, R., Müller, V.C. (eds.) Cognitive Behavioural Systems: COST 2102 International Training School, Dresden, Germany, February 21–26, 2011, Revised Selected Papers, pp. 424–433. Springer, Berlin (2012)

    Chapter  Google Scholar 

  52. Rizzolatti, G.: The mirror neuron system and its function in humans. Anat. Embryol. 210, 419–421 (2005)

    Article  Google Scholar 

  53. Rizzolatti, G., Craighero, L.: The mirror-neuron system. Annu. Rev. Neurosci. 27, 169–192 (2004)

    Article  Google Scholar 

  54. Sacks, H., Schegloff, E., Jefferson, G.: A simplest systematics for the organization of turn-taking for conversation. Language 50(4), 696–735 (1974)

    Article  Google Scholar 

  55. Simon, T., Joo, H., Matthews, I., Sheikh, Y.: Hand keypoint detection in single images using multiview bootstrapping. In: CVPR (2017)

    Google Scholar 

  56. Singh, S.P., Kumar, A., Darbari, H., Singh, L., Rastogi, A., Jain, S.: Machine translation using deep learning: an overview. In: 2017 International Conference on Computer, Communications and Electronics (Comptelix), pp. 162–167 (2017)

    Google Scholar 

  57. Stieglitz, S., Dang-Xuan, L., Bruns, A., Neuberger, C.: Social media analytics. Bus. Inf. Syst. Eng. 6(2), 89–96 (2014)

    Article  Google Scholar 

  58. Streeck, J.: Gesturecraft - The Manufacture of Meaning. John Benjamins Publishing Company (2009)

    Google Scholar 

  59. Streeck, J., Goodwin, C., LeBaron., C. (eds.): Embodied Interaction: Language and Body in the Material World. Cambridge University Press, Cambridge (2011)

    Google Scholar 

  60. Traum, D.R.: A computational theory of grounding in natural language conversation. Ph.D. thesis, Computer Science Department, University of Rochester (1994)

    Google Scholar 

  61. Wei, S.E., Ramakrishna, V., Kanade, T., Sheikh, Y.: Convolutional pose machines. In: CVPR (2016)

    Google Scholar 

  62. Weninger, F., Erdogan, H., Watanabe, S., Vincent, E., Roux, J.L., Hershey, J.R., Schuller, B.: Speech enhancement with LSTM recurrent neural networks and its application to noise-robust ASR. In: Vincent, E., Yeredor, A., Koldovský, Z., Tichavský, P. (eds.) Latent Variable Analysis and Signal Separation: 12th International Conference, LVA/ICA 2015, Liberec, Czech Republic, August 25–28, 2015, Proceedings, pp. 91–99. Springer International Publishing, Cham (2015)

    Chapter  Google Scholar 

  63. You, Q., Luo, J., Jin, H., Yang, J.: Robust image sentiment analysis using progressively trained and domain transferred deep networks. In: Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence, AAAI’15, pp. 381–388. AAAI Press (2015)

    Google Scholar 

  64. Zadeh, A., Chen, M., Poria, S., Cambria, E., Morency, L.: Tensor fusion network for multimodal sentiment analysis. CoRR (2017). arXiv:abs/1707.07250

  65. Zadeh, A., Zellers, R., Pincus, E., Morency, L.P.: MOSI: Multimodal corpus of sentiment intensity and subjectivity analysis in online opinion videos. IEEE Intell. Syst. 31(6), 81–88 (2016)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Costanza Navarretta .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Navarretta, C., Oemig, L. (2019). Big Data and Multimodal Communication: A Perspective View. In: Esposito, A., Esposito, A., Jain, L. (eds) Innovations in Big Data Mining and Embedded Knowledge. Intelligent Systems Reference Library, vol 159. Springer, Cham. https://doi.org/10.1007/978-3-030-15939-9_9

Download citation

Publish with us

Policies and ethics