Abstract
Humans communicate face-to-face through at least two modalities, the auditive modality, speech, and the visual modality, gestures, which comprise e.g. gaze movements, facial expressions, head movements, and hand gestures. The relation between speech and gesture is complex and partly depends on factors such as the culture, the communicative situation, the interlocutors and their relation. Investigating these factors in real data is vital for studying multimodal communication and building models for implementing natural multimodal communicative interfaces able to interact naturally with individuals of different age, culture, and needs. In this paper, we discuss to what extent big data “in the wild”, which are growing explosively on the internet, are useful for this purpose also in light of legal aspects about the use of personal data, comprising multimodal data downloaded from social media.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
- 2.
In the GDPR article 4(7) the controller is defined as “the natural or legal person, public authority, agency or other body which, alone or jointly with others, determines the purposes and means of the processing of personal data; where the purposes and means of such processing are determined by Union or Member State law, the controller or the specific criteria for its nomination may be provided for by Union or Member State law”.
- 3.
This includes all information about (i) the identity and the contact details of the controller, (ii) the contact details of the data protection officer (where applicable), (iii) the purposes of the processing for which the personal data are intended and the legal basis for the processing, (iv) the recipients or categories of recipients of the personal data (if any), (v) the fact (when relevant) that the controller intends to transfer personal data to a third country or international organisation, (vi) the period for which the personal data will be stored, or if that is not possible, the criteria used to determine that period, (vii) the existence of the right to request from the controller access to and rectification or erasure of personal data or restriction of processing concerning the data subject or to object to processing as well as the right to data portability, (viii) in cases when the processing is based on consent, the existence of the right to withdraw consent at any time, without affecting the lawfulness of processing based on consent before its withdrawal, (ix) the right to lodge a complaint with a supervisory authority, and (x) the existence of automated decision-making, including profiling (if any).
- 4.
Cf. Judgement of 6 November 2003, Lindqvist (C 101/01, EU:C:2003:596, paragraph 47).
References
Allwood, J., Nivre, J., Ahls’en, E.: On the semantics and pragmatics of linguistic feedback. J. Semant. 9, 1–26 (1992)
Allwood, J., Cerrato, L., Jokinen, K., Navarretta, C., Paggio, P.: The MUMIN coding scheme for the annotation of feedback, turn management and sequencing. Multimodal corpora for modelling human multimodal behaviour. Spec. Issue Int. J. Lang. Resour. Eval. 41(3–4), 273–287 (2007)
Amos, B., Ludwiczuk, B., Satyanarayanan, M.: Openface: A general-purpose face recognition library with mobile applications. Technical report, CMU-CS-1 6-118, CMU School of Computer Science (2016)
Batalha, N.M., Rowe, J.F., Bryson, S.T., Barclay, T., Burke, C.J. et al.: Planetary candidates observed by Kepler. III. Analysis of the first 16 months of data. Astrophys. J. Suppl. Ser. 204(2), 24 (2013)
Bourbakis, N., Esposito, A., Kavraki, D.: Extracting and associating meta-features for understanding people’s emotional behaviour: face and speech. Cogn. Comput. 3, 436–448 (2011)
Bunt, H., Alexandersson, J., Carletta, J., Choe, J.W., Fang, A.C., Hasida, K., Lee, K., Petukhova, V., Popescu-Belis, A., Romary, L., Soria, C., Traum, D.: Towards an ISO standard for dialogue act annotation. Proc. LREC 2010, 2548–2555 (2010)
Cao, Z., Simon, T., Wei, S.E., Sheikh, Y.: Realtime multi-person 2D pose estimation using part affinity fields. In: CVPR (2017)
Carletta, J., Ashby, S., Bourban, S., Flynn, M., Guillemot, M., Hain, T., Kadlec, J., Karaiskos, V., Kraaij, W., Kronenthal, M., Lathoud, G., Lincoln, M., Lisowska, A., McCowan, I., Post, W., Reidsma, D., Wellner, P.: The AMI meeting corpus: a pre-announcement. In: Renals, S., Bengio, S. (eds.) Machine Learning for Multimodal Interaction, Second International Workshop, vol. 10. Lecture Notes in Computer Science, pp. 28–39. Springer, Berlin (2006)
Cassell, J., Pelachaud, C., Badler, N., Steedman, M., Achorn, B., Becket, T., Douville, B., Prevost, S., Stone, M.: Animated conversation: rule-based generation of facial expression, gesture & spoken intonation for multiple conversational agents. In: Proceedings of the 21st Annual Conference on Computer Graphics and Interactive Techniques, pp. 413–420. ACM (1994)
Collobert, R., Weston, J.: A unified architecture for natural language processing: deep neural networks with multitask learning. In: Proceedings of the 25th International Conference on Machine Learning, ICML ’08, pp. 160–167. ACM, New York, NY, USA (2008)
de Kok, I., Heylen, D.: The MultiLis corpus dealing with individual differences in nonverbal listening behavior. Toward Autonomous, Adaptive, and Context-Aware Multimodal Interfaces. Theoretical and Practical Issues- Third COST 2102 International Training School, Caserta, Italy, March 15–19, 2010, Revised Selected Papers, pp. 362–375. Springer, Berlin (2010)
Deng, L., Li, J., Huang, J.T., Yao, K., Yu, D., Seide, F., Seltzer, M., Zweig, G., He, X., Williams, J., Gong, Y., Acero, A.: Recent advances in deep learning for speech research at Microsoft. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 8604–8608 (2013)
Duncan, S.: Gesture, verb aspect, and the nature of iconic imagery in natural discourse. Gesture 2(2), 183–206 (2002)
Esposito, A., Esposito, A.M.: On speech and gesture synchrony. In: Esposito, A., Vinciarelli, A., Vicsi, K., Pelachaud, C., Nijholt, A. (eds.) Communication and Enactment - The Processing Issues. LNCS, vol. 6800, pp. 252–272. Springer, Berlin (2011)
Eyben, F., Weninger, F., Gross, F., Schuller, B.: Recent developments in opensmile, the munich open-source multimedia feature extractor. In: Proceedings of the 21st ACM International Conference on Multimedia, MM ’13, pp. 835–838. ACM, New York, NY, USA (2013)
Feyereisen, P., Havard, I.: Mental imagery and production of hand gestures while speaking in younger and older adults. J. Nonverbal Behav. 23(2), 153–171 (1999)
Giorgolo, G., Verstraten, F.A.: Perception of ‘speech-and-gesture’ integration. Proc. Int. Conf. Audit.-Vis. Speech Process. 2008, 31–36 (2008)
Hadar, U., Steiner, T.J., Grant, E.C., Rose, F.C.: The relationship between head movements and speech dysfluencies. Lang. Speech 27(4), 333–342 (1984)
Hadar, U., Steiner, T.J., Grant, E.C., Rose, F.C.: The timing of shifts of head postures during conservation. Hum. Mov. Sci. 3(3), 237–245 (1984)
Hostetter, A.B., Potthoff, A.L.: Effects of personality and social situation on representational gesture production. Gesture 12(1), 62–83 (2012)
Hunyadi, L., Bertok, K., Nemeth, T., Szekrenyes, I., Abuczki, A., Nagy, G., Nagy, N., Nemeti, P., Bodog, A.: The outlines of a theory and technology of human-computer interaction as represented in the model of the HuComTech project. In: 2011 2nd International Conference on Cognitive Infocommunications, CogInfoCom 2011 (2011)
Ikizler-Cinbis, N., Sclaroff, S.: Object, scene and actions: combining multiple features for human action recognition. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) Computer Vision - ECCV 2010: 11th European Conference on Computer Vision, Heraklion, Crete, Greece, September 5–11, 2010, Proceedings, Part I, pp. 494–507. Springer, Berlin (2010)
Kendon, A.: Some relationships between body motion and speech. In: Seigman, A., Pope, B. (eds.) Studies in Dyadic Communication, pp. 177–216. Pergamon Press, Elmsford, New York (1972)
Kendon, A.: Gesture and speech: two aspects of the process of utterance. In: Key, M.R. (ed.) Nonverbal Communication and Language, pp. 207–227. Mouton (1980)
Kendon, A.: Gesture - Visible Action as Utterance. Cambridge University Press, Cambridge (2004)
Krämer, N., Kopp, S., Becker-Asano, C., Sommer, N.: Smile and the world will smile with you—the effects of a virtual agent’s smile on users’ evaluation and behavior. Int. J. Hum.-Comput. Stud. 71(3), 335–349 (2013)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Pereira, F., Burges, C.J.C., Bottou, L., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems, vol. 25, pp. 1097–1105. Curran Associates Inc. (2012)
Leonard, T., Cummins, F.: The temporal relation between beat gestures and speech. Lang. Cogn. Process. 26(10), 1457–1471 (2010)
Lis, M.: Multimodal representation of entities: a corpus-based investigation of co-speech hand gesture. Ph.D. thesis, University of Copenhagen (2014)
Lis, M., Navarretta, C.: Classifying the form of iconic hand gestures from the linguistic categorization of co-occurring verbs. In: 1st European Symposium on Multimodal Communication (MMSym’13), pp. 41–50 (2013)
Liu, M., Li, S., Shan, S., Chen, X.: AU-inspired deep networks for facial expression feature learning. Neurocomputing 159(Supplement C), 126–136 (2015)
Loehr, D.P.: Gesture and intonation. Ph.D. thesis, Georgetown University (2004)
Loehr, D.P.: Aspects of rhythm in gesture and speech. Gesture 7(2), (2007)
McNeill, D.: Hand and Mind: What Gestures Reveal about Thought. University of Chicago Press, Chicago (1992)
Mondada, L.: Emergent focused interactions in public places: a systematic analysis of the multimodal achievement of a common interactional space. J. Pragmat. 41, 1977–1997 (2009)
Mou, D.: Automatic Face Recognition, pp. 91–106. Springer, Berlin (2010)
Navarretta, C.: Individuality in communicative bodily behaviours. In: Esposito, A., Esposito, A.M., Vinciarelli, A., Hoffmann, R., Müller, V.C. (eds.) Behavioural Cognitive Systems. Lecture Notes in Computer Science, vol. 7403, pp. 417–423. Springer, Berlin (2012)
Navarretta, C.: Transfer learning in multimodal corpora. In: IEEE (ed.) In Proceedings of the 4th IEEE International Conference on Cognitive Infocommunications (CogInfoCom2013), pp. 195–200. Budapest, Hungary (2013)
Navarretta, C.: Fillers, filled pauses and gestures in Danish first encounters. In: Abstract proceedings of 3rd European Symposium on Multimodal Communication, pp. 1–3. Speech Communication Lab at Trinity College Dublin, Dublin (2015)
Navarretta, C.: Mirroring facial expressions and emotions in dyadic conversations. In: Chair, N.C.C., Choukri, K., Declerck, T., Goggi, S., Grobelnik, M., Maegaard, B., Mariani, J., Mazo, H., Moreno, A., Odijk, J., Piperidis, S. (eds.) Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016), pp. 469–474. European Language Resources Association (ELRA), Paris, France (2016)
Navarretta, C., Lis, M.: Multimodal feedback expressions in Danish and polish spontaneous conversations. In: NEALT Proceedings. Northern European Association for Language and Technology, Proceedings of the Fourth Nordic Symposium of Multimodal Communication, pp. 55–62. Linköping Electronic Conference Proceedings (2013)
Navarretta, C., Lis, M.: Transfer learning of feedback head expressions in Danish and polish comparable multimodal corpora. In: Proceedings of 9th Language Resources and Evaluation Conference (LREC 2014), pp. 3597–3603. Reykjavik, Island (2014)
Navarretta, C., Paggio, P.: Verbal and non-verbal feedback in different types of interactions. In: Proceedings of LREC 2012, pp. 2338–2342. Istanbul Turkey (2012)
Navarretta, C., Ahlsn, E., Allwood, J., Jokinen, K., Paggio, P.: Feedback in Nordic first-encounters: a comparative study. In: Proceedings of LREC 2012, pp. 2494–2499. Istanbul Turkey (2012)
Özyürek, A., Kita, S., Allen, S., Furman, R., Brown, A.: How does linguistic framing of events influence co-speech gestures? Insights from crosslinguistic variations and similarities. Gesture 5(1–2), 219–240 (2005)
Paggio, P., Ahlsén, E., Allwood, J., Jokinen, K., Navarretta, C.: The NOMCO multimodal Nordic resource - goals and characteristics. In: Proceedings of LREC 2010, pp. 2968–2973. Malta (2010)
Paggio, P., Navarretta, C.: The Danish NOMCO corpus: multimodal interaction in first acquaintance conversations. Lang. Resour. Eval. 51(2), 463–494 (2017). https://doi.org/10.1007/s10579-016-9371-6
Parkhi, O.M., Vedaldi, A., Zisserman, A.: Deep face recognition. In: Proceedings of the British Machine Vision Conference (BMVC) (2015)
Raina, R., Madhavan, A., Ng, A.Y.: Large-scale deep unsupervised learning using graphics processors. In: Proceedings 26th Annual International Conference on Machine Learning, pp. 873–888 (2009)
Ringeval, F., Sonderegger, A., Sauer, J., Lalanne, D.: Introducing the RECOLA multimodal corpus of remote collaborative and affective interactions. In: Proceedings of the 2nd International Workshop on Emotion Representation, Analysis and Synthesis in Continuous Time and Space (EmoSPACE 2013), Held in Conjunction with the 10th International IEEE Conference on Automatic Face and Gesture Recognition (FG 2013). Shanghai, China (2013)
Riviello, M.T., Esposito, A., Vicsi, K.: A cross-cultural study on the perception of emotions: how Hungarian subjects evaluate American and Italian emotional expressions. In: Esposito, A., Esposito, A.M., Vinciarelli, A., Hoffmann, R., Müller, V.C. (eds.) Cognitive Behavioural Systems: COST 2102 International Training School, Dresden, Germany, February 21–26, 2011, Revised Selected Papers, pp. 424–433. Springer, Berlin (2012)
Rizzolatti, G.: The mirror neuron system and its function in humans. Anat. Embryol. 210, 419–421 (2005)
Rizzolatti, G., Craighero, L.: The mirror-neuron system. Annu. Rev. Neurosci. 27, 169–192 (2004)
Sacks, H., Schegloff, E., Jefferson, G.: A simplest systematics for the organization of turn-taking for conversation. Language 50(4), 696–735 (1974)
Simon, T., Joo, H., Matthews, I., Sheikh, Y.: Hand keypoint detection in single images using multiview bootstrapping. In: CVPR (2017)
Singh, S.P., Kumar, A., Darbari, H., Singh, L., Rastogi, A., Jain, S.: Machine translation using deep learning: an overview. In: 2017 International Conference on Computer, Communications and Electronics (Comptelix), pp. 162–167 (2017)
Stieglitz, S., Dang-Xuan, L., Bruns, A., Neuberger, C.: Social media analytics. Bus. Inf. Syst. Eng. 6(2), 89–96 (2014)
Streeck, J.: Gesturecraft - The Manufacture of Meaning. John Benjamins Publishing Company (2009)
Streeck, J., Goodwin, C., LeBaron., C. (eds.): Embodied Interaction: Language and Body in the Material World. Cambridge University Press, Cambridge (2011)
Traum, D.R.: A computational theory of grounding in natural language conversation. Ph.D. thesis, Computer Science Department, University of Rochester (1994)
Wei, S.E., Ramakrishna, V., Kanade, T., Sheikh, Y.: Convolutional pose machines. In: CVPR (2016)
Weninger, F., Erdogan, H., Watanabe, S., Vincent, E., Roux, J.L., Hershey, J.R., Schuller, B.: Speech enhancement with LSTM recurrent neural networks and its application to noise-robust ASR. In: Vincent, E., Yeredor, A., Koldovský, Z., Tichavský, P. (eds.) Latent Variable Analysis and Signal Separation: 12th International Conference, LVA/ICA 2015, Liberec, Czech Republic, August 25–28, 2015, Proceedings, pp. 91–99. Springer International Publishing, Cham (2015)
You, Q., Luo, J., Jin, H., Yang, J.: Robust image sentiment analysis using progressively trained and domain transferred deep networks. In: Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence, AAAI’15, pp. 381–388. AAAI Press (2015)
Zadeh, A., Chen, M., Poria, S., Cambria, E., Morency, L.: Tensor fusion network for multimodal sentiment analysis. CoRR (2017). arXiv:abs/1707.07250
Zadeh, A., Zellers, R., Pincus, E., Morency, L.P.: MOSI: Multimodal corpus of sentiment intensity and subjectivity analysis in online opinion videos. IEEE Intell. Syst. 31(6), 81–88 (2016)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this chapter
Cite this chapter
Navarretta, C., Oemig, L. (2019). Big Data and Multimodal Communication: A Perspective View. In: Esposito, A., Esposito, A., Jain, L. (eds) Innovations in Big Data Mining and Embedded Knowledge. Intelligent Systems Reference Library, vol 159. Springer, Cham. https://doi.org/10.1007/978-3-030-15939-9_9
Download citation
DOI: https://doi.org/10.1007/978-3-030-15939-9_9
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-15938-2
Online ISBN: 978-3-030-15939-9
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)