Skip to main content
Log in

Developing a Thai emotional speech corpus from Lakorn (EMOLA)

  • Original Paper
  • Published:
Language Resources and Evaluation Aims and scope Submit manuscript

Abstract

Advances in emotional speech recognition and synthesis essentially rely on the availability of annotated emotional speech corpora. As a low resource language, the Thai language critically lacks corpora of emotional speech, although a few corpora have been constructed for speech recognition and synthesis. This paper presents the design of a Thai emotional speech corpus (namely EMOLA), its construction and annotation process, and its analysis. In the corpus design, four basic types with twelve subtypes of emotions are defined with consideration of the Pleasure-Arousal-Dominance emotional state model. To construct the corpus, a series of Thai dramas (1397 min) were selected and its video clips of approximately 868 min were annotated. As a result, 8987 transcriptions (of conversation turns) were derived in total, with each transcription tagged as one basic type and a few subtypes. Finally, an analysis was conducted to describe the characteristics of this corpus in three sets of statistics: collection-level, annotator-oriented and actor-oriented statistics.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  • Abrilian, S., Devillers, L., Buisine, S., & Martin, J.-C. (2005). EmoTV1: Annotation of real-life emotions for the specification of multimodal affective interfaces. In HCI International.

  • Arimoto, Y., Ohno, S., & Iida, H. (2008). Automatic emotional degree labeling for speakers’ anger utterance during natural Japanese Dialog. In LREC.

  • Arimoto, Y., Ohno, S., & Iida, H. (2011). Assessment of spontaneous emotional speech database toward emotion recognition: Intensity and similarity of perceived emotion from spontaneously expressed emotional speech. Acoustical Science and Technology, 32(1), 26–29.

    Article  Google Scholar 

  • Asghar, D., Moloud, P., & Peymaneh, S. (2008). The pattern of Facial Expression among Iranian Children. In Proceedings of Measuring Behavior (pp. 172–173). Maastricht.

  • Bachorowski, J.-A. (1999). Vocal expression and perception of emotion. Current Directions in Psychological Science, 8(2), 53–57.

    Article  Google Scholar 

  • Bann, E. Y., & Bryson, J. J. (2012). The conceptualisation of emotion qualia: Semantic clustering of emotional tweets. In Computational models of cognitive processes: Proceedings of the 13th neural computation and psychology workshop (pp. 249–263). World Scientific.

  • Bao, W., Li, Y. A., Yang, M., Li, H., Chao, L., & Tao, J. (2014). Building a Chinese Natural Emotional Audio-visual Database. In 12th international conference on signal processing (ICSP) (pp. 583–587).

  • Batliner, A., Fischer, K., Huber, R., Spilker, J., & Nöth, E. (2003). How to find trouble in communication. Speech Communication, 40(1), 117–143.

    Article  Google Scholar 

  • Burkhardt, F. A., Rolfes, M., Sendlmeier, W. F., & Weiss, B. (2005). A database of German emotional speech. Interspeech, 5, 1517–1520.

    Google Scholar 

  • Busso, C., Bulut, M., Lee, C.-C., Kazemzadeh, A., Mower, E., Kim, S., et al. (2008). IEMOCAP: Interactive emotional dyadic motion capture database. Language Resources and Evaluation, 42, 335–359.

    Article  Google Scholar 

  • Campbell, N. (2003). Databases of expressive speech. In Proceedings of oriental COCOSDA workshop.

  • Cichosz, J., & Slot, K. (2005). Low-Dimensional feature space derivation for emotion recognition. In Ninth European conference on speech communication and technology (pp. 477–480).

  • Cichosz, J., & Slot, K. (2007). Emotion recognition in speech signal using emotion-extracting binary decision trees. In Proceedings of affective computing and intelligent interaction.

  • Cole, R. (2005). The CU kids’ speech corpus. The Center for Spoken Language Research (CSLR). http://cslr.colorado.edu/.

  • Colombetti, G. (2009). From affect programs to dynamical discrete emotions. Philosophical Psychology, 22(4), 407–425.

    Article  Google Scholar 

  • Costantini, G., Iaderola, I., Paoloni, A., & Todisco, M. (2014). EMOVO Corpus: an Italian emotional speech database. In LREC (pp. 3501–3504).

  • Cowie, R., & Cornelius, R. R. (2003). Describing the emotional states that are express in speech. Speech Communication, 40(1), 5–32.

    Article  Google Scholar 

  • Crystal, D. (1975). The English tone of voice. London: Edward Arnold.

    Google Scholar 

  • Crystal, D. (1976). Prosodic systems and intonation in English. Cambridge: Cambridge University Press.

    Google Scholar 

  • Dadkhah, A., Pourmohammadi, M., & Shirinbayan, P. (2008). The pattern of Facial Expression among Iranian Children. In: Measuring behavior 2008. Psychonomic Soc Inc, 1710 Fortview Rd, Austin, TX 78704, USA.

  • Douglas-Cowie, E., Campbell, N., Cowie, R., & Roach, P. (2003). Emotional speech: Towards a new generation of databases. Speech Communication, 40(1), 33–60.

    Article  Google Scholar 

  • Douglas-Cowie, E., Cowie, R., & Schroder, M. (2000). A new emotion database: considerations, sources and scope. In ISCA tutorial and research workshop (ITRW) on speech and emotion.

  • Douglas-Cowie, E., Cowie, R., Sneddon, I., Cox, C., Lowry, O., Mcrorie, M., et al. (2007). The HUMAINE database: addressing the collection and annotation of naturalistic and induced emotional data. In Affective computing and intelligent interaction (pp. 488–500).

  • Douglas-Cowie, E., Devillers, L., Martin, J.-C., Cowie, R., Savvidou, S., Abrilian, S., et al. (2005). Multimodal databases of everyday emotion: Facing up to complexity. In Ninth European conference on speech communication and technology.

  • Ekman, P. (1992). An argument for basic emotions. Cognition and Emotion, 6(3/4), 169–200.

    Article  Google Scholar 

  • Ekman, P., Friesen, W. V., & Ellsworth, P. (1972). Emotion in the human face: Guide-lines for research and an integration of findings: Guidelines for research and an integration of findings. Oxford: Pergamon.

    Google Scholar 

  • Fersini, E., Messina, E., & Archetti, F. (2012). Emotional states in judicial courtrooms: an experimental investigation. Speech Communication, 54(1), 11–22.

    Article  Google Scholar 

  • Fersini, E., Messina, E., Arosio, G., & Archetti, F. (2009). Audio-based emotion recognition in judicial domain: A multilayer support vector machines approach. In International workshop on machine learning and data mining in pattern recognition (pp. 594–602). Springer.

  • Fu, L., Mao, X., & Chen, L. (2008). Speaker independent emotion recognition based on SVM/HMMs fusion system. In International conference on audio, language and image processing, 2008 (ICALIP2008) (pp. 61–65). IEEE.

  • Greasley, P., Setter, J., Waterman, M., Sherrard, C., Roach, P., Arnfield, S., et al. (1995). Representation of prosodic and emotional features in a spoken language database. In Proceedings of the XIIIth ICPhS.

  • Grimm, M., Kroschel, K., & Narayanan, S. (2008). The Vera am Mittag German audio-visual emotional speech database. In IEEE international conference on multimedia and expo.

  • Haq, S., Jackson, P. J., & Edge, J. (2008). Audio-visual feature selection and reduction for emotion classification. In Proceedings of AVSP (pp. 185–190).

  • Havlena, W. J., & Holbrook, M. B. (1986). The varieties of consumption experience: Comparing two typologies of emotion in consumer behavior. Journal of Consumer Research, 13(3), 394–404.

    Article  Google Scholar 

  • Hozjan, V., Kacic, Z., Moreno, A., Bonafonte, A., & Nogueiras, A. (2002). Interface databases: Design and collection of a multilingual emotional speech database. In LREC.

  • Iida, A., Campbell, N., Iga, S., Higuchi, F., & Yasumura, M. (1998). Acoustic nature and perceptual testing of corpora of emotional speech. In ICSLP.

  • Johnstone, T., & Scherer, K. R. (1999). The effects of emotions on voice quality. In Proceedings of the XIVth international congress of phonetic sciences (pp. 2029–2032). Citeseer.

  • Kaiser, S., & Scherer, K. R. (1998). Models of ‘normal’ emotions applied to facial and vocal expression in clinical disorders. In J. Flack, F. William & J. D. Laird (Eds.), Emotions in psychopathology: Theory and research (pp. 81–98).

  • Koolagudi, S. G., Maity, S., Kumar, V. A., Chakrabarti, S., & Rao, K. S. (2009). IITKGP-SESC: Speech database for emotion analysis. In International conference on contemporary computing (pp. 485–492). Berlin: Springer.

  • Kostoulas, T., Ganchev, T., Mporas, I., & Fakotakis, N. (2008). A real-world emotional speech corpus for modern greek. In LREC.

  • Kövecses, Z. (2003). Metaphor and emotion: Language, culture, and body in human feeling. Cambridge: Cambridge University Press.

    Google Scholar 

  • Laskowski, K., & Burger, S. (2006). Annotation and analysis of emotionally relevant behavior in the ISL meeting corpus. In LREC.

  • Li, A. (2015). Encoding and decoding of emotional speech: A cross-cultural and multimodal study between Chinese and Japanese. Berlin: Springer.

    Book  Google Scholar 

  • Lian-hong, C., Dan-dan, C., & Rui, C. (2007). TH-CoSS,a Mandarin Speech Corpus for TTS. Journal of Chinese Information Processing, 02.

  • Lubis, N. A. (2014). Construction and analysis of Indonesian emotional speech corpus. In 17th oriental chapter of the international committee for the co-ordination and standardization of speech databases and assessment techniques (COCOSDA) (pp. 1–5).

  • Lubis, N., Gomez, R., Sakti, S., Nakamura, K., Yoshino, K., Nakamura, S., et al. (2016). Construction of Japanese audio-visual emotion database and its application in emotion recognition. In LREC.

  • Lubis, N., Sakti, S., Neubig, G., Toda, T., & Nakamura, S. (2015). Construction and analysis of social-affective interaction corpus in English and Indonesian. In Oriental COCOSDA held jointly with 2015 conference on asian spoken language research and evaluation (O-COCOSDA/CASLRE) (pp. 202–206).

  • Martin, O., Kotsia, I., Macq, B., & Pitas, I. (2006). The eNTERFACE’05 audio-visual emotion database. In Data engineering workshops, 2006 (p. 8). IEEE.

  • Mehrabian, A. (1995). Relationships among three general approaches to personality description. The Journal of Psychology, 129(5), 565–581.

    Article  Google Scholar 

  • Mehrabian, A. (1996). Pleasure-arousal-dominance: A general framework for describing and measuring individual differences in temperament. Current Psychology, 14(4), 261–292.

    Article  Google Scholar 

  • Mehrabian, A., & Russell, J. A. (1974). Approach to environmental psychology. Cambridge, MA: MIT Press.

    Google Scholar 

  • Mori, H., Satake, T., Nakamura, M., & Kasuya, H. (2008). UU database: A spoken dialogue corpus for studies on paralinguistic information in expressive conversation. In International conference on text, speech and dialogue (pp. 427–434). Berlin: Springer.

  • Mori, H., Satake, T., Nakamura, M., & Kasuya, H. (2011). Constructing a spoken dialogue corpus for studying paralinguistic information in expressive conversation and analyzing its statistical/acoustic characteristics. Speech Communication, 53(1), 36–50.

    Article  Google Scholar 

  • Moriyama, T., Mori, S., & Ozawa, S. (2009). A synthesis method of emotional speech using subspace constraints in prosody. Journal of Information Processing, 50(3), 1181–1191.

    Google Scholar 

  • Nwe, T. L., Foo, S. W., & De Silva, L. C. (2003). Speech emotion recognition using hidden Markov models. Speech Communication, 41(4), 603–623.

    Article  Google Scholar 

  • O’Connor, J., & Arnold, G. (1973). Intonation of colloquial English. London: Longman.

    Google Scholar 

  • Plutchik, R. (1980). A general psychoevolutionary theory of emotion. In R. Plutchik & H. Kellerman (Eds.), Emotion: Theory, research, and experience (Vol. 1, pp. 3–31). New York: Academic Press.

    Chapter  Google Scholar 

  • Plutchik, R. (1984). Emotions: A general psychoevolutionary theory. In Approaches to emotion (pp. 197–219).

  • Posner, J., Russell, J. A., & Petersona, B. S. (2005). The circumplex model of affect: An integrative approach to affective neuroscience, cognitive development, and psychopathology. Dvelopmental and Psychopathology, 17(3), 715–734.

    Google Scholar 

  • Ringeval, F., Sonderegger, A., Sauer, J., & Lalanne, D. (2013). Introducing the RECOLA multimodal corpus of remote collaborative and affective interactions. In Automatic Face and Gesture Recognition (FG), 2013 10th IEEE international conference and workshops on (pp. 1–8). IEEE.

  • Russell, J., & Mehrabian, A. (1977). Evidence for a three-factor theory of emotions. Journal of Research in Personality, 11, 273–294.

    Article  Google Scholar 

  • Saratxaga, I., Navas, E., Hernaez, I., & Luengo, I. (2006). Designing and recording an emotional speech database for corpus based synthesis in Basque. In Proceedings of fifth international conference on language resources and evaluation (LREC) (pp. 2126–2129).

  • Scherer, K. R. (1986). Vocal affect expression: a review and a model for future research. Psychological Bulletin, 99(2), 143.

    Article  Google Scholar 

  • Scherer, K. R. (1995). Expression of emotion in voice and music. Journal of Voice, 9(3), 235–248.

    Article  Google Scholar 

  • Scherer, K. R., & Tannenbaum, P. H. (1986). Emotional experiences in everyday life: A survey approach. Motivation and Emotion, 10(4), 295–314.

    Article  Google Scholar 

  • Schlosberg, H. (1954). Three dimensions of emotion. Psychological Review, 61, 81–88.

    Article  Google Scholar 

  • Schubiger, M. (1958). English intonation, its form and function. Tübingen: M. Niemeyer Verlag.

    Google Scholar 

  • Sneddon, I., McRorie, M., McKeown, G., & Hanratty, J. (2012). The Belfast induced natural emotion database. IEEE Transactions on Affective Computing, 3(1), 32–41.

    Article  Google Scholar 

  • Stein, N. L., & Oatley, K. (1992). Basic emotions: Theory and measurement. Cognition and Emotion, 6(3–4), 161–168.

    Article  Google Scholar 

  • Trong, K. P., Neerincx, M. A., & Van Leeuwen, D. A. (2008). Measuring spontaneous vocal and facial emotion expressions in real world environments. In Proceedings of measuring behavior 2008 (pp. 170–171). Maastricht.

  • Ververidis, D., & Kotropoulos, C. (2006). Emotional speech recognition: Resources, features, and methods. Speech Communication, 48(9), 1162–1181.

    Article  Google Scholar 

  • Wang, X., Li, A., & Tao, J. (2007). An expressive speech corpus of standard Chinese. In O-COCOSDA2007. Hanoi, Vietnam.

  • Watson, D., & Tellegan, A. (1985). Toward a consensual structure of mood. Psychological Bulletin, 98, 219–235.

    Article  Google Scholar 

  • Wu, T., Yang, Y., Wu, Z., & Li, D. (2006). MASC: A speech corpus in Mandarin for emotion analysis and affective speaker recognition. In 2006 IEEE Odyssey-the speaker and language recognition workshop (pp. 1–5).

  • Wundt, W. M. (1897). Outlines of psychology. In http://psychclassics.asu.edu/index.htm, Classics in the history of psychology. Toronto: York University 2010.

  • Yamagishi, J., Onishi, K., Masuko, T., & Kobayashi, T. (2005). Acoustic modeling of speaking styles and emotional expressions in HMM-based speech synthesis. IEICE TRANSACTIONS on Information and Systems, 88(3), 502–509.

    Article  Google Scholar 

  • Zhang, S., Ching, P., & Kong, F. (2006). Acoustic analysis of emotional speech in Mandarin Chinese. In International symposium on chinese spoken language processing (pp. 57–66).

  • Zhang, S., Xu, Y., Jia, J., & Cai, L. (2008). Analysis and modeling of affective audio visual speech based on PAD emotion space. In 6th international symposium on Chinese spoken language processing (pp. 1–4). Kunming, China.

  • Zovato, E., Sandri, S., Quazza, S., & Badino, L. (2004). Prosodic analysis of a multi-style corpus in the perspective of emotional speech synthesis. In ICSLP 2004 (Vol. 2, pp. 1453–1457). Prentice Hall.

Download references

Acknowledgements

This work was partially supported by a SIIT graduate student scholarship, the Center of Excellence in Intelligent Informatics, Speech and Language Technology and Service Innovation (CILS), Thammasat University, the Center of Excellence in Intelligent Informatics and Service Innovation (IISI), SIIT, Thammasat University, and the Thailand Research Fund under the Grant Number RTA6080013.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Thanaruk Theeramunkong.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kasuriya, S., Theeramunkong, T., Wutiwiwatchai, C. et al. Developing a Thai emotional speech corpus from Lakorn (EMOLA). Lang Resources & Evaluation 53, 17–55 (2019). https://doi.org/10.1007/s10579-018-9428-9

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10579-018-9428-9

Keywords

Navigation