Abstract:
The human voice is the most frequently used mode of communication among people. It carries both linguistic and paralinguistic information. For an emotion classification t...Show MoreMetadata
Abstract:
The human voice is the most frequently used mode of communication among people. It carries both linguistic and paralinguistic information. For an emotion classification task, it is important to process paralinguistic information because it describes the current affective state of a speaker. This affective information can be used for health care purposes, customer service enhancement and in the entertainment industry. Previous research in the field mostly relied on handcrafted features that are derived from speech signals and thus used for the construction of mainly statistical models. Today, by using new technologies, it is possible to design models that can both extract features and perform classification. This preliminary research explores the performance of a model that comprises a convolutional neural network for feature extraction and a deep neural network that performs emotion classification. The convolutional neural network consists of three convolutional layers that filter input spectrograms in time and frequency dimensions and two dense layers forming the deep part of the model. The unified neural network is trained and tested spectrograms of speech utterances from the Berlin database of emotional speech.
Published in: 2019 42nd International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO)
Date of Conference: 20-24 May 2019
Date Added to IEEE Xplore: 11 July 2019
ISBN Information:
Electronic ISSN: 2623-8764