A fuzzy classifier to deal with similarity between labels on automatic prosodic labeling☆
Introduction
Prosodic labeling aims to enrich spoken utterances with labels that are representative of the relationship between the prosodic form and function of the constituents of the message. Although the prosodic labeling systems establish clear rules and protocols, the difficulty of the task and the inherent subjectivity of the labelers’ judgments cause a high number of inconsistencies. The prosodic labeling systems assume that uncertain situations could appear and reserve special symbols for representing them (like the symbol ‘?’ in ToBI (Beckman et al., 2005)). Nevertheless, apart from these declared uncertain situations, the inter-transcriber tests have shown a relevant number of situations where two different transcribers assign a different label to the same prosodic unit (Escudero et al., 2012). Moreover, the apparent perceptual and acoustic similarity of several pair of labels and their corresponding prosodic units to be labeled is one of the reasons for the uncertain assignment of prosodic labels (Escudero and Estebas, 2012).
Fuzzy sets theory (DuBois and Prade, 1980) has been widely used to represent this type of situations where it is difficult to classify a given element into the different possible categories. The prosodic categories in a prosodic labeling system like ToBI are not fuzzy categories because they have a linguistic phonological meaning. Nevertheless, the labels assigned by human or automatic transcribers are uncertain information according to the experimental evidence. In this paper we show how fuzzy sets can be used in situations where assigning a given prosodic unit to a class is difficult and in situations where more than one class could be assigned as there exists a degree of uncertainty.
The difficulties of manual prosodic labeling are projected into automatic prosodic labeling. Prosodic labeling strongly depends on perceptual judgments that must be performed by human transcribers. Nevertheless, automatic prosodic labeling is a need because manual prosodic labeling is a slow and costly process. There are several applications that can benefit from prosodic labeling, but require the processing of huge corpora. Labeling such corpora manually is not affordable. Thus, automatic prosodic labeling systems need to be refined for this type of applications. This paper introduces a new proposal for this: rather than trying to automatically generate one label per word, our proposed algorithm will generate zero or more labels per word, depending on the value of a single α-cut of the fuzzy classifier.
The state of the art on automatic prosodic labeling reports identification rates higher than 90% in binary decisions, referring to determining the presence or absence of accent, boundary or break. Nevertheless, when trying to classify different types of accents, boundaries or breaks, the classification rates dramatically decrease to about 70% (see Section 2.2 for a review of the state of the art). In González-Ferreras et al. (2010), we showed that the reasons for these low accuracy rates are the high similarity among some pair of classes and the imbalanced nature of the prosodic corpora. In this article we show that the use of a fuzzy classifier considerably increases the accuracy rates when soft classification is performed.
Fuzzy classifiers have shown themselves to be useful in various applications, from system control to decision making (Zimmermann, 1999). In this paper we present a fuzzy classifier based on an adaptation of the prosodic event classifier described in González-Ferreras et al. (2012). With the application of fuzzy expert fusion, the output of the base classifiers is interpreted as a confidence value or a membership degree of a given prosodic unit to a prosodic category. We benefit from the fact that each base classifier behaves differently in the task of separating the different pairs of classes. The application of α-cuts allows the assignment of labels with a certain degree of uncertainty. The method allows the performance to be increased by assigning soft labels to some of the prosodic units. The prosodic units where the uncertainty is higher are identified. We show that the pairs of labels involved in uncertain situations are the most confused pairs that appear in perceptual inter-transcriber tests. The solution we propose in this paper is specially interesting in a situation were automatic labelings are being check by human transcribers, since it is easier for a human to select the best label from a list of candidates than replace the automatic label with some other drawn from perception.
The structure of the paper is as follows. First, we review the state of the art on prosodic labeling, automatic prosodic labeling and fuzzy classification. Next, the fuzzy classifier and the experimental procedure are described. Then we show the results and discuss about the application of the prosodic labeling and the uncertainty of the labeling process. Finally, we present some conclusions.
Section snippets
Prosodic labeling
The Autosegmental-Metrical (AM) model (Ladd, 1996) has been widely applied in prosodic annotation of spoken corpora. The model was proposed in Pierrehumbert's thesis (Pierrehumbert, 1980). Based on this model, ToBI (Tones and Break Indices) is a framework for transcribing and annotating the prosody of speech. ToBI-based systems have been developed for many languages such as English (Beckman et al., 2005), Spanish (Beckman et al., 2000, Beckman, 2002, Estebas and Prieto, 2009), German (Grice and
Fuzzy classifier
Given an input vector x and a set of labels , classic pattern recognition assigns a unique label l* to x. The classification rule selects the label l* which maximizes the posterior probability:The objective of fuzzy classification is to obtain membership values μi as an estimation of P(li|x). This vector of membership values μ = μ1 … μi … μC is used in the decision making process. The fuzzy classification problem is a step between the traditional pattern classification
Experiments
In this section we describe the corpus and the features used in the experiments. Next, we compare the results obtained with different classifiers. Finally, we present an analysis of the behavior of the multiple label classifier for different values of α.
Applications on prosodic labeling
The use of automatic labeling systems for speeding up the manual prosodic labeling has been defended. Thus, the labelers have a labeled version of the utterances that must be reviewed (Syrdal et al., 2001). Due to the perceptual character of the prosodic labels, they must be assigned or checked by human labelers. The duration of the labeling process is a critical variable due to the implication of highly qualified staff.
Fig. 2 shows the interface for checking the output of the automatic fuzzy
Conclusions
In this article we have presented an alternative system for assisting the labeling of prosodic events that consists of the use of a fuzzy classifier. The use of the fuzzy classifier is justified by the high uncertainty that is observed in manual labeling which has, as a consequence, a relevant number of inconsistencies. The combination of different types of classifiers that are aggregated by using fuzzy logic techniques permits the output of the new composed classifier to be considered as a
Acknowledgments
This work has been partly supported by the National R&D&I Plan of the Spanish Government FFI2011-29559-C02-01. We want to thank Prof. Ludmila Kuncheva, for the Matlab scripts used to combine classifiers.
References (89)
- et al.
Multistage classifiers optimized by neural networks and genetic algorithms
Nonlinear Anal. Theory Methods Appl.
(1997) - et al.
On the use of aggregation operations in information fusion processes
Fuzzy Sets Syst.
(2004) - et al.
Analysis of inter-transcriber consistency in the Cat_ToBI prosodic labeling system
Speech Commun.
(2012) - et al.
Fusion of handwritten word classifiers
Pattern Recogn. Lett.
(1996) The application of fuzzy integrals in multicriteria decision making
Eur. J. Oper. Res.
(1996)- et al.
Simultaneous recognition of words and prosody in the Boston University Radio Speech Corpus
Speech Commun.
(2005) Form and function in the representation of speech prosody
Speech Commun.
(2005)- et al.
Genetic fuzzy classifier for sleep stage identification
Comput. Biol. Med.
(2010) - et al.
Decision templates for multiple classifier fusion: an experimental comparison
Pattern Recogn.
(2001) - et al.
From English pitch accent detection to Mandarin stress detection, where is the difference?
Comput. Speech Lang.
(2012)
Advanced soft computing diagnosis method for tumour grading
Artif. Intell. Med.
Fuzzy logic and its applications in medicine
Int. J. Med. Inform.
Prediction of abstract prosodic labels for speech synthesis
Comput. Speech Lang.
Intonation modeling for Indian languages
Comput. Speech Lang.
Automatic ToBI prediction and alignment to speed manual labeling of prosody
Speech Commun.
Soft combination of neural classifiers: a comparative study
Pattern Recogn. Lett.
Fuzzy sets
Inform. Control
An automatic prosody recognizer using a coupled multi-stream acoustic model and a syntactic–prosodic language model
Automatic prosodic event detection using acoustic, lexical, and syntactic evidence
IEEE Trans. Audio Speech Lang. Process.
Fine-grained pitch accent and boundary tone labeling with parametric F0 features
Intonational analysis and prosodic annotation of Greek spoken corpora
Automated pattern recognition for intonation (print) an essay on linguistic categorization
Intonation across Spanish in the Tones and Break Indices framework
Probus
The original ToBI system and the evolution of the ToBI framework
K-ToBI (Korean ToBI) labeling conventions: version 3
Korean J. Speech Sci.
Intonation across Spanish, in the tones and break indices framework. Tech. Rep.
Inter-transcriber reliability for two systems of prosodic annotation: Tobi (tones and break indices) and rap (rhythm and pitch)
Corpus Linguist. Linguist. Theory
The alternatives (alt) tier for tobi: advantages of capturing prosodic ambiguity
An automatic prosody labeling system using ANN-based syntactic–prosodic model and GMM-based acoustic–prosodic model
A maximum likelihood prosody recognizer
Combining multiple neural networks by fuzzy integral and robust classification
IEEE Trans. Syst. Man Cybern.
Multiple network fusion using fuzzy logic
IEEE Trans. Neural Netw.
The role of syntactic structure in guiding prosody perception with ordinary listeners and everyday speech
Lang. Cogn. Process.
Signal-based and expectation-based factors in the perception of prosodic prominence
Lab. Phonol.
On the principles of fuzzy classification
The rap (rythm and pitch) labeling system. Tech. Rep.
Industrial applications of soft computing: a review
Proc. IEEE
Pattern Classification
Applying data mining techniques to corpus based prosodic modeling
Speech Commun.
Corpus based extraction of quantitative prosodic parameters of stress groups in Spanish
Visualizing tool for evaluating inter-label similarity in prosodic labeling experiments
La notación prosódica del espa nol. una revisión del Sp-ToBI
Estudios de Fonética Experimental
Castilian Spanish intonation
Cited by (15)
Prosodic event detection in children's read speech
2021, Computer Speech and LanguageCitation Excerpt :Further, Moniz et al. (2014) applied it on spontaneous European Portuguese speech, to find that prominence estimation performance was excellent for adult speech, but much lower for children’s speech. Word-based features obtained from frame-level acoustic contours have been used with several different supervised classification schemes including decision trees, support vector machines and neural networks (Ananthakrishnan and Narayanan, 2008; González-Ferreras et al., 2012; Escudero-Mancebo et al., 2014). Speech utterances labeled for boundary and prominence at the word level comprise the training and testing data.
Identifying characteristic prosodic patterns through the analysis of the information of Sp_ToBI label sequences
2017, Computer Speech and LanguageCitation Excerpt :A classification rate of 70.8% for pitch accents and 84.2% for boundary tones was reported in the Boston Radio News Corpus. An improvement of the classifier was described in Escudero-Mancebo et al. (2014), using fuzzy logic techniques and reaching a soft classification rate of 81.8% for pitch accents. The adaptation of the classifier for its use with the Sp_ToBI labeling system is detailed in Escudero et al. (2014a).
Second language prosody and computer modeling
2022, Second Language Prosody and Computer ModelingAutomatic analysis of speech prosody in Dutch
2020, Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECHMeasuring prosodic transfer in vector space by weighted tonal events
2018, 2018 11th International Symposium on Chinese Spoken Language Processing, ISCSLP 2018 - ProceedingsRecent Advances in the Internet of Things: Multiple Perspectives
2017, IETE Technical Review (Institution of Electronics and Telecommunication Engineers, India)
- ☆
This paper has been recommended for acceptance by T. Kawahara.