A fuzzy classifier to deal with similarity between labels on automatic prosodic labeling

doi:10.1016/j.csl.2013.08.001

Computer Speech & Language

Volume 28, Issue 1, January 2014, Pages 326-341

https://doi.org/10.1016/j.csl.2013.08.001 Get rights and content

Abstract

This paper presents an original approach to automatic prosodic labeling. Fuzzy logic techniques are used for representing situations of high uncertainty with respect to the category to be assigned to a given prosodic unit. The Fuzzy Integer technique is used to combine the output of different base classifiers. The resulting fuzzy classifier benefits from the different capabilities of the base classifiers for identifying different types of prosodic events. At the same time, the fuzzy classifier identifies the events that are potentially more difficult to be labeled. The classifier has been applied to the identification of ToBI pitch accents. The state of the art on pitch accent multiclass classification reports around 70% accuracy rate. In this paper we describe a fuzzy classifier which assigns more than one label in confusing situations. We show that the pairs of labels that appear in these uncertain situations are consistent with the most confused pairs of labels reported in manual prosodic labeling experiments. Our fuzzy classifier obtains a soft classification rate of 81.8%, which supports the potential of the proposed system for computer assisted prosodic labeling.

Introduction

Prosodic labeling aims to enrich spoken utterances with labels that are representative of the relationship between the prosodic form and function of the constituents of the message. Although the prosodic labeling systems establish clear rules and protocols, the difficulty of the task and the inherent subjectivity of the labelers’ judgments cause a high number of inconsistencies. The prosodic labeling systems assume that uncertain situations could appear and reserve special symbols for representing them (like the symbol ‘?’ in ToBI (Beckman et al., 2005)). Nevertheless, apart from these declared uncertain situations, the inter-transcriber tests have shown a relevant number of situations where two different transcribers assign a different label to the same prosodic unit (Escudero et al., 2012). Moreover, the apparent perceptual and acoustic similarity of several pair of labels and their corresponding prosodic units to be labeled is one of the reasons for the uncertain assignment of prosodic labels (Escudero and Estebas, 2012).

Fuzzy sets theory (DuBois and Prade, 1980) has been widely used to represent this type of situations where it is difficult to classify a given element into the different possible categories. The prosodic categories in a prosodic labeling system like ToBI are not fuzzy categories because they have a linguistic phonological meaning. Nevertheless, the labels assigned by human or automatic transcribers are uncertain information according to the experimental evidence. In this paper we show how fuzzy sets can be used in situations where assigning a given prosodic unit to a class is difficult and in situations where more than one class could be assigned as there exists a degree of uncertainty.

The difficulties of manual prosodic labeling are projected into automatic prosodic labeling. Prosodic labeling strongly depends on perceptual judgments that must be performed by human transcribers. Nevertheless, automatic prosodic labeling is a need because manual prosodic labeling is a slow and costly process. There are several applications that can benefit from prosodic labeling, but require the processing of huge corpora. Labeling such corpora manually is not affordable. Thus, automatic prosodic labeling systems need to be refined for this type of applications. This paper introduces a new proposal for this: rather than trying to automatically generate one label per word, our proposed algorithm will generate zero or more labels per word, depending on the value of a single α-cut of the fuzzy classifier.

The state of the art on automatic prosodic labeling reports identification rates higher than 90% in binary decisions, referring to determining the presence or absence of accent, boundary or break. Nevertheless, when trying to classify different types of accents, boundaries or breaks, the classification rates dramatically decrease to about 70% (see Section 2.2 for a review of the state of the art). In González-Ferreras et al. (2010), we showed that the reasons for these low accuracy rates are the high similarity among some pair of classes and the imbalanced nature of the prosodic corpora. In this article we show that the use of a fuzzy classifier considerably increases the accuracy rates when soft classification is performed.

Fuzzy classifiers have shown themselves to be useful in various applications, from system control to decision making (Zimmermann, 1999). In this paper we present a fuzzy classifier based on an adaptation of the prosodic event classifier described in González-Ferreras et al. (2012). With the application of fuzzy expert fusion, the output of the base classifiers is interpreted as a confidence value or a membership degree of a given prosodic unit to a prosodic category. We benefit from the fact that each base classifier behaves differently in the task of separating the different pairs of classes. The application of α-cuts allows the assignment of labels with a certain degree of uncertainty. The method allows the performance to be increased by assigning soft labels to some of the prosodic units. The prosodic units where the uncertainty is higher are identified. We show that the pairs of labels involved in uncertain situations are the most confused pairs that appear in perceptual inter-transcriber tests. The solution we propose in this paper is specially interesting in a situation were automatic labelings are being check by human transcribers, since it is easier for a human to select the best label from a list of candidates than replace the automatic label with some other drawn from perception.

The structure of the paper is as follows. First, we review the state of the art on prosodic labeling, automatic prosodic labeling and fuzzy classification. Next, the fuzzy classifier and the experimental procedure are described. Then we show the results and discuss about the application of the prosodic labeling and the uncertainty of the labeling process. Finally, we present some conclusions.

Section snippets

Prosodic labeling

The Autosegmental-Metrical (AM) model (Ladd, 1996) has been widely applied in prosodic annotation of spoken corpora. The model was proposed in Pierrehumbert's thesis (Pierrehumbert, 1980). Based on this model, ToBI (Tones and Break Indices) is a framework for transcribing and annotating the prosody of speech. ToBI-based systems have been developed for many languages such as English (Beckman et al., 2005), Spanish (Beckman et al., 2000, Beckman, 2002, Estebas and Prieto, 2009), German (Grice and

Fuzzy classifier

Given an input vector x and a set of labels $L = {l_{1} \dots l_{C}}$ , classic pattern recognition assigns a unique label l^* to x. The classification rule selects the label l^* which maximizes the posterior probability: $l^{*} = arg max_{l} P (l | x)$ The objective of fuzzy classification is to obtain membership values μ_i as an estimation of P(l_i|x). This vector of membership values μ = μ₁ … μ_i … μ_C is used in the decision making process. The fuzzy classification problem is a step between the traditional pattern classification

Experiments

In this section we describe the corpus and the features used in the experiments. Next, we compare the results obtained with different classifiers. Finally, we present an analysis of the behavior of the multiple label classifier for different values of α.

Applications on prosodic labeling

The use of automatic labeling systems for speeding up the manual prosodic labeling has been defended. Thus, the labelers have a labeled version of the utterances that must be reviewed (Syrdal et al., 2001). Due to the perceptual character of the prosodic labels, they must be assigned or checked by human labelers. The duration of the labeling process is a critical variable due to the implication of highly qualified staff.

Fig. 2 shows the interface for checking the output of the automatic fuzzy

Conclusions

In this article we have presented an alternative system for assisting the labeling of prosodic events that consists of the use of a fuzzy classifier. The use of the fuzzy classifier is justified by the high uncertainty that is observed in manual labeling which has, as a consequence, a relevant number of inconsistencies. The combination of different types of classifiers that are aggregated by using fuzzy logic techniques permits the output of the new composed classifier to be considered as a

Acknowledgments

This work has been partly supported by the National R&D&I Plan of the Spanish Government FFI2011-29559-C02-01. We want to thank Prof. Ludmila Kuncheva, for the Matlab scripts used to combine classifiers.

References (89)

J.A. Benediktsson et al.
Multistage classifiers optimized by neural networks and genetic algorithms
Nonlinear Anal. Theory Methods Appl.
(1997)
D. Dubois et al.
On the use of aggregation operations in information fusion processes
Fuzzy Sets Syst.
(2004)
D. Escudero et al.
Analysis of inter-transcriber consistency in the Cat_ToBI prosodic labeling system
Speech Commun.
(2012)
P.D. Gader et al.
Fusion of handwritten word classifiers
Pattern Recogn. Lett.
(1996)
M. Grabisch
The application of fuzzy integrals in multicriteria decision making
Eur. J. Oper. Res.
(1996)
M. Hasegawa-Johnson et al.
Simultaneous recognition of words and prosody in the Boston University Radio Speech Corpus
Speech Commun.
(2005)
D. Hirst
Form and function in the representation of speech prosody
Speech Commun.
(2005)
H. Jo et al.
Genetic fuzzy classifier for sleep stage identification
Comput. Biol. Med.
(2010)
L.I. Kuncheva et al.
Decision templates for multiple classifier fusion: an experimental comparison
Pattern Recogn.
(2001)
C. Ni et al.
From English pitch accent detection to Mandarin stress detection, where is the difference?
Comput. Speech Lang.
(2012)

E. Papageorgiou et al.

Advanced soft computing diagnosis method for tumour grading

Artif. Intell. Med.

(2006)

N. Phuong et al.

Fuzzy logic and its applications in medicine

Int. J. Med. Inform.

(2001)

K. Ross et al.

Prediction of abstract prosodic labels for speech synthesis

Comput. Speech Lang.

(1996)

K. Sreenivasa Rao et al.

Intonation modeling for Indian languages

Comput. Speech Lang.

(2009)

A.K. Syrdal et al.

Automatic ToBI prediction and alignment to speed manual labeling of prosody

Speech Commun.

(2001)

A. Verikas et al.

Soft combination of neural classifiers: a comparative study

Pattern Recogn. Lett.

(1999)

L. Zadeh

Fuzzy sets

Inform. Control

(1965)

S. Ananthakrishnan et al.

An automatic prosody recognizer using a coupled multi-stream acoustic model and a syntactic–prosodic language model

S. Ananthakrishnan et al.

Automatic prosodic event detection using acoustic, lexical, and syntactic evidence

IEEE Trans. Audio Speech Lang. Process.

(2008 January)

S. Ananthakrishnan et al.

Fine-grained pitch accent and boundary tone labeling with parametric F0 features

A. Arvaniti et al.

Intonational analysis and prosodic annotation of Greek spoken corpora

N. Bacuez

Automated pattern recognition for intonation (print) an essay on linguistic categorization

(2012)

M. Beckman

Intonation across Spanish in the Tones and Break Indices framework

Probus

(2002 January)

M. Beckman et al.

The original ToBI system and the evolution of the ToBI framework

M. Beckman et al.

K-ToBI (Korean ToBI) labeling conventions: version 3

Korean J. Speech Sci.

(2000)

M.E. Beckman et al.

Intonation across Spanish, in the tones and break indices framework. Tech. Rep.

(2000)

M. Brean et al.

Inter-transcriber reliability for two systems of prosodic annotation: Tobi (tones and break indices) and rap (rhythm and pitch)

Corpus Linguist. Linguist. Theory

(2012)

A. Brugos et al.

The alternatives (alt) tier for tobi: advantages of capturing prosodic ambiguity

K. Chen et al.

An automatic prosody labeling system using ANN-based syntactic–prosodic model and GMM-based acoustic–prosodic model

K. Chen et al.

A maximum likelihood prosody recognizer

S.-B. Cho et al.

Combining multiple neural networks by fuzzy integral and robust classification

IEEE Trans. Syst. Man Cybern.

(1995)

S.B. Cho et al.

Multiple network fusion using fuzzy logic

IEEE Trans. Neural Netw.

(1995)

J. Cole et al.

The role of syntactic structure in guiding prosody perception with ordinary listeners and everyday speech

Lang. Cogn. Process.

(2010)

J. Cole et al.

Signal-based and expectation-based factors in the perception of prosodic prominence

Lab. Phonol.

(2010)

A. del Amo et al.

On the principles of fuzzy classification

L. Dilley et al.

The rap (rythm and pitch) labeling system. Tech. Rep.

(2005)

Y. Dote et al.

Industrial applications of soft computing: a review

Proc. IEEE

(2001)

D. DuBois et al.

(1980)

R.O. Duda et al.

Pattern Classification

(2001)

D. Escudero et al.

Applying data mining techniques to corpus based prosodic modeling

Speech Commun.

(2007)

D. Escudero et al.

Corpus based extraction of quantitative prosodic parameters of stress groups in Spanish

D. Escudero et al.

Visualizing tool for evaluating inter-label similarity in prosodic labeling experiments

E. Estebas et al.

La notación prosódica del espa nol. una revisión del Sp-ToBI

Estudios de Fonética Experimental

(2009)

E. Estebas Vilaplana et al.

Castilian Spanish intonation

Cited by (15)

Prosodic event detection in children's read speech
2021, Computer Speech and Language
Citation Excerpt :
Further, Moniz et al. (2014) applied it on spontaneous European Portuguese speech, to find that prominence estimation performance was excellent for adult speech, but much lower for children’s speech. Word-based features obtained from frame-level acoustic contours have been used with several different supervised classification schemes including decision trees, support vector machines and neural networks (Ananthakrishnan and Narayanan, 2008; González-Ferreras et al., 2012; Escudero-Mancebo et al., 2014). Speech utterances labeled for boundary and prominence at the word level comprise the training and testing data.
Prosody is the supra-segmental aspect of speech that helps to convey the structure and intended meaning of lexical content unambiguously. The automatic detection of prosodic events, such as phrase boundary and word prominence, has a number of applications in discourse analysis, where a combination of syntactic and acoustic-prosodic features is typically employed. This work addresses prosodic event detection in the context of assessing oral reading skills of middle-school children. We discuss the observed characteristics of a specially created labeled data set of oral reading recordings of English stories by non-native speakers. The obtained diversity of language skills adds to the known challenges of high speaker variability in the acoustic realization of prosodic events. A combination of knowledge- and data-driven feature selection is implemented to identify a compact set of word-level features from the acoustic correlates of prosody considering different ways of incorporating the necessary temporal context. The system is benchmarked with reference to a widely known prosodic event recognition system in a speaker-independent set-up to obtain a competitive performance with greatly reduced feature dimensionality. The interpretable features enable us to use the predictor model importance scores to identify high-level speaker traits that influence the acoustic realization of prosodic events, suggesting a potential extension to systems that can extract and utilize speaker idiosyncrasies for superior prosodic event detection.
Identifying characteristic prosodic patterns through the analysis of the information of Sp_ToBI label sequences
2017, Computer Speech and Language
Citation Excerpt :
A classification rate of 70.8% for pitch accents and 84.2% for boundary tones was reported in the Boston Radio News Corpus. An improvement of the classifier was described in Escudero-Mancebo et al. (2014), using fuzzy logic techniques and reaching a soft classification rate of 81.8% for pitch accents. The adaptation of the classifier for its use with the Sp_ToBI labeling system is detailed in Escudero et al. (2014a).
This paper presents a novel methodology to characterize the style of different speakers or groups of speakers. This methodology uses sequences of prosodic labels (automatic Sp_ToBI labels) to compare and differentiate these speaking styles. A set of metrics based on conditional entropy is used to compute the distance between two speakers or group of speakers depending on the use of sequences of prosodic labels. Additionally, the most contrastive sequences of labels are identified as characteristic patterns of the speaking styles represented in a given corpus. When this methodology is applied to a corpus of radio news items, the result is that the most frequent prosodic patterns coincide with those previously characterized in studies about radio style. Finally, a perceptual test verifies that the participants attribute these characteristic patterns to the radio news style.
Second language prosody and computer modeling
2022, Second Language Prosody and Computer Modeling
Automatic analysis of speech prosody in Dutch
2020, Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
Measuring prosodic transfer in vector space by weighted tonal events
2018, 2018 11th International Symposium on Chinese Spoken Language Processing, ISCSLP 2018 - Proceedings
Recent Advances in the Internet of Things: Multiple Perspectives
2017, IETE Technical Review (Institution of Electronics and Telecommunication Engineers, India)

View all citing articles on Scopus

^☆: This paper has been recommended for acceptance by T. Kawahara.

View full text

A fuzzy classifier to deal with similarity between labels on automatic prosodic labeling☆

Abstract

Introduction

Section snippets

Prosodic labeling

Fuzzy classifier

Experiments

Applications on prosodic labeling

Conclusions

Acknowledgments

Nonlinear Anal. Theory Methods Appl.

Fuzzy Sets Syst.

Speech Commun.

Pattern Recogn. Lett.

Eur. J. Oper. Res.

Speech Commun.

Speech Commun.

Comput. Biol. Med.

Pattern Recogn.

Comput. Speech Lang.

Artif. Intell. Med.

Int. J. Med. Inform.

Comput. Speech Lang.

Comput. Speech Lang.

Speech Commun.

Pattern Recogn. Lett.

Inform. Control

An automatic prosody recognizer using a coupled multi-stream acoustic model and a syntactic–prosodic language model

Automatic prosodic event detection using acoustic, lexical, and syntactic evidence

IEEE Trans. Audio Speech Lang. Process.

Fine-grained pitch accent and boundary tone labeling with parametric F0 features

Intonational analysis and prosodic annotation of Greek spoken corpora

Automated pattern recognition for intonation (print) an essay on linguistic categorization

Intonation across Spanish in the Tones and Break Indices framework

Probus

The original ToBI system and the evolution of the ToBI framework

K-ToBI (Korean ToBI) labeling conventions: version 3

Korean J. Speech Sci.

Intonation across Spanish, in the tones and break indices framework. Tech. Rep.

Inter-transcriber reliability for two systems of prosodic annotation: Tobi (tones and break indices) and rap (rhythm and pitch)

Corpus Linguist. Linguist. Theory

The alternatives (alt) tier for tobi: advantages of capturing prosodic ambiguity

An automatic prosody labeling system using ANN-based syntactic–prosodic model and GMM-based acoustic–prosodic model

A maximum likelihood prosody recognizer

Combining multiple neural networks by fuzzy integral and robust classification

IEEE Trans. Syst. Man Cybern.

Multiple network fusion using fuzzy logic

IEEE Trans. Neural Netw.

The role of syntactic structure in guiding prosody perception with ordinary listeners and everyday speech

Lang. Cogn. Process.

Signal-based and expectation-based factors in the perception of prosodic prominence

Lab. Phonol.

On the principles of fuzzy classification

The rap (rythm and pitch) labeling system. Tech. Rep.

Industrial applications of soft computing: a review

Proc. IEEE

Pattern Classification

Applying data mining techniques to corpus based prosodic modeling

Speech Commun.

Corpus based extraction of quantitative prosodic parameters of stress groups in Spanish

Visualizing tool for evaluating inter-label similarity in prosodic labeling experiments

La notación prosódica del espa nol. una revisión del Sp-ToBI

Estudios de Fonética Experimental

Castilian Spanish intonation