A fuzzy classifier to deal with similarity between labels on automatic prosodic labeling

https://doi.org/10.1016/j.csl.2013.08.001Get rights and content

Abstract

This paper presents an original approach to automatic prosodic labeling. Fuzzy logic techniques are used for representing situations of high uncertainty with respect to the category to be assigned to a given prosodic unit. The Fuzzy Integer technique is used to combine the output of different base classifiers. The resulting fuzzy classifier benefits from the different capabilities of the base classifiers for identifying different types of prosodic events. At the same time, the fuzzy classifier identifies the events that are potentially more difficult to be labeled. The classifier has been applied to the identification of ToBI pitch accents. The state of the art on pitch accent multiclass classification reports around 70% accuracy rate. In this paper we describe a fuzzy classifier which assigns more than one label in confusing situations. We show that the pairs of labels that appear in these uncertain situations are consistent with the most confused pairs of labels reported in manual prosodic labeling experiments. Our fuzzy classifier obtains a soft classification rate of 81.8%, which supports the potential of the proposed system for computer assisted prosodic labeling.

Introduction

Prosodic labeling aims to enrich spoken utterances with labels that are representative of the relationship between the prosodic form and function of the constituents of the message. Although the prosodic labeling systems establish clear rules and protocols, the difficulty of the task and the inherent subjectivity of the labelers’ judgments cause a high number of inconsistencies. The prosodic labeling systems assume that uncertain situations could appear and reserve special symbols for representing them (like the symbol ‘?’ in ToBI (Beckman et al., 2005)). Nevertheless, apart from these declared uncertain situations, the inter-transcriber tests have shown a relevant number of situations where two different transcribers assign a different label to the same prosodic unit (Escudero et al., 2012). Moreover, the apparent perceptual and acoustic similarity of several pair of labels and their corresponding prosodic units to be labeled is one of the reasons for the uncertain assignment of prosodic labels (Escudero and Estebas, 2012).

Fuzzy sets theory (DuBois and Prade, 1980) has been widely used to represent this type of situations where it is difficult to classify a given element into the different possible categories. The prosodic categories in a prosodic labeling system like ToBI are not fuzzy categories because they have a linguistic phonological meaning. Nevertheless, the labels assigned by human or automatic transcribers are uncertain information according to the experimental evidence. In this paper we show how fuzzy sets can be used in situations where assigning a given prosodic unit to a class is difficult and in situations where more than one class could be assigned as there exists a degree of uncertainty.

The difficulties of manual prosodic labeling are projected into automatic prosodic labeling. Prosodic labeling strongly depends on perceptual judgments that must be performed by human transcribers. Nevertheless, automatic prosodic labeling is a need because manual prosodic labeling is a slow and costly process. There are several applications that can benefit from prosodic labeling, but require the processing of huge corpora. Labeling such corpora manually is not affordable. Thus, automatic prosodic labeling systems need to be refined for this type of applications. This paper introduces a new proposal for this: rather than trying to automatically generate one label per word, our proposed algorithm will generate zero or more labels per word, depending on the value of a single α-cut of the fuzzy classifier.

The state of the art on automatic prosodic labeling reports identification rates higher than 90% in binary decisions, referring to determining the presence or absence of accent, boundary or break. Nevertheless, when trying to classify different types of accents, boundaries or breaks, the classification rates dramatically decrease to about 70% (see Section 2.2 for a review of the state of the art). In González-Ferreras et al. (2010), we showed that the reasons for these low accuracy rates are the high similarity among some pair of classes and the imbalanced nature of the prosodic corpora. In this article we show that the use of a fuzzy classifier considerably increases the accuracy rates when soft classification is performed.

Fuzzy classifiers have shown themselves to be useful in various applications, from system control to decision making (Zimmermann, 1999). In this paper we present a fuzzy classifier based on an adaptation of the prosodic event classifier described in González-Ferreras et al. (2012). With the application of fuzzy expert fusion, the output of the base classifiers is interpreted as a confidence value or a membership degree of a given prosodic unit to a prosodic category. We benefit from the fact that each base classifier behaves differently in the task of separating the different pairs of classes. The application of α-cuts allows the assignment of labels with a certain degree of uncertainty. The method allows the performance to be increased by assigning soft labels to some of the prosodic units. The prosodic units where the uncertainty is higher are identified. We show that the pairs of labels involved in uncertain situations are the most confused pairs that appear in perceptual inter-transcriber tests. The solution we propose in this paper is specially interesting in a situation were automatic labelings are being check by human transcribers, since it is easier for a human to select the best label from a list of candidates than replace the automatic label with some other drawn from perception.

The structure of the paper is as follows. First, we review the state of the art on prosodic labeling, automatic prosodic labeling and fuzzy classification. Next, the fuzzy classifier and the experimental procedure are described. Then we show the results and discuss about the application of the prosodic labeling and the uncertainty of the labeling process. Finally, we present some conclusions.

Section snippets

Prosodic labeling

The Autosegmental-Metrical (AM) model (Ladd, 1996) has been widely applied in prosodic annotation of spoken corpora. The model was proposed in Pierrehumbert's thesis (Pierrehumbert, 1980). Based on this model, ToBI (Tones and Break Indices) is a framework for transcribing and annotating the prosody of speech. ToBI-based systems have been developed for many languages such as English (Beckman et al., 2005), Spanish (Beckman et al., 2000, Beckman, 2002, Estebas and Prieto, 2009), German (Grice and

Fuzzy classifier

Given an input vector x and a set of labels L={l1lC}, classic pattern recognition assigns a unique label l* to x. The classification rule selects the label l* which maximizes the posterior probability:l*=argmaxlP(l|x)The objective of fuzzy classification is to obtain membership values μi as an estimation of P(li|x). This vector of membership values μ = μ1  μi  μC is used in the decision making process. The fuzzy classification problem is a step between the traditional pattern classification

Experiments

In this section we describe the corpus and the features used in the experiments. Next, we compare the results obtained with different classifiers. Finally, we present an analysis of the behavior of the multiple label classifier for different values of α.

Applications on prosodic labeling

The use of automatic labeling systems for speeding up the manual prosodic labeling has been defended. Thus, the labelers have a labeled version of the utterances that must be reviewed (Syrdal et al., 2001). Due to the perceptual character of the prosodic labels, they must be assigned or checked by human labelers. The duration of the labeling process is a critical variable due to the implication of highly qualified staff.

Fig. 2 shows the interface for checking the output of the automatic fuzzy

Conclusions

In this article we have presented an alternative system for assisting the labeling of prosodic events that consists of the use of a fuzzy classifier. The use of the fuzzy classifier is justified by the high uncertainty that is observed in manual labeling which has, as a consequence, a relevant number of inconsistencies. The combination of different types of classifiers that are aggregated by using fuzzy logic techniques permits the output of the new composed classifier to be considered as a

Acknowledgments

This work has been partly supported by the National R&D&I Plan of the Spanish Government FFI2011-29559-C02-01. We want to thank Prof. Ludmila Kuncheva, for the Matlab scripts used to combine classifiers.

References (89)

  • E. Papageorgiou et al.

    Advanced soft computing diagnosis method for tumour grading

    Artif. Intell. Med.

    (2006)
  • N. Phuong et al.

    Fuzzy logic and its applications in medicine

    Int. J. Med. Inform.

    (2001)
  • K. Ross et al.

    Prediction of abstract prosodic labels for speech synthesis

    Comput. Speech Lang.

    (1996)
  • K. Sreenivasa Rao et al.

    Intonation modeling for Indian languages

    Comput. Speech Lang.

    (2009)
  • A.K. Syrdal et al.

    Automatic ToBI prediction and alignment to speed manual labeling of prosody

    Speech Commun.

    (2001)
  • A. Verikas et al.

    Soft combination of neural classifiers: a comparative study

    Pattern Recogn. Lett.

    (1999)
  • L. Zadeh

    Fuzzy sets

    Inform. Control

    (1965)
  • S. Ananthakrishnan et al.

    An automatic prosody recognizer using a coupled multi-stream acoustic model and a syntactic–prosodic language model

  • S. Ananthakrishnan et al.

    Automatic prosodic event detection using acoustic, lexical, and syntactic evidence

    IEEE Trans. Audio Speech Lang. Process.

    (2008 January)
  • S. Ananthakrishnan et al.

    Fine-grained pitch accent and boundary tone labeling with parametric F0 features

  • A. Arvaniti et al.

    Intonational analysis and prosodic annotation of Greek spoken corpora

  • N. Bacuez

    Automated pattern recognition for intonation (print) an essay on linguistic categorization

    (2012)
  • M. Beckman

    Intonation across Spanish in the Tones and Break Indices framework

    Probus

    (2002 January)
  • M. Beckman et al.

    The original ToBI system and the evolution of the ToBI framework

  • M. Beckman et al.

    K-ToBI (Korean ToBI) labeling conventions: version 3

    Korean J. Speech Sci.

    (2000)
  • M.E. Beckman et al.

    Intonation across Spanish, in the tones and break indices framework. Tech. Rep.

    (2000)
  • M. Brean et al.

    Inter-transcriber reliability for two systems of prosodic annotation: Tobi (tones and break indices) and rap (rhythm and pitch)

    Corpus Linguist. Linguist. Theory

    (2012)
  • A. Brugos et al.

    The alternatives (alt) tier for tobi: advantages of capturing prosodic ambiguity

  • K. Chen et al.

    An automatic prosody labeling system using ANN-based syntactic–prosodic model and GMM-based acoustic–prosodic model

  • K. Chen et al.

    A maximum likelihood prosody recognizer

  • S.-B. Cho et al.

    Combining multiple neural networks by fuzzy integral and robust classification

    IEEE Trans. Syst. Man Cybern.

    (1995)
  • S.B. Cho et al.

    Multiple network fusion using fuzzy logic

    IEEE Trans. Neural Netw.

    (1995)
  • J. Cole et al.

    The role of syntactic structure in guiding prosody perception with ordinary listeners and everyday speech

    Lang. Cogn. Process.

    (2010)
  • J. Cole et al.

    Signal-based and expectation-based factors in the perception of prosodic prominence

    Lab. Phonol.

    (2010)
  • A. del Amo et al.

    On the principles of fuzzy classification

  • L. Dilley et al.

    The rap (rythm and pitch) labeling system. Tech. Rep.

    (2005)
  • Y. Dote et al.

    Industrial applications of soft computing: a review

    Proc. IEEE

    (2001)
  • D. DuBois et al.
    (1980)
  • R.O. Duda et al.

    Pattern Classification

    (2001)
  • D. Escudero et al.

    Applying data mining techniques to corpus based prosodic modeling

    Speech Commun.

    (2007)
  • D. Escudero et al.

    Corpus based extraction of quantitative prosodic parameters of stress groups in Spanish

  • D. Escudero et al.

    Visualizing tool for evaluating inter-label similarity in prosodic labeling experiments

  • E. Estebas et al.

    La notación prosódica del espa nol. una revisión del Sp-ToBI

    Estudios de Fonética Experimental

    (2009)
  • E. Estebas Vilaplana et al.

    Castilian Spanish intonation

  • Cited by (15)

    • Prosodic event detection in children's read speech

      2021, Computer Speech and Language
      Citation Excerpt :

      Further, Moniz et al. (2014) applied it on spontaneous European Portuguese speech, to find that prominence estimation performance was excellent for adult speech, but much lower for children’s speech. Word-based features obtained from frame-level acoustic contours have been used with several different supervised classification schemes including decision trees, support vector machines and neural networks (Ananthakrishnan and Narayanan, 2008; González-Ferreras et al., 2012; Escudero-Mancebo et al., 2014). Speech utterances labeled for boundary and prominence at the word level comprise the training and testing data.

    • Identifying characteristic prosodic patterns through the analysis of the information of Sp_ToBI label sequences

      2017, Computer Speech and Language
      Citation Excerpt :

      A classification rate of 70.8% for pitch accents and 84.2% for boundary tones was reported in the Boston Radio News Corpus. An improvement of the classifier was described in Escudero-Mancebo et al. (2014), using fuzzy logic techniques and reaching a soft classification rate of 81.8% for pitch accents. The adaptation of the classifier for its use with the Sp_ToBI labeling system is detailed in Escudero et al. (2014a).

    • Second language prosody and computer modeling

      2022, Second Language Prosody and Computer Modeling
    • Automatic analysis of speech prosody in Dutch

      2020, Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
    • Measuring prosodic transfer in vector space by weighted tonal events

      2018, 2018 11th International Symposium on Chinese Spoken Language Processing, ISCSLP 2018 - Proceedings
    • Recent Advances in the Internet of Things: Multiple Perspectives

      2017, IETE Technical Review (Institution of Electronics and Telecommunication Engineers, India)
    View all citing articles on Scopus

    This paper has been recommended for acceptance by T. Kawahara.

    View full text