Data-Driven Approaches to Objective Evaluation of Phoneme Alignment Systems

Baghai-Ravary, Ladan; Kochanski, Greg; Coleman, John

doi:10.1007/978-3-642-20095-3_1

Ladan Baghai-Ravary²⁰,
Greg Kochanski²⁰ &
John Coleman²⁰

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 6562))

Included in the following conference series:

Language and Technology Conference

1105 Accesses

Abstract

This paper presents techniques for objective characterisation of Automatic Speech-to-Phoneme Alignment (ASPA) systems, without the need for human-generated labels to act as a benchmark. As well as being immune to the effects of human variability, these techniques yield diagnostic information which can be helpful in the development of new alignment systems, ensuring that the resulting labels are as consistent as possible. To illustrate this, a total of 48 ASPA systems are used, including three front-end processors. For each processor, the number of states in each phoneme model, and of Gaussian distributions in each state mixture, are adjusted to generate a broad variety of systems. The results are compared using a statistical measure and a model-based Bayesian Monte-Carlo approach. The most consistent alignment system is identified, and is (as expected) in close agreement with typical “baseline” systems used in ASR research.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Baghai-Ravary, L.: Multi-dimensional Adaptive Signal Processing, with Application to Speech Recognition, Speech Coding and Image Compression. University of Sheffield PhD. Thesis (1995)
Google Scholar
Beet, S.W., Gransden, I.R.: Interfacing an Auditory Model to a Parametric Speech Recogniser. Proc. Insititute of Acoustics 14(6), 321–328 (1992)
Google Scholar
Chen, L., Liu, Y., Maia, E., Harper, M.: Evaluating Factors Impacting the Accuracy of Forced Alignments in a Multimodal Corpus. In: 4th International Conference on Language Resources and Evaluation (LREC), ELRA (2004)
Google Scholar
Hutchinson, W., Knopoff, L.: The Acoustic Component of Western Consonance. Interface 7, 1–29 (1978)
Article Google Scholar
Kochanski, G., et al.: Loudness Predicts Prominence; Fundamental Frequency Lends Little. J. Acoustical Society of America 11(2), 1038–1054 (2005)
Article Google Scholar
Kochanski, G., Orphanidou, C.: Testing the Ecological Validity of Repetitive Speech. In: Proc. International Congress of Phonetic Sciences (ICPhS 2007), IPA (2007), http://www.icphs2007.de/conference/Papers/1632/1632.pdf
Kochanski, G., Rosner, B.S.: Bootstrap Markov Chain Monte Carlo and Optimal Solutions to The Law of Categorical Judgement (Corrected). Submitted to Behavior Research Methods (2010), http://arxiv.org/abs/1008.1596
Lander, T.: CSLU Labeling Guide, Center for Spoken Language Understanding, Oregon Graduate Institute (1997)
Google Scholar
Ljolje, A., Riley, M.D.: Automatic Segmentation of Speech for TTS. In: Proc 3^rd European Conference on Speech Communication and Technology (EUROSPEECH 1993), ESCA, pp. 1445–1448 (1993)
Google Scholar
Moore, B.C.J., Glasberg, B.R.: Suggested Formulae for Calculating Auditory-Filter Bandwidths and Excitation Patterns. J. Acoustical Society of America 74(3), 750–753 (1983)
Article Google Scholar
Sebestyen, G.S.: Decision-Making Processes in Pattern Recognition. ACM Monograph Series, pp. 40–47. MacMillan, Basingstoke (1962)
Google Scholar
SoX Sound eXchange manual (2009), http://sox.sourceforge.net/sox.html
Young, S.J., et al.: The HTK Book (for HTK Version 3.4). Cambridge University Engineering Department (2009), http://htk.eng.cam.ac.uk/docs/docs.shtml

Download references

Author information

Authors and Affiliations

Phonetics Laboratory, Oxford University, 41 Wellington Square, Oxford, OX1 2JD, UK
Ladan Baghai-Ravary, Greg Kochanski & John Coleman

Authors

Ladan Baghai-Ravary
View author publications
You can also search for this author in PubMed Google Scholar
Greg Kochanski
View author publications
You can also search for this author in PubMed Google Scholar
John Coleman
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Faculty of Mathematics and Computer Science, Adam Mickiewicz University in Poznan, ul. Umultowska 87, 61614, Poznan, Poland
Zygmunt Vetulani

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Baghai-Ravary, L., Kochanski, G., Coleman, J. (2011). Data-Driven Approaches to Objective Evaluation of Phoneme Alignment Systems. In: Vetulani, Z. (eds) Human Language Technology. Challenges for Computer Science and Linguistics. LTC 2009. Lecture Notes in Computer Science(), vol 6562. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-20095-3_1

Download citation

DOI: https://doi.org/10.1007/978-3-642-20095-3_1
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-20094-6
Online ISBN: 978-3-642-20095-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics