Evaluating Text Normalization for Speech-Based Media Selection

Pfeil, Martin; Buehler, Dirk; Gruhn, Rainer; Minker, Wolfgang

doi:10.1007/978-3-540-69369-7_7

Martin Pfeil^1,2,
Dirk Buehler^1,2,
Rainer Gruhn^1,2 &
…
Wolfgang Minker²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5078))

Included in the following conference series:

International Tutorial and Research Workshop on Perception and Interactive Technologies for Speech-Based Systems

1398 Accesses

Abstract

In this paper, we present an approach how to evaluate text normalization for multi-lingual speech-based dialogue systems. The application of text normalization occurs within the task of music selection, which imposes several important and novel requirements on its performance. The main idea is that text normalization should determine likely user utterances from metadata that is available within a user’s music collection. This is substantially different from the text preprocessing applied, for instance, in text-to-speech systems, because a) more than one normalization hypothesis may be generated, b) for media selection the information content may be reduced, which is not desirable for Text-to-speech (TTS). These factors also have an impact on evaluation.

We describe an data collection effort that was carried out with the purpose of building an initial corpus of text normalization references and scorings, as well as experiments with well-known evaluation metrics from different areas of language research aiming at identifying an adequate evaluation measure.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Multimodal text summarization with evaluation approaches

Article 24 October 2023

A Systematic Study of Open Source and Commercial Text-to-Speech (TTS) Engines

Enhancing Bengali Text-to-Speech Synthesis Through Transformer-Driven Text Normalization

References

Akiba, Y., Federico, M., Kando, N., Nakaiwa, H., Paul, M., Tsujii, J.: Overview of the IWSLT04 evaluation campaign. In: Proc. of the International Workshop on Spoken Language Translation, Kyoto, Japan, pp. 1–12 (2004)
Google Scholar
Banerjee, S., Lavie, A.: METEOR: An automatic metric for MT evaluation with improved correlation with human judgments (2005)
Google Scholar
Callison-Burch, C., Osborne, M., Koehn, P.: Re-evaluating the role of BLEU in Machine Translation Research (2006)
Google Scholar
Pfeil, M.: Automatic evaluation of text normalization (2007)
Google Scholar
McCowan, I., Moore, D., Dines, J., Gatica-Perez, D., Flynn, M., Wellner, P., Bourlard, H.: On the Use of Information Retrieval Measures for Speech Recognition Evaluation. IDIAP-RR 73, IDIAP, Martigny, Switzerland (2004)
Google Scholar
Minker, W., Buehler, D., Dybkjaer, L. (eds.): Spoken Multimodal Human-Computer Dialogue in Mobile Environments. Text, Speech and Language Technology, vol. 28. Springer, Heidelberg (2005)
Google Scholar
Papineni, K., Roukos, S., Ward, T., Zhu, W.: BLEU: a method for automatic evaluation of machine translation (2001)
Google Scholar
Nießen, S.: Improving Statistical Machine Translation using Morpho-syntactic Information (2002)
Google Scholar
Sproat, R., Black, A.W., Chen, S., Kumar, S., Ostendorf, M., Richards, C.: Article Submitted to Computer Speech and Language Normalization of Non-Standard Words (1999)
Google Scholar

Download references

Author information

Authors and Affiliations

Harman/Becker Automotive Systems, Ulm, Germany
Martin Pfeil, Dirk Buehler & Rainer Gruhn
Information Technology Institute, Ulm University, Germany
Martin Pfeil, Dirk Buehler, Rainer Gruhn & Wolfgang Minker

Authors

Martin Pfeil
View author publications
You can also search for this author in PubMed Google Scholar
Dirk Buehler
View author publications
You can also search for this author in PubMed Google Scholar
Rainer Gruhn
View author publications
You can also search for this author in PubMed Google Scholar
Wolfgang Minker
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Elisabeth André Laila Dybkjær Wolfgang Minker Heiko Neumann Roberto Pieraccini Michael Weber

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Pfeil, M., Buehler, D., Gruhn, R., Minker, W. (2008). Evaluating Text Normalization for Speech-Based Media Selection. In: André, E., Dybkjær, L., Minker, W., Neumann, H., Pieraccini, R., Weber, M. (eds) Perception in Multimodal Dialogue Systems. PIT 2008. Lecture Notes in Computer Science(), vol 5078. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-69369-7_7

Download citation

DOI: https://doi.org/10.1007/978-3-540-69369-7_7
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-69368-0
Online ISBN: 978-3-540-69369-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Evaluating Text Normalization for Speech-Based Media Selection

Abstract

Access this chapter

Preview

Similar content being viewed by others

Multimodal text summarization with evaluation approaches

A Systematic Study of Open Source and Commercial Text-to-Speech (TTS) Engines

Enhancing Bengali Text-to-Speech Synthesis Through Transformer-Driven Text Normalization

References

Author information

Authors and Affiliations

Editor information

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Evaluating Text Normalization for Speech-Based Media Selection

Abstract

Access this chapter

Preview

Similar content being viewed by others

Multimodal text summarization with evaluation approaches

A Systematic Study of Open Source and Commercial Text-to-Speech (TTS) Engines

Enhancing Bengali Text-to-Speech Synthesis Through Transformer-Driven Text Normalization

References

Author information

Authors and Affiliations

Editor information

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation