Automatic capitalisation generation for speech input

https://doi.org/10.1016/S0885-2308(03)00032-9Get rights and content

Abstract

Two different systems are proposed for the task of capitalisation generation. The first system is a slightly modified speech recogniser. In this system, every word in the vocabulary is duplicated: once in a de-capitalised form and again in capitalised forms. In addition, the language model is re-trained on mixed case texts. The other system is based on Named Entity (NE) recognition and punctuation generation, since most capitalised words are the first words in sentences or NE words. Both systems are compared when every procedure is fully automated. The system based on NE recognition and punctuation generation shows better results by word error rate, by F-measure and by slot error rate than the system modified from the speech recogniser. This is because the latter system has a distorted language model and a sparser language model. The detailed performance of the system based on NE recognition and punctuation generation is investigated by including one or more of the following: the reference word sequences, the reference NE classes and the reference punctuation marks. The results show that this system is robust to NE recognition errors. Although most punctuation generation errors cause errors in this capitalisation generation system, the number of errors caused in capitalisation generation does not exceed the number of errors from punctuation generation. In addition, the results demonstrate that the effect of NE recognition errors is independent of the effect of punctuation generation errors for capitalisation generation.

Introduction

Even with no speech recognition errors, automatically transcribed speech is much harder to read due to the lack of punctuation, capitalisation, and number formatting. The output format of a standard research speech recogniser is known as Standard Normalised Orthographical Representation (SNOR) (NIST, 1998a) and consists of only single-case letters without punctuation marks or numbers. The readability of speech recognition output would be greatly enhanced by generating proper capitalisation. When speech dictation is performed, the dictation system can rely on the speaker explicitly to indicate the capitalised words, although people do not want to be forced to verbally capitalise words. However, when speakers are unaware that their speech is automatically transcribed, e.g. broadcast news and conversational speech over the telephone, explicit indications of capitalised words are not given. When the input text comes from speech, the capitalisation generation task become more difficult because of corruptions of the input text caused by speech recognition errors.

The tasks of Named Entity (NE) (MUC, 1995) recognition1 and enhanced speech recognition output generation are strongly related to each other, because most capitalised words are the first words in sentences or are NEs. The importance of NE recognition in automatic capitalisation was mentioned in (Gotoh et al., 1999). The generated punctuation and capitalisation give further clues for NE recognition. NE recognition experiments, which compare the effects of the input condition of between mixed case text and SNOR, showed that the performance deteriorates when capitalisation and punctuation information are missing (Kubala et al., 1998). This missing information makes certain decisions regarding proper names more difficult.

The objective of this paper is to devise automatic methods of capitalisation generation for speech input. The paper consists of seven sections. First, previous work in this area is introduced. The corpora used in the experiments are then described. Along with evaluation measures for the systems, the two different automatic capitalisation systems are presented: the first system is a slightly modified speech recogniser and the other system is based on NE recognition and punctuation generation. Finally, the detailed performance of the system based on NE recognition and punctuation generation is investigated.

Section snippets

Previous work

Many commercial implementations of automatic capitalisation are provided with word processors. In these implementations, the grammar and spelling checkers of word processors generate suggestions about capitalisation. A typical example is one of the most popular word processors, Microsoft Word. The details of its implementation was described in a US patent (Rayson et al., 1998). In this implementation, whether the current word is at the start of a sentence or not was determined by a sentence

Corpora and evaluation measures

Two different sets of data, the Broadcast News (BN) text corpus and the 100-hour Hub-4 BN data set, were available as training data for the experiments conducted in this paper. The BN text corpus (named BNText92_97 in this paper) comprises a 184 million words of BN text from the period of 1992–1997 inclusive.2 Another set of training data, the 100-hour BN acoustic training data set released for the 1998 Hub-4

Automatic capitalisation generation

In this section, two different automatic capitalisation generation systems are presented. The first system is a slightly modified speech recogniser. In this system, every word in its vocabulary is duplicated: once in a de-capitalised form and again in the two capitalised forms. In addition, its language model is re-trained on mixed case texts. The other system is based on NE recognition and punctuation generation, since most capitalised words are first words in sentences or NE words.

Experimental results

There are two different systems of generating capitalisation: a system modified from a speech recogniser (named ModSR) and a system based on NE recognition and punctuation generation (named NEPuncBased).

NEPuncBased uses the rule-based NE recognition system in (Kim and Woodland, 2000b), which generates rules automatically, and the punctuation generation system in (Kim and Woodland, 2001), which incorporates prosodic information along with acoustic and language model information.

A version of the

Analysis of performance of the system based on NE recognition and punctuation generation (NEPuncBased)

The effects of speech recognition errors, NE recognition errors and punctuation generation errors are accumulated in the results of NEPuncBased in Table 7. In this section, the performance of NEPuncBased is investigated by including one or more of the following: reference word sequences, reference NE classes and reference punctuation marks. The total effect of the accumulated errors is examined, and the contribution of each step in NEPuncBased is tested for reference word sequences, NE classes

Conclusions

In this paper, an important area of transcription readability improvement, automatic capitalisation generation, has been discussed. Two different systems have been proposed for this task. The first is a slightly modified speech recogniser. In this system, every word in its vocabulary is duplicated: once in a de-capitalised form and again in capitalised forms. In addition, the language model is re-trained on mixed case texts. The other system is based on NE recognition and punctuation generation

Acknowledgements

Ji-Hwan Kim acknowledges the support of the British Council, LG company, GCHQ and the EU Coretex project.

References (33)

  • C.J. Leggetter et al.

    Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models

    Computer Speech and Language

    (1995)
  • P.C. Woodland

    The development of the HTK broadcast news transcription system: an overview

    Speech Communication

    (2002)
  • Beeferman, D., Berger, A., Lafferty, J., 1998. Cyberpunc: a lightweight punctuation annotation system for speech. In:...
  • Bikel, D., Miller, S., Schwartz, R., 1997. Nymble: a high-performance learning namefinder. In: Proceedings of the...
  • L. Breiman et al.

    Classification and Regression Trees

    (1983)
  • Brill, E., 1993. A corpus-based approach to language learning. Ph.D. Thesis, University of...
  • Brill, E., 1994. Some advances in rule-based part of speech tagging. In: Proceedings of the 12th National Conference on...
  • Chen, C., 1999. Speech recognition with automatic punctuation. In: Proceedings of the European Conference on Speech...
  • Christensen, H., Gotoh, Y., Renals, S., 2001. Punctuation annotation using statistical prosody models. In: Proceedings...
  • Gotoh, Y., Renals, S., 2000. Sentence boundary detection in broadcast speech transcripts. In: Proceedings of the...
  • Gotoh, Y., Renals, S., Williams, G., 1999. Named entity tagged language models. In: Proceedings of the IEEE...
  • Grishman, R., Sundheim, B., 1995. Design of the MUC-6 evaluation. In: Proceedings of the 6th Message Understanding...
  • Huang, J., Zweig, G., 2002. Maximum entropy model for punctuation annotation from speech. In: Proceedings of the...
  • Kim, J., Woodland, P.C., 2000a. Rule based named entity recognition. Technical Report CUED/F-INFENG/TR.385, Department...
  • Kim, J., Woodland, P.C., 2000b. A rule-based named entity recognition system for speech input. In: Proceedings of the...
  • Kim, J., Woodland, P.C., 2001. The use of prosody in a combined system for punctuation generation and speech...
  • Cited by (0)

    View full text