Low Latency MaxEnt- and RNN-Based Word Sequence Models for Punctuation Restoration of Closed Caption Data

Tündik, Máté Ákos; Tarján, Balázs; Szaszák, György

doi:10.1007/978-3-319-68456-7_13

Máté Ákos Tündik¹⁶,
Balázs Tarján¹⁶ &
György Szaszák¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10583))

Included in the following conference series:

International Conference on Statistical Language and Speech Processing

760 Accesses

Abstract

Automatic Speech Recognition (ASR) rarely addresses the punctuation of the obtained transcriptions. Recently, Recurrent Neural Network (RNN) based models were proposed in automatic punctuation exploiting wide word contexts. In real-time ASR tasks such as closed captioning of live TV streams, text based punctuation poses two particular challenges: a requirement for low latency (limiting the future context), and the propagation of ASR errors, seen more often for informal or spontaneous speech. This paper investigates Maximum Entropy (MaxEnt) and RNN punctuation models in such real-time conditions, but also compares the models to off-line setups. As expected, the RNN outperforms the MaxEnt baseline system. Limiting future context results only in a slighter performance drop, whereas ASR errors influence punctuation performance considerably. A genre analysis is also carried out w.r.t. the punctuation performance. Our approach is also evaluated on TED talks within the IWSLT English dataset providing comparable results to the state-of-the-art systems.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Inserting Punctuation to ASR Output in a Real-Time Production Environment

Punctuation Restoration System for Slovene Language

FullStop: punctuation and segmentation prediction for Dutch with transformers

Article 14 July 2023

Notes

1.
https://github.com/tundik/HuPP.

References

Batista, F., Moniz, H., Trancoso, I., Mamede, N.: Bilingual experiments on automatic recovery of capitalization and punctuation of automatic speech transcripts. IEEE Trans. Audio Speech Lang. Process. 20(2), 474–485 (2012)
Article Google Scholar
Batista, F.: Recovering capitalization and punctuation marks on speech transcriptions. Ph.D. thesis. Instituto Superior Técnico (2011)
Google Scholar
Beeferman, D., Berger, A., Lafferty, J.: Cyberpunc: A lightweight punctuation annotation system for speech. In: Proceedings of ICASSP, pp. 689–692. IEEE (1998)
Google Scholar
Che, X., Wang, C., Yang, H., Meinel, C.: Punctuation prediction for unsegmented transcript based on word vector. In: Proceedings of LREC, pp. 654–658 (2016)
Google Scholar
Cho, E., Niehues, J., Kilgour, K., Waibel, A.: Punctuation insertion for real-time spoken language translation. In: Proceedings of the Eleventh International Workshop on Spoken Language Translation (2015)
Google Scholar
Chollet, F.: Keras: Theano-based deep learning library. Code: https://github.com/fchollet. Documentation: http://keras.io (2015)
Chung, J., Gulcehre, C., Cho, K., Bengio, Y.: Empirical evaluation of gated recurrent neural networks on sequence modeling. arxiv preprint arXiv:1412.3555 (2014)
Gravano, A., Jansche, M., Bacchiani, M.: Restoring punctuation and capitalization in transcribed speech. In: Proceedings of ICASSP, pp. 4741–4744. IEEE (2009)
Google Scholar
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Article Google Scholar
Huang, J., Zweig, G.: Maximum entropy model for punctuation annotation from speech. In: Proceedings of Interspeech, pp. 917–920 (2002)
Google Scholar
Lu, W., Ng, H.T.: Better punctuation prediction with dynamic conditional random fields. In: Proceedings of EMNLP, pp. 177–186. ACL (2010)
Google Scholar
Makhoul, J., Kubala, F., Schwartz, R., Weischedel, R.: Performance measures for information extraction. In: Proceedings of DARPA broadcast news workshop, pp. 249–252 (1999)
Google Scholar
Makrai, M.: Filtering wiktionary triangles by linear mapping between distributed models. In: Proceedings of LREC, pp. 2770–2776 (2016)
Google Scholar
Moró, A., Szaszák, G.: A phonological phrase sequence modelling approach for resource efficient and robust real-time punctuation recovery. In: Proceedings of Interspeech (2017)
Google Scholar
Pahuja, V., Laha, A., Mirkin, S., Raykar, V., Kotlerman, L., Lev, G.: Joint learning of correlated sequence labelling tasks using bidirectional recurrent neural networks. arxiv preprint arXiv:1703.04650 (2017)
Paulik, M., Rao, S., Lane, I., Vogel, S., Schultz, T.: Sentence segmentation and punctuation recovery for spoken language translation. In: Proceedings of ICASSP, pp. 5105–5108. IEEE (2008)
Google Scholar
Pennington, J., Socher, R., Manning, C.D.: Glove: global vectors for word representation. In: Proceedings of EMNLP, pp. 1532–1543 (2014)
Google Scholar
Povey, D., et al.: The Kaldi speech recognition toolkit. In: Proceedings of ASRU, pp. 1–4. IEEE (2011)
Google Scholar
Ratnaparkhi, A., et al.: A maximum entropy model for part-of-speech tagging. In: Proceedings of EMNLP, pp. 133–142 (1996)
Google Scholar
Recski, G., Varga, D.: A Hungarian NP chunker. Odd Yearb. 8, 87–93 (2009)
Google Scholar
Renals, S., Simpson, M., Bell, P., Barrett, J.: Just-in-time prepared captioning for live transmissions. In: Proceedings of IBC 2016 (2016)
Google Scholar
Schuster, M., Paliwal, K.K.: Bidirectional recurrent neural networks. IEEE Trans. Sig. Process. 45(11), 2673–2681 (1997)
Article Google Scholar
Shriberg, E., Stolcke, A., Baron, D.: Can prosody aid the automatic processing of multi-party meetings? evidence from predicting punctuation, disfluencies, and overlapping speech. In: ISCA Tutorial and Research Workshop (ITRW) on Prosody in Speech Recognition and Understanding (2001)
Google Scholar
Tilk, O., Alumäe, T.: LSTM for punctuation restoration in speech transcripts. In: Proceedings of Interspeech, pp. 683–687 (2015)
Google Scholar
Tilk, O., Alumäe, T.: Bidirectional recurrent neural network with attention mechanism for punctuation restoration. In: Proceedings of Interspeech, pp. 3047–3051 (2016)
Google Scholar
Ueffing, N., Bisani, M., Vozila, P.: Improved models for automatic punctuation prediction for spoken and written text. In: Proceedings of Interspeech, pp. 3097–3101 (2013)
Google Scholar
Varga, Á., Tarján, B., Tobler, Z., Szaszák, G., Fegyó, T., Bordás, C., Mihajlik, P.: Automatic close captioning for live Hungarian television broadcast speech: a fast and resource-efficient approach. In: Ronzhin, A., Potapova, R., Fakotakis, N. (eds.) SPECOM 2015. LNCS, vol. 9319, pp. 105–112. Springer, Cham (2015). doi:10.1007/978-3-319-23132-7_13
Chapter Google Scholar
Zsibrita, J., Vincze, V., Farkas, R.: magyarlanc: A toolkit for morphological and dependency parsing of Hungarian. In: Proceedings of RANLP, pp. 763–771 (2013)
Google Scholar

Download references

Acknowledgements

The authors would like to thank the support of the Hungarian National Research, Development and Innovation Office (NKFIH) under contract ID OTKA-PD-112598; the Pro Progressio Foundation; NVIDIA for kindly providing a Titan GPU for the RNN experiments.

Author information

Authors and Affiliations

Department of Telecommunications and Media Informatics, Budapest University of Technology and Economics, Magyar Tudósok körútja 2, Budapest, 1117-H, Hungary
Máté Ákos Tündik, Balázs Tarján & György Szaszák

Authors

Máté Ákos Tündik
View author publications
You can also search for this author in PubMed Google Scholar
Balázs Tarján
View author publications
You can also search for this author in PubMed Google Scholar
György Szaszák
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Máté Ákos Tündik .

Editor information

Editors and Affiliations

University of Le Mans, Le Mans, France
Nathalie Camelin
University of Le Mans, Le Mans, France
Yannick Estève
Rovira i Virgili University, Tarragona, Spain
Carlos Martín-Vide

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Tündik, M.Á., Tarján, B., Szaszák, G. (2017). Low Latency MaxEnt- and RNN-Based Word Sequence Models for Punctuation Restoration of Closed Caption Data. In: Camelin, N., Estève, Y., Martín-Vide, C. (eds) Statistical Language and Speech Processing. SLSP 2017. Lecture Notes in Computer Science(), vol 10583. Springer, Cham. https://doi.org/10.1007/978-3-319-68456-7_13

Download citation

DOI: https://doi.org/10.1007/978-3-319-68456-7_13
Published: 27 September 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-68455-0
Online ISBN: 978-3-319-68456-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Low Latency MaxEnt- and RNN-Based Word Sequence Models for Punctuation Restoration of Closed Caption Data