Abstract
Automatic Speech Recognition (ASR) rarely addresses the punctuation of the obtained transcriptions. Recently, Recurrent Neural Network (RNN) based models were proposed in automatic punctuation exploiting wide word contexts. In real-time ASR tasks such as closed captioning of live TV streams, text based punctuation poses two particular challenges: a requirement for low latency (limiting the future context), and the propagation of ASR errors, seen more often for informal or spontaneous speech. This paper investigates Maximum Entropy (MaxEnt) and RNN punctuation models in such real-time conditions, but also compares the models to off-line setups. As expected, the RNN outperforms the MaxEnt baseline system. Limiting future context results only in a slighter performance drop, whereas ASR errors influence punctuation performance considerably. A genre analysis is also carried out w.r.t. the punctuation performance. Our approach is also evaluated on TED talks within the IWSLT English dataset providing comparable results to the state-of-the-art systems.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
References
Batista, F., Moniz, H., Trancoso, I., Mamede, N.: Bilingual experiments on automatic recovery of capitalization and punctuation of automatic speech transcripts. IEEE Trans. Audio Speech Lang. Process. 20(2), 474–485 (2012)
Batista, F.: Recovering capitalization and punctuation marks on speech transcriptions. Ph.D. thesis. Instituto Superior Técnico (2011)
Beeferman, D., Berger, A., Lafferty, J.: Cyberpunc: A lightweight punctuation annotation system for speech. In: Proceedings of ICASSP, pp. 689–692. IEEE (1998)
Che, X., Wang, C., Yang, H., Meinel, C.: Punctuation prediction for unsegmented transcript based on word vector. In: Proceedings of LREC, pp. 654–658 (2016)
Cho, E., Niehues, J., Kilgour, K., Waibel, A.: Punctuation insertion for real-time spoken language translation. In: Proceedings of the Eleventh International Workshop on Spoken Language Translation (2015)
Chollet, F.: Keras: Theano-based deep learning library. Code: https://github.com/fchollet. Documentation: http://keras.io (2015)
Chung, J., Gulcehre, C., Cho, K., Bengio, Y.: Empirical evaluation of gated recurrent neural networks on sequence modeling. arxiv preprint arXiv:1412.3555 (2014)
Gravano, A., Jansche, M., Bacchiani, M.: Restoring punctuation and capitalization in transcribed speech. In: Proceedings of ICASSP, pp. 4741–4744. IEEE (2009)
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Huang, J., Zweig, G.: Maximum entropy model for punctuation annotation from speech. In: Proceedings of Interspeech, pp. 917–920 (2002)
Lu, W., Ng, H.T.: Better punctuation prediction with dynamic conditional random fields. In: Proceedings of EMNLP, pp. 177–186. ACL (2010)
Makhoul, J., Kubala, F., Schwartz, R., Weischedel, R.: Performance measures for information extraction. In: Proceedings of DARPA broadcast news workshop, pp. 249–252 (1999)
Makrai, M.: Filtering wiktionary triangles by linear mapping between distributed models. In: Proceedings of LREC, pp. 2770–2776 (2016)
Moró, A., Szaszák, G.: A phonological phrase sequence modelling approach for resource efficient and robust real-time punctuation recovery. In: Proceedings of Interspeech (2017)
Pahuja, V., Laha, A., Mirkin, S., Raykar, V., Kotlerman, L., Lev, G.: Joint learning of correlated sequence labelling tasks using bidirectional recurrent neural networks. arxiv preprint arXiv:1703.04650 (2017)
Paulik, M., Rao, S., Lane, I., Vogel, S., Schultz, T.: Sentence segmentation and punctuation recovery for spoken language translation. In: Proceedings of ICASSP, pp. 5105–5108. IEEE (2008)
Pennington, J., Socher, R., Manning, C.D.: Glove: global vectors for word representation. In: Proceedings of EMNLP, pp. 1532–1543 (2014)
Povey, D., et al.: The Kaldi speech recognition toolkit. In: Proceedings of ASRU, pp. 1–4. IEEE (2011)
Ratnaparkhi, A., et al.: A maximum entropy model for part-of-speech tagging. In: Proceedings of EMNLP, pp. 133–142 (1996)
Recski, G., Varga, D.: A Hungarian NP chunker. Odd Yearb. 8, 87–93 (2009)
Renals, S., Simpson, M., Bell, P., Barrett, J.: Just-in-time prepared captioning for live transmissions. In: Proceedings of IBC 2016 (2016)
Schuster, M., Paliwal, K.K.: Bidirectional recurrent neural networks. IEEE Trans. Sig. Process. 45(11), 2673–2681 (1997)
Shriberg, E., Stolcke, A., Baron, D.: Can prosody aid the automatic processing of multi-party meetings? evidence from predicting punctuation, disfluencies, and overlapping speech. In: ISCA Tutorial and Research Workshop (ITRW) on Prosody in Speech Recognition and Understanding (2001)
Tilk, O., Alumäe, T.: LSTM for punctuation restoration in speech transcripts. In: Proceedings of Interspeech, pp. 683–687 (2015)
Tilk, O., Alumäe, T.: Bidirectional recurrent neural network with attention mechanism for punctuation restoration. In: Proceedings of Interspeech, pp. 3047–3051 (2016)
Ueffing, N., Bisani, M., Vozila, P.: Improved models for automatic punctuation prediction for spoken and written text. In: Proceedings of Interspeech, pp. 3097–3101 (2013)
Varga, Á., Tarján, B., Tobler, Z., Szaszák, G., Fegyó, T., Bordás, C., Mihajlik, P.: Automatic close captioning for live Hungarian television broadcast speech: a fast and resource-efficient approach. In: Ronzhin, A., Potapova, R., Fakotakis, N. (eds.) SPECOM 2015. LNCS, vol. 9319, pp. 105–112. Springer, Cham (2015). doi:10.1007/978-3-319-23132-7_13
Zsibrita, J., Vincze, V., Farkas, R.: magyarlanc: A toolkit for morphological and dependency parsing of Hungarian. In: Proceedings of RANLP, pp. 763–771 (2013)
Acknowledgements
The authors would like to thank the support of the Hungarian National Research, Development and Innovation Office (NKFIH) under contract ID OTKA-PD-112598; the Pro Progressio Foundation; NVIDIA for kindly providing a Titan GPU for the RNN experiments.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Tündik, M.Á., Tarján, B., Szaszák, G. (2017). Low Latency MaxEnt- and RNN-Based Word Sequence Models for Punctuation Restoration of Closed Caption Data. In: Camelin, N., Estève, Y., Martín-Vide, C. (eds) Statistical Language and Speech Processing. SLSP 2017. Lecture Notes in Computer Science(), vol 10583. Springer, Cham. https://doi.org/10.1007/978-3-319-68456-7_13
Download citation
DOI: https://doi.org/10.1007/978-3-319-68456-7_13
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-68455-0
Online ISBN: 978-3-319-68456-7
eBook Packages: Computer ScienceComputer Science (R0)