Abstract
Restoring punctuation and capitalization in the output of automatic speech recognition (ASR) system greatly improves readability and extends the number of downstream applications. We present a Transformer-based method for restoring punctuation and capitalization for Latvian and English, following the established approach of using neural machine translation (NMT) models. NMT methods here pose a challenge as the length of the predicted sequence does not always match the length of the input sequence. We offer two solutions to this problem: a simple target sequence cutting or padding by force and a more sophisticated attention alignment-based method. Our approach reaches new state of the art results for Latvian and competitive results on English.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
Europarl results were not in print version of the paper, but they can be found at https://github.com/ottokart/punctuator2.
References
Abadi, M., et al.: Tensorflow: a system for large-scale machine learning. In: OSDI, vol. 16, pp. 265–283 (2016)
Agbago, A., Foster, G.: Truecasing for the portage system. In. Recent Advances in Natural Language Processing (2005)
Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473 (2014)
Bai, S., Kolter, J.Z., Koltun, V.: An empirical evaluation of generic convolutional and recurrent networks for sequence modeling. arXiv preprint arXiv:1803.01271 (2018)
Batista, F., Moniz, H., Trancoso, I., Mamede, N.: Bilingual experiments on automatic recovery of capitalization and punctuation of automatic speech transcripts. IEEE Trans. Audio Speech Lang. Process. 20(2), 474–485 (2012)
Beaufays, F., Strope, B.: Language model capitalization. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6749–6752. IEEE (2013)
Brown, E.W., Coden, A.R.: Capitalization recovery for text. In: Coden, A.R., Brown, E.W., Srinivasan, S. (eds.) IRTSA 2001. LNCS, vol. 2273, pp. 11–22. Springer, Heidelberg (2002). https://doi.org/10.1007/3-540-45637-6_2
Cettolo, M., Niehues, J., Stüker, S., Bentivogli, L., Federico, M.: Report on the 10th IWSLT evaluation campaign. In: Proceedings of the International Workshop on Spoken Language Translation, Heidelberg, Germany (2013)
Cettolo, M., Niehues, J., Stüker, S., Bentivogli, L., Federico, M.: Report on the 11th IWSLT evaluation campaign, IWSLT 2014. In: Proceedings of the International Workshop on Spoken Language Translation, Hanoi, Vietnam (2014)
Cettolo, M., Niehues, J., Stüker, S., Bentivogli, L., Federico, M.: Report on the 12th IWSLT evaluation campaign, IWSLT 2015. In: Proceedings of the International Workshop on Spoken Language Translation, Da Nang, Vietnam (2015)
Chelba, C., Acero, A.: Adaptation of maximum entropy capitalizer: little data can help a lot. Comput. Speech Lang. 20(4), 382–399 (2006)
Chen, M.X., et al.: The best of both worlds: combining recent advances in neural machine translation. arXiv preprint arXiv:1804.09849 (2018)
Cho, E., et al.: A real-world system for simultaneous translation of German lectures. In: INTERSPEECH, pp. 3473–3477 (2013)
Cho, E., Niehues, J., Waibel, A.: Segmentation and punctuation prediction in speech language translation using a monolingual translation system. In: International Workshop on Spoken Language Translation (IWSLT) 2012 (2012)
Cho, E., Niehues, J., Waibel, A.: NMT-based segmentation and punctuation insertion for real-time spoken language translation. In: Proc. Interspeech 2017. pp. 2645–2649 (2017), https://doi.org/10.21437/Interspeech.2017-1320
Gravano, A., Jansche, M., Bacchiani, M.: Restoring punctuation and capitalization in transcribed speech. In: IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2009, pp. 4741–4744. IEEE (2009)
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
Koehn, P.: Europarl: a parallel corpus for statistical machine translation. In: MT Summit, vol. 5, pp. 79–86 (2005)
Koehn, P., et al.: Moses: open source toolkit for statistical machine translation. In: Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions, pp. 177–180. Association for Computational Linguistics (2007)
Lita, L.V., Ittycheriah, A., Roukos, S., Kambhatla, N.: tRuEcasing. In: Proceedings of the 41st Annual Meeting on Association for Computational Linguistics, ACL 2003, vol. 1. pp. 152–159. Association for Computational Linguistics, Stroudsburg (2003). https://doi.org/10.3115/1075096.1075116
Lu, W., Ng, H.T.: Better punctuation prediction with dynamic conditional random fields. In: Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, pp. 177–186. Association for Computational Linguistics (2010)
Ostendorf, M., et al.: Speech segmentation and spoken document processing. IEEE Sig. Process. Mag. 25(3), 59–69 (2008)
Peitz, S., Freitag, M., Mauser, A., Ney, H.: Modeling punctuation prediction as machine translation. In: International Workshop on Spoken Language Translation (IWSLT) 2011 (2011)
Rao, S., Lane, I., Schultz, T.: Optimizing sentence segmentation for spoken language translation. In: Eighth Annual Conference of the International Speech Communication Association (2007)
Salimbajevs, A.: Bidirectional LSTM for automatic punctuation restoration. In: Human Language Technologies-The Baltic Perspective: Proceedings of the Seventh International Conference Baltic HLT 2016, vol. 289, p. 59. IOS Press (2016)
Sennrich, R., Haddow, B., Birch, A.: Neural machine translation of rare words with subword units. arXiv preprint arXiv:1508.07909 (2015)
Tilk, O., Alumäe, T.: Bidirectional recurrent neural network with attention mechanism for punctuation restoration. In: Interspeech, pp. 3047–3051 (2016)
Vaswani, A., et al.: Tensor2tensor for neural machine translation. CoRR abs/1803.07416 (2018), http://arxiv.org/abs/1803.07416
Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, pp. 6000–6010 (2017)
Wang, W., Knight, K., Marcu, D.: Capitalizing machine translation. In: Proceedings of the Main Conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics, pp. 1–8. Association for Computational Linguistics (2006)
Acknowledgements
The research has been supported by the European Regional Development Fund within the project “Neural Network Modelling for Inflected Natural Languages” No. 1.1.1.1/16/A/215.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Vāravs, A., Salimbajevs, A. (2018). Restoring Punctuation and Capitalization Using Transformer Models. In: Dutoit, T., Martín-Vide, C., Pironkov, G. (eds) Statistical Language and Speech Processing. SLSP 2018. Lecture Notes in Computer Science(), vol 11171. Springer, Cham. https://doi.org/10.1007/978-3-030-00810-9_9
Download citation
DOI: https://doi.org/10.1007/978-3-030-00810-9_9
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-00809-3
Online ISBN: 978-3-030-00810-9
eBook Packages: Computer ScienceComputer Science (R0)