Text normalization with convolutional neural networks

Yolchuyeva, Sevinj; Németh, Géza; Gyires-Tóth, Bálint

doi:10.1007/s10772-018-9521-x

Text normalization with convolutional neural networks

Published: 30 May 2018

Volume 21, pages 589–600, (2018)
Cite this article

International Journal of Speech Technology Aims and scope Submit manuscript

Sevinj Yolchuyeva¹,
Géza Németh¹ &
Bálint Gyires-Tóth¹

907 Accesses
6 Altmetric
Explore all metrics

Abstract

Text normalization is a critical step in the variety of tasks involving speech and language technologies. It is one of the vital components of natural language processing, text-to-speech synthesis and automatic speech recognition. Convolutional neural networks (CNNs) have proven their superior performance to recurrent architectures in various application scenarios, like neural machine translation, however their ability in text normalization was not exploited yet. In this paper we investigate and propose a novel CNNs based text normalization method. Training, inference times, accuracy, precision, recall, and F1-score were evaluated on an open-source dataset. The performance of CNNs is evaluated and compared with a variety of different long short-term memory (LSTM) and Bi-LSTM architectures with the same dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Neural Inverse Text Normalization with Numerical Recognition for Low Resource Scenarios

Multi-task Text Normalization Approach for Speech Synthesis

Effectiveness of Self Normalizing Neural Networks for Text Classification

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Notes

https://www.kaggle.com/c/text-normalization-challenge-english-language/data. Accessed December 2017.

References

Allauzen, C., Riley, M., & Roark, B. (2016). Distributed representation and estimation of WFST-based N-Gram models. In Proceeding of ACL workshop on statistical NLP and weighted automata (pp 32–41).
Allen, J., Hunnicutt, M. S., & Klatt, D. (1987). From text to speech—The MITalk system. Cambridge: MIT press.
Google Scholar
Arik, S. O., Chrzanowski, M., Coates, A., Diamos, G., Gibiansky, A., Kang, Y., et al. (2017). Deep voice: real-time neural text-to-speech. In Proceedings of the 34th international conference on machine learning (pp 195–204).
Astudillo, R. F., Amir, S., Lin, W., Silva, M., & Trancoso, I. (2015). Learning word representations from scarce and noisy data with embedding subspaces. In Proceedings of the 53rd annual meeting of the association for computational linguistics and the 7th international joint conference on natural language processing (pp 1074–1084).
Aw, A, Zhang, M., Xiao, J., & Su, J. (2006). A phrase-based statistical model for SMS text normalization. In Proceedings of the COLING/ACL on Main Conference Poster Sessions (pp 33–40).
Baldwin, T., Road, H., & Jose, S. (2015). An in-depth analysis of the effect of text normalization in social media. In Proceedings of the main conference, HLT-NAACL 2015—Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics (pp 420–429)
Bigi, B. (2014). A multilingual text normalization approach. Lecture notes in computer science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) pp. 515–526.
Chollet, F. (2016). Keras: Theano-based deep learning library. Code: https://github.com/fchollet. Documentation: http://keras.io.
Conneau, A., Schwenk, H., Barrault, L., & Lecun, Y. (2016). Very deep convolutional networks for text classification. KI - Künstliche Intelligenz, 26(4), 357–363.
Google Scholar
Cook, P., & Stevenson, S. (2009). An Unsupervised Model for Text Message Normalization. In Proceedings of the workshop on computational approaches to linguistic creativity (pp 71–78).
Daiber, J., & Van Der Goot, R. (2016). The denoised web treebank: Evaluating dependency parsing under noisy input conditions. In Proceedings of the tenth international conference on language resources and evaluation (LREC 2016) (pp 649–653).
El-Desoky, M., & Schuller, B. (2016). Deep bidirectional long short-term memory recurrent neural networks for grapheme-to-phoneme conversion utilizing complex many-to-many alignments. In Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH (pp 2836–2840).
Elman, J. L. (1991). Distributed representations, simple recurrent networks, and grammatical structure. Machine Learning, 7(2), 195–225.
Google Scholar
Gehring, J., Auli, M., Grangier, D. & Dauphin, Y. N. (2016). A convolutional encoder model for neural machine translation. In Proceedings of the 55th annual meeting of the association for computational linguistics (pp 123–135).
Gehring, J., Auli, M., Grangier, D. & Dauphin, Y. N. (2017). Convolutional sequence to sequence learning. arXiv preprint arXiv: 1705.03122.
Goldberg, Y. (2016). A primer on neural network models for natural language processing. Journal of Artificial Intelligence Research, 57, 345–420.
Article MathSciNet MATH Google Scholar
Graves, A., Fernandez, S., Gomez, F. & Schmidhuber, J. (2006). Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In Proceedings of the 23rd international conference on machine learning (pp 369–376).
Gu, J., Wang, Z., Kuen, J., Ma, L., Shahroudy, A., Shuai, B., Liu, T., Wang, X., Wang, L., Wang, G., Cai, J., & Chen, T. (2015). Recent advances in convolutional neural networks. arXiv preprint arXiv:1512.07108.
Greff, K., Rupesh, K., & Srivastava, J. S. (2017). Highway and residual networks learn unrolled iterative estimation. In 5th international conference on learning representations.
Ioffe, S., & Szegedy, C. (2015). Batch normalization: Accelerating deep network training by reducing internal covariate shift. In Proceedings of the 32nd international conference on machine learning.
Kingma, D. P., & Ba, J. (2015). Adam: A method for stochastic optimization. In Proceedings of the 3rd International Conference on Learning Representations (ICLR).
Kobus, C., Yvon, F., & Damnati, G. (2008). Normalizing SMS: Are two metaphors better than one? In Proceedings of the 22nd international conference on computational linguistics association for computational linguistics (pp 441–448).
Kumar, V., & Sridhar, R. (2015). Unsupervised text normalization using distributed representations of words and phrases. In Proceedings of NAACL-HLT 2015 (pp 8–16).
Lu, L., Zhang, X., & Renais, S. (2016). On training the recurrent neural network encoder-decoder for large vocabulary end-to-end speech recognition. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP (pp 5060–5064).
Mikolov, T., Corrado, G., Chen, K., & Dean, J. (2013). Efficient estimation of word representations in vector space. In Proceedings of the international conference on learning representations (ICLR 2013) (pp 1–12).
Nair, V. & Hinton., G. E. (2010). Rectified linear units improve restricted Boltzmann machines. In Proceedings 27th international conference on machine learning (pp. 807–814).
Pundak, G., & Sainath, T. N. (2017). Highway-LSTM and recurrent highway networks for speech recognition. Interspeech 2017, 1303–1307.
Article Google Scholar
Roark, B., et al. (2012). The OpenGrm open-source finite-state grammar software libraries. In Proceedings of the ACL 2012 System Demonstrations (pp 61–66).
Shang, W., & Chiu, J. (2017). Exploring normalization in deep residual networks with concatenated rectified linear units batch normalization in ResNets. Proceedings of the 31th Conference on Artificial Intelligence (AAAI 2017) 1, 1509–16.
Google Scholar
Sokolova, M., & Lapalme, G. (2009). A systematic analysis of performance measures for classification tasks. Information Processing and Management 45, 427–437.
Article Google Scholar
Sonmez, C., Ozgur, A., & Ozg, A. (2014). A graph-based approach for contextual text normalization. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP) (Iv) (pp 313–324).
Sproat, R., et al. (2001). Normalization of non-standard words. Computer Speech & Language, 15(3), 287–333.
Article Google Scholar
Sproat, R., & Jaitly, N. (2016). RNN Approaches to text normalization: a challenge. arXiv: preprint arXiv: 1611.00068.
Sproat, R., & Hall, K. (2014). Applications of maximum entropy rankers to problems in spoken language processing. In Proceedings of the annual conference of the international speech communication association, INTERSPEECH (pp 761–64).
Sridhar, R., & Kumar, V. (2015). Unsupervised topic modeling for short texts using distributed representations of words. In Proceedings of the 1st workshop on vector space modeling for natural language processing (pp 192–200).
Srivastava, R. K., Greff, K., & Schmidhuber, J. (2015). Training very deep networks training very deep networks. In NIPS’15 Proceedings of the 28th international conference on neural information processing systems (pp 2377–2385).
Sun, R., & Lee Giles, C. (2001) Sequence learning: from recognition and prediction to sequential decision making. IEEE Intelligent Systems, 16(4), 67–70.
Article Google Scholar
Sundermeyer, M., Alkhouli, T., Wuebker, J., & Ney, H. (2014). Translation modeling with bidirectional recurrent neural networks. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP) (pp 14–25).
Sutskever, I., Vinyals, O., & Quoc, V. L. (2014). Sequence to sequence learning with neural networks. In Advances in neural information processing systems (NIPS) (pp 3104–3112).
Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., & Rabinovich, A. (2015). Going deeper with convolutions. In Proceedings of the IEEE computer society conference on computer vision and pattern recognition (pp 1–9).
The Theano Development. (2016). A Python framework for fast computation of mathematical expressions, arXiv preprint arXiv:1605.02688.
Weber, D., & Zhekova, D. (2016). TweetNorm: text normalization on Italian Twitter Data. In Proceedings of the 13th Conference on Natural Language Processing (pp 306–312).
Young, T., Hazarika, D., Poria, S., & Cambria, E. (2017). Recent trends in deep learning based natural language processing. arXiv: preprint arXiv:1708.02709v4.
Zhang, C., Baldwin, T., Kimelfeld, B., & Li, Y. (2013). Adaptive parser-centric text normalization. In Proceedings of the 34th international conference on machine learning, Sydney.
Zhang, X., Zhao, J., & LeCun, Y. (2015). Character-level convolutional networks for text classification. In Advances in neural information processing systems (pp. 1–9).
Zhou, P., Qi, Z., Zheng, S., Xu, J., Bao, H., & Xu, B. (2016). Text classification improved by integrating bidirectional LSTM with two-dimensional max pooling. In Proceedings of COLING 2016, the 26th international conference on computational linguistics: technical papers (pp 3485–3495).
Zilly, J. G., Srivastava, R. K., Jan, K., & Schmidhuber, J. (2017). Recurrent highway networks. In Proceedings of the 34th international conference on machine learning, Sydney.

Download references

Acknowledgements

The research presented in this paper has been supported by the European Union, co-financed by the European Social Fund (EFOP-3.6.2-16-2017-00013), by the VUK project (AAL 2014-183), the DANSPLAT project (Eureka 9944) and by the BME-Artificial Intelligence FIKP grant of EMMI (BME FIKP-MI/SC). We gratefully acknowledge the support of NVIDIA Corporation with the donation of the Titan Xp GPU used for this research.

Author information

Authors and Affiliations

Department of Telecommunications and Media Informatics, University of Budapest Technology and Economics, Magyar Tudósok krt. 2., Budapest, 1111, Hungary
Sevinj Yolchuyeva, Géza Németh & Bálint Gyires-Tóth

Authors

Sevinj Yolchuyeva
View author publications
You can also search for this author in PubMed Google Scholar
Géza Németh
View author publications
You can also search for this author in PubMed Google Scholar
Bálint Gyires-Tóth
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sevinj Yolchuyeva.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yolchuyeva, S., Németh, G. & Gyires-Tóth, B. Text normalization with convolutional neural networks. Int J Speech Technol 21, 589–600 (2018). https://doi.org/10.1007/s10772-018-9521-x

Download citation

Received: 01 December 2017
Accepted: 21 May 2018
Published: 30 May 2018
Issue Date: September 2018
DOI: https://doi.org/10.1007/s10772-018-9521-x

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Text normalization with convolutional neural networks

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Neural Inverse Text Normalization with Numerical Recognition for Low Resource Scenarios

Multi-task Text Normalization Approach for Speech Synthesis

Effectiveness of Self Normalizing Neural Networks for Text Classification

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Text normalization with convolutional neural networks

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Neural Inverse Text Normalization with Numerical Recognition for Low Resource Scenarios

Multi-task Text Normalization Approach for Speech Synthesis

Effectiveness of Self Normalizing Neural Networks for Text Classification

Explore related subjects

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation