Skip to main content
Log in

Text normalization with convolutional neural networks

  • Published:
International Journal of Speech Technology Aims and scope Submit manuscript

Abstract

Text normalization is a critical step in the variety of tasks involving speech and language technologies. It is one of the vital components of natural language processing, text-to-speech synthesis and automatic speech recognition. Convolutional neural networks (CNNs) have proven their superior performance to recurrent architectures in various application scenarios, like neural machine translation, however their ability in text normalization was not exploited yet. In this paper we investigate and propose a novel CNNs based text normalization method. Training, inference times, accuracy, precision, recall, and F1-score were evaluated on an open-source dataset. The performance of CNNs is evaluated and compared with a variety of different long short-term memory (LSTM) and Bi-LSTM architectures with the same dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Notes

  1. https://www.kaggle.com/c/text-normalization-challenge-english-language/data. Accessed December 2017.

References

  • Allauzen, C., Riley, M., & Roark, B. (2016). Distributed representation and estimation of WFST-based N-Gram models. In Proceeding of ACL workshop on statistical NLP and weighted automata (pp 32–41).

  • Allen, J., Hunnicutt, M. S., & Klatt, D. (1987). From text to speech—The MITalk system. Cambridge: MIT press.

    Google Scholar 

  • Arik, S. O., Chrzanowski, M., Coates, A., Diamos, G., Gibiansky, A., Kang, Y., et al. (2017). Deep voice: real-time neural text-to-speech. In Proceedings of the 34th international conference on machine learning (pp 195–204).

  • Astudillo, R. F., Amir, S., Lin, W., Silva, M., & Trancoso, I. (2015). Learning word representations from scarce and noisy data with embedding subspaces. In Proceedings of the 53rd annual meeting of the association for computational linguistics and the 7th international joint conference on natural language processing (pp 1074–1084).

  • Aw, A, Zhang, M., Xiao, J., & Su, J. (2006). A phrase-based statistical model for SMS text normalization. In Proceedings of the COLING/ACL on Main Conference Poster Sessions (pp 33–40).

  • Baldwin, T., Road, H., & Jose, S. (2015). An in-depth analysis of the effect of text normalization in social media. In Proceedings of the main conference, HLT-NAACL 2015—Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics (pp 420–429)

  • Bigi, B. (2014). A multilingual text normalization approach. Lecture notes in computer science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) pp. 515–526.

  • Chollet, F. (2016). Keras: Theano-based deep learning library. Code: https://github.com/fchollet. Documentation: http://keras.io.

  • Conneau, A., Schwenk, H., Barrault, L., & Lecun, Y. (2016). Very deep convolutional networks for text classification. KI - Künstliche Intelligenz, 26(4), 357–363.

    Google Scholar 

  • Cook, P., & Stevenson, S. (2009). An Unsupervised Model for Text Message Normalization. In Proceedings of the workshop on computational approaches to linguistic creativity (pp 71–78).

  • Daiber, J., & Van Der Goot, R. (2016). The denoised web treebank: Evaluating dependency parsing under noisy input conditions. In Proceedings of the tenth international conference on language resources and evaluation (LREC 2016) (pp 649–653).

  • El-Desoky, M., & Schuller, B. (2016). Deep bidirectional long short-term memory recurrent neural networks for grapheme-to-phoneme conversion utilizing complex many-to-many alignments. In Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH (pp 2836–2840).

  • Elman, J. L. (1991). Distributed representations, simple recurrent networks, and grammatical structure. Machine Learning, 7(2), 195–225.

    Google Scholar 

  • Gehring, J., Auli, M., Grangier, D. & Dauphin, Y. N. (2016). A convolutional encoder model for neural machine translation. In Proceedings of the 55th annual meeting of the association for computational linguistics (pp 123–135).

  • Gehring, J., Auli, M., Grangier, D. & Dauphin, Y. N. (2017). Convolutional sequence to sequence learning. arXiv preprint arXiv: 1705.03122.

  • Goldberg, Y. (2016). A primer on neural network models for natural language processing. Journal of Artificial Intelligence Research, 57, 345–420.

    Article  MathSciNet  MATH  Google Scholar 

  • Graves, A., Fernandez, S., Gomez, F. & Schmidhuber, J. (2006). Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In Proceedings of the 23rd international conference on machine learning (pp 369–376).

  • Gu, J., Wang, Z., Kuen, J., Ma, L., Shahroudy, A., Shuai, B., Liu, T., Wang, X., Wang, L., Wang, G., Cai, J., & Chen, T. (2015). Recent advances in convolutional neural networks. arXiv preprint arXiv:1512.07108.

  • Greff, K., Rupesh, K., & Srivastava, J. S. (2017). Highway and residual networks learn unrolled iterative estimation. In 5th international conference on learning representations.

  • Ioffe, S., & Szegedy, C. (2015). Batch normalization: Accelerating deep network training by reducing internal covariate shift. In Proceedings of the 32nd international conference on machine learning.

  • Kingma, D. P., & Ba, J. (2015). Adam: A method for stochastic optimization. In Proceedings of the 3rd International Conference on Learning Representations (ICLR).

  • Kobus, C., Yvon, F., & Damnati, G. (2008). Normalizing SMS: Are two metaphors better than one? In Proceedings of the 22nd international conference on computational linguistics association for computational linguistics (pp 441–448).

  • Kumar, V., & Sridhar, R. (2015). Unsupervised text normalization using distributed representations of words and phrases. In Proceedings of NAACL-HLT 2015 (pp 8–16).

  • Lu, L., Zhang, X., & Renais, S. (2016). On training the recurrent neural network encoder-decoder for large vocabulary end-to-end speech recognition. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP (pp 5060–5064).

  • Mikolov, T., Corrado, G., Chen, K., & Dean, J. (2013). Efficient estimation of word representations in vector space. In Proceedings of the international conference on learning representations (ICLR 2013) (pp 1–12).

  • Nair, V. & Hinton., G. E. (2010). Rectified linear units improve restricted Boltzmann machines. In Proceedings 27th international conference on machine learning (pp. 807–814).

  • Pundak, G., & Sainath, T. N. (2017). Highway-LSTM and recurrent highway networks for speech recognition. Interspeech 2017, 1303–1307.

    Article  Google Scholar 

  • Roark, B., et al. (2012). The OpenGrm open-source finite-state grammar software libraries. In Proceedings of the ACL 2012 System Demonstrations (pp 61–66).

  • Shang, W., & Chiu, J. (2017). Exploring normalization in deep residual networks with concatenated rectified linear units batch normalization in ResNets. Proceedings of the 31th Conference on Artificial Intelligence (AAAI 2017) 1, 1509–16.

    Google Scholar 

  • Sokolova, M., & Lapalme, G. (2009). A systematic analysis of performance measures for classification tasks. Information Processing and Management 45, 427–437.

    Article  Google Scholar 

  • Sonmez, C., Ozgur, A., & Ozg, A. (2014). A graph-based approach for contextual text normalization. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP) (Iv) (pp 313–324).

  • Sproat, R., et al. (2001). Normalization of non-standard words. Computer Speech & Language, 15(3), 287–333.

    Article  Google Scholar 

  • Sproat, R., & Jaitly, N. (2016). RNN Approaches to text normalization: a challenge. arXiv: preprint arXiv: 1611.00068.

  • Sproat, R., & Hall, K. (2014). Applications of maximum entropy rankers to problems in spoken language processing. In Proceedings of the annual conference of the international speech communication association, INTERSPEECH (pp 761–64).

  • Sridhar, R., & Kumar, V. (2015). Unsupervised topic modeling for short texts using distributed representations of words. In Proceedings of the 1st workshop on vector space modeling for natural language processing (pp 192–200).

  • Srivastava, R. K., Greff, K., & Schmidhuber, J. (2015). Training very deep networks training very deep networks. In NIPS’15 Proceedings of the 28th international conference on neural information processing systems (pp 2377–2385).

  • Sun, R., & Lee Giles, C. (2001) Sequence learning: from recognition and prediction to sequential decision making. IEEE Intelligent Systems, 16(4), 67–70.

    Article  Google Scholar 

  • Sundermeyer, M., Alkhouli, T., Wuebker, J., & Ney, H. (2014). Translation modeling with bidirectional recurrent neural networks. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP) (pp 14–25).

  • Sutskever, I., Vinyals, O., & Quoc, V. L. (2014). Sequence to sequence learning with neural networks. In Advances in neural information processing systems (NIPS) (pp 3104–3112).

  • Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., & Rabinovich, A. (2015). Going deeper with convolutions. In Proceedings of the IEEE computer society conference on computer vision and pattern recognition (pp 1–9).

  • The Theano Development. (2016). A Python framework for fast computation of mathematical expressions, arXiv preprint arXiv:1605.02688.

  • Weber, D., & Zhekova, D. (2016). TweetNorm: text normalization on Italian Twitter Data. In Proceedings of the 13th Conference on Natural Language Processing (pp 306–312).

  • Young, T., Hazarika, D., Poria, S., & Cambria, E. (2017). Recent trends in deep learning based natural language processing. arXiv: preprint arXiv:1708.02709v4.

  • Zhang, C., Baldwin, T., Kimelfeld, B., & Li, Y. (2013). Adaptive parser-centric text normalization. In Proceedings of the 34th international conference on machine learning, Sydney.

  • Zhang, X., Zhao, J., & LeCun, Y. (2015). Character-level convolutional networks for text classification. In Advances in neural information processing systems (pp. 1–9).

  • Zhou, P., Qi, Z., Zheng, S., Xu, J., Bao, H., & Xu, B. (2016). Text classification improved by integrating bidirectional LSTM with two-dimensional max pooling. In Proceedings of COLING 2016, the 26th international conference on computational linguistics: technical papers (pp 3485–3495).

  • Zilly, J. G., Srivastava, R. K., Jan, K., & Schmidhuber, J. (2017). Recurrent highway networks. In Proceedings of the 34th international conference on machine learning, Sydney.

Download references

Acknowledgements

The research presented in this paper has been supported by the European Union, co-financed by the European Social Fund (EFOP-3.6.2-16-2017-00013), by the VUK project (AAL 2014-183), the DANSPLAT project (Eureka 9944) and by the BME-Artificial Intelligence FIKP grant of EMMI (BME FIKP-MI/SC). We gratefully acknowledge the support of NVIDIA Corporation with the donation of the Titan Xp GPU used for this research.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sevinj Yolchuyeva.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yolchuyeva, S., Németh, G. & Gyires-Tóth, B. Text normalization with convolutional neural networks. Int J Speech Technol 21, 589–600 (2018). https://doi.org/10.1007/s10772-018-9521-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10772-018-9521-x

Keywords

Navigation