Skip to main content

Grapheme to Phoneme Translation Using Conditional Random Fields with Re-Ranking

  • Conference paper
  • First Online:
Text, Speech, and Dialogue (TSD 2016)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9924))

Included in the following conference series:

Abstract

Grapheme to phoneme (G2P) translation is an important part of many applications including text to speech, automatic speech recognition, and phonetic similarity matching. Although G2P models have been studied thoroughly in the literature, we propose a G2P system which is optimized for producing a high-quality top-k list of candidate pronunciations for an input grapheme string. Our pipeline approach uses Conditional Random Fields (CRF) to predict phonemes from graphemes and a discriminative re-ranker, which incorporates information from previous stages in the pipeline with a graphone language model to construct a high-quality ranked list of results. We evaluate our findings against the widely used CMUDict dataset and demonstrate competitive performance with state-of-the-art G2P methods. Additionally, using entries with multiple valid pronunciations, we show that our re-ranking approach out-performs ranking using only a smoothed graphone language model, a technique employed by many recent publications. Lastly, we released our system as an open-source G2P toolkit available at http://bit.ly/83yysKL.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Klatt, D.H.: Review of the ARPA speech understanding project. J. Acoust. Soc. Am. 62(6), 1345–1366 (1977)

    Article  Google Scholar 

  2. Kaplan, R.M., Kay, M.: Regular models of phonological rule systems. Comput. Linguist. 20(3), 331–378 (1994)

    Google Scholar 

  3. Black, A.W., Lenzo, K., Pagel, V.: Issues in building general letter to sound rules. In: ESCA Synthesis Workshop, Australia, pp. 77–80 (1998)

    Google Scholar 

  4. McCulloch, N., Bedworth, M., Bridle, J.: NETspeak a re-implementation of NETtalk. Comput. Speech Lang. 2(3), 289–302 (1987)

    Article  Google Scholar 

  5. Torkkola, K.: An efficient way to learn english grapheme-to-phoneme rules automatically. In: IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), vol. 2, pp. 199–202. IEEE (1993)

    Google Scholar 

  6. Bisani, M., Ney, H.: Joint-sequence models for grapheme-to-phoneme conversion. Speech Commun. 50(5), 434–451 (2008)

    Article  Google Scholar 

  7. Lafferty, J., McCallum, A., Pereira, F.: Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: Proceedings of ICML, pp. 282–289 (2001)

    Google Scholar 

  8. Wang, D., King, S.: Letter-to-sound pronunciation prediction using conditional random fields. IEEE Signal Process. Lett. 18(2), 122–125 (2011)

    Article  Google Scholar 

  9. Novak, J.R., Minematsu, N., Hirose, K.: WFST-based grapheme-to-phoneme conversion: open source tools for alignment, model-building and decoding. In: 10th International Workshop on Finite State Methods and Natural Language Processing, p. 45 (2012)

    Google Scholar 

  10. Novak, J.R., Minematsu, N., Hirose, K., Hori, C., Kashioka, H., Dixon, P.R.: Improving WFST-based G2P conversion with alignment constraints and RNNLM n-best rescoring. In: Interspeech (2012)

    Google Scholar 

  11. Wu, K., Allauzen, C., Hall, K., Riley, M., Roark, B.: Encoding linear models as weighted finite-state transducers. In: Interspeech (2014)

    Google Scholar 

  12. Rao, K., Peng, F., Sak, H., Beaufays, F.: Grapheme-to-phoneme conversion using long short-term memory recurrent neural networks. In: IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) (2015)

    Google Scholar 

  13. Weide, R.: The CMU pronunciation dictionary, release 0.7a (2014). http://www.speech.cs.cmu.edu/cgi-bin/cmudict

  14. McCallum, A.K.: Mallet: a machine learning for language toolkit (2002). http://mallet.cs.umass.edu

  15. Galescu, L., Allen, J.F.: Pronunciation of proper names with a joint n-gram model for bi-directional grapheme-to-phoneme conversion. In: 7th International Conference on Spoken Language Processing, pp. 109–112 (2002)

    Google Scholar 

  16. Kheang, S., Katsurada, K., Iribe, Y., Nitta, T.: Solving the phoneme conflict in grapheme-to-phoneme conversion using a two-stage neural network-based approach. IEICE Trans. Inf. Syst. 97(4), 901–910 (2014)

    Article  Google Scholar 

  17. Eger, S.: Do we need bigram alignment models? On the effect of alignment quality on transduction accuracy in G2P. Proc. EMNLP 18, 127–136 (2015)

    MathSciNet  Google Scholar 

  18. Jiampojamarn, S., Kondrak, G.: Online discriminative training for grapheme-to-phoneme conversion. In: Interspeech, pp. 1303–1306 (2009)

    Google Scholar 

  19. Lehnen, P., Allauzen, A., Lavergne, T., Yvon, F., Hahn, S., Ney, H.: Structure learning in hidden conditional random fields for grapheme-to-phoneme conversion. In: Interspeech, pp. 2326–2330 (2013)

    Google Scholar 

  20. Lehnen, P., Hahn, S., Guta, A., Ney, H.: Incorporating alignments into conditional random fields for grapheme to phoneme conversion. In: IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pp. 4916–4919. IEEE (2011)

    Google Scholar 

  21. Jiampojamarn, S., Kondrak, G.: Letter-phoneme alignment: an exploration. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics (ACL), pp. 780–788 (2010)

    Google Scholar 

  22. Sejnowski, T.J., Rosenberg, C.R.: Parallel networks that learn to pronounce English text. J. Complex Syst. 1(1), 145–168 (1987)

    MATH  Google Scholar 

  23. Wang, X., Sim, K.C.: Integrating conditional random fields and joint multi-gram model with syllabic features for grapheme-to-phone conversion. In: Interspeech, pp. 2321–2325 (2013)

    Google Scholar 

  24. Bartlett, S., Kondrak, G., Cherry, C.: On the syllabification of phonemes. In: Proceedings of NAACL-HLT, pp. 308–316 (2009)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Stephen Ash .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Ash, S., Lin, D. (2016). Grapheme to Phoneme Translation Using Conditional Random Fields with Re-Ranking. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds) Text, Speech, and Dialogue. TSD 2016. Lecture Notes in Computer Science(), vol 9924. Springer, Cham. https://doi.org/10.1007/978-3-319-45510-5_36

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-45510-5_36

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-45509-9

  • Online ISBN: 978-3-319-45510-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics