Comparing NLP Solutions for the Disambiguation of French Heterophonic Homographs for End-to-End TTS Systems

Hajj, Maria-Loulou; Lenglet, Martin; Perrotin, Olivier; Bailly, Gérard

doi:10.1007/978-3-031-20980-2_23

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13721))

Included in the following conference series:

International Conference on Speech and Computer

1104 Accesses

Abstract

This paper presents a study on different NLP solutions for French homographs disambiguation for text-to-speech systems. Solutions are compared using a home-made corpus of 8137 sentences extracted from the Web, comprising roughly one hundred instances of each of 34 pairs of prototypical words. A disambiguation system based on per-case Linear Discriminant Analysis (LDA) classifiers using contextual word embeddings as input features achieves state-of-the-art F-scores superior to 0.96.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Yet Another Suite of Multilingual NLP Tools

MultiAligNet: Cross-lingual Knowledge Bridges Between Words and Senses

Comparative Study on Different Approaches Used in Word Sense Disambiguation—NLP

Notes

1.
https://fr.wiktionary.org/wiki/Categorie:Homographes_non_homophones_en_francais
2.
e.g. the root “techniqu” is only used 5 times in our audiobook database: with no additional patterns from a pronunciation lexicon, “ch” will likely be mispronounced with the post-alveolar fricative .
3.
https://huggingface.co/gilf/french-postag-model.

References

Bisani, M., Ney, H.: Joint-sequence models for grapheme-to-phoneme conversion. Speech Commun. 50(5), 434–451 (2008)
Article Google Scholar
Black, A.W., Lenzo, K., Pagel, V.: Issues in building general letter to sound rules. In: The Third ESCA/COCOSDA Workshop (ETRW) on Speech Synthesis. Jenolan Caves House, Blue Mountains, Australia (1998)
Google Scholar
Bosse, M.L., Tainturier, M.J., Valdois, S.: Developmental dyslexia: the visual attention span deficit hypothesis. Cognition 104(2), 198–230 (2007)
Article Google Scholar
Goldman, J.P., Laenzlinger, C., Wehrli, E.: La phonétisation de plus, tous et de certains nombres: une analyse phono-syntaxique. Actes de TALN99, Cargese, Corse, pp. 165–174 (1999)
Google Scholar
Gorman, K., Mazovetskiy, G., Nikolaev, V.: Improving homograph disambiguation with supervised machine learning. In: Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018) (2018)
Google Scholar
Kastner, K., Santos, J.F., Bengio, Y., Courville, A.: Representation mixing for TTS synthesis. In: ICASSP 2019–2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5906–5910. IEEE (2019)
Google Scholar
Kumar, A.: NLP pre-trained models explained with examples (2021)
Google Scholar
Le, H., et al.: Flaubert: unsupervised language model pre-training for French (2019). https://arxiv.org/abs/1912.05372
Lenglet, M., Perrotin, O., Bailly, G.: Modélisation de la parole avec tacotron2: analyse acoustique et phonétique des plongements de caractère. In: 34$^{e}$ Journées d’Études sur la Parole (JEP), pp. 845–854. Noirmoutier, France (2022)
Google Scholar
Nicolis, M., Klimkov, V.: Homograph disambiguation with contextual word embeddings for TTS systems. In: ISCA Speech Synthesis Workshop (SSW), pp. 222–226 (2021). https://doi.org/10.21437/SSW.2021-39
Ping, W., et al.: Deep voice 3: scaling text-to-speech with convolutional sequence learning. arXiv preprint arXiv:1710.07654 (2017)
Ren, Y., et al.: Fastspeech: fast, robust and controllable text to speech. Adv. Neural Inf. Process. Syst. 32 (2019)
Google Scholar
Shen, J., et al.: Natural TTS synthesis by conditioning wavenet on mel spectrogram predictions (2018). https://arxiv.org/abs/1712.05884
Sun, M., Bellegarda, J.R.: Improved pos tagging for text-to-speech synthesis. In: 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5384–5387. IEEE (2011)
Google Scholar
Taylor, J., Richmond, K.: Analysis of pronunciation learning in end-to-end speech synthesis. In: INTERSPEECH, pp. 2070–2074 (2019)
Google Scholar
Yao, K., Zweig, G.: Sequence-to-sequence neural net models for grapheme-to-phoneme conversion. arXiv preprint arXiv:1506.00196 (2015)

Download references

Acknowledgments

Supported by the ANR 19-P3IA-0003 MIAI. This work was performed using HPC/AI resources from GENCI-IDRIS (Grant AD011011542).

Author information

Authors and Affiliations

Grenoble -Alps Univ, GIPSA-Lab, 11, rue des Mathématiques, St Martin d’Hères, France
Maria-Loulou Hajj, Martin Lenglet, Olivier Perrotin & Gérard Bailly

Authors

Maria-Loulou Hajj
View author publications
You can also search for this author in PubMed Google Scholar
Martin Lenglet
View author publications
You can also search for this author in PubMed Google Scholar
Olivier Perrotin
View author publications
You can also search for this author in PubMed Google Scholar
Gérard Bailly
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Gérard Bailly .

Editor information

Editors and Affiliations

Indian Institute of Technology Dharwad, Dharwad, India
S. R. Mahadeva Prasanna
St. Petersburg Federal Research Center of the Russian Academy of Sciences, St. Petersburg, Russia
Alexey Karpov
Koneru Lakshmaiah Education Foundation, Vaddeswaram, India
K. Samudravijaya
KIIT Group of Colleges, Gurugram, India
Shyam S. Agrawal

A Appendices

1.1 A.1 Example of Embeddings of Word Pairs (B-wrd)

For example, processing two sentences using the word “as” by the LDA from FlauBERT embeddings of “as”:

1.2 A.2 Example of Embeddings of Class Pairs (B-grp)

Phonetization of heterophone homographes of a FaceBook post of French poetry with no errors:

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hajj, ML., Lenglet, M., Perrotin, O., Bailly, G. (2022). Comparing NLP Solutions for the Disambiguation of French Heterophonic Homographs for End-to-End TTS Systems. In: Prasanna, S.R.M., Karpov, A., Samudravijaya, K., Agrawal, S.S. (eds) Speech and Computer. SPECOM 2022. Lecture Notes in Computer Science(), vol 13721. Springer, Cham. https://doi.org/10.1007/978-3-031-20980-2_23

Download citation

DOI: https://doi.org/10.1007/978-3-031-20980-2_23
Published: 10 November 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-20979-6
Online ISBN: 978-3-031-20980-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Comparing NLP Solutions for the Disambiguation of French Heterophonic Homographs for End-to-End TTS Systems

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Yet Another Suite of Multilingual NLP Tools

MultiAligNet: Cross-lingual Knowledge Bridges Between Words and Senses

Comparative Study on Different Approaches Used in Word Sense Disambiguation—NLP

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

A Appendices

A Appendices

1.1 A.1 Example of Embeddings of Word Pairs (B-wrd)

1.2 A.2 Example of Embeddings of Class Pairs (B-grp)

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us