Brahmic Schwa-Deletion with Neural Classifiers: Experiments with Bengali

Johny, Cibu; Jansche, Martin

doi:10.21437/SLTU.2018-55

Brahmic Schwa-Deletion with Neural Classifiers: Experiments with Bengali

Cibu Johny, Martin Jansche

The Brahmic family of writing systems is an alpha-syllabary, in which a consonant letter without an explicit vowel marker can be ambiguous: it can either represent a consonant phoneme or a CV syllable with an inherent vowel (“schwa”). The schwa- deletion ambiguity must be resolved when converting from text to an accurate phonemic representation, particularly for text-to-speech synthesis. We situate the problem of Bengali schwa- deletion in the larger context of grapheme-to-phoneme conver-sion for Brahmic scripts and solve it using neural network clas-sifiers with graphemic features that are independent of the script and the language. Classifier training is implemented using Ten-sorFlow and related tools. We analyze the impact of both training data size and trained model size, as these represent real-life data collection and system deployment constraints. Our method achieves high accuracy for Bengali and is applicable to other languages written with Brahmic scripts.

doi: 10.21437/SLTU.2018-55

Cite as: Johny, C., Jansche, M. (2018) Brahmic Schwa-Deletion with Neural Classifiers: Experiments with Bengali. Proc. 6th Workshop on Spoken Language Technologies for Under-Resourced Languages (SLTU 2018), 264-268, doi: 10.21437/SLTU.2018-55

@inproceedings{johny18_sltu,
  author={Cibu Johny and Martin Jansche},
  title={{Brahmic Schwa-Deletion with Neural Classifiers: Experiments with Bengali}},
  year=2018,
  booktitle={Proc. 6th Workshop on Spoken Language Technologies for Under-Resourced Languages (SLTU 2018)},
  pages={264--268},
  doi={10.21437/SLTU.2018-55}
}