The Brahmic family of writing systems is an alpha-syllabary, in which a consonant letter without an explicit vowel marker can be ambiguous: it can either represent a consonant phoneme or a CV syllable with an inherent vowel (“schwa”). The schwa- deletion ambiguity must be resolved when converting from text to an accurate phonemic representation, particularly for text-to-speech synthesis. We situate the problem of Bengali schwa- deletion in the larger context of grapheme-to-phoneme conver-sion for Brahmic scripts and solve it using neural network clas-sifiers with graphemic features that are independent of the script and the language. Classifier training is implemented using Ten-sorFlow and related tools. We analyze the impact of both training data size and trained model size, as these represent real-life data collection and system deployment constraints. Our method achieves high accuracy for Bengali and is applicable to other languages written with Brahmic scripts.
Cite as: Johny, C., Jansche, M. (2018) Brahmic Schwa-Deletion with Neural Classifiers: Experiments with Bengali. Proc. 6th Workshop on Spoken Language Technologies for Under-Resourced Languages (SLTU 2018), 264-268, doi: 10.21437/SLTU.2018-55
@inproceedings{johny18_sltu, author={Cibu Johny and Martin Jansche}, title={{Brahmic Schwa-Deletion with Neural Classifiers: Experiments with Bengali}}, year=2018, booktitle={Proc. 6th Workshop on Spoken Language Technologies for Under-Resourced Languages (SLTU 2018)}, pages={264--268}, doi={10.21437/SLTU.2018-55} }