Abstract
Amharic is the second most spoken Semitic language after Arabic. It has its own syllabary writing system, each character representing a consonant and a vowel. Automatic Speech Recognition (ASR) researches for Amharic have been conducted on the basis of grapheme-based pronunciation lexicon, taking advantage of the nature of its writing system. However, the epenthetic vowel and the glottal stop consonant represented in the writing system may not be pronounced in all of their occurrences. Moreover, the writing system does not differentiate geminated and non-geminated forms of consonants. Therefore, the grapheme-based pronunciation lexicon used so far has limitations with regard to these language features. To handle these limitations, we have prepared word- and morpheme-based pronunciation lexicons using data-driven and knowledge-driven experts’ transcription. The data-driven transcription has been used for the preparation of training pronunciation lexicon while the knowledge-driven has been used to prepare morpheme- and word-based pronunciation lexicons for decoding. When morpheme-based knowledge-driven lexicons are used, better ASR performance (compared with the baseline ASR system that used grapheme-based lexicon) has been achieved although the number of phones is much more (60) than the number of phones used in the grapheme-based lexicon (37).
Similar content being viewed by others
Notes
The number of phones used in the IARPA Babel Amharic lexicon is 61. However, it is not due to representing consonant geminations as we did but due to the use of different representations for labiovelars. We have represented labialization as variations of only 5 vowels while in the IARPA Babel Amharic lexicon it is represented as variations of 26 consonants
References
Abate, S.T. (2006). Automatic speech recognition for Amharic. PhD thesis, University of Hamburg, Hamburg
Abate, S.T., Menzel, W. (2007a). Automatic speech recognition for an under-resourced language—Amharic. In: Proceeding of INTERSPEECH, pp. 1541–1544
Abate, S.T., Menzel, W. (2007b). Syllable-based speech recognition for Amharic. In: Proceeding of the 2007 Workshop on Computational Approaches to Semitic Languages: Common Issues and Resources, pp. 33–40
Abate, S.T., Menzel, W., Tafila, B. (2005). An amharic speech corpus for large vocabulary continuous speech recognition. In: Proceeding of INTERSPEECH, pp. 1601–1604
Abate, S.T., Tachbelie, M.Y., Melese, M., et al. (2020a). Large vocabulary read speech corpora for four ethiopian languages : Amharic, tigrigna, oromo and wolaytta. In: LREC 2020
Abate, S.T., Tachbelie, M.Y., Schultz, T. (2020b). Deep neural networks based automatic speech recognition for four ethiopian languages. In: ICASSP 2020
Abate, S.T., Tachbelie, M.Y., Schultz, T. (2020c). Multilingual acoustic and language modeling for ethio-semitic languages. In: Meng. H., Xu. B., Zheng, T.F. (eds) Interspeech. ISCA, pp. 1047–1051
Abate, S.T., Tachbelie, M.Y., Schultz, T. (2021). End-to-end multilingual automatic speech recognition for less-resourced languages: The case of four Ethiopian languages. In: ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp 7013–7017, https://doi.org/10.1109/ICASSP39728.2021.9415020
Appen Buttler Hill Pty Ltd (2012). Speech and language resources 2012. Appen Butler Hill Speech and Language Resources 2012-Product Catalogue
Appleyard, D. (1995). Colloquial Amharic: A complete course for beginners. London: Routledge.
Bender, M. L., Bowen, J. D., Cooper, R. L., et al. (1976). Languages in Ethiopia. London: Oxford University Press.
Berhanu, S. (2001). Isolated amharic consonant-vowel syllable recognition: An experiment using the hidden markov model. Master’s thesis, School of Information Studies for Africa, Addis Ababa University, Addis Ababa Ethiopia
Besacier, L., Le, V.B., Boitet, C., et al. (2006). Asr and translation for under-resourced languages. In: Proceeding of the IEEE International Conference on Acoustics Speech and Signal Processing (ICASSP 2006), pp. 1221–1224
Bills, A., Conners, T., David, A., et al. (2019). IARPA Babel Amharic Language Pack IARPA-babel307b-v1.0b. https://doi.org/11272.1/AB2/U1H3H7, https://hdl.handle.net/11272.1/AB2/U1H3H7
Choueiter, G., Povey, D., Chen, S.F., et al. (2006). Morpheme-based language modeling for Arabic lvcsr. In: Proceeding of ICCASP 2006
Dribssa, A.E., Tachbelie, M.Y. (2015). Investigating the use of syllable acoustic units for Amharic speech recognition. In: Proceeding of the IEEE AFRICON
Gelas, H., Abate, S.T., Besacier, L., et al. (2011). Quality assessment of crowdsourcing transcriptions for african languages. In: Proceeding of the 12th Annual Conference of the International Speech Communication Association (INTERSPEECH 2011), pp. 3065–3068
Girmaw, M. (2004). An automatic speech recognition system for amharic. Master’s thesis, Department of Signal, Sensor and System, Royal Institute of Technology, Stockholm Sweden
H/Mariam, S., Prahallad, K., Black, A.W., et al. (2004). Unit selection voice for Amharic using Festvox. In: Proceeding 5th ISCA Speech Synthesis Workshop
Hou, W., Dong, Y., Zhuang, B., et al. (2020). Large-scale end-to-end multilingual speech recognition and language identification with multi-task learning. In: Interspeech
Karafiát, M., Baskar, M.K., Matějka, P., et al. (2016). Multilingual blstm and speaker-specific vector adaptation in 2016 but babel system. In: 2016 IEEE Spoken Language Technology Workshop (SLT), pp. 637–643, https://doi.org/10.1109/SLT.2016.7846330
Leslau, W. (1976). Concise Amharic dictionary. Wiesbaden: Otto Harrassowitz.
Leslau, W. (2000). Introductory grammar of Amharic. Wiesbaden: Harrassowits Verlag.
Li, X., Dalmia, S., Black, A.W., et al. (2019). Multilingual speech recognition with corpus relatedness sampling. ArXiv arXiv:abs/1908.01060
Pellegrini, T., Lamel, L. (2006). Investigating automatic decomposition for asr in less represented languages. In: Proceeding of INTERSPEECH
Pellegrini, T., & Lamel, L. (2009). Automatic word decompounding for ASR in a morphologically rich language: Application to Amharic. IEEE Transactions on Audio Speech and Language Processing, 17(5), 863–873.
Seid, H., Gambaeck, B. (2005). A speaker independent continuous speech recognizer for Amharic. In: Proceeding of INTERSPEECH, pp. 3349–3352
Seifu, Z. (2003). Hmm based large vocabulary, speaker independent, continuous Amharic speech recognizer. Master’s thesis, School of Information Studies for Africa, Addis Ababa University, Addis Ababa Ethiopia
Stolcke, A. (2002). Srilm-an extensible language modeling toolkit. In: Proceeding of International Conference on Spoken Language Processing, pp. 257–286
Tachbelie, M.Y. (2010). Morphology based language modeling for amharic. PhD thesis, University of Hamburg, Hamburg Germany
Tachbelie, M.Y., Abate, S.T. (2015). Effect of language resources on automatic speech recognition for Amharic. In: Proceeding of IEEE AFRICON
Tachbelie, M.Y., Abate, S.T., Menzel, W. (2009). Automatic speech recognition for an under-resourced language-Amharic. In: Proceding of the 4th Language and Technology Conference (LTC-09), pp. 114–118
Tachbelie, M.Y., Abate, S.T., Menzel, W. (2010). Morpheme-based automatic speech recognition for a morphologically rich language-Amharic. In: Proceeding of Spoken Language Technology for Under-resourced Languages (SLTU 10), pp. 68–73
Tachbelie, M.Y., Abate, S.T., Besacier, L. (2011a). Part-of-speech tagging for under-resourced and morphologically rich languages-the case of amharic. In: Proceeding of Conference on Human Language Technology for Development, Alexiandria Egypt
Tachbelie, M.Y., Abate, S.T., Menzel, W. (2011b). Morpheme-based and factored language modeling for Amharic speech recognition. In: Human Language Technology: Challenges for Computer Science and Linguists, pp. 82–93
Tachbelie, M.Y., Besacier, L., Rossato, S. (2011c). Comparison of syllable and triphone based speech recognition for Amharic. In: Proceeding of Language Technology COnference (LTC 11), Poznan Poland
Tachbelie, M.Y., Besacier, L., Rossato, S. (2012). Syllable- based and hybrid acoustic models for Amharic speech recognition. In: Proceeding of Workshop on Spoken Language Technologies for Under-Resourced Languages (SLTU 12), pp. 5–10
Tachbelie, M. Y., Abate, S. T., & Besacier, L. (2014). Using different acoustic lexical and language modeling units for ASR of an under-resourced language-Amharic. Speech Communication, 56, 181–194.
Tachbelie, M. Y., Abulimiti, A., Abate, S. T., et al. (2020). Dnn-based speech recognition for Globalphone languages. In: ICASSP 2020
Tadesse, K. (2002). Word based amharic speech recognizer: An experiment using hidden Markov model (HMM). Master’s thesis, School of Information Studies for Africa, Addis Ababa University, Addis Ababa Ethiopia
Yifiru, M. (2003). Automatic Amharic speech recognition system to command and control computers. Master’s thesis, School of Information Studies for Africa, Addis Ababa University, Addis Ababa Ethiopia
Yimam, B. (2007). yamarňa sewasew (2nd ed.). Addis Ababa: EMPDE.
Żelasko, P., Moro-Vel’azquez, L., Hasegawa-Johnson, M.A., et al. (2020). That sounds familiar: An analysis of phonetic representations transfer across languages. In: INTERSPEECH
Acknowledgements
We are thankful to Google for the Faculty Research Award that enabled us to conduct the research. The result of the preliminary experiments using only 25% of the linguistically transcribed data has been published in the proceedings of the AFRICON Conference Tachbelie and Abate (2015).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Tachbelie, M.Y., Abate, S.T. Lexical modeling for the development of Amharic automatic speech recognition systems. Lang Resources & Evaluation 57, 963–984 (2023). https://doi.org/10.1007/s10579-023-09659-y
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10579-023-09659-y