Skip to main content
Log in

Lexical modeling for the development of Amharic automatic speech recognition systems

  • Original Paper
  • Published:
Language Resources and Evaluation Aims and scope Submit manuscript

Abstract

Amharic is the second most spoken Semitic language after Arabic. It has its own syllabary writing system, each character representing a consonant and a vowel. Automatic Speech Recognition (ASR) researches for Amharic have been conducted on the basis of grapheme-based pronunciation lexicon, taking advantage of the nature of its writing system. However, the epenthetic vowel and the glottal stop consonant represented in the writing system may not be pronounced in all of their occurrences. Moreover, the writing system does not differentiate geminated and non-geminated forms of consonants. Therefore, the grapheme-based pronunciation lexicon used so far has limitations with regard to these language features. To handle these limitations, we have prepared word- and morpheme-based pronunciation lexicons using data-driven and knowledge-driven experts’ transcription. The data-driven transcription has been used for the preparation of training pronunciation lexicon while the knowledge-driven has been used to prepare morpheme- and word-based pronunciation lexicons for decoding. When morpheme-based knowledge-driven lexicons are used, better ASR performance (compared with the baseline ASR system that used grapheme-based lexicon) has been achieved although the number of phones is much more (60) than the number of phones used in the grapheme-based lexicon (37).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

Notes

  1. The number of phones used in the IARPA Babel Amharic lexicon is 61. However, it is not due to representing consonant geminations as we did but due to the use of different representations for labiovelars. We have represented labialization as variations of only 5 vowels while in the IARPA Babel Amharic lexicon it is represented as variations of 26 consonants

References

  • Abate, S.T. (2006). Automatic speech recognition for Amharic. PhD thesis, University of Hamburg, Hamburg

  • Abate, S.T., Menzel, W. (2007a). Automatic speech recognition for an under-resourced language—Amharic. In: Proceeding of INTERSPEECH, pp. 1541–1544

  • Abate, S.T., Menzel, W. (2007b). Syllable-based speech recognition for Amharic. In: Proceeding of the 2007 Workshop on Computational Approaches to Semitic Languages: Common Issues and Resources, pp. 33–40

  • Abate, S.T., Menzel, W., Tafila, B. (2005). An amharic speech corpus for large vocabulary continuous speech recognition. In: Proceeding of INTERSPEECH, pp. 1601–1604

  • Abate, S.T., Tachbelie, M.Y., Melese, M., et al. (2020a). Large vocabulary read speech corpora for four ethiopian languages : Amharic, tigrigna, oromo and wolaytta. In: LREC 2020

  • Abate, S.T., Tachbelie, M.Y., Schultz, T. (2020b). Deep neural networks based automatic speech recognition for four ethiopian languages. In: ICASSP 2020

  • Abate, S.T., Tachbelie, M.Y., Schultz, T. (2020c). Multilingual acoustic and language modeling for ethio-semitic languages. In: Meng. H., Xu. B., Zheng, T.F. (eds) Interspeech. ISCA, pp. 1047–1051

  • Abate, S.T., Tachbelie, M.Y., Schultz, T. (2021). End-to-end multilingual automatic speech recognition for less-resourced languages: The case of four Ethiopian languages. In: ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp 7013–7017, https://doi.org/10.1109/ICASSP39728.2021.9415020

  • Appen Buttler Hill Pty Ltd (2012). Speech and language resources 2012. Appen Butler Hill Speech and Language Resources 2012-Product Catalogue

  • Appleyard, D. (1995). Colloquial Amharic: A complete course for beginners. London: Routledge.

    Book  Google Scholar 

  • Bender, M. L., Bowen, J. D., Cooper, R. L., et al. (1976). Languages in Ethiopia. London: Oxford University Press.

    Google Scholar 

  • Berhanu, S. (2001). Isolated amharic consonant-vowel syllable recognition: An experiment using the hidden markov model. Master’s thesis, School of Information Studies for Africa, Addis Ababa University, Addis Ababa Ethiopia

  • Besacier, L., Le, V.B., Boitet, C., et al. (2006). Asr and translation for under-resourced languages. In: Proceeding of the IEEE International Conference on Acoustics Speech and Signal Processing (ICASSP 2006), pp. 1221–1224

  • Bills, A., Conners, T., David, A., et al. (2019). IARPA Babel Amharic Language Pack IARPA-babel307b-v1.0b. https://doi.org/11272.1/AB2/U1H3H7, https://hdl.handle.net/11272.1/AB2/U1H3H7

  • Choueiter, G., Povey, D., Chen, S.F., et al. (2006). Morpheme-based language modeling for Arabic lvcsr. In: Proceeding of ICCASP 2006

  • Dribssa, A.E., Tachbelie, M.Y. (2015). Investigating the use of syllable acoustic units for Amharic speech recognition. In: Proceeding of the IEEE AFRICON

  • Gelas, H., Abate, S.T., Besacier, L., et al. (2011). Quality assessment of crowdsourcing transcriptions for african languages. In: Proceeding of the 12th Annual Conference of the International Speech Communication Association (INTERSPEECH 2011), pp. 3065–3068

  • Girmaw, M. (2004). An automatic speech recognition system for amharic. Master’s thesis, Department of Signal, Sensor and System, Royal Institute of Technology, Stockholm Sweden

  • H/Mariam, S., Prahallad, K., Black, A.W., et al. (2004). Unit selection voice for Amharic using Festvox. In: Proceeding 5th ISCA Speech Synthesis Workshop

  • Hou, W., Dong, Y., Zhuang, B., et al. (2020). Large-scale end-to-end multilingual speech recognition and language identification with multi-task learning. In: Interspeech

  • Karafiát, M., Baskar, M.K., Matějka, P., et al. (2016). Multilingual blstm and speaker-specific vector adaptation in 2016 but babel system. In: 2016 IEEE Spoken Language Technology Workshop (SLT), pp. 637–643, https://doi.org/10.1109/SLT.2016.7846330

  • Leslau, W. (1976). Concise Amharic dictionary. Wiesbaden: Otto Harrassowitz.

    Google Scholar 

  • Leslau, W. (2000). Introductory grammar of Amharic. Wiesbaden: Harrassowits Verlag.

    Google Scholar 

  • Li, X., Dalmia, S., Black, A.W., et al. (2019). Multilingual speech recognition with corpus relatedness sampling. ArXiv arXiv:abs/1908.01060

  • Pellegrini, T., Lamel, L. (2006). Investigating automatic decomposition for asr in less represented languages. In: Proceeding of INTERSPEECH

  • Pellegrini, T., & Lamel, L. (2009). Automatic word decompounding for ASR in a morphologically rich language: Application to Amharic. IEEE Transactions on Audio Speech and Language Processing, 17(5), 863–873.

    Article  Google Scholar 

  • Seid, H., Gambaeck, B. (2005). A speaker independent continuous speech recognizer for Amharic. In: Proceeding of INTERSPEECH, pp. 3349–3352

  • Seifu, Z. (2003). Hmm based large vocabulary, speaker independent, continuous Amharic speech recognizer. Master’s thesis, School of Information Studies for Africa, Addis Ababa University, Addis Ababa Ethiopia

  • Stolcke, A. (2002). Srilm-an extensible language modeling toolkit. In: Proceeding of International Conference on Spoken Language Processing, pp. 257–286

  • Tachbelie, M.Y. (2010). Morphology based language modeling for amharic. PhD thesis, University of Hamburg, Hamburg Germany

  • Tachbelie, M.Y., Abate, S.T. (2015). Effect of language resources on automatic speech recognition for Amharic. In: Proceeding of IEEE AFRICON

  • Tachbelie, M.Y., Abate, S.T., Menzel, W. (2009). Automatic speech recognition for an under-resourced language-Amharic. In: Proceding of the 4th Language and Technology Conference (LTC-09), pp. 114–118

  • Tachbelie, M.Y., Abate, S.T., Menzel, W. (2010). Morpheme-based automatic speech recognition for a morphologically rich language-Amharic. In: Proceeding of Spoken Language Technology for Under-resourced Languages (SLTU 10), pp. 68–73

  • Tachbelie, M.Y., Abate, S.T., Besacier, L. (2011a). Part-of-speech tagging for under-resourced and morphologically rich languages-the case of amharic. In: Proceeding of Conference on Human Language Technology for Development, Alexiandria Egypt

  • Tachbelie, M.Y., Abate, S.T., Menzel, W. (2011b). Morpheme-based and factored language modeling for Amharic speech recognition. In: Human Language Technology: Challenges for Computer Science and Linguists, pp. 82–93

  • Tachbelie, M.Y., Besacier, L., Rossato, S. (2011c). Comparison of syllable and triphone based speech recognition for Amharic. In: Proceeding of Language Technology COnference (LTC 11), Poznan Poland

  • Tachbelie, M.Y., Besacier, L., Rossato, S. (2012). Syllable- based and hybrid acoustic models for Amharic speech recognition. In: Proceeding of Workshop on Spoken Language Technologies for Under-Resourced Languages (SLTU 12), pp. 5–10

  • Tachbelie, M. Y., Abate, S. T., & Besacier, L. (2014). Using different acoustic lexical and language modeling units for ASR of an under-resourced language-Amharic. Speech Communication, 56, 181–194.

    Article  Google Scholar 

  • Tachbelie, M. Y., Abulimiti, A., Abate, S. T., et al. (2020). Dnn-based speech recognition for Globalphone languages. In: ICASSP 2020

  • Tadesse, K. (2002). Word based amharic speech recognizer: An experiment using hidden Markov model (HMM). Master’s thesis, School of Information Studies for Africa, Addis Ababa University, Addis Ababa Ethiopia

  • Yifiru, M. (2003). Automatic Amharic speech recognition system to command and control computers. Master’s thesis, School of Information Studies for Africa, Addis Ababa University, Addis Ababa Ethiopia

  • Yimam, B. (2007). yamarňa sewasew (2nd ed.). Addis Ababa: EMPDE.

    Google Scholar 

  • Żelasko, P., Moro-Vel’azquez, L., Hasegawa-Johnson, M.A., et al. (2020). That sounds familiar: An analysis of phonetic representations transfer across languages. In: INTERSPEECH

Download references

Acknowledgements

We are thankful to Google for the Faculty Research Award that enabled us to conduct the research. The result of the preliminary experiments using only 25% of the linguistically transcribed data has been published in the proceedings of the AFRICON Conference Tachbelie and Abate (2015).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Solomon Teferra Abate.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Tachbelie, M.Y., Abate, S.T. Lexical modeling for the development of Amharic automatic speech recognition systems. Lang Resources & Evaluation 57, 963–984 (2023). https://doi.org/10.1007/s10579-023-09659-y

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10579-023-09659-y

Keywords

Navigation