Skip to main content
Log in

Subword analysis of small vocabulary and large vocabulary ASR for Punjabi language

  • Published:
International Journal of Speech Technology Aims and scope Submit manuscript

Abstract

Modeling of words into phones should be done quite carefully, as these phones or sound units are used to build the acoustic model. Various techniques have been proposed for modeling the acoustic unit like phone, character, syllable, subword etc. Problem occurs when too many unique subwords/phones are generated in dictionary; it makes the automatic speech recognition process difficult. Various researchers have formulated diverse techniques to deal with it. In this paper, subword based dictionary has been explored for Punjabi language. For large vocabulary, number of subwords generated is quite more than the number permissible for computation. To reduce the number of subwords to be modeled, an algorithm has been proposed to replace least occurring subword with subword having similar sound. Acoustic model has been developed using the small and large vocabulary data. WER and size comparison has been done. Results reveal that large vocabulary models give high recognition rate having only 6% of WER.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

References

  • Aggarwal, R. K., & Dave, M. (2012). Integration of multiple acoustic and language models for improved Hindi speech recognition system. International Journal of Speech Technology,15, 165. https://doi.org/10.1007/s10772-012-9131-y.

    Article  Google Scholar 

  • Anand, D. (2013). History of Punjabi language. Retrieved March 2018, from https://patch.com/connecticut/trumbull/history-of-punjabi-language--gurmukhi-alphabet.

  • Baum, L. E., Petrie, T., Soules, G., & Weiss, N. (1970). A maximization technique occurring in the statistical analysis of probabilistic functions of Markov chains. The Annals of Mathematical Statistics,41(1), 164–171. https://doi.org/10.2307/2239727.

    Article  MathSciNet  MATH  Google Scholar 

  • Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society Series B (Methodological),39(1), 1–38.

    Article  MathSciNet  Google Scholar 

  • Ganapathiraju, A., Hamaker, J., Picone, J., Ordowski, M., & Doddington, G. R. (2001). Syllable based large vocabulary continuous speech recognition. IEEE Transactions on Speech and Audio Processing,9(4), 358–366.

    Article  Google Scholar 

  • Huggins-Daines, D., Kumar, M., Chan, A., Black, A. W., Ravishankar, M., & Rudnicky, A. I. (2006). PocketSphinx: A free, real-time continuous speech recognition system for hand-held devices. Proceedings of ICASSP, Toulouse, I-185–I-188.

  • Lee, K. F., Hon, H. W., & Reddy, R. (1990). An overview of the SPHINX speech recognition system. IEEE Transactions on Acoustics, Speech, and Signal Processing,38, 35–45.

    Article  Google Scholar 

  • Li, B., Drozd, A., Liu, T. and Du, X. (2018). Subword-level composition functions for learning word embeddings. In Proceedings of the second workshop on subword/character level models (pp. 38-48).

  • Lussier, F. E. (2003). A tutorial on pronunciation modeling for large vocabulary speech recognition In S. Renals & G. Grefenstette (Eds), Text- and speech-triggered information access. Lecture notes in computer science, 2705, 38–77.

  • Mittal, P., Singh, N. (2018). Speaker-independent automatic speech recognition system for mobile phone applications in Punjabi. In: S. Thampi, S. Krishnan, J. Corchado Rodriguez, S. Das, M. Wozniak & D. Al-Jumeily (Eds.), Advances in signal processing and intelligent recognition systems. SIRS 2017. Advances in intelligent systems and computing (Vol. 678, pp. 369–382) Springer, Cham.

  • Mittal, P., & Singh, N. (2019). Development and analysis of Punjabi ASR system for mobile phones under different acoustic models. International Journal of Speech Technology,22, 219. https://doi.org/10.1007/s10772-019-09593-x.

    Article  Google Scholar 

  • Rabiner, L. R. (1989). A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of IEEE,77(2), 257–286.

    Article  Google Scholar 

  • Shmyre, N. (n.d.). CMUCLMTK Development—CMUSphinx Open Source Speech Recognition. Retrieved March 2018, from https://cmusphinx.github.io/wiki/cmuclmtkdevelopment.

  • Smit, P., Virpioja, S., & Kurimo, M. (2017). Improved subword modeling for WFST-based speech recognition. In Proceedings of Interspeech: Annual conference of the international speech communication association (pp. 2551–2555)

  • Tachbelie, M. Y., Abate, S. T., & Besacier, L. (2014). Using different acoustic, lexical and language modeling units for ASR of an under-resourced language—Amharic. Speech Communication,56, 181–194.

    Article  Google Scholar 

  • Thalengala, A., & Shama, K. (2016). Study of sub-word acoustical models for Kannada isolated word recognition system. International Journal of Speech Technology,19, 817. https://doi.org/10.1007/s10772-016-9374-0.

    Article  Google Scholar 

  • Thangarajan, R., Natarajan, A. M., & Selvam, M. (2009). Syllable modeling in continuous speech recognition for Tamil language. International Journal of Speech Technology,12, 47. https://doi.org/10.1007/s10772-009-9058-0.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Puneet Mittal.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Mittal, P., Singh, N. Subword analysis of small vocabulary and large vocabulary ASR for Punjabi language. Int J Speech Technol 23, 71–78 (2020). https://doi.org/10.1007/s10772-020-09673-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10772-020-09673-3

Keywords

Navigation