Low Complexity On-Line Adaptation Techniques in Context of Assamese Spoken Query System

Shahnawazuddin, S.; Deepak, K. T.; Sarma, B. D.; Deka, A.; Prasanna, S. R. M.; Sinha, Rohit

doi:10.1007/s11265-014-0906-z

Low Complexity On-Line Adaptation Techniques in Context of Assamese Spoken Query System

Published: 22 May 2014

Volume 81, pages 83–97, (2015)
Cite this article

Journal of Signal Processing Systems Aims and scope Submit manuscript

S. Shahnawazuddin¹,
K. T. Deepak¹,
B. D. Sarma¹,
A. Deka¹,
S. R. M. Prasanna¹ &
…
Rohit Sinha¹

368 Accesses
7 Citations
Explore all metrics

Abstract

In this work, we present the development of an Assamese spoken query (SQ) system for accessing the price of agricultural commodities. The developed system intends to make the cultivators aware of the recent market trends. The SQ system enables the user to access the latest price of the commodity by calling the system using a landline/mobile phone. The spoken query input by the user is processed and then current price of the desired commodity in the given district is played back by the system. Features that make the system user friendly are incorporated into the design after taking feedbacks from local farmers. In other words, the system is tuned as per the needs of the users. Furthermore, the issues of adapting such query systems to the end user are also explored in this work. In case of the developed SQ system, the typical user responses are of extremely small duration (1–2 seconds due to isolated word response from the user). Moreover, the employed adaptation approach must keep the system latency low since these systems are meant for real-time applications. Consequently, adapting such systems to the end user becomes an extremely challenging task. In this regard, acoustic model interpolation based adaptation techniques are proposed that employ interpolation weights derived in an approximate fashion. The proposed approaches try to minimize the latency in the system response by avoiding the iterative weight estimation procedure used in the earlier reported works. Even with extremely small amount of adaptation data, the proposed approaches are found to result in a relative improvement of 12 % over the baseline ASR system.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Automatic speech recognition: a survey

Article 10 November 2020

A comprehensive survey on automatic speech recognition using neural networks

Article 15 August 2023

Large-Language-Models (LLM)-Based AI Chatbots: Architecture, In-Depth Analysis and Their Performance Evaluation

Notes

An initial version of this work was presented at the National Conference on Communication in February 2013 [11].

References

Gauvain, J. L., & Lee, C. H. (1994). Maximum a-posteriori estimation for multivariate gaussian mixture observations of Markov chains. IEEE Transactions on Speech and Audio Processing, 2, 291–298.
Article Google Scholar
Leggetter, C. J., & Woodland, P. C. (1995). Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models. Computer Speech and Language, 9, 171–185.
Article Google Scholar
Hazen, T.J., Glass, J.R. (1997) A comparison of novel techniques for instantaneous speaker adaptation. In Proc. of European Conference on Speech Communication and Technology (pp. 2047–2050).
Kuhn, R., Junqua, J. C., Nguyen, P., & Niedzielski, N. (2000). Rapid speaker adaptation in eigenvoice space. IEEE Transactions on Speech and Audio Processing, 8(6), 695–707.
Article Google Scholar
Gales, M. J. F. (1999). Cluster adaptive training of hidden Markov models. IEEE Transactions on Speech and Audio Processing, 8(4), 417–428.
Article Google Scholar
Mak, B., Lai, T., Hsiao, R. (2006). Improving reference speaker weighting adaptation by the use of maximum-likelihood reference speakers. In Proc. ICASSP (vol. 1)
Cai, T., Zhu, J. (2005). A novel method for rapid speaker adaptation based on support speaker weighting. In Proc. ICASSP (pp. 993–996).
Rabiner, L. (1994). Applications of voice processing to telecommunications. In Proc. IEEE (vol. 82, pp. 199–228).
Trihandoyo, A., Belloum, A., Hou, K.M. (1995). A real-time speech recognition architecture for a multi-channel interactive voice response system. In Proc. ICASSP (vol. 4, pp. 2687–2690).
Glass, J.R. (1999). Challenges for spoken dialogue systems. In Proc. IEEE ASRU workshop
Shahnawazuddin, S., Thotappa, D., Sarma, B.D., Deka, A., Prasanna, S.R.M., Sinha, R. (2013). Assamese spoken query system to access the price of agricultural commodities. In Proc. 19th National Conference on Communication. New Delhi.
Assam Small Farmers Agri-Business Consortium. http://www.assamagribusiness.nic.in.
AGMARKNET–Ministry of Agriculture, Government of India, Agricultural Marketing Information Network website: http://agmarknet.nic.in.
India Telecom Online. http://www.indiatelecomonline.com.
Kotkar, P., Thies, W., Amarsinghe, S. (2008). An audio wiki for publishing user-generated content in the developing world. In HCI for Community and International Development. Florence, Italy.
Goel, S., Bhattacharya, M. (2010). Speech based dialog query system over asterisk pbx server. In Proc. ICSPS (vol. 3, pp. 752–756).
Plauche, M., Prabhaker, M. (2005). Tamil market: a spoken dialog system for rural India. In ACM CHI Conference (pp. 1619–1624).
Bohus, D. Error awareness and recovery in task-oriented spoken dialogue systems. Phd Thesis Proposal, Carnegie Mellon University, PA.
Hasan, M.M., Hassan, F., Islam, G.M.M., Banik, M., Kotwal, M.R.A., Rahman, S.M.M., Muhammad, G., Mohammad, N.H. (2010). Bangla triphone hmm based word recognition. In Proc. IEEE APCCAS (pp. 883–886).
The HTK Toolkit: http://htk.eng.cam.ac.uk.
Woodland, P.C. (2001). Speaker adaptation for continuous density hmms: a review. In ISCA ITRW on Adaptation Methods for Speech Recognition (pp. 11–19).
Milner, B., & Vaseghi, S. (1996). Bayesian channel equalisation and robust features for speech recognition. Vision, Image and Signal Processing, IEE Proceedings, 143(4), 223–231.
Article Google Scholar
Acero, A., Deng, L., Kristjansson, T.T., Zhang, J. (2000). Hmm adaptation using vector Taylor series for noisy speech recognition. In INTERSPEECH (pp. 869–872). ISCA.
He, Y., & Han, J. (2011). Gaussian specific compensation for channel distortion in speech recognition. IEEE Signal Processing Letters, 18(10), 599–602.
Article Google Scholar
Campbell, W. M., Sturim, D. E., & Reynolds, D. A. (2006). Support vector machines using gmm supervectors for speaker verification. IEEE Signal Processing Letters, 13, 308–311.
Article Google Scholar
Duchateau, J., Leroy, T., Demuynck, K., Van Hamme, H. (2008) Fast speaker adaptation using non-negative matrix factorization. In Proc. ICASSP (pp. 4269–4272).
Zhang, X., Demuynck, K., Van Hamme, H. (2012). Latent variable speaker adaptation of gaussian mixture weights and means. In Proc. ICASSP (pp. 4349–4352).
Shahnawazuddin, S., & Sinha, R. (2014). Improved bases selection in acoustic model interpolation for fast on-line adaptation. IEEE Signal Processing Letters, 21(4), 493–497.
Article Google Scholar
Elad, M. (2010). Sparse and redundant representations: from theory to applications in signal and image processing. New-York: Springer.
Book Google Scholar
Tibshirani, R. (1994). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society, Series B, 58, 267–288.
MathSciNet Google Scholar
OMP-Box v10: http://www.cs.technion.ac.il/~ronrubin/software.html.

Download references

Acknowledgments

This work was in part supported by the project grant no. 11(12)/2009-HCC(TDIL) from the Department of Information Technology, Government of India.

Author information

Authors and Affiliations

Department of Electronics and Electrical Engineer, Indian Institute of Technology Guwahati, Guwahati, 781039, India
S. Shahnawazuddin, K. T. Deepak, B. D. Sarma, A. Deka, S. R. M. Prasanna & Rohit Sinha

Authors

S. Shahnawazuddin
View author publications
You can also search for this author in PubMed Google Scholar
K. T. Deepak
View author publications
You can also search for this author in PubMed Google Scholar
B. D. Sarma
View author publications
You can also search for this author in PubMed Google Scholar
A. Deka
View author publications
You can also search for this author in PubMed Google Scholar
S. R. M. Prasanna
View author publications
You can also search for this author in PubMed Google Scholar
Rohit Sinha
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to S. Shahnawazuddin.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Shahnawazuddin, S., Deepak, K.T., Sarma, B.D. et al. Low Complexity On-Line Adaptation Techniques in Context of Assamese Spoken Query System. J Sign Process Syst 81, 83–97 (2015). https://doi.org/10.1007/s11265-014-0906-z

Download citation

Received: 11 March 2014
Revised: 28 April 2014
Accepted: 08 May 2014
Published: 22 May 2014
Issue Date: October 2015
DOI: https://doi.org/10.1007/s11265-014-0906-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Low Complexity On-Line Adaptation Techniques in Context of Assamese Spoken Query System

Abstract

Access this article

Similar content being viewed by others

Automatic speech recognition: a survey

A comprehensive survey on automatic speech recognition using neural networks

Large-Language-Models (LLM)-Based AI Chatbots: Architecture, In-Depth Analysis and Their Performance Evaluation

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Low Complexity On-Line Adaptation Techniques in Context of Assamese Spoken Query System

Abstract

Access this article

Similar content being viewed by others

Automatic speech recognition: a survey

A comprehensive survey on automatic speech recognition using neural networks

Large-Language-Models (LLM)-Based AI Chatbots: Architecture, In-Depth Analysis and Their Performance Evaluation

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation