Abstract
In this work, we present the development of an Assamese spoken query (SQ) system for accessing the price of agricultural commodities. The developed system intends to make the cultivators aware of the recent market trends. The SQ system enables the user to access the latest price of the commodity by calling the system using a landline/mobile phone. The spoken query input by the user is processed and then current price of the desired commodity in the given district is played back by the system. Features that make the system user friendly are incorporated into the design after taking feedbacks from local farmers. In other words, the system is tuned as per the needs of the users. Furthermore, the issues of adapting such query systems to the end user are also explored in this work. In case of the developed SQ system, the typical user responses are of extremely small duration (1–2 seconds due to isolated word response from the user). Moreover, the employed adaptation approach must keep the system latency low since these systems are meant for real-time applications. Consequently, adapting such systems to the end user becomes an extremely challenging task. In this regard, acoustic model interpolation based adaptation techniques are proposed that employ interpolation weights derived in an approximate fashion. The proposed approaches try to minimize the latency in the system response by avoiding the iterative weight estimation procedure used in the earlier reported works. Even with extremely small amount of adaptation data, the proposed approaches are found to result in a relative improvement of 12 % over the baseline ASR system.
Similar content being viewed by others
Notes
An initial version of this work was presented at the National Conference on Communication in February 2013 [11].
References
Gauvain, J. L., & Lee, C. H. (1994). Maximum a-posteriori estimation for multivariate gaussian mixture observations of Markov chains. IEEE Transactions on Speech and Audio Processing, 2, 291–298.
Leggetter, C. J., & Woodland, P. C. (1995). Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models. Computer Speech and Language, 9, 171–185.
Hazen, T.J., Glass, J.R. (1997) A comparison of novel techniques for instantaneous speaker adaptation. In Proc. of European Conference on Speech Communication and Technology (pp. 2047–2050).
Kuhn, R., Junqua, J. C., Nguyen, P., & Niedzielski, N. (2000). Rapid speaker adaptation in eigenvoice space. IEEE Transactions on Speech and Audio Processing, 8(6), 695–707.
Gales, M. J. F. (1999). Cluster adaptive training of hidden Markov models. IEEE Transactions on Speech and Audio Processing, 8(4), 417–428.
Mak, B., Lai, T., Hsiao, R. (2006). Improving reference speaker weighting adaptation by the use of maximum-likelihood reference speakers. In Proc. ICASSP (vol. 1)
Cai, T., Zhu, J. (2005). A novel method for rapid speaker adaptation based on support speaker weighting. In Proc. ICASSP (pp. 993–996).
Rabiner, L. (1994). Applications of voice processing to telecommunications. In Proc. IEEE (vol. 82, pp. 199–228).
Trihandoyo, A., Belloum, A., Hou, K.M. (1995). A real-time speech recognition architecture for a multi-channel interactive voice response system. In Proc. ICASSP (vol. 4, pp. 2687–2690).
Glass, J.R. (1999). Challenges for spoken dialogue systems. In Proc. IEEE ASRU workshop
Shahnawazuddin, S., Thotappa, D., Sarma, B.D., Deka, A., Prasanna, S.R.M., Sinha, R. (2013). Assamese spoken query system to access the price of agricultural commodities. In Proc. 19th National Conference on Communication. New Delhi.
Assam Small Farmers Agri-Business Consortium. http://www.assamagribusiness.nic.in.
AGMARKNET–Ministry of Agriculture, Government of India, Agricultural Marketing Information Network website: http://agmarknet.nic.in.
India Telecom Online. http://www.indiatelecomonline.com.
Kotkar, P., Thies, W., Amarsinghe, S. (2008). An audio wiki for publishing user-generated content in the developing world. In HCI for Community and International Development. Florence, Italy.
Goel, S., Bhattacharya, M. (2010). Speech based dialog query system over asterisk pbx server. In Proc. ICSPS (vol. 3, pp. 752–756).
Plauche, M., Prabhaker, M. (2005). Tamil market: a spoken dialog system for rural India. In ACM CHI Conference (pp. 1619–1624).
Bohus, D. Error awareness and recovery in task-oriented spoken dialogue systems. Phd Thesis Proposal, Carnegie Mellon University, PA.
Hasan, M.M., Hassan, F., Islam, G.M.M., Banik, M., Kotwal, M.R.A., Rahman, S.M.M., Muhammad, G., Mohammad, N.H. (2010). Bangla triphone hmm based word recognition. In Proc. IEEE APCCAS (pp. 883–886).
The HTK Toolkit: http://htk.eng.cam.ac.uk.
Woodland, P.C. (2001). Speaker adaptation for continuous density hmms: a review. In ISCA ITRW on Adaptation Methods for Speech Recognition (pp. 11–19).
Milner, B., & Vaseghi, S. (1996). Bayesian channel equalisation and robust features for speech recognition. Vision, Image and Signal Processing, IEE Proceedings, 143(4), 223–231.
Acero, A., Deng, L., Kristjansson, T.T., Zhang, J. (2000). Hmm adaptation using vector Taylor series for noisy speech recognition. In INTERSPEECH (pp. 869–872). ISCA.
He, Y., & Han, J. (2011). Gaussian specific compensation for channel distortion in speech recognition. IEEE Signal Processing Letters, 18(10), 599–602.
Campbell, W. M., Sturim, D. E., & Reynolds, D. A. (2006). Support vector machines using gmm supervectors for speaker verification. IEEE Signal Processing Letters, 13, 308–311.
Duchateau, J., Leroy, T., Demuynck, K., Van Hamme, H. (2008) Fast speaker adaptation using non-negative matrix factorization. In Proc. ICASSP (pp. 4269–4272).
Zhang, X., Demuynck, K., Van Hamme, H. (2012). Latent variable speaker adaptation of gaussian mixture weights and means. In Proc. ICASSP (pp. 4349–4352).
Shahnawazuddin, S., & Sinha, R. (2014). Improved bases selection in acoustic model interpolation for fast on-line adaptation. IEEE Signal Processing Letters, 21(4), 493–497.
Elad, M. (2010). Sparse and redundant representations: from theory to applications in signal and image processing. New-York: Springer.
Tibshirani, R. (1994). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society, Series B, 58, 267–288.
OMP-Box v10: http://www.cs.technion.ac.il/~ronrubin/software.html.
Acknowledgments
This work was in part supported by the project grant no. 11(12)/2009-HCC(TDIL) from the Department of Information Technology, Government of India.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Shahnawazuddin, S., Deepak, K.T., Sarma, B.D. et al. Low Complexity On-Line Adaptation Techniques in Context of Assamese Spoken Query System. J Sign Process Syst 81, 83–97 (2015). https://doi.org/10.1007/s11265-014-0906-z
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11265-014-0906-z