Skip to main content
Log in

Low Complexity On-Line Adaptation Techniques in Context of Assamese Spoken Query System

  • Published:
Journal of Signal Processing Systems Aims and scope Submit manuscript

Abstract

In this work, we present the development of an Assamese spoken query (SQ) system for accessing the price of agricultural commodities. The developed system intends to make the cultivators aware of the recent market trends. The SQ system enables the user to access the latest price of the commodity by calling the system using a landline/mobile phone. The spoken query input by the user is processed and then current price of the desired commodity in the given district is played back by the system. Features that make the system user friendly are incorporated into the design after taking feedbacks from local farmers. In other words, the system is tuned as per the needs of the users. Furthermore, the issues of adapting such query systems to the end user are also explored in this work. In case of the developed SQ system, the typical user responses are of extremely small duration (1–2 seconds due to isolated word response from the user). Moreover, the employed adaptation approach must keep the system latency low since these systems are meant for real-time applications. Consequently, adapting such systems to the end user becomes an extremely challenging task. In this regard, acoustic model interpolation based adaptation techniques are proposed that employ interpolation weights derived in an approximate fashion. The proposed approaches try to minimize the latency in the system response by avoiding the iterative weight estimation procedure used in the earlier reported works. Even with extremely small amount of adaptation data, the proposed approaches are found to result in a relative improvement of 12 % over the baseline ASR system.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8
Figure 9

Similar content being viewed by others

Notes

  1. An initial version of this work was presented at the National Conference on Communication in February 2013 [11].

References

  1. Gauvain, J. L., & Lee, C. H. (1994). Maximum a-posteriori estimation for multivariate gaussian mixture observations of Markov chains. IEEE Transactions on Speech and Audio Processing, 2, 291–298.

    Article  Google Scholar 

  2. Leggetter, C. J., & Woodland, P. C. (1995). Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models. Computer Speech and Language, 9, 171–185.

    Article  Google Scholar 

  3. Hazen, T.J., Glass, J.R. (1997) A comparison of novel techniques for instantaneous speaker adaptation. In Proc. of European Conference on Speech Communication and Technology (pp. 2047–2050).

  4. Kuhn, R., Junqua, J. C., Nguyen, P., & Niedzielski, N. (2000). Rapid speaker adaptation in eigenvoice space. IEEE Transactions on Speech and Audio Processing, 8(6), 695–707.

    Article  Google Scholar 

  5. Gales, M. J. F. (1999). Cluster adaptive training of hidden Markov models. IEEE Transactions on Speech and Audio Processing, 8(4), 417–428.

    Article  Google Scholar 

  6. Mak, B., Lai, T., Hsiao, R. (2006). Improving reference speaker weighting adaptation by the use of maximum-likelihood reference speakers. In Proc. ICASSP (vol. 1)

  7. Cai, T., Zhu, J. (2005). A novel method for rapid speaker adaptation based on support speaker weighting. In Proc. ICASSP (pp. 993–996).

  8. Rabiner, L. (1994). Applications of voice processing to telecommunications. In Proc. IEEE (vol. 82, pp. 199–228).

  9. Trihandoyo, A., Belloum, A., Hou, K.M. (1995). A real-time speech recognition architecture for a multi-channel interactive voice response system. In Proc. ICASSP (vol. 4, pp. 2687–2690).

  10. Glass, J.R. (1999). Challenges for spoken dialogue systems. In Proc. IEEE ASRU workshop

  11. Shahnawazuddin, S., Thotappa, D., Sarma, B.D., Deka, A., Prasanna, S.R.M., Sinha, R. (2013). Assamese spoken query system to access the price of agricultural commodities. In Proc. 19th National Conference on Communication. New Delhi.

  12. Assam Small Farmers Agri-Business Consortium. http://www.assamagribusiness.nic.in.

  13. AGMARKNET–Ministry of Agriculture, Government of India, Agricultural Marketing Information Network website: http://agmarknet.nic.in.

  14. India Telecom Online. http://www.indiatelecomonline.com.

  15. Kotkar, P., Thies, W., Amarsinghe, S. (2008). An audio wiki for publishing user-generated content in the developing world. In HCI for Community and International Development. Florence, Italy.

  16. Goel, S., Bhattacharya, M. (2010). Speech based dialog query system over asterisk pbx server. In Proc. ICSPS (vol. 3, pp. 752–756).

  17. Plauche, M., Prabhaker, M. (2005). Tamil market: a spoken dialog system for rural India. In ACM CHI Conference (pp. 1619–1624).

  18. Bohus, D. Error awareness and recovery in task-oriented spoken dialogue systems. Phd Thesis Proposal, Carnegie Mellon University, PA.

  19. Hasan, M.M., Hassan, F., Islam, G.M.M., Banik, M., Kotwal, M.R.A., Rahman, S.M.M., Muhammad, G., Mohammad, N.H. (2010). Bangla triphone hmm based word recognition. In Proc. IEEE APCCAS (pp. 883–886).

  20. The HTK Toolkit: http://htk.eng.cam.ac.uk.

  21. Woodland, P.C. (2001). Speaker adaptation for continuous density hmms: a review. In ISCA ITRW on Adaptation Methods for Speech Recognition (pp. 11–19).

  22. Milner, B., & Vaseghi, S. (1996). Bayesian channel equalisation and robust features for speech recognition. Vision, Image and Signal Processing, IEE Proceedings, 143(4), 223–231.

    Article  Google Scholar 

  23. Acero, A., Deng, L., Kristjansson, T.T., Zhang, J. (2000). Hmm adaptation using vector Taylor series for noisy speech recognition. In INTERSPEECH (pp. 869–872). ISCA.

  24. He, Y., & Han, J. (2011). Gaussian specific compensation for channel distortion in speech recognition. IEEE Signal Processing Letters, 18(10), 599–602.

    Article  Google Scholar 

  25. Campbell, W. M., Sturim, D. E., & Reynolds, D. A. (2006). Support vector machines using gmm supervectors for speaker verification. IEEE Signal Processing Letters, 13, 308–311.

    Article  Google Scholar 

  26. Duchateau, J., Leroy, T., Demuynck, K., Van Hamme, H. (2008) Fast speaker adaptation using non-negative matrix factorization. In Proc. ICASSP (pp. 4269–4272).

  27. Zhang, X., Demuynck, K., Van Hamme, H. (2012). Latent variable speaker adaptation of gaussian mixture weights and means. In Proc. ICASSP (pp. 4349–4352).

  28. Shahnawazuddin, S., & Sinha, R. (2014). Improved bases selection in acoustic model interpolation for fast on-line adaptation. IEEE Signal Processing Letters, 21(4), 493–497.

    Article  Google Scholar 

  29. Elad, M. (2010). Sparse and redundant representations: from theory to applications in signal and image processing. New-York: Springer.

    Book  Google Scholar 

  30. Tibshirani, R. (1994). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society, Series B, 58, 267–288.

    MathSciNet  Google Scholar 

  31. OMP-Box v10: http://www.cs.technion.ac.il/~ronrubin/software.html.

Download references

Acknowledgments

This work was in part supported by the project grant no. 11(12)/2009-HCC(TDIL) from the Department of Information Technology, Government of India.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to S. Shahnawazuddin.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Shahnawazuddin, S., Deepak, K.T., Sarma, B.D. et al. Low Complexity On-Line Adaptation Techniques in Context of Assamese Spoken Query System. J Sign Process Syst 81, 83–97 (2015). https://doi.org/10.1007/s11265-014-0906-z

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11265-014-0906-z

Keywords

Navigation