We develop two improvements over our previously proposed spectral subtraction with voice activity detection and minimum mean square error spectrum power estimator based on zero crossing (SS-VAD + MMSE-SPZC) enhancement for a real-time spoken query system (SQS). Firstly, we introduce a time delay neural network (TDNN) based modeling technique. Secondly, to properly train the models, we increase the size of the database by collecting the Kannada speech data from an additional 500 farmers under real-time conditions. The proposed combined enhancement technique effectively removes background noise and improves speech quality. When evaluated on the updated degraded speech corpus, our proposed automatic speech recognition (ASR) system achieves better performance compared to previous framework. Moreover, experimental results demonstrate an improvement of 1.32% and 1.48% in terms of speech recognition accuracy for noisy and enhanced speech data respectively, compared to our earlier work.

This work was a part of consortium project on “Speech-based Access of Agricultural Commodity Prices and Weather Information in 11 Indian Languages /Dialects, funded by the Technology Development for Indian Languages (TDIL) programme initiated by the Department of Electronics & Information Technology (DeitY), Ministry of Communication & Information Technology (MC &IT), Govt. of India (Grant number: 11(18)/2012-HCC(TDIL)).
Authors have no conflict of interest.
Nagaraja B G, Jayanna H S and Shivakumar B R are contributed equally to this work.
Appendix A: Considerations and challenges of the research approach
The following limitations should be considered when interpreting and applying this research findings to real-world SQS and ASR applications.
This research focuses on developing improvements to the ASR system specifically for the Kannada language/dialects. As a result, the findings and conclusions may not be directly applicable to other languages or dialects, limiting the generalizability of the approach.
The challenges of real-time data collection, such as background noise variations, environmental conditions, and other contextual factors, may impact the quality and diversity of the collected data.
Appendix B: Speech database description
The Table 6 presents the speech data collected for this study, encompassing Kannada language participants (male and female) across diverse dialect regions of Karnataka state.
Appendix C: Comparison of ASR toolkits
Appendix D: List of Acronyms
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
G, T.Y., G, N.B., S, J.H. et al. A spoken query system to access the real time agricultural commodity prices and weather information in Kannada language/dialects. Multimed Tools Appl 83, 28675–28688 (2024). https://doi.org/10.1007/s11042-023-16554-9
DOI: https://doi.org/10.1007/s11042-023-16554-9