Multilingual query-by-example spoken term detection in Indian languages

Popli, Abhimanyu; Kumar, Arun

doi:10.1007/s10772-018-09585-3

Multilingual query-by-example spoken term detection in Indian languages

Published: 10 January 2019

Volume 22, pages 131–141, (2019)
Cite this article

International Journal of Speech Technology Aims and scope Submit manuscript

205 Accesses
3 Citations
Explore all metrics

Abstract

Spoken language processing poses to be a challenging task in multilingual and mixlingual scenario in linguistically diverse regions like Indian subcontinent. Common articulatory based framework is explored for the representation of phonemes of different languages. This framework is designed to handle typical features like aspirated plosives, nasalized vowels, combined letters, unvoiced retroflex plosive which are found in majority of Indian languages. It is trained with two languages. Different strategies like transfer training and joint training are studied to adapt English trained neural networks with smaller amount of Bangla data. It is observed that such training not only improves Query-by-Example Spoken Term Detection (QbE-STD) in the language of same language family like Hindi but also other Indian languages like Tamil and Telugu. While cross lingual adaptation of neural networks with a language specific softmax layer has been studied earlier in context of speech recognition, this work presents an architecture which is language independent uptil softmax layer. It is observed that this architecture has higher accuracy for unseen languages, is more compact and can be adapted more easily for new languages in comparison to the classic phoneme posteriorgrams based architecture.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Comparison of ALBAYZIN query-by-example spoken term detection 2012 and 2014 evaluations

Article Open access 13 January 2016

Multilingual spoken term detection: a review

Article 22 July 2020

ALBAYZIN Query-by-example Spoken Term Detection 2016 evaluation

Article Open access 13 April 2018

References

Das, B., Mandal, S., & Mitra, P. (2011). Bengali speech corpus for continuous auutomatic speech recognition system. In 2011 International conference on speech database and assessments (Oriental COCOSDA). (pp 51–55). https://doi.org/10.1109/ICSDA.2011.6085979.
Ellis, D. (2000). Quicknet-icsi. http://www.icsi.berkeley.edu/Speech/qn.html.
Garofolo, J. (1993). Csr-i (wsj0) complete ldc93s6a.
Garofolo, J. (1994). Csr-ii (wsj1) complete ldc94s13a.
Gupta, V., Ajmera, J., Kumar, A., & Verma, A. (2011). A language independent approach to audio search. In Proceedings of INTERSPEECH, ISCA (pp. 1125–1128).
Hazen, T., Shen, W., & White, C. (2009). Query-by-example spoken term detection using phonetic posteriorgram templates. In Proceedings of ASRU (pp. 421–426). https://doi.org/10.1109/ASRU.2009.5372889.
Heigold, G., Vanhoucke, V., Senior, A. W., Nguyen, P., Ranzato, M., Devin, M., et al. (2013). Multilingual acoustic models using distributed deep neural networks. In Proceedings of ICASSP (pp. 8619–8623).
Huang, J. T., Li, J., Yu, D., Deng, L., & Gong, Y. (2013). Cross-language knowledge transfer using multilingual deep neural network with shared hidden layers. In Proceedings of ICASSP (pp. 7304–7308).
Lopez-Otero, P., Docio-Fernandez, L., & Garcia-Mateo, C. (2015). Phonetic unit selection for cross-lingual query-by-example spoken term detection. In Proceedings of ASRU (pp. 223–229).
Mantena, G., Achanta, S., & Prahallad, K. (2014). Query-by-example spoken term detection using frequency domain linear prediction and non-segmental dynamic time warping. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 22(5), 946–955. https://doi.org/10.1109/TASLP.2014.2311322.
Article Google Scholar
Mantena, G., & Prahallad, K. (2014). Use of articulatory bottle-neck features for query-by-example spoken term detection in low resource scenarios. In Proceedings of ICASSP (pp. 7128–7132). https://doi.org/10.1109/ICASSP.2014.6854983.
Popli, A., & Kumar, A. (2015). Query-by-example spoken term detection using low dimensional posteriorgrams motivated by articulatory classes. In 2015 IEEE 17th international workshop on multimedia signal processing (MMSP) (pp. 1–6). https://doi.org/10.1109/MMSP.2015.7340826.
Popli, A., & Kumar, A. (2017a). Capturing Indian phonemic diversity with multiple posteriorgrams for multilingual Query-by-Example spoken term detection. In 2017 Twenty-third national conference on communications (NCC) (NCC-2017) (pp. 453–458). India: Chennai.
Popli, A., & Kumar, A. (2017b). Multimodal keyword search for multilingual and mixlingual speech corpus. In A. Karpov, R. Potapova, & I. Mporas (Eds.), Proceedings, springer, lecture notes in computer science, speech and computer—19th international conference, SPECOM 2017 Hatfield, UK, September 12–16, 201710458 (pp. 535–545). https://doi.org/10.1007/978-3-319-66429-3_53.
Proenga, J., Veiga, A., & Perdigao, F. (2015). Query by example search with segmented dynamic time warping for non-exact spoken queries. In Proceedings of EUSIPCO (pp. 1661–1665).
Saxena, A., & Yegnanarayana, B. (2015). Distinctive feature based representation of speech for query-by-example spoken term detection. In Proceedings of INTERSPEECH.
Thomas, S., Ganapathy, S., & Hermansky, H. (2012). Multilingual mlp features for low-resource lvcsr systems. In Proceedings of ICASSP (pp. 4269–4272). https://doi.org/10.1109/ICASSP.2012.6288862.
van Hout, J., Ferrer, L., Vergyri, D., Scheffer, N., Lei, Y., Mitra, V., & Wegmann, S. (2014). Calibration and multiple system fusion for spoken term detection using linear logistic regression. In Proceedings of ICASSP (pp. 7138–7142).
Vertanen, K. (2005). HTK wall street journal training recipe. https://www.keithv.com/software/htk/.
Vu, N. T., Breiter, W., Metze, F., & Schultz, T. (2012). Initialization schemes for multilayer perceptron training and their impact on ASR performance using multilingual data. In Proceedings of INTERSPEECH, 2012 (pp. 2586–2589).
Wang, H., Lee, T., Leung, C. C., Ma, B., & Li, H. (2013). Using parallel tokenizers with dtw matrix combination for low-resource spoken term detection. In Proceedings of ICASSP (pp. 8545–8549). https://doi.org/10.1109/ICASSP.2013.6639333.
Wikipedia (2016). Languages of India. Accessed December 27, 2016, from https://en.wikipedia.org/wiki/Languages_of_India.
Yuan, Y., Leung, C. C., Xie, L., Chen, H., Ma, B., & Li, H. (2017). Pairwise learning using multi-lingual bottleneck features for low-resource query-by-example spoken term detection. In Proceedings of ICASSP (pp. 5645–5649). https://doi.org/10.1109/ICASSP.2017.7953237.
Zhang, Y., & Glass, J. (2009). Unsupervised spoken keyword spotting via segmental DTW on gaussian posteriorgrams. In Proceedings of ASRU (pp. 398–403). https://doi.org/10.1109/ASRU.2009.5372931.

Download references

Acknowledgements

The authors would like to thank Dr. K. Samudravijaya (TIFR) and Dr. S. Lata (DEITY) for providing Hindi data and Dr. Suryakanth V. Gangashetty (IIIT, Hyderabad) for providing Telugu data. The first author would like to thank his manager Mr. Biren Karmakar and Mr. Vipin Tyagi, Executive Director, CDOT, New Delhi for their permission to carry out this research.

Author information

Authors and Affiliations

CDOT, New Delhi, 110030, India
Abhimanyu Popli
CARE, IIT Delhi, New Delhi, 110016, India
Arun Kumar

Authors

Abhimanyu Popli
View author publications
You can also search for this author in PubMed Google Scholar
Arun Kumar
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Abhimanyu Popli.

Appendix

See Tables 12 and 13.

Table 12 Vowels

Full size table

Table 13 Consonants

Full size table

Rights and permissions

Reprints and permissions

About this article

Cite this article

Popli, A., Kumar, A. Multilingual query-by-example spoken term detection in Indian languages. Int J Speech Technol 22, 131–141 (2019). https://doi.org/10.1007/s10772-018-09585-3

Download citation

Received: 09 June 2018
Accepted: 17 December 2018
Published: 10 January 2019
Issue Date: 15 March 2019
DOI: https://doi.org/10.1007/s10772-018-09585-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multilingual query-by-example spoken term detection in Indian languages

Abstract

Access this article

Similar content being viewed by others

Comparison of ALBAYZIN query-by-example spoken term detection 2012 and 2014 evaluations

Multilingual spoken term detection: a review

ALBAYZIN Query-by-example Spoken Term Detection 2016 evaluation

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Appendix

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Multilingual query-by-example spoken term detection in Indian languages

Abstract

Access this article

Similar content being viewed by others

Comparison of ALBAYZIN query-by-example spoken term detection 2012 and 2014 evaluations

Multilingual spoken term detection: a review

ALBAYZIN Query-by-example Spoken Term Detection 2016 evaluation

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Appendix

Appendix

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation