Multimodal Keyword Search for Multilingual and Mixlingual Speech Corpus

Popli, Abhimanyu; Kumar, Arun

doi:10.1007/978-3-319-66429-3_53

Abhimanyu Popli^16,17 &
Arun Kumar¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10458))

Included in the following conference series:

International Conference on Speech and Computer

2220 Accesses
1 Citations

Abstract

A novel framework for searching keywords in multilingual and mixlingual speech corpus is proposed. This framework is capable of searching spoken as well as text queries. The capability of spoken search enables it to search out-of-vocabulary (OOV) words. The capability of searching text queries enables it to perform semantic search. An advanced application of searching keyword translations in mixlingual speech corpus is also possible within posteriorgram framework with this system. It is shown that the performance of text queries is comparable or better than the performance of spoken queries if the language of the keyword is included in the training languages. Also, a technique for combining information from text and spoken queries is proposed which further enhances the search performance. This system is based on multiple posteriorgrams based on articulatory classes trained with multiple languages.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Aradilla, G., Bourlard, H., Magimai.-Doss, M.: Using KL-based acoustic models in a large vocabulary recognition task. Idiap-RR Idiap-RR-14-2008. IDIAP (2008)
Google Scholar
Das, B., Mandal, S., Mitra, P.: Bengali speech corpus for continuous automatic speech recognition system. In: 2011 International Conference on Speech Database and Assessments (Oriental COCOSDA), pp. 51–55 (2011)
Google Scholar
Garofolo, J.: Csr-i (wsj0) complete ldc93s6a (1993)
Google Scholar
Garofolo, J.: Csr-ii (wsj1) complete ldc94s13a (1994)
Google Scholar
Gupta, V., Ajmera, J., Kumar, A., Verma, A.: A language independent approach to audio search. In: Proceedings of INTERSPEECH, pp. 1125–1128. ISCA (2011)
Google Scholar
Hazen, T., Shen, W., White, C.: Query-by-example spoken term detection using phonetic posteriorgram templates. In: Proceedings of ASRU, pp. 421–426 (2009)
Google Scholar
Mantena, G., Prahallad, K.: Use of articulatory bottle-neck features for query-by-example spoken term detection in low resource scenarios. In: Proceedings of ICASSP, pp. 7128–7132 (2014)
Google Scholar
Motlicek, P., Valente, F., Szoke, I.: Improving acoustic based keyword spotting using lvcsr lattices. In: 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4413–4416 (2012)
Google Scholar
Popli, A., Kumar, A.: Query-by-example spoken term detection using low dimensional posteriorgrams motivated by articulatory classes. In: 2015 IEEE 17th International Workshop on Multimedia Signal Processing (MMSP), pp. 1–6 (2015)
Google Scholar
Popli, A., Kumar, A.: Multilingual query-by-example spoken term detection in indian languages. Submitted to a Journal (2017)
Google Scholar
Thambiratnam, K., Sridharan, S.: Dynamic match phone-lattice searches for very fast and accurate unrestricted vocabulary keyword spotting. In: Proceedings of ICASSP, vol. 1, pp. 465–468 (2005)
Google Scholar
Wikipedia: Schwa deletion in indo-aryan languages - wikipedia, the free encyclopedia (2017)
Google Scholar
Xu, J., Zhang, G., Yan, Y.: Effective utilization of multiple examples in query-by-example spoken term detection. In: 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5440–5444 (2016)
Google Scholar
Zhang, Y., Glass, J.: Unsupervised spoken keyword spotting via segmental DTW on gaussian posteriorgrams. In: Proceedings of ASRU, pp. 398–403 (2009)
Google Scholar

Download references

Acknowledgment

The authors would like to thank Dr. K. Samudravijaya (TIFR) and Dr. S. Lata (MEITY) for providing Hindi data and Dr. Suryakanth V. Gangashetty (IIIT, Hyderabad) for providing Telugu data. The first author would like to thank his managers Mr. Shiv Narayan, Mr. Biren Karmakar and Mr. Vipin Tyagi, Executive Director, CDOT, New Delhi for their permission to carry out this research.

Author information

Authors and Affiliations

Centre for Applied Research in Electronics, Indian Institute of Technology Delhi, Hauz Khas, New Delhi, India
Abhimanyu Popli & Arun Kumar
Centre for Development of Telematics, Mandi Road, Mehrauli, New Delhi, India
Abhimanyu Popli

Authors

Abhimanyu Popli
View author publications
You can also search for this author in PubMed Google Scholar
Arun Kumar
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Abhimanyu Popli .

Editor information

Editors and Affiliations

SPIIRAS, Saint Petersburg, Russia
Alexey Karpov
Moscow State Linguistic University, Moscow, Russia
Rodmonga Potapova
University of Hertfordshire, Hatfield, United Kingdom
Iosif Mporas

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Popli, A., Kumar, A. (2017). Multimodal Keyword Search for Multilingual and Mixlingual Speech Corpus. In: Karpov, A., Potapova, R., Mporas, I. (eds) Speech and Computer. SPECOM 2017. Lecture Notes in Computer Science(), vol 10458. Springer, Cham. https://doi.org/10.1007/978-3-319-66429-3_53

Download citation

DOI: https://doi.org/10.1007/978-3-319-66429-3_53
Published: 13 August 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-66428-6
Online ISBN: 978-3-319-66429-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics