Abstract
A novel framework for searching keywords in multilingual and mixlingual speech corpus is proposed. This framework is capable of searching spoken as well as text queries. The capability of spoken search enables it to search out-of-vocabulary (OOV) words. The capability of searching text queries enables it to perform semantic search. An advanced application of searching keyword translations in mixlingual speech corpus is also possible within posteriorgram framework with this system. It is shown that the performance of text queries is comparable or better than the performance of spoken queries if the language of the keyword is included in the training languages. Also, a technique for combining information from text and spoken queries is proposed which further enhances the search performance. This system is based on multiple posteriorgrams based on articulatory classes trained with multiple languages.
Keywords
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Aradilla, G., Bourlard, H., Magimai.-Doss, M.: Using KL-based acoustic models in a large vocabulary recognition task. Idiap-RR Idiap-RR-14-2008. IDIAP (2008)
Das, B., Mandal, S., Mitra, P.: Bengali speech corpus for continuous automatic speech recognition system. In: 2011 International Conference on Speech Database and Assessments (Oriental COCOSDA), pp. 51–55 (2011)
Garofolo, J.: Csr-i (wsj0) complete ldc93s6a (1993)
Garofolo, J.: Csr-ii (wsj1) complete ldc94s13a (1994)
Gupta, V., Ajmera, J., Kumar, A., Verma, A.: A language independent approach to audio search. In: Proceedings of INTERSPEECH, pp. 1125–1128. ISCA (2011)
Hazen, T., Shen, W., White, C.: Query-by-example spoken term detection using phonetic posteriorgram templates. In: Proceedings of ASRU, pp. 421–426 (2009)
Mantena, G., Prahallad, K.: Use of articulatory bottle-neck features for query-by-example spoken term detection in low resource scenarios. In: Proceedings of ICASSP, pp. 7128–7132 (2014)
Motlicek, P., Valente, F., Szoke, I.: Improving acoustic based keyword spotting using lvcsr lattices. In: 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4413–4416 (2012)
Popli, A., Kumar, A.: Query-by-example spoken term detection using low dimensional posteriorgrams motivated by articulatory classes. In: 2015 IEEE 17th International Workshop on Multimedia Signal Processing (MMSP), pp. 1–6 (2015)
Popli, A., Kumar, A.: Multilingual query-by-example spoken term detection in indian languages. Submitted to a Journal (2017)
Thambiratnam, K., Sridharan, S.: Dynamic match phone-lattice searches for very fast and accurate unrestricted vocabulary keyword spotting. In: Proceedings of ICASSP, vol. 1, pp. 465–468 (2005)
Wikipedia: Schwa deletion in indo-aryan languages - wikipedia, the free encyclopedia (2017)
Xu, J., Zhang, G., Yan, Y.: Effective utilization of multiple examples in query-by-example spoken term detection. In: 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5440–5444 (2016)
Zhang, Y., Glass, J.: Unsupervised spoken keyword spotting via segmental DTW on gaussian posteriorgrams. In: Proceedings of ASRU, pp. 398–403 (2009)
Acknowledgment
The authors would like to thank Dr. K. Samudravijaya (TIFR) and Dr. S. Lata (MEITY) for providing Hindi data and Dr. Suryakanth V. Gangashetty (IIIT, Hyderabad) for providing Telugu data. The first author would like to thank his managers Mr. Shiv Narayan, Mr. Biren Karmakar and Mr. Vipin Tyagi, Executive Director, CDOT, New Delhi for their permission to carry out this research.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Popli, A., Kumar, A. (2017). Multimodal Keyword Search for Multilingual and Mixlingual Speech Corpus. In: Karpov, A., Potapova, R., Mporas, I. (eds) Speech and Computer. SPECOM 2017. Lecture Notes in Computer Science(), vol 10458. Springer, Cham. https://doi.org/10.1007/978-3-319-66429-3_53
Download citation
DOI: https://doi.org/10.1007/978-3-319-66429-3_53
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-66428-6
Online ISBN: 978-3-319-66429-3
eBook Packages: Computer ScienceComputer Science (R0)