Skip to main content

Multimodal Keyword Search for Multilingual and Mixlingual Speech Corpus

  • Conference paper
  • First Online:
Speech and Computer (SPECOM 2017)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10458))

Included in the following conference series:

Abstract

A novel framework for searching keywords in multilingual and mixlingual speech corpus is proposed. This framework is capable of searching spoken as well as text queries. The capability of spoken search enables it to search out-of-vocabulary (OOV) words. The capability of searching text queries enables it to perform semantic search. An advanced application of searching keyword translations in mixlingual speech corpus is also possible within posteriorgram framework with this system. It is shown that the performance of text queries is comparable or better than the performance of spoken queries if the language of the keyword is included in the training languages. Also, a technique for combining information from text and spoken queries is proposed which further enhances the search performance. This system is based on multiple posteriorgrams based on articulatory classes trained with multiple languages.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Aradilla, G., Bourlard, H., Magimai.-Doss, M.: Using KL-based acoustic models in a large vocabulary recognition task. Idiap-RR Idiap-RR-14-2008. IDIAP (2008)

    Google Scholar 

  2. Das, B., Mandal, S., Mitra, P.: Bengali speech corpus for continuous automatic speech recognition system. In: 2011 International Conference on Speech Database and Assessments (Oriental COCOSDA), pp. 51–55 (2011)

    Google Scholar 

  3. Garofolo, J.: Csr-i (wsj0) complete ldc93s6a (1993)

    Google Scholar 

  4. Garofolo, J.: Csr-ii (wsj1) complete ldc94s13a (1994)

    Google Scholar 

  5. Gupta, V., Ajmera, J., Kumar, A., Verma, A.: A language independent approach to audio search. In: Proceedings of INTERSPEECH, pp. 1125–1128. ISCA (2011)

    Google Scholar 

  6. Hazen, T., Shen, W., White, C.: Query-by-example spoken term detection using phonetic posteriorgram templates. In: Proceedings of ASRU, pp. 421–426 (2009)

    Google Scholar 

  7. Mantena, G., Prahallad, K.: Use of articulatory bottle-neck features for query-by-example spoken term detection in low resource scenarios. In: Proceedings of ICASSP, pp. 7128–7132 (2014)

    Google Scholar 

  8. Motlicek, P., Valente, F., Szoke, I.: Improving acoustic based keyword spotting using lvcsr lattices. In: 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4413–4416 (2012)

    Google Scholar 

  9. Popli, A., Kumar, A.: Query-by-example spoken term detection using low dimensional posteriorgrams motivated by articulatory classes. In: 2015 IEEE 17th International Workshop on Multimedia Signal Processing (MMSP), pp. 1–6 (2015)

    Google Scholar 

  10. Popli, A., Kumar, A.: Multilingual query-by-example spoken term detection in indian languages. Submitted to a Journal (2017)

    Google Scholar 

  11. Thambiratnam, K., Sridharan, S.: Dynamic match phone-lattice searches for very fast and accurate unrestricted vocabulary keyword spotting. In: Proceedings of ICASSP, vol. 1, pp. 465–468 (2005)

    Google Scholar 

  12. Wikipedia: Schwa deletion in indo-aryan languages - wikipedia, the free encyclopedia (2017)

    Google Scholar 

  13. Xu, J., Zhang, G., Yan, Y.: Effective utilization of multiple examples in query-by-example spoken term detection. In: 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5440–5444 (2016)

    Google Scholar 

  14. Zhang, Y., Glass, J.: Unsupervised spoken keyword spotting via segmental DTW on gaussian posteriorgrams. In: Proceedings of ASRU, pp. 398–403 (2009)

    Google Scholar 

Download references

Acknowledgment

The authors would like to thank Dr. K. Samudravijaya (TIFR) and Dr. S. Lata (MEITY) for providing Hindi data and Dr. Suryakanth V. Gangashetty (IIIT, Hyderabad) for providing Telugu data. The first author would like to thank his managers Mr. Shiv Narayan, Mr. Biren Karmakar and Mr. Vipin Tyagi, Executive Director, CDOT, New Delhi for their permission to carry out this research.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Abhimanyu Popli .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Popli, A., Kumar, A. (2017). Multimodal Keyword Search for Multilingual and Mixlingual Speech Corpus. In: Karpov, A., Potapova, R., Mporas, I. (eds) Speech and Computer. SPECOM 2017. Lecture Notes in Computer Science(), vol 10458. Springer, Cham. https://doi.org/10.1007/978-3-319-66429-3_53

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-66429-3_53

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-66428-6

  • Online ISBN: 978-3-319-66429-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics