Phonetic Spoken Term Detection in Large Audio Archive Using the WFST Framework

Vavruška, Jan; Švec, Jan; Ircing, Pavel

doi:10.1007/978-3-642-40585-3_51

Jan Vavruška²⁰,
Jan Švec²⁰ &
Pavel Ircing²⁰

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8082))

Included in the following conference series:

International Conference on Text, Speech and Dialogue

2515 Accesses

Abstract

The paper presents a technique for phonetic spoken term detection in large audio archive. It is designed within the framework of weighted finite-state transducers and utilizes the rather recently developed notion of factor automata, which we have enhanced with a score normalization and a technique for systematic query expansion which allows for phone deletions and substitutions and consequently compensates for frequent pronunciation imperfections and systematic phoneme interchanges occurring during the ASR decoding process. The experiments presented in the paper show that the new WFST-based method outperforms the baseline system both in terms of search performance and speed. Finally, the paper discusses the issues of the proposed techniques that need to be addressed before the application in real-life tasks.

This research was supported by the Ministry of Culture Czech Republic, project No. DF12P01OVV022.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Spoken term detection ALBAYZIN 2014 evaluation: overview, systems, results, and discussion

Article Open access 07 August 2015

Comparison of ALBAYZIN query-by-example spoken term detection 2012 and 2014 evaluations

Article Open access 13 January 2016

ALBAYZIN Query-by-example Spoken Term Detection 2016 evaluation

Article Open access 13 April 2018

References

Psutka, J., Švec, J., Psutka, J., Vaněk, J., Pražák, A., Šmídl, L., Ircing, P.: System for fast lexical and phonetic spoken term detection in a Czech cultural heritage archive. EURASIP Journal on Audio, Speech, and Music Processing 2011(1), 10 (2011)
Article Google Scholar
Byrne, W., Doermann, D., Franz, M., Gustman, S., Hajič, J., Oard, D., Picheny, M., Psutka, J., Ramabhadran, B., Soergel, D., Ward, T., Zhu, W.J.: Automatic Recognition of Spontaneous Speech for Access to Multilingual Oral History Archives. IEEE Transactions on Speech and Audio Processing 12(4), 420–435 (2004)
Article Google Scholar
Can, D., Saraclar, M.: Lattice indexing for spoken term detection. IEEE Transactions on Audio, Speech, and Language Processing 19(8), 2338–2347 (2011)
Article Google Scholar
Mohri, M., Moreno, P., Weinstein, E.: Factor automata of automata and applications. In: Holub, J., Žďárek, J. (eds.) CIAA 2007. LNCS, vol. 4783, pp. 168–179. Springer, Heidelberg (2007)
Chapter Google Scholar
Allauzen, C., Mohri, M., Saraclar, M.: General indexation of weighted automata - application to spoken utterance retrieval. In: Ramabhadran, B., Douglas, O. (eds.) HLT-NAACL 2004 Workshop: Interdisciplinary Approaches to Speech Indexing and Retrieval, pp. 33–40. Association for Computational Linguistics, Boston (2004)
Chapter Google Scholar
Allauzen, C., Riley, M.D., Schalkwyk, J., Skut, W., Mohri, M.: OpenFst: A general and efficient weighted finite-state transducer library. In: Holub, J., Žďárek, J. (eds.) CIAA 2007. LNCS, vol. 4783, pp. 11–23. Springer, Heidelberg (2007)
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

Department of Cybernetics, University of West Bohemia, Plzeň, Czech Republic
Jan Vavruška, Jan Švec & Pavel Ircing

Authors

Jan Vavruška
View author publications
You can also search for this author in PubMed Google Scholar
Jan Švec
View author publications
You can also search for this author in PubMed Google Scholar
Pavel Ircing
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

University of West Bohemia, 306 14, Pilsen, Czech Republic
Ivan Habernal & Václav Matoušek &

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Vavruška, J., Švec, J., Ircing, P. (2013). Phonetic Spoken Term Detection in Large Audio Archive Using the WFST Framework. In: Habernal, I., Matoušek, V. (eds) Text, Speech, and Dialogue. TSD 2013. Lecture Notes in Computer Science(), vol 8082. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-40585-3_51

Download citation

DOI: https://doi.org/10.1007/978-3-642-40585-3_51
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-40584-6
Online ISBN: 978-3-642-40585-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics