Acoustic Similarity Scores for Keyword Spotting

Veiga, Arlindo; Lopes, Carla; Sá, Luís; Perdigão, Fernando

doi:10.1007/978-3-319-09761-9_5

Arlindo Veiga^25,26,
Carla Lopes^25,27,
Luís Sá^25,26 &
…
Fernando Perdigão^25,26

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8775))

Included in the following conference series:

International Conference on Computational Processing of the Portuguese Language

669 Accesses
5 Citations

Abstract

This paper presents a study on keyword spotting systems based on acoustic similarity between a filler model and keyword model. The ratio between the keyword model likelihood and the generic (filler) model likelihood is used by the classifier to detect relevant peaks values that indicate keyword occurrences. We have changed the standard scheme of keyword spotting system to allow keyword detection in a single forward step. We propose a new log-likelihood ratio normalization to minimize the effect of word length on the classifier performance. Tests show the effectiveness of our normalization method against two other methods. Experiments were performed on continuous speech utterances of the Portuguese TECNOVOZ database (read sentences) with keywords of several lengths.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Bridle, J.S.: An Efficient Elastic-Template Method for Detecting Given Words in Running Speech. In: Proc. of the British Acoustical Society Meeting (1973)
Google Scholar
Higgins, A., Wohlford, R.: Keyword Recognition Using Template Concatenation. In: Proc. of the International Conference on Acoustics, Speech, and Signal Processing, vol. 10, pp. 1233–1236 (1985)
Google Scholar
Rabiner, L.R.: A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition. Proceedings of the IEEE 77(2), 257–286 (1989)
Article Google Scholar
Wöllmer, M., Schuller, B., Rigoll, G.: Keyword Spotting Exploiting Long Short-Term Memory. Speech Communication 55, 252–265 (2013)
Article Google Scholar
Zhu, Q., Chen, B., Morgan, N., Stolcke, A.: Tandem Connectionist Feature Extraction for Conversational Speech Recognition. In: Bengio, S., Bourlard, H. (eds.) MLMI 2004. LNCS, vol. 3361, pp. 223–231. Springer, Heidelberg (2005)
Chapter Google Scholar
Szoke, I., Schwarz, P., Matejka, P., Burget, L., Karafiát, M., Fapso, M., Cernocky, J.: Comparison of Keyword Spotting Approaches for Informal Continuous Speech. In: Proc. of the 9th European Conference on Speech Communication and Technology, Lisbon, Portugal (2005)
Google Scholar
Rohlicek, J.R., Russell, W., Roukos, S., Gish, H.: Continuous Hidden Markov Modeling for Speaker-Independent Word Spotting. In: Proc. of the International Conference on Acoustics, Speech, and Signal Processing, pp. 627–630 (1989)
Google Scholar
Rose, R.C., Paul, D.B.: A Hidden Markov Model Based Keyword Recognition System. In: Proc. of the International Conference on Acoustics, Speech, and Signal Processing, vol. 1, pp. 129–132 (1990)
Google Scholar
Young, S., Russell, N.H., Thornton, M.: Token Passing: A Simple Conceptual Model for Connected Speech Recognition Systems. Cambridge University Engineering Department, Cambridge (1989)
Google Scholar
Junkawitsch, J., Ruske, G., Höge, H.: Efficient Methods for Detecting Keywords in Continuous Speech. In: Proc. of the 5th European Conference on Speech Communication and Technology, Rhodes, Greece, pp. 259–262 (1997)
Google Scholar
Weintraub, M.: Keyword-Spotting Using SRI’s DECIPHER Large-Vocabulary Speech-Recognition System. In: Proc. of the International Conference on Acoustics, Speech, and Signal Processing, vol. 2, pp. 463–466 (1993)
Google Scholar
Papoulis, A.: Probability, Random Variables and Stochastic Processes, 3rd edn. McGraw-Hill Companies (1991)
Google Scholar
Lopes, J., Neves, C., Veiga, A., Maciel, A., Lopes, C., Perdigão, F., Sá, L.: Development of a Speech Recognizer with the Tecnovoz Database. In: Teixeira, A., de Lima, V.L.S., de Oliveira, L.C., Quaresma, P. (eds.) PROPOR 2008. LNCS (LNAI), vol. 5190, pp. 260–263. Springer, Heidelberg (2008)
Chapter Google Scholar
Young, S., Evermann, G., Gales, M., Hain, T., Kershaw, D., Liu, X., Moore, G., Odell, J., Ollason, D., Povey, D., Valtchev, V., Woodland, P.: The HTK Book (for HTK Version 3.4). Cambridge University Engineering Department, Cambridge (2006)
Google Scholar
Egan, J.P.: Signal Detection Theory and ROC Analysis. Academic Press, New York (1975)
Google Scholar
Martin, A., Doddington, G., Kamm, T., Ordowski, M., Przybocki, M.: The DET Curve in Assessment of Detection Task Performance. DTIC Document (1997)
Google Scholar

Download references

Author information

Authors and Affiliations

Instituto de Telecomunicações, 3030-290, Coimbra, Portugal
Arlindo Veiga, Carla Lopes, Luís Sá & Fernando Perdigão
Universidade de Coimbra – DEEC, Polo II, 3030-290, Coimbra, Portugal
Arlindo Veiga, Luís Sá & Fernando Perdigão
Instituto Politécnico de Leiria – ESTG, 2411-901, Leiria, Portugal
Carla Lopes

Authors

Arlindo Veiga
View author publications
You can also search for this author in PubMed Google Scholar
Carla Lopes
View author publications
You can also search for this author in PubMed Google Scholar
Luís Sá
View author publications
You can also search for this author in PubMed Google Scholar
Fernando Perdigão
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

FCHS, Universidade do Algarve, Campus de Gambelas,, 8005-139, Faro, Portugal
Jorge Baptista
INESC-ID Lisboa, Lisbon, Portugal
Nuno Mamede
IT-University of Coimbra, Coimbra, Portugal
Sara Candeias
USP-EACH, São Paulo-SP, Brazil
Ivandré Paraboni
USP-ICMC, Universidade de São Paulo, São Carlos, SP, Brazil
Thiago A. S. Pardo
SCC-ICMC, University of São Paulo, São Carlos, SP, Brazil
Maria das Graças Volpe Nunes

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Veiga, A., Lopes, C., Sá, L., Perdigão, F. (2014). Acoustic Similarity Scores for Keyword Spotting. In: Baptista, J., Mamede, N., Candeias, S., Paraboni, I., Pardo, T.A.S., Volpe Nunes, M.d.G. (eds) Computational Processing of the Portuguese Language. PROPOR 2014. Lecture Notes in Computer Science(), vol 8775. Springer, Cham. https://doi.org/10.1007/978-3-319-09761-9_5

Download citation

DOI: https://doi.org/10.1007/978-3-319-09761-9_5
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-09760-2
Online ISBN: 978-3-319-09761-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics