Improved dynamic match phone lattice search for Persian spoken term detection system in online and offline applications

Tabibian, Shima; Akbari, Ahmad; Nasersharif, Babak

doi:10.1007/s10772-019-09594-w

Improved dynamic match phone lattice search for Persian spoken term detection system in online and offline applications

Published: 25 January 2019

Volume 22, pages 205–217, (2019)
Cite this article

International Journal of Speech Technology Aims and scope Submit manuscript

Shima Tabibian¹,
Ahmad Akbari² &
Babak Nasersharif^2,3

127 Accesses
1 Citation
Explore all metrics

Abstract

Spoken term detection (STD) refers to discovering all occurrences of a given term in a set of speech utterances. One of the well-known approaches for the STD system is the phone lattice search (PLS) that produces a phone-based lattice of speech utterances. Since the accuracy of a phone recognizer affects the accuracy of the STD system, the PLS approach utilizes the minimum edit distance (MED) measure to compensate the phone recognizer errors. While this measure increases the detection rate, it also raises the false alarm rate. In this paper, we consider the PLS approach as the baseline. Then, we use Viterbi scoring and Jaro-Winkler similarity measure in order to decrease the false alarm rate. Since the proposed approach uses more techniques than the baseline approach, the search speed may decrease. To overcome this problem, we use lattice pruning and indexing techniques such as depth first search algorithm to increase the search speed in online and offline applications, respectively. We report the experimental results for monophone-based and triphone-based STD system. The results indicate that using triphone-based STD system improved the performance about 2% in comparison with monophone-based STD system. Moreover, when we used triphone-based models, the proposed approach including MED measure, Viterbi scores and Jaro-Winkler similarity measure improved the accuracy of the method with only MED measure, about 17%.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Natural language processing: state of the art, current trends and challenges

Article 14 July 2022

Transformer models for text-based emotion detection: a review of BERT-based approaches

Article 08 February 2021

Large-Language-Models (LLM)-Based AI Chatbots: Architecture, In-Depth Analysis and Their Performance Evaluation

References

Allauzen, C., Mohri, M., & Saraclar, M. (2004). General indexation of weighted automata: Application to spoken utterance retrieval. In: Proceedings of the workshop on interdisciplinary approaches to speech indexing and retrieval at HLT-NAACL 2004. Association for computational linguistics (pp. 33–40).
Audhkhasi, K., & Verma, A. (2007). Keyword search using modified minimum edit distance measure. In: IEEE international conference on acoustics, speech and signal processing, ICASSP 2007 (pp. IV-929–IV-932). IEEE.
BenZeghiba, M. F., Gauvain, J.-L., & Lamel, L. (2010). Improved n-gram phonotactic models for language recognition. In: Eleventh annual conference of the international speech communication association (pp. 2710–2713).
Bijankhan, M., Sheykhzadegan, J., Roohani, M. R., Zarrintare, R., Ghasemi, S. Z., & Ghasedi, M. E. (2003). Tfarsdat-the telephone Farsi speech database. In: Eighth european conference on speech communication and technology.
Burget, L., Černocký, J., Fapšo, M., Karafiát, M., Matějka, P., Schwarz, P., Smrž, P., & Szöke, I. (2006). Indexing and search methods for spoken documents. In: International conference on text, speech and dialogue (pp. 351–358). Berlin: Springer.
Can, D., Cooper, E., Sethy, A., White, C., Ramabhadran, B., & Saraclar, M. (2009). Effect of pronounciations on OOV queries in spoken term detection. In: IEEE international conference on acoustics, speech and signal processing ICASSP 2009 (pp. 3957–3960).
Can, D., & Saraclar, M. (2011). Lattice indexing for spoken term detection. IEEE Transactions on Audio, Speech, and Language Processing, 19(8), 2338–2347.
Article Google Scholar
Cernocky, J., Szoke, I., Fapso, M., Karafiat, M., Burget, L., Kopecky, J., Grezl, F., Schwarz, P., Glembek, O., & Oparin, I. (2007). Search in speech for public security and defense. In: IEEE Workshop on signal processing applications for public security and forensics, SAFE’07 (pp. 1–7).
Chaudhari, U. V., & Picheny, M. (2007). Improvements in phone based audio search via constrained match with high order confusion estimates. In: IEEE Workshop on automatic speech recognition and understanding, ASRU 2007 (pp. 665–670).
Chelba, C., & Acero, A. (2005). Position specific posterior lattices for indexing speech. In: Proceedings of the 43rd annual meeting on association for computational linguistics (pp. 443–450).
Goodarzi, M. M., Shekofteh, Y., Rezaei, I. S., & Kabudian, J. (2014). Discriminative confidence measure using linear combination of duration-based features and acoustic-based scores in keyword spotting. In: IEEE 7th international symposium on telecommunications (IST) (pp 316–319).
Gracia, C., Anguera, X., Luque, J., & Artzi, I. (2014). Phoneme-lattice to phoneme-sequence matching algorithm based on dynamic programming. In: Advances in speech and language technologies for iberian languages (pp. 99–108). Cham: Springer.
Li, W., Wu, J., & Wang, Z. A. (2008). Trellis based fast lattice generating algorithm. In: IEEE 6th international symposium on chinese spoken language processing, ISCSLP’08 (pp. 1–4).
Mamou, J., Ramabhadran, B., & Siohan, O. (2007). Vocabulary independent spoken term detection. In: Proceedings of the 30th annual international ACM SIGIR conference on research and development in information retrieval, 2007 (pp. 615–622). New York: ACM
Mangu, L., Soltau, H., Kuo, H.-K., Kingsbury, B., & Saon, G. (2013). Exploiting diversity for spoken term detection. In: IEEE international conference on acoustics, speech and signal processing (ICASSP), 2013 (pp. 8282–8286).
Mansikkaniemi, A. (2010). Acoustic model and language model adaptation for a mobile dictation service. Aalto University: Master of Science.
Google Scholar
Masoud, A. (2017). Keyword spotting in persian speech using a hybrid model of DNN and HMM. Msc, Amirkabir University of Technology.
Meng, S., Yu, P., Seide, F., & Liu, J. (2007). A study of lattice-based spoken term detection for Chinese spontaneous speech. In: IEEE workshop on automatic speech recognition and understanding, ASRU (pp. 635–640).
Mertens, T., & Schneider, D. (2009). Efficient subword lattice retrieval for German spoken term detection. In: IEEE international conference on, acoustics, speech and signal processing, ICASSP 2009 (pp. 4885–4888).
Rajabzadeh, M., Tabibian, S., Akbari, A., & Nasersharif, B. (2012a). An improved phone lattice search method for triphone based keyword spotting in online persian telephony speech. In: International conference on contemporary (CICIS) (pp. 294–299).
Rajabzadeh, M., Tabibian, S., Akbari, A., & Nasersharif, B. (2012b). Improved dynamic match phone lattice search using viterbi scores and Jaro Winkler distance for keyword spotting system. In: IEEE 16th CSI International Symposium on, Artificial Intelligence and Signal Processing (AISP) (pp. 423–427).
Sak, H., Saraclar, M., & Güngör, T. (2010). On-the-fly lattice rescoring for real-time automatic speech recognition. In: Eleventh annual conference of the international speech communication association.
Saraclar, M., & Sproat, R. (2004). Lattice-based search for spoken utterance retrieval. In: Proceedings of the human language technology conference of the north american chapter of the association for computational linguistics: HLT-NAACL 2004, (pp. 129–136).
Shao, J., Zhao, Q., Zhang, P., Liu, Z., & Yan, Y. (2008). Fast fuzzy keyword spotting using syllable confusion network indexing. Chinese Journal of Electronics, 17(2), 265–270.
Google Scholar
Shekofteh, Y., Kabudian, J., Goodarzi, M. M., & Rezaei, I. S. (2012). Confidence measure improvement using useful predictor features and support vector machines. In: IEEE 20th Iranian conference on electrical engineering (ICEE) (pp. 1168–1171).
Shokri, A., Tabibian, S., Akbari, A., Nasersharif, B., & Kabudian, J. (2011). A robust keyword spotting system for Persian conversational telephone speech using feature and score normalization and ARMA filter. In: GCC conference and exhibition (GCC) (pp. 497–500).
Siohan, O., Bacchiani, M. (2005). Fast vocabulary-independent audio search using path-based graph indexing. In: Ninth European conference on speech communication and technology (pp. 53–56).
Szoke, I., Schwarz, P., Matejka, P., Burget, L., Karafiát, M., Fapso, M., & Cernocky, J. (2005). Comparison of keyword spotting approaches for informal continuous speech. In: Ninth European conference on speech communication and technology (pp. 633–636).
Tabibian, S., Akbari, A., & Nasersharif, B. (2018). Discriminative keyword spotting using triphones information and N-best search. Information Sciences, 423, 157–171.
Article Google Scholar
Thambiratnam, K., & Sridharan, S. (2005). Dynamic match phone-lattice searches for very fast and accurate unrestricted vocabulary keyword spotting. In: IEEE international conference on acoustics, speech, and signal processing, proceedings (ICASSP’05) (Vol. 461, pp. I/465–I/468).
Thambiratnam, K., & Sridharan, S. (2007). Rapid yet accurate speech indexing using dynamic match lattice spotting. IEEE Transactions on Audio, Speech, and Language Processing, 15(1), 346–357.
Article Google Scholar
Trinh, K., Nguyen, H., Duong, D., & Vu, Q. (2008). An empirical study of multipass decoding for vietnamese LVCSR. In: Spoken languages technologies for under-resourced languages (pp. 12–17).
Vazirnezhad, B., Almasganj, F., & Ahadi, S. M. (2009). Hybrid statistical pronunciation models designed to be trained by a medium-size corpus. Computer Speech & Language, 23(1), 1–24.
Article Google Scholar
Wallace, R., Baker, B., Vogt, R., & Sridharan, S. (2009a). The effect of language models on phonetic decoding for spoken term detection. In: Proceedings of the third workshop on Searching spontaneous conversational speech (pp. 31–36). New York: ACM.
Wallace, R., Vogt, R., & Sridharan, S. (2009b). Spoken term detection using fast phonetic decoding. In: IEEE international conference on acoustics, speech and signal processing, ICASSP 2009 (pp. 4881–4884).
Wang, X., Xie, L., Ma, B., Chng, E. S., & Li, H. (2010). Phoneme lattice based TextTiling towards multilingual story segmentation. In: Eleventh annual conference of the international speech communication association (pp. 1305–1308).
Winkler, W. E. (1990). String comparator metrics and enhanced decision rules in the Fellegi-Sunter model of record linkage (pp. 354–359).
Winkler, W. E. (2006). Overview of record linkage and current research directions. Citeseer.
Young, S., Evermann, G., Gales, M., Hain, T., Kershaw, D., Liu, X., Moore, G., Odell, J., Ollason, D., & Povey, D. (2002). The HTK book (Vol. 3, p. 175). Cambridge: Cambridge University Engineering Department.
Google Scholar
Young, S. J., Woodland, P., & Byrne, W. (1993). HTK: hidden Markov Model Toolkit V1. 5.
Zhang, S., Shuang, Z., Shi, Q., & Qin, Y. (2010). Improved mandarin keyword spotting using confusion garbage model. In: IEEE 20th International conference on pattern recognition (ICPR) (pp. 3700–3703).
Zhou, Z.-Y., Yu, P., Chelba, C., & Seide, F. (2006). Towards spoken-document retrieval for the internet: Lattice indexing for large-scale web-search architectures. In: Proceedings of the main conference on human language technology conference of the North American chapter of the association of computational linguistics (pp. 415–422).

Download references

Acknowledgements

We thank Iran Telecommunication Research Center for its supports during this work.

Author information

Authors and Affiliations

Cyberspace Research Institute, Shahid Beheshti University, Tehran, Iran
Shima Tabibian
Audio and Speech Processing Lab, Computer Engineering Department, Iran University of Science and Technology, Tehran, Iran
Ahmad Akbari & Babak Nasersharif
Computer Engineering Department, K.N. Toosi University of Technology, Tehran, Iran
Babak Nasersharif

Authors

Shima Tabibian
View author publications
You can also search for this author in PubMed Google Scholar
Ahmad Akbari
View author publications
You can also search for this author in PubMed Google Scholar
Babak Nasersharif
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shima Tabibian.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Tabibian, S., Akbari, A. & Nasersharif, B. Improved dynamic match phone lattice search for Persian spoken term detection system in online and offline applications. Int J Speech Technol 22, 205–217 (2019). https://doi.org/10.1007/s10772-019-09594-w

Download citation

Received: 16 June 2018
Accepted: 14 January 2019
Published: 25 January 2019
Issue Date: 15 March 2019
DOI: https://doi.org/10.1007/s10772-019-09594-w

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Improved dynamic match phone lattice search for Persian spoken term detection system in online and offline applications

Abstract

Access this article

Similar content being viewed by others

Natural language processing: state of the art, current trends and challenges

Transformer models for text-based emotion detection: a review of BERT-based approaches

Large-Language-Models (LLM)-Based AI Chatbots: Architecture, In-Depth Analysis and Their Performance Evaluation

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Improved dynamic match phone lattice search for Persian spoken term detection system in online and offline applications

Abstract

Access this article

Similar content being viewed by others

Natural language processing: state of the art, current trends and challenges

Transformer models for text-based emotion detection: a review of BERT-based approaches

Large-Language-Models (LLM)-Based AI Chatbots: Architecture, In-Depth Analysis and Their Performance Evaluation

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation