Skip to main content
Log in

Improved dynamic match phone lattice search for Persian spoken term detection system in online and offline applications

  • Published:
International Journal of Speech Technology Aims and scope Submit manuscript

Abstract

Spoken term detection (STD) refers to discovering all occurrences of a given term in a set of speech utterances. One of the well-known approaches for the STD system is the phone lattice search (PLS) that produces a phone-based lattice of speech utterances. Since the accuracy of a phone recognizer affects the accuracy of the STD system, the PLS approach utilizes the minimum edit distance (MED) measure to compensate the phone recognizer errors. While this measure increases the detection rate, it also raises the false alarm rate. In this paper, we consider the PLS approach as the baseline. Then, we use Viterbi scoring and Jaro-Winkler similarity measure in order to decrease the false alarm rate. Since the proposed approach uses more techniques than the baseline approach, the search speed may decrease. To overcome this problem, we use lattice pruning and indexing techniques such as depth first search algorithm to increase the search speed in online and offline applications, respectively. We report the experimental results for monophone-based and triphone-based STD system. The results indicate that using triphone-based STD system improved the performance about 2% in comparison with monophone-based STD system. Moreover, when we used triphone-based models, the proposed approach including MED measure, Viterbi scores and Jaro-Winkler similarity measure improved the accuracy of the method with only MED measure, about 17%.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14

Similar content being viewed by others

References

  • Allauzen, C., Mohri, M., & Saraclar, M. (2004). General indexation of weighted automata: Application to spoken utterance retrieval. In: Proceedings of the workshop on interdisciplinary approaches to speech indexing and retrieval at HLT-NAACL 2004. Association for computational linguistics (pp. 33–40).

  • Audhkhasi, K., & Verma, A. (2007). Keyword search using modified minimum edit distance measure. In: IEEE international conference on acoustics, speech and signal processing, ICASSP 2007 (pp. IV-929–IV-932). IEEE.

  • BenZeghiba, M. F., Gauvain, J.-L., & Lamel, L. (2010). Improved n-gram phonotactic models for language recognition. In: Eleventh annual conference of the international speech communication association (pp. 2710–2713).

  • Bijankhan, M., Sheykhzadegan, J., Roohani, M. R., Zarrintare, R., Ghasemi, S. Z., & Ghasedi, M. E. (2003). Tfarsdat-the telephone Farsi speech database. In: Eighth european conference on speech communication and technology.

  • Burget, L., Černocký, J., Fapšo, M., Karafiát, M., Matějka, P., Schwarz, P., Smrž, P., & Szöke, I. (2006). Indexing and search methods for spoken documents. In: International conference on text, speech and dialogue (pp. 351–358). Berlin: Springer.

  • Can, D., Cooper, E., Sethy, A., White, C., Ramabhadran, B., & Saraclar, M. (2009). Effect of pronounciations on OOV queries in spoken term detection. In: IEEE international conference on acoustics, speech and signal processing ICASSP 2009 (pp. 3957–3960).

  • Can, D., & Saraclar, M. (2011). Lattice indexing for spoken term detection. IEEE Transactions on Audio, Speech, and Language Processing, 19(8), 2338–2347.

    Article  Google Scholar 

  • Cernocky, J., Szoke, I., Fapso, M., Karafiat, M., Burget, L., Kopecky, J., Grezl, F., Schwarz, P., Glembek, O., & Oparin, I. (2007). Search in speech for public security and defense. In: IEEE Workshop on signal processing applications for public security and forensics, SAFE’07 (pp. 1–7).

  • Chaudhari, U. V., & Picheny, M. (2007). Improvements in phone based audio search via constrained match with high order confusion estimates. In: IEEE Workshop on automatic speech recognition and understanding, ASRU 2007 (pp. 665–670).

  • Chelba, C., & Acero, A. (2005). Position specific posterior lattices for indexing speech. In: Proceedings of the 43rd annual meeting on association for computational linguistics (pp. 443–450).

  • Goodarzi, M. M., Shekofteh, Y., Rezaei, I. S., & Kabudian, J. (2014). Discriminative confidence measure using linear combination of duration-based features and acoustic-based scores in keyword spotting. In: IEEE 7th international symposium on telecommunications (IST) (pp 316–319).

  • Gracia, C., Anguera, X., Luque, J., & Artzi, I. (2014). Phoneme-lattice to phoneme-sequence matching algorithm based on dynamic programming. In: Advances in speech and language technologies for iberian languages (pp. 99–108). Cham: Springer.

  • Li, W., Wu, J., & Wang, Z. A. (2008). Trellis based fast lattice generating algorithm. In: IEEE 6th international symposium on chinese spoken language processing, ISCSLP’08 (pp. 1–4).

  • Mamou, J., Ramabhadran, B., & Siohan, O. (2007). Vocabulary independent spoken term detection. In: Proceedings of the 30th annual international ACM SIGIR conference on research and development in information retrieval, 2007 (pp. 615–622). New York: ACM

  • Mangu, L., Soltau, H., Kuo, H.-K., Kingsbury, B., & Saon, G. (2013). Exploiting diversity for spoken term detection. In: IEEE international conference on acoustics, speech and signal processing (ICASSP), 2013 (pp. 8282–8286).

  • Mansikkaniemi, A. (2010). Acoustic model and language model adaptation for a mobile dictation service. Aalto University: Master of Science.

    Google Scholar 

  • Masoud, A. (2017). Keyword spotting in persian speech using a hybrid model of DNN and HMM. Msc, Amirkabir University of Technology.

  • Meng, S., Yu, P., Seide, F., & Liu, J. (2007). A study of lattice-based spoken term detection for Chinese spontaneous speech. In: IEEE workshop on automatic speech recognition and understanding, ASRU (pp. 635–640).

  • Mertens, T., & Schneider, D. (2009). Efficient subword lattice retrieval for German spoken term detection. In: IEEE international conference on, acoustics, speech and signal processing, ICASSP 2009 (pp. 4885–4888).

  • Rajabzadeh, M., Tabibian, S., Akbari, A., & Nasersharif, B. (2012a). An improved phone lattice search method for triphone based keyword spotting in online persian telephony speech. In: International conference on contemporary (CICIS) (pp. 294–299).

  • Rajabzadeh, M., Tabibian, S., Akbari, A., & Nasersharif, B. (2012b). Improved dynamic match phone lattice search using viterbi scores and Jaro Winkler distance for keyword spotting system. In: IEEE 16th CSI International Symposium on, Artificial Intelligence and Signal Processing (AISP) (pp. 423–427).

  • Sak, H., Saraclar, M., & Güngör, T. (2010). On-the-fly lattice rescoring for real-time automatic speech recognition. In: Eleventh annual conference of the international speech communication association.

  • Saraclar, M., & Sproat, R. (2004). Lattice-based search for spoken utterance retrieval. In: Proceedings of the human language technology conference of the north american chapter of the association for computational linguistics: HLT-NAACL 2004, (pp. 129–136).

  • Shao, J., Zhao, Q., Zhang, P., Liu, Z., & Yan, Y. (2008). Fast fuzzy keyword spotting using syllable confusion network indexing. Chinese Journal of Electronics, 17(2), 265–270.

    Google Scholar 

  • Shekofteh, Y., Kabudian, J., Goodarzi, M. M., & Rezaei, I. S. (2012). Confidence measure improvement using useful predictor features and support vector machines. In: IEEE 20th Iranian conference on electrical engineering (ICEE) (pp. 1168–1171).

  • Shokri, A., Tabibian, S., Akbari, A., Nasersharif, B., & Kabudian, J. (2011). A robust keyword spotting system for Persian conversational telephone speech using feature and score normalization and ARMA filter. In: GCC conference and exhibition (GCC) (pp. 497–500).

  • Siohan, O., Bacchiani, M. (2005). Fast vocabulary-independent audio search using path-based graph indexing. In: Ninth European conference on speech communication and technology (pp. 53–56).

  • Szoke, I., Schwarz, P., Matejka, P., Burget, L., Karafiát, M., Fapso, M., & Cernocky, J. (2005). Comparison of keyword spotting approaches for informal continuous speech. In: Ninth European conference on speech communication and technology (pp. 633–636).

  • Tabibian, S., Akbari, A., & Nasersharif, B. (2018). Discriminative keyword spotting using triphones information and N-best search. Information Sciences, 423, 157–171.

    Article  Google Scholar 

  • Thambiratnam, K., & Sridharan, S. (2005). Dynamic match phone-lattice searches for very fast and accurate unrestricted vocabulary keyword spotting. In: IEEE international conference on acoustics, speech, and signal processing, proceedings (ICASSP’05) (Vol. 461, pp. I/465–I/468).

  • Thambiratnam, K., & Sridharan, S. (2007). Rapid yet accurate speech indexing using dynamic match lattice spotting. IEEE Transactions on Audio, Speech, and Language Processing, 15(1), 346–357.

    Article  Google Scholar 

  • Trinh, K., Nguyen, H., Duong, D., & Vu, Q. (2008). An empirical study of multipass decoding for vietnamese LVCSR. In: Spoken languages technologies for under-resourced languages (pp. 12–17).

  • Vazirnezhad, B., Almasganj, F., & Ahadi, S. M. (2009). Hybrid statistical pronunciation models designed to be trained by a medium-size corpus. Computer Speech & Language, 23(1), 1–24.

    Article  Google Scholar 

  • Wallace, R., Baker, B., Vogt, R., & Sridharan, S. (2009a). The effect of language models on phonetic decoding for spoken term detection. In: Proceedings of the third workshop on Searching spontaneous conversational speech (pp. 31–36). New York: ACM.

  • Wallace, R., Vogt, R., & Sridharan, S. (2009b). Spoken term detection using fast phonetic decoding. In: IEEE international conference on acoustics, speech and signal processing, ICASSP 2009 (pp. 4881–4884).

  • Wang, X., Xie, L., Ma, B., Chng, E. S., & Li, H. (2010). Phoneme lattice based TextTiling towards multilingual story segmentation. In: Eleventh annual conference of the international speech communication association (pp. 1305–1308).

  • Winkler, W. E. (1990). String comparator metrics and enhanced decision rules in the Fellegi-Sunter model of record linkage (pp. 354–359).

  • Winkler, W. E. (2006). Overview of record linkage and current research directions. Citeseer.

  • Young, S., Evermann, G., Gales, M., Hain, T., Kershaw, D., Liu, X., Moore, G., Odell, J., Ollason, D., & Povey, D. (2002). The HTK book (Vol. 3, p. 175). Cambridge: Cambridge University Engineering Department.

    Google Scholar 

  • Young, S. J., Woodland, P., & Byrne, W. (1993). HTK: hidden Markov Model Toolkit V1. 5.

  • Zhang, S., Shuang, Z., Shi, Q., & Qin, Y. (2010). Improved mandarin keyword spotting using confusion garbage model. In: IEEE 20th International conference on pattern recognition (ICPR) (pp. 3700–3703).

  • Zhou, Z.-Y., Yu, P., Chelba, C., & Seide, F. (2006). Towards spoken-document retrieval for the internet: Lattice indexing for large-scale web-search architectures. In: Proceedings of the main conference on human language technology conference of the North American chapter of the association of computational linguistics (pp. 415–422).

Download references

Acknowledgements

We thank Iran Telecommunication Research Center for its supports during this work.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shima Tabibian.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Tabibian, S., Akbari, A. & Nasersharif, B. Improved dynamic match phone lattice search for Persian spoken term detection system in online and offline applications. Int J Speech Technol 22, 205–217 (2019). https://doi.org/10.1007/s10772-019-09594-w

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10772-019-09594-w

Keywords

Navigation