Skip to main content
Log in

Term-Dependent Confidence Normalisation for Out-of-Vocabulary Spoken Term Detection

  • Regular Paper
  • Published:
Journal of Computer Science and Technology Aims and scope Submit manuscript

Abstract

An important component of a spoken term detection (STD) system involves estimating confidence measures of hypothesised detections. A potential problem of the widely used lattice-based confidence estimation, however, is that the confidence scores are treated uniformly for all search terms, regardless of how much they may differ in terms of phonetic or linguistic properties. This problem is particularly evident for out-of-vocabulary (OOV) terms which tend to exhibit high intra-term diversity. To address the impact of term diversity on confidence measures, we propose in this work a term-dependent normalisation technique which compensates for term diversity in confidence estimation. We first derive an evaluation-metric-oriented normalisation that optimises the evaluation metric by compensating for the diverse occurrence rates among terms, and then propose a linear bias compensation and a discriminative compensation to deal with the bias problem that is inherent in lattice-based confidence measurement and from which the Term Specific Threshold (TST) approach suffers. We tested the proposed technique on speech data from the multi-party meeting domain with two state-of-the-art STD systems based on phonemes and words respectively. The experimental results demonstrate that the confidence normalisation approach leads to a significant performance improvement in STD, particularly for OOV terms with phoneme-based systems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Mamou J, Ramabhadran B, Siohan O. Vocabulary independent spoken term detection. In Proc. the 30th ACM-SIGIR, Amsterdam, the Netherlands, July 23-27, 2007, pp.615–622.

  2. Mamou J, Ramabhadran B. Phonetic query expansion for spoken document retrieval. In Proc. the 9th INTERSPEECH, Brisbane, Australia, September 22-26, 2008, pp.2106–2109.

  3. Can D, Cooper E, Sethy A, White C, Ramabhadran B, Saraclar M. Effect of pronunciations on OOV queries in spoken term detection. In Proc. ICASSP 2009, Taipei, China, April 19-24, 2009, pp.3957–3960.

  4. Fiscus J G, Ajot J, Garofolo J S, Doddingtion G. Results of the 2006 spoken term detection evaluation. In Proc. Workshop on Searching Spontaneous Conversational Speech (SIGIR-SSCS), Amsterdam, the Netherlands, July 2007, pp.45–50.

  5. Vergyri D, Stolcke A, Gadde R R, Wang W. The SRI 2006 spoken term detection system. In Proc. NIST Spoken Term Detection Workshop (STD 2006), Gaithersburg, USA, December 14-15, 2006.

  6. Vergyri D, Shafran I, Stolcke A, Gadde R R, Akbacak M, Roark B, Wang W. The SRI/OGI 2006 spoken term detection system. In Proc. the 8th INTERSPEECH, Antwerp, Belgium, August 27-31, 2007, pp.2393–2396.

  7. Akbacak M, Vergyri D, Stolcke A. Open-vocabulary spoken term detection using graphone-based hybrid recognition systems. In Proc. ICASSP 2008, Las Vegas, USA, March 31-April 4, 2008, pp.5240–5243.

  8. Szöke I, Fapšo M, Karafiát M, Burget L, Grézl F, Schwarz P, Glembek O, Matĕjka P, Kopecký J, Černocký J. Spoken term detection system based on combination of LVCSR and phonetic search. In Lecture Notes in Computer Science 4892, Popescn-Belis A, Bourlard H, Reanals S (eds.), Springer Berlin/Heidelberg, September 2008, pp.237–247.

  9. Szöke I, Burget L, Černocký J, Fapšo M. Sub-word modeling of out of vocabulary words in spoken term detection. In Proc. IEEE Workshop on Spoken Language Technology (SLT2008), Goa, India, December 15-19, 2008, pp.273–276.

  10. Szöke I, Fapšo M, Burget L, Černocký J. Hybrid wordsubword decoding for spoken term detection. In Proc. Speech Search Workshop at SIGIR (SSCS 2008), Singapore, Singapore, July 20-24, 2008, pp.42–48.

  11. Meng S, Yu P, Liu J, Seide F. Fusing multiple systems into a compact lattice index for Chinese spoken term detection. In Proc. ICASSP 2008, Las Vegas, USA, March 31-April 4, 2008, pp.4345–4348.

  12. Thambiratmann K, Sridharan S. Rapid yet accurate speech indexing using dynamic match lattice spotting. IEEE Transactions on Audio, Speech, and Language Processing, 2007, 15(1): 346–357.

    Article  Google Scholar 

  13. Wallace R, Vogt R, Baker B, Sridharan S. Optimising figure of merit for phonetic spoken term detection. In Proc. ICASSP 2010, Dallas, USA, March 14-19, 2010, pp.5298–5301.

  14. Parada C, Sethy A, Dredze M, Jelinek F. A spoken term detection framework for recovering out-of-vocabulary words using the web. In Proc. Interspeech 2010, Makuhari, Japan, September 26-30, 2010, pp.1269–1272.

  15. Jansen A, Church K, Hermansky H. Towards spoken term discovery at scale with zero resources. In Proc. INTERSPEECH 2010, Makuhari, Japan, September 26-30, 2010, pp.1676–1679.

  16. Parada C, Sethy A, Ramabhadran B. Balancing false alarms and hits in spoken term detection. In Proc. ICASSP 2010, Dallas, USA, March 14-19, 2010, pp.5286–5289.

  17. Schneider D, Mertens T, Larson M, Kohler J. Contextual verification for open vocabulary spoken term detection. In Proc. INTERSPEECH 2010, Makuhari, Japan, September 26-30, 2010, pp.697–700.

  18. Chan C A, Lee L S. Unsupervised spoken-term detection with spoken queries using segment-based dynamic time warping. In Proc. INTERSPEECH 2010, Makuhari, Japan, September 26-30, 2010, pp.693–696.

  19. Chen C P, Lee H Y, Yeh C F, Lee L S. Improved spoken term detection by feature space pseudo-relevance feedback. In Proc. INTERSPEECH 2010, Makuhari, Japan, September 26-30, 2010, pp.1672–1675.

  20. Motlicek P, Valente F, Garner P. English spoken term detection in multilingual recordings. In Proc. INTERSPEECH 2010, Makuhari, Japan, September 26-30, 2010, pp.206–209.

  21. Szöke I, Fapšo M, Karafiát M, Burget L, Grézl F, Schwarz P, Glembek O, Matĕjka P, Kontár S, ¸Cernocký J. BUT system for NIST STD 2006 — English. In Proc. NIST Spoken Term Detection Evaluation Workshop (STD 2006), Gaithersburg, USA, December 14-15, 2006.

  22. Miller D R H, Kleber M, Kao C L, Kimball O, Colthurst T, Lowe S A, Schwartz R M, Gish H. Rapid and accurate spoken term detection. In Proc. INTERSPEECH 2007, Antwerp, Belgium, August 27-31, 2007, pp.314–317.

  23. Seide F, Yu P, Ma C, Chang E. Vocabulary-independent search in spontaneous speech. In Proc. ICASSP 2004, Vol.1, Montreal, Quebec, Canada, May 17-21, 2004, pp.253–256.

  24. Logan B, Thong J M V, Moreno P J. Approaches to reduce the effects of OOV queries on indexed spoken audio. IEEE Transaction on Multimedia, 2005, 7(5): 899–906.

    Article  Google Scholar 

  25. Logan B, Moreno P, Deshmuk O. Word and sub-word indexing approaches for reducing the effects of OOV queries on spoken audio. In Proc. the 2 rd HLT, San Diego, USA, March 24-27, 2002, pp.31–35.

  26. Ma B, Li H. A phonotactic-semantic paradigm for automatic spoken document classification. In Proc. the 28th International ACM SIGIR Conference on Research and Development in Information retrieval, Salvador, Brazil, August 15-19, 2005, pp.369–376.

  27. Pinto J, Szöke I, Prasanna S, Heřmanský H. Fast approximate spoken term detection from sequence of phonemes. In Proc. the 31st Annual International ACM SIGIR Conference, Singapore, Singapore, July 20-24, 2008, pp.28–33.

  28. Meng S, Yu P, Seide F, Liu J. A study of lattice-based spoken term detection for Chinese spontaneous speech. In Proc. ASRU2007, Kyoto, Japan, December 9-13, 2007, pp.635–640.

  29. Wang D, Frankel J, Tejedor J, King S. A comparison of phone and grapheme-based spoken term detection. In Proc. ICASSP 2008, Las Vegas, USA, March 31-April 4, 2008, pp.4969–4972.

  30. Wallace R, Vogt R, Sridharan S. A phonetic search approach to the 2006 NIST spoken term detection evaluation. In Proc. IINTERSPEECH 2007, Antwerp, Belgium, August 27-31, 2007, pp.2385–2388.

  31. Parlak S, Saraçlar M. Spoken term detection for Turkish broadcast news. In Proc. ICASSP 2008, Las Vegas, USA, March 31-April 4, 2008, pp.5244–5247.

  32. James D A. A system for unrestricted topic retrieval from radio news broadcasts. In Proc. ICASSP 1996, Vol.1, Atlanta, USA, May 7-10, 1994, pp.279–282.

  33. Jones G J F, Foote J T, Spärck Jones K S, Young S J. Retrieving spoken documents by combining multiple index sources. In Proc. the 19th ACM SIGIR, Zurich, Switzerland, August 18-22, 1996, pp.30–38.

  34. Saraclar M, Sproat R. Lattice-based search for spoken utterance retrieval. In Proc. HLT-NAACL 2004, Boston, USA, May 2-7, 2004, pp.129–136.

  35. Iwata K, Shinoda K, Furui S. Robust spoken term detection using combination of phone-based and word-based recognition. In Proc. INTERSPEECH 2008, Brisbane, Australia, September 22-26, 2008, pp.2195–2198.

  36. Yu P, Seide F. A hybrid word/phoneme-based approach for improved vocabulary-independent search in spontaneous speech. In Proc. ICSLP 2004, Jeju, Korea, October 4-8, 2004, pp.293–296.

  37. Yazgan A, Saraclar M. Hybrid language models for out of vocabulary word detection in large vocabulary conversational speech recognition. In Proc. ICASSP 2004, Vol.1, Montreal, Canada, May 17-21, 2004, pp.745–748.

  38. NIST. The spoken term detection (STD) 2006 evaluation plan. National Institute of Standards and Technology (NIST), Gaithersburg, USA, 10 edition, September 2006, http://www.nist.gov/speech/tests/std.

  39. Martin A, Doddington G, Kamm T, Ordowski M, Przybocki M. The DET curve in assessment of detection task performance. In Proc. Eurospeech1997, Vol.4, Rhodes, Greece, September 22-25, 1997, pp.1895–1898.

  40. Wessel F, Macherey K, Schlüter R. Using word probabilities as confidence measures. In Proc. ICASSP 1998, Vol.1, Seattle, Washington, USA, May 12-15, 1998, pp.225–228.

  41. Rohlicek J R, Russell W, Roukos S, Gish H. Continuous hidden Markov modeling for speaker-independent word spotting. In Proc. ICASSP 1989, Glasgow, UK, May 23-26, 1989, pp.627–630.

  42. Cox S, Rose R. Confidence measures for the SWITCHBOARD database. In Proc. ICASSP 1996, Vol.1, Atlanta, USA, May 7-10, 1996, pp.511–514.

  43. Weintraub M. LVCSR log-likelihood ratio scoring for keyword spotting. In Proc. ICASSP 1995, Vol.1, Detroit, USA, May 9-12, 1995, pp.297–300.

  44. Setlur A R, Sukkar R A, Jacob J. Correcting recognition errors via discriminative utterance verification. In Proc. ICSLP 1996, Philadelphia, USA, October 1996, pp.602–605.

  45. James D A, Young S J. A fast lattice-based approach to vocabulary independent wordspotting. In Proc. ICASSP 1994, Vol.1, Adelaide, Australia, April 19-22, 1994, pp.377–380.

  46. Kemp T, Schaaf T. Estimating confidence using word lattices. In Proc. EUROSPEECH1997, Rhodes, Greece, September 22-25, 1997, pp.827–830.

  47. Rahim M G, Lee C H, Juang B H. Discriminative utterance verification for connected digits recognition. IEEE Transactions on Speech and Audio Processing, 1997, 5(3): 266–277.

    Article  Google Scholar 

  48. Sukkar R A. Subword-based minimum verification error (SB-MVE) training for task independent utterance verification. In Proc. ICASSP 1998, Vol.1, Seattle, USA, May 12-15, 1998, pp.229–232

  49. Gillick L, Ito Y, Young J. A probabilistic approach to confidence estimation and evaluation. In Proc. ICASSP 1997, Munich, Germany, April 21-24, 1997, pp.879–882.

  50. Siu M, Gish H, Richardson F. Improved estimation, evaluation and applications of confidence measures for speech recognition. In Proc. EUROSPEECH1997, Rhodes, Greece, September 22-25, 1997, pp.831–834.

  51. Chase L. Word and acoustic confidence annotation for large vocabulary speech recognition. In Proc. EUROSPEECH 1997, Rhodes, Greece, September 22-25, 1997, pp.815–818.

  52. Hauptmann A G, Jones R E, Seymore K, Slattery S T, Witbrock M J, Siegler M A. Experiments in information retrieval from spoken documents. In Proc. DARPA Workshop on Broadcast News Transcription and Understanding, Lansdowne, USA, February 8-11, 1998, pp.175–181.

  53. Kamppari S O, Hazen T J. Word and phone level acoustic confidence scoring. In Proc. ICASSP 2000, Vol.3, Istanbul, Turkey, June 5-9, 2000, pp.1799–1802.

  54. Äbrego G A H. Confidence measures for speech recognition and utterance verification [PhD thesis]. Polytechnic of Cataluña, March 2000.

  55. Zhang R, Rudnicky A I. Word level confidence annotation using combinations of features. In Proc. EUROSPEECH2001, Aalborg, Denmark, September 3-7, 2001, pp.2105–2108.

  56. Sudoh K, Tsukada H, Isozaki H. Discriminative named entity recognition of speech data using speech recognition confidence. In Proc. ICSLP 2006, Pittsburgh, USA, September 17-21, 2006, pp.1153–1156.

  57. Shafran Z, Roark B, Fisher S. OGI spoken term detection system. In Proc. NIST Spoken Term Detection Workshop (STD 2006), Gaithersburg, USA, December 14-15, 2006, pp.1–15.

  58. Jiang H. Confidence measures for speech recognition: A survey. Speech Communication 2005, 45(4): 455–470.

    Article  Google Scholar 

  59. Siu M, Gish H. Evaluation of word confidence for speech recognition systems. Computer Speech and Language, 1999, 13(4): 299–319.

    Article  Google Scholar 

  60. Mathan L, Miclet L. Rejection of extraneous input in speech recognition applications, using multi-layer perceptrons and the trace of HMMs. In Proc. ICASSP 1991, Vol.1, Toronto, Canada, April 14-17, 1991, pp.93–96.

  61. Neti C V, Roukos S, Eide E. Word-based confidence measures as a guide for stack search in speech recognition. In Proc. ICASSP 1997, Munich, Germany, April 21-24, 1997, pp.883–886.

  62. Bishop C M. Neural Networks for Pattern Recognition. Oxford University Press, 1995.

  63. Wang D, King S, Frankel J. Stochastic pronunciation modeling for out-of-vocabulary spoken term detection. IEEE Trans. Audio, Speech, and Language Processing, 2011, 19(4): 688–698.

    Article  Google Scholar 

  64. Hain T, Burget L, Dines J, Garau G, Karafiat M, Lincoln M, Vepa J, Wan V. The AMI meeting transcription system: Progress and performance. In Lecture Notes in Computer Science 4299, Renals S et al. (eds.), Springer Berlin/Heidelberg, 2006, pp.419–431.

  65. Deligne S, Yvon F, Bimbot F. Variable-length sequence matching for phonetic transcription using joint multigrams. In Proc. EUROSPEECH1995, Madrid, Spain, September 18-21, 1995, pp.2243–2246.

  66. Chang C C, Lin C J. LIBSVM: A library for support vector machines. http://www.csie.ntu.edu.tw/»cjlin/libsvm, 2001.

  67. Liaw A, Wiener M. Classification and regression by random forest. R News, 2002, 2(3): 18–22.

    Google Scholar 

  68. Can D, Saraçlar M. Score distribution based term specific thresholding for spoken term detection. In Proc. NAACL HLT 2009, Boulder, USA, May 31-June 5, 2009, pp.269–272.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dong Wang.

Electronic Supplementary Material

Below is the link to the electronic supplementary material.

(PDF 73.3 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wang, D., Tejedor, J., King, S. et al. Term-Dependent Confidence Normalisation for Out-of-Vocabulary Spoken Term Detection. J. Comput. Sci. Technol. 27, 358–375 (2012). https://doi.org/10.1007/s11390-012-1228-x

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11390-012-1228-x

Keywords

Navigation