research-article

Statistical lattice-based spoken document retrieval

Authors:

Hwee Tou NgAuthors Info & Claims

ACM Transactions on Information Systems (TOIS), Volume 28, Issue 1

Article No.: 2, Pages 1 - 30

https://doi.org/10.1145/1658377.1658379

Published: 29 January 2010 Publication History

Abstract

Recent research efforts on spoken document retrieval have tried to overcome the low quality of 1-best automatic speech recognition transcripts, especially in the case of conversational speech, by using statistics derived from speech lattices containing multiple transcription hypotheses as output by a speech recognizer. We present a method for lattice-based spoken document retrieval based on a statistical n-gram modeling approach to information retrieval. In this statistical lattice-based retrieval (SLBR) method, a smoothed statistical model is estimated for each document from the expected counts of words given the information in a lattice, and the relevance of each document to a query is measured as a probability under such a model. We investigate the efficacy of our method under various parameter settings of the speech recognition and lattice processing engines, using the Fisher English Corpus of conversational telephone speech. Experimental results show that our method consistently achieves better retrieval performance than using only the 1-best transcripts in statistical retrieval, outperforms a recently proposed lattice-based vector space retrieval method, and also compares favorably with a lattice-based retrieval method based on the Okapi BM25 model.

References

[1]

Abberley, D., Renals, S., Cook, G., and Robinson, T. 1998. Retrieval of broadcast news documents with the THISL system. In Proceedings of the 7th Text Retrieval Conference (TREC-7). 181--190.

[2]

Allauzen, C., Mohri, M., and Roark, B. 2003. Generalized algorithms for constructing statistical language models. In Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics (ACL). 40--47.

Digital Library

[3]

Allauzen, C., Mohri, M., and Roark, B. 2004a. A general weighted grammar library. In Proceedings of the Conference on Implementation and Application of Automata (CIAA). 23--34.

Digital Library

[4]

Allauzen, C., Mohri, M., and Saraclar, M. 2004b. General indexation of weighted automata—application to spoken utterance retrieval. In Proceedings of the Workshop on Interdisciplinary Approaches to Speech Indexing and Retrieval at HLT-NAACL. B. Ramabhadran and D. Oard, Eds., Association for Computational Linguistics, Boston, MA, 33--40.

Digital Library

[5]

Berger, A. and Lafferty, J. 1999. Information retrieval as statistical translation. In Proceedings of Special Interest Group on Information Retrieval Conference (SIGIR). ACM Press, New York, NY, 222--229.

Digital Library

[6]

Brin, S. and Page, L. 1998. The anatomy of a large-scale hypertextual Web search engine. In Proceedings of the 7th International World-Wide Web Conference (WWW).

Digital Library

[7]

Buckley, C. 1985. Implementation of the SMART information retrieval system. Tech. rep. TR85-686, Cornell University, Ithaca, NY.

Digital Library

[8]

Carmel, D., Amitay, E., Herscovici, M., Maarek, Y. S., Petruschka, Y., and Soffer, A. 2001. Juru at TREC 10—experiments with index pruning. In Proceedings of the 10th Text Retrieval Conference (TREC-10). 228--236.

[9]

Chelba, C. and Acero, A. 2005. Position specific posterior lattices for indexing speech. In Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics (ACL). Association for Computational Linguistics, Ann Arbor, MI, 443--450.

Digital Library

[10]

Chelba, C., Silva, J., and Acero, A. 2007. Soft indexing of speech content for search in spoken documents. Comput. Speech Lang. 21, 3, 458--478.

Digital Library

[11]

Chen, B., Wang, H.-M., and Lee, L.-S. 2004. A discriminative HMM/N-gram-based retrieval approach for Mandarin spoken documents. ACM Trans. Asian Lang. Inform. Process. 3, 2, 128--145.

Digital Library

[12]

Chia, T. K., Li, H., and Ng, H. T. 2007. A statistical language modeling approach to lattice-based spoken document retrieval. In Proceedings of the Joint Meeting of the Conference on Empirical Methods in Natural Language Processing and the Conference on Natural Language Learning (EMNLP-CoNLL). 810--818.

[13]

Church, K. W. 2003. Speech and language processing: Where have we been and where are we going? In Proceedings of Eurospeech. 1--4.

[14]

Evermann, G., Chan, H. Y., Gales, M. J. F., Hain, T., Liu, X., Mrva, D., Wang, L., and Woodland, P. C. 2004a. Development of the 2003 CU-HTK conversational telephone speech transcription system. In Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP). Vol. 1. 249--252.

[15]

Evermann, G., Chan, H. Y., Gales, M. J. F., Jia, B., Liu, X., Mrva, D., Sim, K. C., Wang, L., Woodland, P. C., and Yu, K. 2004b. Development of the 2004 CU-HTK English CTS systems using more than two thousand hours of data. In Proceedings of the Fall DARPA Rich Transcription Workshop (RT-04f).

[16]

Forney, G. D. 1973. The Viterbi algorithm. In Proc. IEEE 61. 268--278.

[17]

Gauvain, J.-L., Lamel, L., Barras, C., Adda, G., and de Kercadio, Y. 2000. The LIMSI SDR system for TREC-9. In Proceedings of the 9th Text Retrieval Conference (TREC-9). 335--341.

[18]

Harter, S. P. 1975. A probabilistic approach to automatic keyword indexing. Part I: On the distribution of specialty words in a technical literature. J. Amer. Soc. Inform. Sci. 26, 4, 197--206.

[19]

Hatch, A., Peskin, B., and Stolcke, A. 2005. Improved phonetic speaker recognition using lattice decoding. In Proceedings of IEEE ICASSP. 169--172.

[20]

Hiemstra, D. 1998. A linguistically motivated probabilistic model of information retrieval. In Proceedings of the 2nd European Conference on Research and Advanced Technology for Digital Libraries. Springer-Verlag, Berlin, Germany, 569--584.

Digital Library

[21]

Hiemstra, D. and Kraaij, W. 1998. Twenty-One at TREC-7: Ad-hoc and cross-language track. In Proceedings of the 7th Text Retrieval Conference (TREC-7). 174--185.

[22]

James, D. A. 1995. The application of classical information retrieval techniques to spoken documents. Ph.D. thesis, University of Cambridge, UK.

[23]

James, D. A. and Young, S. J. 1994. A fast lattice-based approach to vocabulary independent wordspotting. In Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP). 377--380.

[24]

Jelinek, F. and Mercer, R. L. 1980. Interpolated estimation of Markov source parameters from sparse data. In Proceedings of the Workshop on Pattern Recognition in Practice. 381--397.

[25]

Jones, G. J. F., Foote, J. T., Jones, K. S., and Young, S. J. 1996. Retrieving spoken documents by combining multiple index sources. In Proceedings of the Special Interest Group on Information Retrieval (SIGIR) Conference. ACM Press, New York, NY, 30--38.

Digital Library

[26]

Lafferty, J. and Zhai, C. 2001. Document language models, query models, and risk minimization for information retrieval. In Proceedings of the Special Interest Group on Information Retrieval (SIGIR) Conference. ACM, New York, NY, 111--119.

Digital Library

[27]

MacKay, D. J. C. and Peto, L. C. B. 1994. A hierarchical Dirichlet language model. Nat. Lang. Eng. 1, 3, 1--19.

[28]

Mamou, J., Carmel, D., and Hoory, R. 2006. Spoken document retrieval from call-center conversations. In Proceedings of the Special Interest Group on Information Retrieval (SIGIR) Conference. ACM Press, New York, NY, 51--58.

Digital Library

[29]

Mangu, L., Brill, E., and Stolcke, A. 2000. Finding consensus in speech recognition: word error minimization and other applications of confusion networks. Comput. Speech Lang. 14, 4, 373--400.

Digital Library

[30]

Mohri, M., Pereira, F., and Riley, M. 2000. The design principles of a weighted finite-state transducer library. Theor. Comput. Sci. 231, 17--32.

Digital Library

[31]

NIST. 2000. TREC-9 SDR track Web site. http://www.nist.gov/speech/tests/sdr/sdr2000/sdr2000.htm.

[32]

Ponte, J. M. and Croft, W. B. 1998. A language modeling approach to information retrieval. In Proceedings of the Special Interest Group on Information Retrieval (SIGIR) Conference. ACM Press, New York, NY, 275--281.

Digital Library

[33]

Porter, M. F. 1980. An algorithm for suffix stripping. Program 14, 3, 130--137.

[34]

Rabiner, L. R. 1989. A tutorial on hidden Markov models and selected applications in speech recognition. Proc. IEEE 77, 2, 257--286.

[35]

Robertson, S. E. and Spärck Jones, K. 1976. Relevance weighting of search terms. J. Amer. Soc. Inform. Sci. 27, 129--146.

[36]

Robertson, S. E., van Rijsbergen, C. J., and Porter, M. F. 1980. Probabilistic models of indexing and searching. In Proceedings of the Special Interest Group on Information Retrieval (SIGIR) Conference. 35--56.

Digital Library

[37]

Robertson, S. E. and Walker, S. 1994. Some simple effective approximations to the 2-Poisson model for probabilistic weighted retrieval. In Proceedings of the Special Interest Group on Information Retrieval (SIGIR) Conference. Springer-Verlag New York, Inc., New York, NY, 232--241.

Digital Library

[38]

Robertson, S. E., Walker, S., and Hancock-Beaulieu, M. 1998. Okapi at TREC-7: Automatic ad hoc, filtering, VLC and interactive. In Proceedings of the 7th Text Retrieval Conference (TREC-7). 199--210.

[39]

Salton, G. and Buckley, C. 1988. Term-weighting approaches in automatic text retrieval. Inform. Process. Manag. 24, 5, 513--523.

Digital Library

[40]

Saraclar, M. and Sproat, R. 2004. Lattice-based search for spoken utterance retrieval. In Proceedings of HLT-NAACL. North American Association for Computational Linguistics, Boston, MA, 129--136.

[41]

Shafran, I. and Rose, R. 2003. Robust speech detection and segmentation for real-time ASR applications. In Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP). Vol. 1. 432--435.

[42]

Siegler, M. A. 1999. Integration of continuous speech recognition and information retrieval for mutually optimal performance. Ph.D. thesis, Carnegie Mellon University.

Digital Library

[43]

Siegler, M. A., Berger, A., Witbrock, M., and Hauptmann, A. 1998. Experiments in spoken document retrieval at CMU. In Proceedings of the 7th Text Retrieval Conference (TREC-7). 319--326.

[44]

Song, F. and Croft, W. B. 1999. A general language model for information retrieval. In Proceedings of Conference on Information and Knowledge Management (CIKM). ACM Press, New York, NY, 316--321.

Digital Library

[45]

Stolcke, A. 2002. SRILM—an extensible language modeling toolkit. In Proceedings of the 7th International Conference on Spoken Language Processing (ICSLP). Vol. 2. 901--904.

[46]

Turunen, V. T. and Kurimo, M. 2007. Indexing confusion networks for morph-based spoken document retrieval. In Proceedings of the Special Interest Group on Information Retrieval Conference (SIGIR). 631--638.

Digital Library

[47]

Weng, F., Stolcke, A., and Sankar, A. 1998. Efficient lattice representation and generation. In Proceedings of the International Conference on Spoken Language Processing (ICSLP). Vol. 6. 2531--2534.

[48]

Young, S., Evermann, G., Gales, M., Hain, T., Kershaw, D., Liu, X., Moore, G., Odell, J., Ollason, D., Povey, D., Valtchev, V., and Woodland, P. 2006. The HTK Book (HTK Version 3.4). Cambridge University Press, Cambridge, UK.

[49]

Young, S. J., Russell, N. H., and Thornton, J. H. S. 1989. Token passing: a simple conceptual model for connected speech recognition systems. Tech. rep. F/INFENG/TR.38, Cambridge University Engineering Department, UK.

[50]

Yu, P., Chen, K., Lu, L., and Seide, F. 2005. Searching the audio notebook: keyword search in recorded conversation. In Proceedings of HLT/EMNLP. Association for Computational Linguistics, Vancouver, 947--954.

Digital Library

[51]

Yu, P. and Seide, F. 2005. Fast two-stage vocabulary-independent search in spontaneous speech. In Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP). Vol. 1. 481--484.

[52]

Zhai, C. and Lafferty, J. 2004. A study of smoothing methods for language models applied to information retrieval. ACM Trans. Inform. Syst. 22, 2, 179--214.

Digital Library

Cited By

Vidal EToselli APuigcerver J(2023)Lexicon-based probabilistic indexing of handwritten text imagesNeural Computing and Applications10.1007/s00521-023-08620-y35:24(17501-17520)Online publication date: 10-May-2023
https://dl.acm.org/doi/10.1007/s00521-023-08620-y
Veisi HGhoreishi SBastanfard A(2021)Spoken Term Detection for Persian News of Islamic Republic of Iran BroadcastingSignal and Data Processing10.29252/jsdp.17.4.6717:4(67-88)Online publication date: 1-Feb-2021
https://doi.org/10.29252/jsdp.17.4.67
Lee HChung PWu YLin TWen T(2018)Interactive Spoken Content Retrieval by Deep Reinforcement LearningIEEE/ACM Transactions on Audio, Speech and Language Processing10.1109/TASLP.2018.285273926:12(2447-2459)Online publication date: 1-Dec-2018
https://dl.acm.org/doi/10.1109/TASLP.2018.2852739
Show More Cited By

Index Terms

Statistical lattice-based spoken document retrieval
1. Information systems
  1. Information retrieval
    1. Retrieval models and ranking

Recommendations

A lattice-based approach to query-by-example spoken document retrieval
SIGIR '08: Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval

Recent efforts on the task of spoken document retrieval (SDR) have made use of speech lattices: speech lattices contain information about alternative speech transcription hypotheses other than the 1-best transcripts, and this information can improve ...
Relevant document retrieval using a spoken document
ISCIT'09: Proceedings of the 9th international conference on Communications and information technologies

In this paper, we proposed a method of retrieving documents from the World Wide Web using a spoken document as a "key." This method can be viewed as a speech version of an ordinary relevant document retrieval, where a text document is used as a query of ...
Content-based language models for spoken document retrieval
IRAL '00: Proceedings of the fifth international workshop on on Information retrieval with Asian languages

Spoken document retrieval (SDR) has been extensively studied in recent years because of its potential use in navigating large multimedia collections in the near future. This paper presents a novel concept of applying the content-based language models to ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Information Systems

ACM Transactions on Information Systems Volume 28, Issue 1

January 2010

157 pages

ISSN:1046-8188

EISSN:1558-2868

DOI:10.1145/1658377

Issue’s Table of Contents

Copyright © 2010 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 29 January 2010

Accepted: 01 February 2009

Revised: 01 October 2008

Received: 01 March 2008

Published in TOIS Volume 28, Issue 1

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

25
Total Citations
View Citations
664
Total Downloads

Downloads (Last 12 months)8
Downloads (Last 6 weeks)0

Reflects downloads up to 15 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Vidal EToselli APuigcerver J(2023)Lexicon-based probabilistic indexing of handwritten text imagesNeural Computing and Applications10.1007/s00521-023-08620-y35:24(17501-17520)Online publication date: 10-May-2023
https://dl.acm.org/doi/10.1007/s00521-023-08620-y
Veisi HGhoreishi SBastanfard A(2021)Spoken Term Detection for Persian News of Islamic Republic of Iran BroadcastingSignal and Data Processing10.29252/jsdp.17.4.6717:4(67-88)Online publication date: 1-Feb-2021
https://doi.org/10.29252/jsdp.17.4.67
Lee HChung PWu YLin TWen T(2018)Interactive Spoken Content Retrieval by Deep Reinforcement LearningIEEE/ACM Transactions on Audio, Speech and Language Processing10.1109/TASLP.2018.285273926:12(2447-2459)Online publication date: 1-Dec-2018
https://dl.acm.org/doi/10.1109/TASLP.2018.2852739
Chen KLiu SChen BWang HChen H(2016)Exploring the use of unsupervised query modeling techniques for speech recognition and summarizationSpeech Communication10.1016/j.specom.2016.03.00680:C(49-59)Online publication date: 1-Jun-2016
https://dl.acm.org/doi/10.1016/j.specom.2016.03.006
Toselli AVidal ERomero VFrinken V(2016)HMM word graph based keyword spotting in handwritten document imagesInformation Sciences: an International Journal10.1016/j.ins.2016.07.063370:C(497-518)Online publication date: 20-Nov-2016
https://dl.acm.org/doi/10.1016/j.ins.2016.07.063
Lee LGlass JLee HChan C(2015)Spoken content retrievalIEEE/ACM Transactions on Audio, Speech and Language Processing10.1109/TASLP.2015.243854323:9(1389-1420)Online publication date: 1-Sep-2015
https://dl.acm.org/doi/10.1109/TASLP.2015.2438543
Hung-Yi Lee Lin-Shan Lee (2014)Improved Semantic Retrieval of Spoken Content by Document/Query Expansion with Random Walk Over Acoustic Similarity GraphsIEEE/ACM Transactions on Audio, Speech and Language Processing10.1109/TASLP.2013.228546922:1(80-94)Online publication date: 1-Jan-2014
https://dl.acm.org/doi/10.1109/TASLP.2013.2285469
Pham VXu HChen NSivadas SLim BChng ELi H(2014)Discriminative score normalization for keyword search decision2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)10.1109/ICASSP.2014.6854973(7078-7082)Online publication date: May-2014
https://doi.org/10.1109/ICASSP.2014.6854973
Eskevich MJones G(2014)Exploring speech retrieval from meetings using the AMI corpusComputer Speech & Language10.1016/j.csl.2013.12.00528:5(1021-1044)Online publication date: Sep-2014
https://doi.org/10.1016/j.csl.2013.12.005
Chen YChen KWang HChen B(2013)Effective pseudo-relevance feedback for spoken document retrieval2013 IEEE International Conference on Acoustics, Speech and Signal Processing10.1109/ICASSP.2013.6639331(8535-8539)Online publication date: May-2013
https://doi.org/10.1109/ICASSP.2013.6639331
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Issue’s Table of Contents