skip to main content
research-article

Statistical lattice-based spoken document retrieval

Published: 29 January 2010 Publication History

Abstract

Recent research efforts on spoken document retrieval have tried to overcome the low quality of 1-best automatic speech recognition transcripts, especially in the case of conversational speech, by using statistics derived from speech lattices containing multiple transcription hypotheses as output by a speech recognizer. We present a method for lattice-based spoken document retrieval based on a statistical n-gram modeling approach to information retrieval. In this statistical lattice-based retrieval (SLBR) method, a smoothed statistical model is estimated for each document from the expected counts of words given the information in a lattice, and the relevance of each document to a query is measured as a probability under such a model. We investigate the efficacy of our method under various parameter settings of the speech recognition and lattice processing engines, using the Fisher English Corpus of conversational telephone speech. Experimental results show that our method consistently achieves better retrieval performance than using only the 1-best transcripts in statistical retrieval, outperforms a recently proposed lattice-based vector space retrieval method, and also compares favorably with a lattice-based retrieval method based on the Okapi BM25 model.

References

[1]
Abberley, D., Renals, S., Cook, G., and Robinson, T. 1998. Retrieval of broadcast news documents with the THISL system. In Proceedings of the 7th Text Retrieval Conference (TREC-7). 181--190.
[2]
Allauzen, C., Mohri, M., and Roark, B. 2003. Generalized algorithms for constructing statistical language models. In Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics (ACL). 40--47.
[3]
Allauzen, C., Mohri, M., and Roark, B. 2004a. A general weighted grammar library. In Proceedings of the Conference on Implementation and Application of Automata (CIAA). 23--34.
[4]
Allauzen, C., Mohri, M., and Saraclar, M. 2004b. General indexation of weighted automata—application to spoken utterance retrieval. In Proceedings of the Workshop on Interdisciplinary Approaches to Speech Indexing and Retrieval at HLT-NAACL. B. Ramabhadran and D. Oard, Eds., Association for Computational Linguistics, Boston, MA, 33--40.
[5]
Berger, A. and Lafferty, J. 1999. Information retrieval as statistical translation. In Proceedings of Special Interest Group on Information Retrieval Conference (SIGIR). ACM Press, New York, NY, 222--229.
[6]
Brin, S. and Page, L. 1998. The anatomy of a large-scale hypertextual Web search engine. In Proceedings of the 7th International World-Wide Web Conference (WWW).
[7]
Buckley, C. 1985. Implementation of the SMART information retrieval system. Tech. rep. TR85-686, Cornell University, Ithaca, NY.
[8]
Carmel, D., Amitay, E., Herscovici, M., Maarek, Y. S., Petruschka, Y., and Soffer, A. 2001. Juru at TREC 10—experiments with index pruning. In Proceedings of the 10th Text Retrieval Conference (TREC-10). 228--236.
[9]
Chelba, C. and Acero, A. 2005. Position specific posterior lattices for indexing speech. In Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics (ACL). Association for Computational Linguistics, Ann Arbor, MI, 443--450.
[10]
Chelba, C., Silva, J., and Acero, A. 2007. Soft indexing of speech content for search in spoken documents. Comput. Speech Lang. 21, 3, 458--478.
[11]
Chen, B., Wang, H.-M., and Lee, L.-S. 2004. A discriminative HMM/N-gram-based retrieval approach for Mandarin spoken documents. ACM Trans. Asian Lang. Inform. Process. 3, 2, 128--145.
[12]
Chia, T. K., Li, H., and Ng, H. T. 2007. A statistical language modeling approach to lattice-based spoken document retrieval. In Proceedings of the Joint Meeting of the Conference on Empirical Methods in Natural Language Processing and the Conference on Natural Language Learning (EMNLP-CoNLL). 810--818.
[13]
Church, K. W. 2003. Speech and language processing: Where have we been and where are we going? In Proceedings of Eurospeech. 1--4.
[14]
Evermann, G., Chan, H. Y., Gales, M. J. F., Hain, T., Liu, X., Mrva, D., Wang, L., and Woodland, P. C. 2004a. Development of the 2003 CU-HTK conversational telephone speech transcription system. In Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP). Vol. 1. 249--252.
[15]
Evermann, G., Chan, H. Y., Gales, M. J. F., Jia, B., Liu, X., Mrva, D., Sim, K. C., Wang, L., Woodland, P. C., and Yu, K. 2004b. Development of the 2004 CU-HTK English CTS systems using more than two thousand hours of data. In Proceedings of the Fall DARPA Rich Transcription Workshop (RT-04f).
[16]
Forney, G. D. 1973. The Viterbi algorithm. In Proc. IEEE 61. 268--278.
[17]
Gauvain, J.-L., Lamel, L., Barras, C., Adda, G., and de Kercadio, Y. 2000. The LIMSI SDR system for TREC-9. In Proceedings of the 9th Text Retrieval Conference (TREC-9). 335--341.
[18]
Harter, S. P. 1975. A probabilistic approach to automatic keyword indexing. Part I: On the distribution of specialty words in a technical literature. J. Amer. Soc. Inform. Sci. 26, 4, 197--206.
[19]
Hatch, A., Peskin, B., and Stolcke, A. 2005. Improved phonetic speaker recognition using lattice decoding. In Proceedings of IEEE ICASSP. 169--172.
[20]
Hiemstra, D. 1998. A linguistically motivated probabilistic model of information retrieval. In Proceedings of the 2nd European Conference on Research and Advanced Technology for Digital Libraries. Springer-Verlag, Berlin, Germany, 569--584.
[21]
Hiemstra, D. and Kraaij, W. 1998. Twenty-One at TREC-7: Ad-hoc and cross-language track. In Proceedings of the 7th Text Retrieval Conference (TREC-7). 174--185.
[22]
James, D. A. 1995. The application of classical information retrieval techniques to spoken documents. Ph.D. thesis, University of Cambridge, UK.
[23]
James, D. A. and Young, S. J. 1994. A fast lattice-based approach to vocabulary independent wordspotting. In Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP). 377--380.
[24]
Jelinek, F. and Mercer, R. L. 1980. Interpolated estimation of Markov source parameters from sparse data. In Proceedings of the Workshop on Pattern Recognition in Practice. 381--397.
[25]
Jones, G. J. F., Foote, J. T., Jones, K. S., and Young, S. J. 1996. Retrieving spoken documents by combining multiple index sources. In Proceedings of the Special Interest Group on Information Retrieval (SIGIR) Conference. ACM Press, New York, NY, 30--38.
[26]
Lafferty, J. and Zhai, C. 2001. Document language models, query models, and risk minimization for information retrieval. In Proceedings of the Special Interest Group on Information Retrieval (SIGIR) Conference. ACM, New York, NY, 111--119.
[27]
MacKay, D. J. C. and Peto, L. C. B. 1994. A hierarchical Dirichlet language model. Nat. Lang. Eng. 1, 3, 1--19.
[28]
Mamou, J., Carmel, D., and Hoory, R. 2006. Spoken document retrieval from call-center conversations. In Proceedings of the Special Interest Group on Information Retrieval (SIGIR) Conference. ACM Press, New York, NY, 51--58.
[29]
Mangu, L., Brill, E., and Stolcke, A. 2000. Finding consensus in speech recognition: word error minimization and other applications of confusion networks. Comput. Speech Lang. 14, 4, 373--400.
[30]
Mohri, M., Pereira, F., and Riley, M. 2000. The design principles of a weighted finite-state transducer library. Theor. Comput. Sci. 231, 17--32.
[31]
NIST. 2000. TREC-9 SDR track Web site. http://www.nist.gov/speech/tests/sdr/sdr2000/sdr2000.htm.
[32]
Ponte, J. M. and Croft, W. B. 1998. A language modeling approach to information retrieval. In Proceedings of the Special Interest Group on Information Retrieval (SIGIR) Conference. ACM Press, New York, NY, 275--281.
[33]
Porter, M. F. 1980. An algorithm for suffix stripping. Program 14, 3, 130--137.
[34]
Rabiner, L. R. 1989. A tutorial on hidden Markov models and selected applications in speech recognition. Proc. IEEE 77, 2, 257--286.
[35]
Robertson, S. E. and Spärck Jones, K. 1976. Relevance weighting of search terms. J. Amer. Soc. Inform. Sci. 27, 129--146.
[36]
Robertson, S. E., van Rijsbergen, C. J., and Porter, M. F. 1980. Probabilistic models of indexing and searching. In Proceedings of the Special Interest Group on Information Retrieval (SIGIR) Conference. 35--56.
[37]
Robertson, S. E. and Walker, S. 1994. Some simple effective approximations to the 2-Poisson model for probabilistic weighted retrieval. In Proceedings of the Special Interest Group on Information Retrieval (SIGIR) Conference. Springer-Verlag New York, Inc., New York, NY, 232--241.
[38]
Robertson, S. E., Walker, S., and Hancock-Beaulieu, M. 1998. Okapi at TREC-7: Automatic ad hoc, filtering, VLC and interactive. In Proceedings of the 7th Text Retrieval Conference (TREC-7). 199--210.
[39]
Salton, G. and Buckley, C. 1988. Term-weighting approaches in automatic text retrieval. Inform. Process. Manag. 24, 5, 513--523.
[40]
Saraclar, M. and Sproat, R. 2004. Lattice-based search for spoken utterance retrieval. In Proceedings of HLT-NAACL. North American Association for Computational Linguistics, Boston, MA, 129--136.
[41]
Shafran, I. and Rose, R. 2003. Robust speech detection and segmentation for real-time ASR applications. In Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP). Vol. 1. 432--435.
[42]
Siegler, M. A. 1999. Integration of continuous speech recognition and information retrieval for mutually optimal performance. Ph.D. thesis, Carnegie Mellon University.
[43]
Siegler, M. A., Berger, A., Witbrock, M., and Hauptmann, A. 1998. Experiments in spoken document retrieval at CMU. In Proceedings of the 7th Text Retrieval Conference (TREC-7). 319--326.
[44]
Song, F. and Croft, W. B. 1999. A general language model for information retrieval. In Proceedings of Conference on Information and Knowledge Management (CIKM). ACM Press, New York, NY, 316--321.
[45]
Stolcke, A. 2002. SRILM—an extensible language modeling toolkit. In Proceedings of the 7th International Conference on Spoken Language Processing (ICSLP). Vol. 2. 901--904.
[46]
Turunen, V. T. and Kurimo, M. 2007. Indexing confusion networks for morph-based spoken document retrieval. In Proceedings of the Special Interest Group on Information Retrieval Conference (SIGIR). 631--638.
[47]
Weng, F., Stolcke, A., and Sankar, A. 1998. Efficient lattice representation and generation. In Proceedings of the International Conference on Spoken Language Processing (ICSLP). Vol. 6. 2531--2534.
[48]
Young, S., Evermann, G., Gales, M., Hain, T., Kershaw, D., Liu, X., Moore, G., Odell, J., Ollason, D., Povey, D., Valtchev, V., and Woodland, P. 2006. The HTK Book (HTK Version 3.4). Cambridge University Press, Cambridge, UK.
[49]
Young, S. J., Russell, N. H., and Thornton, J. H. S. 1989. Token passing: a simple conceptual model for connected speech recognition systems. Tech. rep. F/INFENG/TR.38, Cambridge University Engineering Department, UK.
[50]
Yu, P., Chen, K., Lu, L., and Seide, F. 2005. Searching the audio notebook: keyword search in recorded conversation. In Proceedings of HLT/EMNLP. Association for Computational Linguistics, Vancouver, 947--954.
[51]
Yu, P. and Seide, F. 2005. Fast two-stage vocabulary-independent search in spontaneous speech. In Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP). Vol. 1. 481--484.
[52]
Zhai, C. and Lafferty, J. 2004. A study of smoothing methods for language models applied to information retrieval. ACM Trans. Inform. Syst. 22, 2, 179--214.

Cited By

View all
  • (2023)Lexicon-based probabilistic indexing of handwritten text imagesNeural Computing and Applications10.1007/s00521-023-08620-y35:24(17501-17520)Online publication date: 10-May-2023
  • (2021)Spoken Term Detection for Persian News of Islamic Republic of Iran BroadcastingSignal and Data Processing10.29252/jsdp.17.4.6717:4(67-88)Online publication date: 1-Feb-2021
  • (2018)Interactive Spoken Content Retrieval by Deep Reinforcement LearningIEEE/ACM Transactions on Audio, Speech and Language Processing10.1109/TASLP.2018.285273926:12(2447-2459)Online publication date: 1-Dec-2018
  • Show More Cited By

Index Terms

  1. Statistical lattice-based spoken document retrieval

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Information Systems
    ACM Transactions on Information Systems  Volume 28, Issue 1
    January 2010
    157 pages
    ISSN:1046-8188
    EISSN:1558-2868
    DOI:10.1145/1658377
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 29 January 2010
    Accepted: 01 February 2009
    Revised: 01 October 2008
    Received: 01 March 2008
    Published in TOIS Volume 28, Issue 1

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Lattice-based spoken document retrieval
    2. probabilistic retrieval approach
    3. retrieval of conversational speech

    Qualifiers

    • Research-article
    • Research
    • Refereed

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)8
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 15 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)Lexicon-based probabilistic indexing of handwritten text imagesNeural Computing and Applications10.1007/s00521-023-08620-y35:24(17501-17520)Online publication date: 10-May-2023
    • (2021)Spoken Term Detection for Persian News of Islamic Republic of Iran BroadcastingSignal and Data Processing10.29252/jsdp.17.4.6717:4(67-88)Online publication date: 1-Feb-2021
    • (2018)Interactive Spoken Content Retrieval by Deep Reinforcement LearningIEEE/ACM Transactions on Audio, Speech and Language Processing10.1109/TASLP.2018.285273926:12(2447-2459)Online publication date: 1-Dec-2018
    • (2016)Exploring the use of unsupervised query modeling techniques for speech recognition and summarizationSpeech Communication10.1016/j.specom.2016.03.00680:C(49-59)Online publication date: 1-Jun-2016
    • (2016)HMM word graph based keyword spotting in handwritten document imagesInformation Sciences: an International Journal10.1016/j.ins.2016.07.063370:C(497-518)Online publication date: 20-Nov-2016
    • (2015)Spoken content retrievalIEEE/ACM Transactions on Audio, Speech and Language Processing10.1109/TASLP.2015.243854323:9(1389-1420)Online publication date: 1-Sep-2015
    • (2014)Improved Semantic Retrieval of Spoken Content by Document/Query Expansion with Random Walk Over Acoustic Similarity GraphsIEEE/ACM Transactions on Audio, Speech and Language Processing10.1109/TASLP.2013.228546922:1(80-94)Online publication date: 1-Jan-2014
    • (2014)Discriminative score normalization for keyword search decision2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)10.1109/ICASSP.2014.6854973(7078-7082)Online publication date: May-2014
    • (2014)Exploring speech retrieval from meetings using the AMI corpusComputer Speech & Language10.1016/j.csl.2013.12.00528:5(1021-1044)Online publication date: Sep-2014
    • (2013)Effective pseudo-relevance feedback for spoken document retrieval2013 IEEE International Conference on Acoustics, Speech and Signal Processing10.1109/ICASSP.2013.6639331(8535-8539)Online publication date: May-2013
    • Show More Cited By

    View Options

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media