skip to main content
10.1145/3477495.3531921acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
short-paper
Public Access

Expanded Lattice Embeddings for Spoken Document Retrieval on Informal Meetings

Published: 07 July 2022 Publication History

Abstract

In this paper, we evaluate different alternatives to process richer forms of Automatic Speech Recognition (ASR) output based on lattice expansion algorithms for Spoken Document Retrieval (SDR). Typically, SDR systems employ ASR transcripts to index and retrieve relevant documents. However, ASR errors negatively affect the retrieval performance. Multiple alternative hypotheses can also be used to augment the input to document retrieval to compensate for the erroneous one-best hypothesis. In Weighted Finite State Transducer-based ASR systems, using the n-best output (i.e. the top "n'' scoring hypotheses) for the retrieval task is common, since they can easily be fed to a traditional Information Retrieval (IR) pipeline. However, the n-best hypotheses are terribly redundant, and do not sufficiently encapsulate the richness of the ASR output, which is represented as an acyclic directed graph called the lattice. In particular, we utilize the lattice's constrained minimum path cover to generate a minimum set of hypotheses that serve as input to the reranking phase of IR. The novelty of our proposed approach is the incorporation of the lattice as an input for neural reranking by considering a set of hypotheses that represents every arc in the lattice. The obtained hypotheses are encoded through sentence embeddings using BERT-based models, namely SBERT and RoBERTa, and the final ranking of the retrieved segments is obtained with a max-pooling operation over the computed scores among the input query and the hypotheses set. We present our evaluation on the publicly available AMI meeting corpus. Our results indicate that the proposed use of hypotheses from the expanded lattice improves the SDR performance significantly over the n-best ASR output.

Supplementary Material

MP4 File (SIGIR22-sp2160.mp4)
We present an extensive analysis of different alternatives to process richer forms of ASR output for SDR on the AMI corpus. Our work has three salient features: (1) a neural reranking approach based on expanded lattice embeddings space; we evaluate and report IR results using two different alternatives to use information in the lattice, (2) a new baseline on the AMI corpus; we present our results on the AMI corpus prepared for SDR and establish a baseline for this dataset with ASR model trained on out-of-domain data, (3) we also do not use any part of the target dataset to train the IR reranker, allowing the possibility of our method to be domain-agnostic.

References

[1]
2004. Fisher English Training Speech Part 1 Transcripts LDC2004T19. (2004).
[2]
Gianni Amati and Cornelis Joost Van Rijsbergen. 2002. Probabilistic Models of Information Retrieval Based on Measuring the Divergence from Randomness. ACM Trans. Inf. Syst. 20, 4 (oct 2002), 357--389. https://doi.org/10.1145/582415. 582416
[3]
Elizabeth Boschee et al. 2019. SARAL: A low-resource cross-lingual domainfocused information retrieval system for effective rapid document triage. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: System Demonstrations. 19--24.
[4]
Jean Carletta. 2007. Unleashing the killer corpus: experiences in creating the multi-everything AMI Meeting Corpus. Language Resources and Evaluation 41, 2 (2007), 181--190.
[5]
Maria Eskevich and Gareth J.F. Jones. 2014. Exploring speech retrieval from meetings using the AMI corpus. Computer Speech & Language 28, 5 (2014), 1021--1044. https://doi.org/10.1016/j.csl.2013.12.005
[6]
Parisa Haghani et al. 2018. From audio to semantics: Approaches to end-toend spoken language understanding. In 2018 IEEE Spoken Language Technology Workshop (SLT). IEEE, 720--726.
[7]
Sebastian Hofstätter, Sheng-Chieh Lin, Jheng-Hong Yang, Jimmy Lin, and Allan Hanbury. 2021. Efficiently Teaching an Effective Dense Retriever with Balanced Topic Aware Sampling. In Proc. of SIGIR.
[8]
Rosie Jones, Ben Carterette, Ann Clifton, Maria Eskevich, Gareth JF Jones, Jussi Karlgren, Aasish Pappu, Sravana Reddy, and Yongze Yu. 2021. Trec 2020 podcasts track overview. arXiv preprint arXiv:2103.15953 (2021).
[9]
Omar Khattab and Matei Zaharia. 2020. ColBERT: Efficient and Effective Passage Search via Contextualized Late Interaction over BERT. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval (Virtual Event, China) (SIGIR '20). Association for Computing Machinery, New York, NY, USA, 39--48. https://doi.org/10.1145/3397271.3401075
[10]
Ke Li, Daniel Povey, and Sanjeev Khudanpur. 2021. A parallelizable lattice rescoring strategy with neural language models. In ICASSP 2021--2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 6518--6522.
[11]
Xunying Liu, Xie Chen, Yongqiang Wang, Mark JF Gales, and Philip C Woodland. 2016. Two efficient lattice rescoring methods using recurrent neural network language models. IEEE/ACM Transactions on Audio, Speech, and Language Processing 24, 8 (2016), 1438--1449.
[12]
Xunying Liu, Yongqiang Wang, Xie Chen, Mark JF Gales, and Philip C Woodland. 2014. Efficient lattice rescoring using recurrent neural network language models. In 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 4908--4912.
[13]
Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. RoBERTa: A Robustly Optimized BERT Pretraining Approach. arXiv:1907.11692 [cs.CL]
[14]
Sean MacAvaney, Andrew Yates, Arman Cohan, and Nazli Goharian. 2019. CEDR: Contextualized Embeddings for Document Ranking. In Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval (Paris, France) (SIGIR'19). Association for Computing Machinery, New York, NY, USA, 1101--1104. https://doi.org/10.1145/3331184.3331317
[15]
Craig Macdonald and Nicola Tonellotto. 2020. Declarative Experimentation inInformation Retrieval using PyTerrier. In Proceedings of ICTIR 2020.
[16]
Craig Macdonald and Nicola Tonellotto. 2021. On Approximate Nearest Neighbour Selection for Multi-Stage Dense Retrieval. Proceedings of the 30th ACM International Conference on Information & Knowledge Management (Oct 2021). https://doi.org/10.1145/3459637.3482156
[17]
Srikanth Madikeri et al. 2020. Pkwrap: a pytorch package for lf-mmi training of acoustic models. arXiv preprint arXiv:2010.03466 (2020).
[18]
Lidia Mangu, Eric Brill, and Andreas Stolcke. 2000. Finding consensus in speech recognition: word error minimization and other applications of confusion networks. Computer Speech & Language 14, 4 (2000), 373--400.
[19]
Rodrigo Nogueira and Kyunghyun Cho. 2020. Passage Re-ranking with BERT. arXiv:1901.04085 [cs.IR]
[20]
Vassil Panayotov, Guoguo Chen, Daniel Povey, and Sanjeev Khudanpur. 2015. Librispeech: an asr corpus based on public domain audio books. In 2015 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, 5206--5210.
[21]
Adam Paszke et al. 2019. Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems 32 (2019), 8026--8037.
[22]
Daniel Povey et al. 2016. Purely sequence-trained neural networks for ASR based on lattice-free MMI. In Interspeech. 2751--2755.
[23]
Daniel Povey et al. 2018. Semi-Orthogonal Low-Rank Matrix Factorization for Deep Neural Networks. In Interspeech. 3743--3747.
[24]
Daniel Povey, Arnab Ghoshal, Gilles Boulianne, Lukas Burget, Ondrej Glembek, Nagendra Goel, Mirko Hannemann, Petr Motlicek, Yanmin Qian, Petr Schwarz, et al. 2011. The Kaldi speech recognition toolkit. In IEEE 2011 workshop on automatic speech recognition and understanding. IEEE Signal Processing Society.
[25]
Daniel Povey, Mirko Hannemann, Gilles Boulianne, Luká? Burget, Arnab Ghoshal, Milo? Janda, Martin Karafiát, Stefan Kombrink, Petr Motlíek, Yanmin Qian, et al. 2012. Generating exact lattices in the WFST framework. In 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 4213--4216.
[26]
Yifan Qiao, Chenyan Xiong, Zhenghao Liu, and Zhiyuan Liu. 2019. Understanding the Behaviors of BERT in Ranking. arXiv:1904.07531 [cs.IR]
[27]
Nils Reimers and Iryna Gurevych. 2019. Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics. https://arxiv.org/abs/1908.10084
[28]
Stephen Robertson and Hugo Zaragoza. 2009. The probabilistic relevance framework: BM25 and beyond. Now Publishers Inc.
[29]
Nicholas Ruiz, Mattia Antonino Di Gangi, Nicola Bertoldi, and Marcello Federico. 2017. Assessing the Tolerance of Neural Machine Translation Systems Against Speech Recognition Errors. Interspeech 2017 (Aug 2017). https://doi.org/10. 21437/interspeech.2017--1690
[30]
Ville T Turunen and Mikko Kurimo. 2007. Indexing confusion networks for morph-based spoken document retrieval. In Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval. 631--638.
[31]
Esaú Villatoro-Tello, Antonio Juárez-González, Manuel Montes-y Gómez, Luis Villaseñor-Pineda, and L. Enrique Sucar. 2012. Document ranking refinement using a Markov random field model. Natural Language Engineering 18, 2 (2012), 155--185. https://doi.org/10.1017/S1351324912000010
[32]
Lee Xiong, Chenyan Xiong, Ye Li, Kwok-Fung Tang, Jialin Liu, Paul Bennett, Junaid Ahmed, and Arnold Overwijk. 2020. Approximate Nearest Neighbor Negative Contrastive Learning for Dense Text Retrieval. arXiv:2007.00808 [cs.IR]
[33]
Hainan Xu, Tongfei Chen, Dongji Gao, Yiming Wang, Ke Li, Nagendra Goel, Yishay Carmiel, Daniel Povey, and Sanjeev Khudanpur. 2018. A pruned rnnlm lattice-rescoring algorithm for automatic speech recognition. In 2018 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, 5929--5933.
[34]
Wei Yang, Haotian Zhang, and Jimmy Lin. 2019. Simple Applications of BERT for Ad Hoc Document Retrieval. arXiv:1903.10972 [cs.IR]
[35]
Le Zhang et al. 2020. The 2019 bbn cross-lingual information retrieval system. In Proceedings of the workshop on Cross-Language Search and Summarization of Text and Speech (CLSSTS2020). 44--51.

Cited By

View all
  • (2024)Whisper-based spoken term detection systems for search on speech ALBAYZIN evaluation challengeEURASIP Journal on Audio, Speech, and Music Processing10.1186/s13636-024-00334-w2024:1Online publication date: 29-Feb-2024
  • (2024)Probability-Aware Word-Confusion-Network-To-Text Alignment Approach for Intent ClassificationICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)10.1109/ICASSP48485.2024.10445934(12617-12621)Online publication date: 14-Apr-2024
  • (2023)Effectiveness of Text, Acoustic, and Lattice-Based Representations in Spoken Language Understanding TasksICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)10.1109/ICASSP49357.2023.10095168(1-5)Online publication date: 4-Jun-2023

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGIR '22: Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval
July 2022
3569 pages
ISBN:9781450387323
DOI:10.1145/3477495
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 July 2022

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. informal spoken content search
  2. lattice embeddings
  3. lattice expansion
  4. lattice rescoring
  5. neural language models
  6. neural reranker
  7. speech retrieval

Qualifiers

  • Short-paper

Funding Sources

  • IARPA

Conference

SIGIR '22
Sponsor:

Acceptance Rates

Overall Acceptance Rate 792 of 3,983 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)80
  • Downloads (Last 6 weeks)17
Reflects downloads up to 28 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Whisper-based spoken term detection systems for search on speech ALBAYZIN evaluation challengeEURASIP Journal on Audio, Speech, and Music Processing10.1186/s13636-024-00334-w2024:1Online publication date: 29-Feb-2024
  • (2024)Probability-Aware Word-Confusion-Network-To-Text Alignment Approach for Intent ClassificationICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)10.1109/ICASSP48485.2024.10445934(12617-12621)Online publication date: 14-Apr-2024
  • (2023)Effectiveness of Text, Acoustic, and Lattice-Based Representations in Spoken Language Understanding TasksICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)10.1109/ICASSP49357.2023.10095168(1-5)Online publication date: 4-Jun-2023

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media