short-paper

Public Access

Expanded Lattice Embeddings for Spoken Document Retrieval on Informal Meetings

Authors:

Esaú Villatoro-Tello,

Srikanth Madikeri,

Petr Motlicek,

Aravind Ganapathiraju,

Alexei V. IvanovAuthors Info & Claims

SIGIR '22: Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval

Pages 2669 - 2674

https://doi.org/10.1145/3477495.3531921

Published: 07 July 2022 Publication History

PDF eReader

Abstract

In this paper, we evaluate different alternatives to process richer forms of Automatic Speech Recognition (ASR) output based on lattice expansion algorithms for Spoken Document Retrieval (SDR). Typically, SDR systems employ ASR transcripts to index and retrieve relevant documents. However, ASR errors negatively affect the retrieval performance. Multiple alternative hypotheses can also be used to augment the input to document retrieval to compensate for the erroneous one-best hypothesis. In Weighted Finite State Transducer-based ASR systems, using the n-best output (i.e. the top "n'' scoring hypotheses) for the retrieval task is common, since they can easily be fed to a traditional Information Retrieval (IR) pipeline. However, the n-best hypotheses are terribly redundant, and do not sufficiently encapsulate the richness of the ASR output, which is represented as an acyclic directed graph called the lattice. In particular, we utilize the lattice's constrained minimum path cover to generate a minimum set of hypotheses that serve as input to the reranking phase of IR. The novelty of our proposed approach is the incorporation of the lattice as an input for neural reranking by considering a set of hypotheses that represents every arc in the lattice. The obtained hypotheses are encoded through sentence embeddings using BERT-based models, namely SBERT and RoBERTa, and the final ranking of the retrieved segments is obtained with a max-pooling operation over the computed scores among the input query and the hypotheses set. We present our evaluation on the publicly available AMI meeting corpus. Our results indicate that the proposed use of hypotheses from the expanded lattice improves the SDR performance significantly over the n-best ASR output.

Supplementary Material

MP4 File (SIGIR22-sp2160.mp4)

We present an extensive analysis of different alternatives to process richer forms of ASR output for SDR on the AMI corpus. Our work has three salient features: (1) a neural reranking approach based on expanded lattice embeddings space; we evaluate and report IR results using two different alternatives to use information in the lattice, (2) a new baseline on the AMI corpus; we present our results on the AMI corpus prepared for SDR and establish a baseline for this dataset with ASR model trained on out-of-domain data, (3) we also do not use any part of the target dataset to train the IR reranker, allowing the possibility of our method to be domain-agnostic.

Download
16.21 MB

References

[1]

2004. Fisher English Training Speech Part 1 Transcripts LDC2004T19. (2004).

Abstract

Supplementary Material

References

Cited By

Index Terms

Recommendations

Statistical lattice-based spoken document retrieval

A Syllable Lattice Approach to Speaker Verification

Spoken information retrieval for turkish broadcast news

Comments

Information

Published In

Sponsors

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Funding Sources

Conference

Acceptance Rates

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

View options

PDF

eReader

Login options

Full Access

Share

Share this Publication link

Share on social media

Affiliations