Abstract:
This paper investigates an approach for adapting RNN-Transducer (RNN-T) based automatic speech recognition (ASR) model to improve the recognition of unseen words during t...Show MoreMetadata
Abstract:
This paper investigates an approach for adapting RNN-Transducer (RNN-T) based automatic speech recognition (ASR) model to improve the recognition of unseen words during training. Prior works have shown that it is possible to incrementally finetune the ASR model to recognize multiple sets of new words. However, this creates a dependency between the updates which is not ideal for the hot-fixing use-case where we want each update to be applied independently of other updates. We propose to train residual adapters on the RNN-T model and combine them on-the-fly through adapter-fusion. We investigate several approaches to combine the adapters so that they maintain the ability to recognize new words with only a minimal degradation on the usual user requests. Specifically, the sum-fusion which sums the outputs of the adapters inserted in parallel shows over 90% recall on the new words with less than 1% relative WER degradation on the usual data compared to the original RNN-T model.
Published in: 2022 IEEE Spoken Language Technology Workshop (SLT)
Date of Conference: 09-12 January 2023
Date Added to IEEE Xplore: 27 January 2023
ISBN Information: