Abstract:
End-to-end spoken language understanding requires speech data annotated with semantic information and may suffer from the shortage of annotated data. Recent progresses le...Show MoreMetadata
Abstract:
End-to-end spoken language understanding requires speech data annotated with semantic information and may suffer from the shortage of annotated data. Recent progresses leverage unlabelled speech data to pre-train a speech encoder. However, it remains a challenge for the pre-trained speech encoder to encode semantic information. Existing works explore transferring knowledge from a pre-trained text model with different alignment losses at a fixed granularity. In this paper, we address the variable granularity in transferring knowledge from texts to speech representation via APLY, an auxiliary pooling layer, that fuses the global information with the adaptively encoded local context. We demonstrate the effectiveness of APLY on three benchmarks of spoken language understanding.
Published in: ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Date of Conference: 04-10 June 2023
Date Added to IEEE Xplore: 05 May 2023
ISBN Information: