Loading [a11y]/accessibility-menu.js
Towards a Unified End-to-End Language Understanding System for Speech and Text Inputs | IEEE Conference Publication | IEEE Xplore

Towards a Unified End-to-End Language Understanding System for Speech and Text Inputs


Abstract:

End-to-end (E2E) spoken language understanding (SLU) systems facilitate mapping speech inputs directly to semantic outputs, eliminating the need for modular processing of...Show More

Abstract:

End-to-end (E2E) spoken language understanding (SLU) systems facilitate mapping speech inputs directly to semantic outputs, eliminating the need for modular processing of speech-to-text and text-to-semantics sub-tasks using separate models. However, they are now limited to processing speech inputs only, and are not flexible to deal with plain texts. In this paper, we propose an E2E spoken and natural language understanding (SNLU) system that can handle both speech and text within a unified architecture. The system follows the Mask-CTC non-autoregressive approach, and the input flexibility is acquired by partially sharing the decoder between SLU and NLU tasks. Experiments on the SLURP dataset show that the proposed architecture achieves similar performance to using separate E2E SLU and NLU modules, but with relatively 43.7 % less model parameters. We also explore the use of pre-trained speech and language models into the SNLU system, and show that they further improve the performance.
Date of Conference: 16-20 December 2023
Date Added to IEEE Xplore: 19 January 2024
ISBN Information:
Conference Location: Taipei, Taiwan

Contact IEEE to Subscribe

References

References is not available for this document.