Loading [MathJax]/extensions/MathZoom.js
HiTNet: Byte-to-BPE Hierarchical Transcription Network for End-to-End Speech Recognition | IEEE Conference Publication | IEEE Xplore

HiTNet: Byte-to-BPE Hierarchical Transcription Network for End-to-End Speech Recognition


Abstract:

In this paper, we propose a new byte to byte-pair-encoding (BPE) Hierarchical Transcription Network (HiTNet) architecture for end-to-end (e2e) automatic speech recognitio...Show More

Abstract:

In this paper, we propose a new byte to byte-pair-encoding (BPE) Hierarchical Transcription Network (HiTNet) architecture for end-to-end (e2e) automatic speech recognition (ASR). The proposed HiTNet architecture simultaneously encodes as well as decodes information hierarchically at different levels of linguistic granularity such as bytes and BPE. In general this idea can be extended to any levels of granularity including phonemes or graphemes or bytes (character to sub-character in some languages), to sub-words or byte-pair encodings (BPE), to words, and so on. Existing hierarchical e2e ASR models primarily encode the acoustic information in an hierarchical manner governed by weaker linguistic constraints at each level. The language information at each level is neither embedded or used explicitly, nor is the information decoded at each level passed on to the next stage. The proposed architecture primarily decodes information in an hierarchical manner utilizing the linguistic information at each level explicitly, while at the same time utilizing the hierarchically encoded acoustic information at each level. Experiments with a two-level byte-to-BPE (b2B) hierarchical transcription show that the proposed architecture significantly reduces the word error rates of both the byte and BPE decoders compared to baseline byte and BPE based attention encoder-decoder models.
Date of Conference: 13-17 December 2021
Date Added to IEEE Xplore: 03 February 2022
ISBN Information:
Conference Location: Cartagena, Colombia

Contact IEEE to Subscribe

References

References is not available for this document.