Abstract:
Inspired by the contrastive predictive coding (CPC), we propose a feature representation scheme for automatic speech recognition (ASR), which encodes sequential dependenc...Show MoreMetadata
Abstract:
Inspired by the contrastive predictive coding (CPC), we propose a feature representation scheme for automatic speech recognition (ASR), which encodes sequential dependency information from raw audio signals. Following the original CPC, for a given frame, mutual information (MI) lower bound is maximized between historical context and future prediction. While computing the MI lower bound, based on original CPC, we develop the sequential CPC (SEQ-CPC), which takes the sequential information between frames into consideration. Since speech frames are not independent events, incorporating sequential information leads to better recognition performance. Experimental results on WSJ corpus show that SEQ-CPC achieves the best performance than CPC and NCE which is the contrastive objective used in wav2vec.
Published in: ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Date of Conference: 06-11 June 2021
Date Added to IEEE Xplore: 13 May 2021
ISBN Information: