ISCA Archive Interspeech 2018
ISCA Archive Interspeech 2018

Acoustic Modeling from Frequency Domain Representations of Speech

Pegah Ghahremani, Hossein Hadian, Hang Lv, Daniel Povey, Sanjeev Khudanpur

In recent years, different studies have proposed new methods for DNN-based feature extraction and joint acoustic model training and feature learning from raw waveform for large vocabulary speech recognition. However, conventional pre-processed methods such as MFCC and PLP are still preferred in the state-of-the-art speech recognition systems as they are perceived to be more robust. Besides, the raw waveform methods - most of which are based on the time-domain signal - do not significantly outperform the conventional methods. In this paper, we propose a frequency-domain feature-learning layer which can allow acoustic model training directly from the waveform. The main distinctions from previous works are a new normalization block and a short-range constraint on the filter weights. The proposed setup achieves consistent performance improvements compared to the baseline MFCC and log-Mel features as well as other proposed time and frequency domain setups on different LVCSR tasks. Finally, based on the learned filters in our feature-learning layer, we propose a new set of analytic filters using polynomial approximation, which outperforms log-Mel filters significantly while being equally fast.


doi: 10.21437/Interspeech.2018-1453

Cite as: Ghahremani, P., Hadian, H., Lv, H., Povey, D., Khudanpur, S. (2018) Acoustic Modeling from Frequency Domain Representations of Speech. Proc. Interspeech 2018, 1596-1600, doi: 10.21437/Interspeech.2018-1453

@inproceedings{ghahremani18_interspeech,
  author={Pegah Ghahremani and Hossein Hadian and Hang Lv and Daniel Povey and Sanjeev Khudanpur},
  title={{Acoustic Modeling from Frequency Domain Representations of Speech}},
  year=2018,
  booktitle={Proc. Interspeech 2018},
  pages={1596--1600},
  doi={10.21437/Interspeech.2018-1453}
}