State-of-the-art automatic speech recognition (ASR) systems typically rely on pre-processed features. This paper studies the time-frequency duality in ASR feature extraction methods and proposes extending the standard acoustic model with a complex-valued linear projection layer to learn and optimize features that minimize standard cost functions such as cross-entropy. The proposed Complex Linear Projection (CLP) features achieve superior performance compared to pre-processed Log Mel features.
Cite as: Variani, E., Sainath, T.N., Shafran, I., Bacchiani, M. (2016) Complex Linear Projection (CLP): A Discriminative Approach to Joint Feature Extraction and Acoustic Modeling. Proc. Interspeech 2016, 808-812, doi: 10.21437/Interspeech.2016-1459
@inproceedings{variani16_interspeech, author={Ehsan Variani and Tara N. Sainath and Izhak Shafran and Michiel Bacchiani}, title={{Complex Linear Projection (CLP): A Discriminative Approach to Joint Feature Extraction and Acoustic Modeling}}, year=2016, booktitle={Proc. Interspeech 2016}, pages={808--812}, doi={10.21437/Interspeech.2016-1459} }