Loading [a11y]/accessibility-menu.js
Combining time- and frequency-domain convolution in convolutional neural network-based phone recognition | IEEE Conference Publication | IEEE Xplore

Combining time- and frequency-domain convolution in convolutional neural network-based phone recognition


Abstract:

Convolutional neural networks have proved very successful in image recognition, thanks to their tolerance to small translations. They have recently been applied to speech...Show More

Abstract:

Convolutional neural networks have proved very successful in image recognition, thanks to their tolerance to small translations. They have recently been applied to speech recognition as well, using a spectral representation as input. However, in this case the translations along the two axes - time and frequency - should be handled quite differently. So far, most authors have focused on convolution along the frequency axis, which offers invariance to speaker and speaking style variations. Other researchers have developed a different network architecture that applies time-domain convolution in order to process a longer time-span of input in a hierarchical manner. These two approaches have different background motivations, and both offer significant gains over a standard fully connected network. Here we show that the two network architectures can be readily combined, like their advantages. With the combined model we report an error rate of 16.7% on the TIMIT phone recognition task, a new record on this dataset.
Date of Conference: 04-09 May 2014
Date Added to IEEE Xplore: 14 July 2014
Electronic ISBN:978-1-4799-2893-4

ISSN Information:

Conference Location: Florence, Italy

References

References is not available for this document.