Abstract:
Complex-valued neural networks (CVNNs) are well suited to speech signal processing because they can naturally represent amplitude and phase. In this paper, we explore app...Show MoreMetadata
Abstract:
Complex-valued neural networks (CVNNs) are well suited to speech signal processing because they can naturally represent amplitude and phase. In this paper, we explore applying an acoustic model with multiple complex-valued layers (multiple-CVNN-AM) and spliced features to speech recognition. First, we focus on multiple-CVNN-AM with unspliced input features and investigate an appropriate architecture from the viewpoint of the activation function, bias, and number of complex-valued layers. We also propose batch amplitude mean normalization for more quickly and stably training complex-valued layers. We then investigate an appropriate architecture for multiple-CVNN-AM with spliced input features and compare it with a real-valued neural network acoustic model without complex-valued layers (RVNN-AM) and complex linear projection (CLP) models, which can be considered acoustic models with single complex-valued layers. We show that under noise conditions, multiple-CVNN-AM outperforms RVNN-AM and CLP models by up to 7.45% and 11.90%, respectively.
Published in: 2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)
Date of Conference: 12-15 November 2018
Date Added to IEEE Xplore: 07 March 2019
ISBN Information: