ISCA Archive Odyssey 2018
ISCA Archive Odyssey 2018

End-to-end automatic speaker verification with evolving recurrent neural networks

Giacomo Valenti, Adrien Daniel, Nicholas Evans

The state-of-the-art in automatic speaker verification (ASV) is undergoing a shift from a reliance on hand-crafted features and sequentially optimized toolchains towards end-to-end approaches. Many of the latest algorithms still rely on frameblocking and stacked, hand-crafted features and fixed model topologies such as layered, deep neural networks. This paper reports a fundamentally different exploratory approach which operates on raw audio and which evolves both the weights and the topology of a neural network solution. The paper reports what is, to the authors’ best knowledge, the first investigation of evolving recurrent neural networks for truly end-to-end ASV. The algorithm avoids a reliance upon hand-crafted features and fixed topologies and also learns to discard unreliable output samples. Resulting networks are of low complexity and memory footprint. The approach is thus well suited to embedded systems. With computational complexity making experimentation with standard datasets impracticable, the paper reports modest proof-of-concept experiments designed to evaluate potential. Results equivalent to those obtained using a traditional GMM baseline system and suggest that the proposed end-to-end approach merits further investigation; avenues for future research are described and have potential to deliver significant improvements in performance


doi: 10.21437/Odyssey.2018-47

Cite as: Valenti, G., Daniel, A., Evans, N. (2018) End-to-end automatic speaker verification with evolving recurrent neural networks . Proc. The Speaker and Language Recognition Workshop (Odyssey 2018), 335-341, doi: 10.21437/Odyssey.2018-47

@inproceedings{valenti18_odyssey,
  author={Giacomo Valenti and Adrien Daniel and Nicholas Evans},
  title={{End-to-end automatic speaker verification with evolving recurrent neural networks	}},
  year=2018,
  booktitle={Proc. The Speaker and Language Recognition Workshop (Odyssey 2018)},
  pages={335--341},
  doi={10.21437/Odyssey.2018-47}
}