ISCA Archive Interspeech 2018
ISCA Archive Interspeech 2018

End-to-end Deep Neural Network Age Estimation

Pegah Ghahremani, Phani Sankar Nidadavolu, Nanxin Chen, Jesús Villalba, Daniel Povey, Sanjeev Khudanpur, Najim Dehak

In this paper, we apply the recently proposed x-vector neural network architecture for the task of age estimation. This architecture maps a variable length utterance into a fixed dimensional embedding which retains the relevant sequence level information. This is achieved by a temporal pooling layer. From the embedding, a series of layers is applied to make predictions. The full network is trained end-to-end in a discriminative fashion. This kind of network is starting to outperform the state-of-the-art i-vector embeddings in tasks like speaker and language recognition. Motivated by this, we investigated the optimum way to train x-vectors for the age estimation task. Despite that a regression objective is typical for this task, we found that optimizing a mixture of classification and regression losses provides better results. We trained our models on the NIST SRE08 dataset and evaluated on SRE10. The proposed approach improved mean absolute error (MAE) by 12% w.r.t the i-vector baseline.


doi: 10.21437/Interspeech.2018-2015

Cite as: Ghahremani, P., Nidadavolu, P.S., Chen, N., Villalba, J., Povey, D., Khudanpur, S., Dehak, N. (2018) End-to-end Deep Neural Network Age Estimation. Proc. Interspeech 2018, 277-281, doi: 10.21437/Interspeech.2018-2015

@inproceedings{ghahremani18b_interspeech,
  author={Pegah Ghahremani and Phani Sankar Nidadavolu and Nanxin Chen and Jesús Villalba and Daniel Povey and Sanjeev Khudanpur and Najim Dehak},
  title={{End-to-end Deep Neural Network Age Estimation}},
  year=2018,
  booktitle={Proc. Interspeech 2018},
  pages={277--281},
  doi={10.21437/Interspeech.2018-2015}
}