ISCA Archive Interspeech 2022
ISCA Archive Interspeech 2022

Exploring Multi-task Learning Based Gender Recognition and Age Estimation for Class-imbalanced Data

Weiqiao Zheng, Ping Yang, Rongfeng Lai, Kongyang Zhu, Tao Zhang, Junpeng Zhang, Hongcheng Fu

Automatic gender recognition and age estimation from speaker's audio is desired by applications in music recommendation, speaker profiling etc. However, its performance degrades greatly with the class-imbalanced data distribution. This paper explores a novel multi-task learning based gender recognition and age estimation system using speaker embedding. We apply the label distribution smoothing referred as LDS and investigate a weight mean squared error focal loss named as w-MSE-FL to reshape the weight assigned to the centralized-distribution samples during training. For a limited dataset, we pretrain a deep convolution neural network stacked with an attentive statistic pooling layer for speaker recognition task on a speaker speech dataset to extract robust speaker embedding feature. Then, we further fine-tune the multi-task learning network for gender recognition and age estimation simultaneously using classifier and regressor on a specific gender and age dataset, respectively. Experimental results verify our proposed system achieves better results on the TIMIT dataset with RMSE of 7.17 and 7.25 years on age estimation for male and female speakers, respectively, while performs an overall gender recognition accuracy of 99.30%.


doi: 10.21437/Interspeech.2022-682

Cite as: Zheng, W., Yang, P., Lai, R., Zhu, K., Zhang, T., Zhang, J., Fu, H. (2022) Exploring Multi-task Learning Based Gender Recognition and Age Estimation for Class-imbalanced Data. Proc. Interspeech 2022, 1983-1987, doi: 10.21437/Interspeech.2022-682

@inproceedings{zheng22b_interspeech,
  author={Weiqiao Zheng and Ping Yang and Rongfeng Lai and Kongyang Zhu and Tao Zhang and Junpeng Zhang and Hongcheng Fu},
  title={{Exploring Multi-task Learning Based Gender Recognition and Age Estimation for Class-imbalanced Data}},
  year=2022,
  booktitle={Proc. Interspeech 2022},
  pages={1983--1987},
  doi={10.21437/Interspeech.2022-682}
}