Scaling Effect of Self-Supervised Speech Models

Pu, Jie; Yang, Yuguang; Li, Ruirui; Elibol, Oguz; Droppo, Jasha

doi:10.21437/Interspeech.2021-1935

Scaling Effect of Self-Supervised Speech Models

Jie Pu, Yuguang Yang, Ruirui Li, Oguz Elibol, Jasha Droppo

The success of modern deep learning systems is built on two cornerstones, massive amount of annotated training data and advanced computational infrastructure to support large-scale computation. In recent years, the model size of state-of-the-art deep learning systems has rapidly increased and sometimes reached to billions of parameters. Herein we take a close look into this phenomenon and present an empirical study on the scaling effect of model size for self-supervised speech models. In particular, we investigate the quantitative relationship between the model size and the loss/accuracy performance on speech tasks. First, the power-law scaling property between the number of parameters and the L₁ self-supervised loss is verified for speech models. Then the advantage of large speech models in learning effective speech representations is demonstrated in two downstream tasks: i) speaker recognition and ii) phoneme classification. Moreover, it has been shown that the model size of self-supervised speech networks is able to compensate the lack of annotation when there is insufficient training data.

doi: 10.21437/Interspeech.2021-1935

Cite as: Pu, J., Yang, Y., Li, R., Elibol, O., Droppo, J. (2021) Scaling Effect of Self-Supervised Speech Models. Proc. Interspeech 2021, 1084-1088, doi: 10.21437/Interspeech.2021-1935

@inproceedings{pu21_interspeech,
  author={Jie Pu and Yuguang Yang and Ruirui Li and Oguz Elibol and Jasha Droppo},
  title={{Scaling Effect of Self-Supervised Speech Models}},
  year=2021,
  booktitle={Proc. Interspeech 2021},
  pages={1084--1088},
  doi={10.21437/Interspeech.2021-1935}
}