Adaptive Rectangle Loss for Speaker Verification

Ruida, Li; Shuo, Fang; Chenguang, Ma; Liang, Li

doi:10.21437/Interspeech.2022-486

Adaptive Rectangle Loss for Speaker Verification

Li Ruida, Fang Shuo, Ma Chenguang, Li Liang

From the perspective of pair similarity optimization, speaker verification is expected to satisfy the criterion that each intraclass similarity is higher than the maximal inter-class similarity. However, we find that most softmax-based losses are suboptimal which encourages each sample to have a higher target similarity score only than its corresponding non-target similarity scores but not all the non-target ones. To this end, we propose a batch-wise maximum softmax loss, in which the non-target logits are replaced by the ones derived from the whole batch. To further emphasize the minority hard non-target pairs, an adaptive margin mechanism is introduced at the same time. The proposed loss is named Adaptive Rectangle loss due to its rectangle decision boundary. In addition, an annealing strategy is introduced to improve the stability of the training process and boost the convergence. Experimentally, we demonstrate the superiority of adaptive rectangle loss on speaker verification tasks. Results on VoxCeleb show that our proposed loss outperforms state-of-the-art by 10.11% in EER.

doi: 10.21437/Interspeech.2022-486

Cite as: Ruida, L., Shuo, F., Chenguang, M., Liang, L. (2022) Adaptive Rectangle Loss for Speaker Verification. Proc. Interspeech 2022, 301-305, doi: 10.21437/Interspeech.2022-486

@inproceedings{ruida22_interspeech,
  author={Li Ruida and Fang Shuo and Ma Chenguang and Li Liang},
  title={{Adaptive Rectangle Loss for Speaker Verification}},
  year=2022,
  booktitle={Proc. Interspeech 2022},
  pages={301--305},
  doi={10.21437/Interspeech.2022-486}
}