Loading [MathJax]/extensions/MathMenu.js
SI-Net: Multi-Scale Context-Aware Convolutional Block for Speaker Verification | IEEE Conference Publication | IEEE Xplore

SI-Net: Multi-Scale Context-Aware Convolutional Block for Speaker Verification


Abstract:

Utilizing multi-scale information adequately is essential for building a high-performance speaker verification (SV) system. Biological research shows that the human audit...Show More

Abstract:

Utilizing multi-scale information adequately is essential for building a high-performance speaker verification (SV) system. Biological research shows that the human auditory system employs a multi-timescale processing mode to extract information and has a mechanism of integrating multi-scale information to encode sound information. Inspired by this, we propose a novel block, named Split-Integration (SI), to explore multi-scale context-aware feature learning at a granular level for speaker verification. Our model involves a pair of operations, (i) multi-scale split, which is designed to imitate the multi-timescale processing mode, extracting multi-scale features by grouping and stacking different sizes of filters, and (ii) dynamic integration, which aims at reflecting analogy with the fusion mechanism, introducing KL divergence to measure the complementarity between multi-scale features such that the model fully integrates multi-scale features and produces better speaker-discriminative representation. Experiments are conducted on Voxceleb and Speakers in the Wild(SITW) datasets. Results demonstrate that our approach achieves a relative 10%–20% improvement on equal error rate (EER) over a strong baseline in the SV task.
Date of Conference: 13-17 December 2021
Date Added to IEEE Xplore: 03 February 2022
ISBN Information:
Conference Location: Cartagena, Colombia

Contact IEEE to Subscribe

References

References is not available for this document.