Conferences >2021 IEEE Automatic Speech Re...

SI-Net: Multi-Scale Context-Aware Convolutional Block for Speaker Verification

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

Utilizing multi-scale information adequately is essential for building a high-performance speaker verification (SV) system. Biological research shows that the human audit...Show More

Metadata

Abstract:

Utilizing multi-scale information adequately is essential for building a high-performance speaker verification (SV) system. Biological research shows that the human auditory system employs a multi-timescale processing mode to extract information and has a mechanism of integrating multi-scale information to encode sound information. Inspired by this, we propose a novel block, named Split-Integration (SI), to explore multi-scale context-aware feature learning at a granular level for speaker verification. Our model involves a pair of operations, (i) multi-scale split, which is designed to imitate the multi-timescale processing mode, extracting multi-scale features by grouping and stacking different sizes of filters, and (ii) dynamic integration, which aims at reflecting analogy with the fusion mechanism, introducing KL divergence to measure the complementarity between multi-scale features such that the model fully integrates multi-scale features and produces better speaker-discriminative representation. Experiments are conducted on Voxceleb and Speakers in the Wild(SITW) datasets. Results demonstrate that our approach achieves a relative 10%–20% improvement on equal error rate (EER) over a strong baseline in the SV task.

Published in: 2021 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)

Date of Conference: 13-17 December 2021

Date Added to IEEE Xplore: 03 February 2022

ISBN Information:

DOI: 10.1109/ASRU51503.2021.9688119

Conference Location: Cartagena, Colombia