ISCA Archive Interspeech 2019
ISCA Archive Interspeech 2019

Class-Wise Centroid Distance Metric Learning for Acoustic Event Detection

Xugang Lu, Peng Shen, Sheng Li, Yu Tsao, Hisashi Kawai

Designing good feature extraction and classifier models is essential for obtaining high performances of acoustic event detection (AED) systems. Current state-of-the-art algorithms are based on deep neural network models that jointly learn the feature representation and classifier models. As a typical pipeline in these algorithms, several network layers with nonlinear transforms are stacked for feature extraction, and a classifier layer with a softmax transform is applied on top of these extracted features to obtain normalized probability outputs. This pipeline is directly connected to a final goal for class discrimination without explicitly considering how the features should be distributed for inter-class and intra-class samples. In this paper, we explicitly add a distance metric constraint on feature extraction process with a goal to reduce intra-class sample distances and increase inter-class sample distances. Rather than estimating the pair-wise distances of samples, the distances are efficiently calculated between samples and class cluster centroids. With this constraint, the learned features have a good property for improving the generalization of the classification models. AED experiments on an urban sound classification task were carried out to test the algorithm. Results showed that the proposed algorithm efficiently improved the performance on the current state-of-the-art deep learning algorithms.


doi: 10.21437/Interspeech.2019-2271

Cite as: Lu, X., Shen, P., Li, S., Tsao, Y., Kawai, H. (2019) Class-Wise Centroid Distance Metric Learning for Acoustic Event Detection. Proc. Interspeech 2019, 3614-3618, doi: 10.21437/Interspeech.2019-2271

@inproceedings{lu19d_interspeech,
  author={Xugang Lu and Peng Shen and Sheng Li and Yu Tsao and Hisashi Kawai},
  title={{Class-Wise Centroid Distance Metric Learning for Acoustic Event Detection}},
  year=2019,
  booktitle={Proc. Interspeech 2019},
  pages={3614--3618},
  doi={10.21437/Interspeech.2019-2271}
}