Conferences >2024 Signal Processing: Algor...

Automatic re-labeling of Google AudioSet for improved quality of learned features and pre-training

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

AudioSet, comprising over 2 million human-labeled sound clips, remains one of the biggest and most versatile publicly available audio events datasets. Deep neural network...Show More

Metadata

Abstract:

AudioSet, comprising over 2 million human-labeled sound clips, remains one of the biggest and most versatile publicly available audio events datasets. Deep neural networks trained on this data are able to detect 527 types of sounds organized in a hierarchical (tree-like) structure named ontology. However, these models are also often used as feature extractors or serve as a basis for knowledge transfer to other sound detection and classification tasks. When describing the AudioSet recordings, raters were asked to choose one or more labels from the ontology. Analysis of the dataset reveals that raters were inconsistent and imprecise when dealing with the hierarchy of sounds. For example, some raters selected only the most precise labels while others selected all relevant labels (i.e. all parents of selected child labels). Additionally, a large fraction of sound clips are labeled with general labels without providing any fine-grained labels. These issues harm the quality of features learned by the models trained on AudioSet. As a remedy, we propose two ways in which the dataset can be automatically re-labeled to achieve specific, consistent and complete label definitions on all levels of the ontology tree. Experimental results show significant improvement in the performance of new models trained on features extracted from, or initialized with weights transferred from base models trained with re-labeled AudioSet data. In a more general view, this work highlights the importance of paying attention to the labeling of data as a way to improve model accuracy.

Published in: 2024 Signal Processing: Algorithms, Architectures, Arrangements, and Applications (SPA)

Date of Conference: 25-27 September 2024

Date Added to IEEE Xplore: 17 October 2024

ISBN Information:

ISSN Information:

DOI: 10.23919/SPA61993.2024.10715611

Conference Location: Poznan, Poland

Contents

References is not available for this document.

Automatic re-labeling of Google AudioSet for improved quality of learned features and pre-training

Abstract:

Metadata

Abstract:

ISSN Information:

References

IEEE Account

Purchase Details

Profile Information

Need Help?

Automatic re-labeling of Google AudioSet for improved quality of learned features and pre-training

Alerts

Abstract:

Metadata

Abstract:

ISSN Information:

References

IEEE Account

Purchase Details

Profile Information

Need Help?