short-paper

ESC: Dataset for Environmental Sound Classification

Author:

Karol J. PiczakAuthors Info & Claims

MM '15: Proceedings of the 23rd ACM international conference on Multimedia

Pages 1015 - 1018

https://doi.org/10.1145/2733373.2806390

Published: 13 October 2015 Publication History

Abstract

One of the obstacles in research activities concentrating on environmental sound classification is the scarcity of suitable and publicly available datasets. This paper tries to address that issue by presenting a new annotated collection of 2000 short clips comprising 50 classes of various common sound events, and an abundant unified compilation of 250000 unlabeled auditory excerpts extracted from recordings available through the Freesound project. The paper also provides an evaluation of human accuracy in classifying environmental sounds and compares it to the performance of selected baseline classifiers using features derived from mel-frequency cepstral coefficients and zero-crossing rate.

References

[1]

BBC sound effects library. http://www.sound-ideas.com/sound-effects/bbc-sound-effects.html. (Aug. 5, 2015).

[2]

E. Alexandre et al. Feature selection for sound classification in hearing aids through restricted search driven by genetic algorithms. IEEE Transactions on Audio, Speech, and Language Processing, 15(8):2249--2256, 2007.

Digital Library

[3]

L. Ballan et al. Deep networks for audio event classification in soccer videos. In Proceedings of the IEEE International Conference on Multimedia and Expo, pages 474--477, 2009.

Digital Library

[4]

D. Barchiesi et al. Acoustic scene classification: Classifying environments from the sounds they produce. Signal Processing Magazine, 32(3):16--34, 2015.

[5]

S. Chachada and C.-C. J. Kuo. Environmental sound recognition: A survey. APSIPA Transactions on Signal and Information Processing, 3:e14, 2014.

[6]

F. Font, G. Roma, and X. Serra. Freesound technical demo. In Proceedings of the ACM International Conference on Multimedia, pages 411--412. ACM, 2013.

Digital Library

[7]

D. Giannoulis et al. Detection and classification of acoustic scenes and events: An IEEE AASP challenge. In Proceedings of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA). IEEE, 2013.

[8]

I. Lallemand, D. Schwarz, and T. Artieres. Content-based retrieval of environmental sounds by multiresolution analysis. In Proceedings of the Sound and Music Computing conference, 2012.

[9]

K. Łopatka, P. Zwan, and A. Czy\.zewski. Dangerous sound event recognition using support vector machine classifiers. In Advances in Multimedia and Network Information System Technologies, pages 49--57. Springer, 2010.

[10]

J. Maxime et al. Sound representation and classification benchmark for domestic robots. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), pages 6285--6292. IEEE, 2014.

[11]

T. Nishiura and S. Nakamura. An evaluation of sound source identification with RWCP sound scene database in real acoustic environments. In Proceedings of the IEEE International Conference on Multimedia and Expo, volume 2, pages 265--268. IEEE, 2002.

[12]

K. J. Piczak. Environmental sound classification with convolutional neural networks. In Proceedings of the IEEE International Workshop on Machine Learning for Signal Processing (MLSP). IEEE, 2015.textitIn press.

[13]

A. Plinge et al. A bag-of-features approach to acoustic event detection. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 3704--3708. IEEE, 2014.

[14]

J. Salamon, C. Jacoby, and J. P. Bello. A dataset and taxonomy for urban sound research. In Proceedings of the ACM International Conference on Multimedia, pages 1041--1044. ACM, 2014.

Digital Library

[15]

D. Stowell and M. D. Plumbley. An open dataset for research on audio field recording archives: freefield1010. arXiv preprint arXiv:1309.5275, 2013.

[16]

M. Vacher, J.-F. Serignat, and S. Chaillol. Sound classification in a smart room environment: an approach using GMM and HMM methods. In Proceedings of the IEEE Conference on Speech Technology and Human-Computer Dialogue, pages 135--146, 2007.

[17]

M. van Grootel, T. Andringa, and J. Krijnders. DARES-G1: Database of annotated real-world everyday sounds. In Proceedings of the NAG/DAGA International Conference on Acoustics, 2009.

Cited By

Mishra PMishra NChoudhary DPareek PReis M(2025)Use of IoT with Deep Learning for Classification of Environment Sounds and Detection of GasesComputers10.3390/computers1402003314:2(33)Online publication date: 22-Jan-2025
https://doi.org/10.3390/computers14020033
McLachlan GMajdak PReijniers JMihocic MPeremans H(2025)Bayesian active sound localisation: To what extent do humans perform like an ideal-observer?PLOS Computational Biology10.1371/journal.pcbi.101210821:1(e1012108)Online publication date: 7-Jan-2025
https://doi.org/10.1371/journal.pcbi.1012108
Ngnamsie Njimbouom SLee KKim J(2025)MANSHIP: Mobile-based assistive notification service for hearing-impaired people using a hybrid deep learning modelTechnology and Health Care10.1177/09287329241309702Online publication date: 31-Jan-2025
https://doi.org/10.1177/09287329241309702
Show More Cited By

Index Terms

ESC: Dataset for Environmental Sound Classification
1. Applied computing
  1. Arts and humanities
    1. Sound and music computing
2. Information systems
  1. Information retrieval
    1. Document representation
    2. Specialized information retrieval
      1. Multimedia and multimodal retrieval
        Music retrieval

Recommendations

NMF-based environmental sound source separation using time-variant gain features

Various environmental sounds exist around us in our daily life. Recently, environmental sound recognition has drawn great attention for understanding our environment. However, because environmental sounds derive from multiple sound sources, it is ...
A new dataset evaluation method based on category overlap

The quality of dataset has a profound effect on classification accuracy, and there is a clear need for some method to evaluate this quality. In this paper, we propose a new dataset evaluation method using the R-value measure. This proposed method is ...
Environmental sound classification method based on WVD and the improved ResNet50
MIDA '24: Proceedings of the 2024 International Conference on Machine Intelligence and Digital Applications

Due to the complexity of environmental sound, environmental sound classification has always been a difficult problem that has not been fully solved. This paper introduces a novel approach named IC-ResNet50 for environmental sound classification. This ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

MM '15: Proceedings of the 23rd ACM international conference on Multimedia

October 2015

1402 pages

ISBN:9781450334594

DOI:10.1145/2733373

General Chairs:
Xiaofang Zhou
The University of Queensland, Australia
,
Alan F. Smeaton
Dublin City University, Ireland
,
Qi Tian
The University of Texas at San Antonio, USA
,
Program Chairs:
Dick C.A. Bulterman
FXPAL, USA
,
Heng Tao Shen
The University of Queensland, Australia
,
Ketan Mayer-Patel
The University of North Carolina, USA
,
Shuicheng Yan
National University of Singapore, Singapore

Copyright © 2015 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 October 2015

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Short-paper

Conference

MM '15

Sponsor:

SIGMM

MM '15: ACM Multimedia Conference

October 26 - 30, 2015

Brisbane, Australia

Acceptance Rates

MM '15 Paper Acceptance Rate 56 of 252 submissions, 22%;

Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

830
Total Citations
View Citations
3,352
Total Downloads

Downloads (Last 12 months)442
Downloads (Last 6 weeks)37

Reflects downloads up to 30 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Mishra PMishra NChoudhary DPareek PReis M(2025)Use of IoT with Deep Learning for Classification of Environment Sounds and Detection of GasesComputers10.3390/computers1402003314:2(33)Online publication date: 22-Jan-2025
https://doi.org/10.3390/computers14020033
McLachlan GMajdak PReijniers JMihocic MPeremans H(2025)Bayesian active sound localisation: To what extent do humans perform like an ideal-observer?PLOS Computational Biology10.1371/journal.pcbi.101210821:1(e1012108)Online publication date: 7-Jan-2025
https://doi.org/10.1371/journal.pcbi.1012108
Ngnamsie Njimbouom SLee KKim J(2025)MANSHIP: Mobile-based assistive notification service for hearing-impaired people using a hybrid deep learning modelTechnology and Health Care10.1177/09287329241309702Online publication date: 31-Jan-2025
https://doi.org/10.1177/09287329241309702
Si YLi YTan JChen GLi QRusso M(2025)Fully Few-Shot Class-Incremental Audio Classification With Adaptive Improvement of Stability and PlasticityIEEE Transactions on Audio, Speech and Language Processing10.1109/TASLPRO.2025.352714733(418-433)Online publication date: 2025
https://doi.org/10.1109/TASLPRO.2025.3527147
Liu XKong QZhao YLiu HYuan YLiu YXia RWang YPlumbley MWang W(2025)Separate Anything You DescribeIEEE Transactions on Audio, Speech and Language Processing10.1109/TASLP.2024.352001733(458-471)Online publication date: 2025
https://doi.org/10.1109/TASLP.2024.3520017
Wen SZhang QHu TLi J(2025)Robust Audio Watermarking Against Manipulation Attacks Based on Deep LearningIEEE Signal Processing Letters10.1109/LSP.2024.350128532(126-130)Online publication date: 2025
https://doi.org/10.1109/LSP.2024.3501285
Wijngaard GFormisano EEsposito MDumontier M(2025)Audio-Language Datasets of Scenes and Events: A SurveyIEEE Access10.1109/ACCESS.2025.353462113(20328-20360)Online publication date: 2025
https://doi.org/10.1109/ACCESS.2025.3534621
Jo DKim JJeon JWon C(2025)EgoSep: Egocentric On-Screen Sound Source Separation for Real-Time Edge ComputingIEEE Access10.1109/ACCESS.2025.352675713(6387-6396)Online publication date: 2025
https://doi.org/10.1109/ACCESS.2025.3526757
Yokotani KYamamoto TTakahashi HTakamura MAbe N(2025)Sounds like gambling: detection of gambling venue visitation from sounds in gamblers’ environments using a transformerScientific Reports10.1038/s41598-024-83389-115:1Online publication date: 2-Jan-2025
https://doi.org/10.1038/s41598-024-83389-1
Martinez-Rau LChelotti JFerrero MGalli JUtsumi SPlanisich ARufiner HGiovanini L(2025)A noise-robust acoustic method for recognizing foraging activities of grazing cattleComputers and Electronics in Agriculture10.1016/j.compag.2024.109692229(109692)Online publication date: Feb-2025
https://doi.org/10.1016/j.compag.2024.109692
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten