skip to main content
10.1145/2733373.2806390acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
short-paper

ESC: Dataset for Environmental Sound Classification

Published: 13 October 2015 Publication History

Abstract

One of the obstacles in research activities concentrating on environmental sound classification is the scarcity of suitable and publicly available datasets. This paper tries to address that issue by presenting a new annotated collection of 2000 short clips comprising 50 classes of various common sound events, and an abundant unified compilation of 250000 unlabeled auditory excerpts extracted from recordings available through the Freesound project. The paper also provides an evaluation of human accuracy in classifying environmental sounds and compares it to the performance of selected baseline classifiers using features derived from mel-frequency cepstral coefficients and zero-crossing rate.

References

[1]
BBC sound effects library. http://www.sound-ideas.com/sound-effects/bbc-sound-effects.html. (Aug. 5, 2015).
[2]
E. Alexandre et al. Feature selection for sound classification in hearing aids through restricted search driven by genetic algorithms. IEEE Transactions on Audio, Speech, and Language Processing, 15(8):2249--2256, 2007.
[3]
L. Ballan et al. Deep networks for audio event classification in soccer videos. In Proceedings of the IEEE International Conference on Multimedia and Expo, pages 474--477, 2009.
[4]
D. Barchiesi et al. Acoustic scene classification: Classifying environments from the sounds they produce. Signal Processing Magazine, 32(3):16--34, 2015.
[5]
S. Chachada and C.-C. J. Kuo. Environmental sound recognition: A survey. APSIPA Transactions on Signal and Information Processing, 3:e14, 2014.
[6]
F. Font, G. Roma, and X. Serra. Freesound technical demo. In Proceedings of the ACM International Conference on Multimedia, pages 411--412. ACM, 2013.
[7]
D. Giannoulis et al. Detection and classification of acoustic scenes and events: An IEEE AASP challenge. In Proceedings of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA). IEEE, 2013.
[8]
I. Lallemand, D. Schwarz, and T. Artieres. Content-based retrieval of environmental sounds by multiresolution analysis. In Proceedings of the Sound and Music Computing conference, 2012.
[9]
K. Łopatka, P. Zwan, and A. Czy\.zewski. Dangerous sound event recognition using support vector machine classifiers. In Advances in Multimedia and Network Information System Technologies, pages 49--57. Springer, 2010.
[10]
J. Maxime et al. Sound representation and classification benchmark for domestic robots. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), pages 6285--6292. IEEE, 2014.
[11]
T. Nishiura and S. Nakamura. An evaluation of sound source identification with RWCP sound scene database in real acoustic environments. In Proceedings of the IEEE International Conference on Multimedia and Expo, volume 2, pages 265--268. IEEE, 2002.
[12]
K. J. Piczak. Environmental sound classification with convolutional neural networks. In Proceedings of the IEEE International Workshop on Machine Learning for Signal Processing (MLSP). IEEE, 2015.textitIn press.
[13]
A. Plinge et al. A bag-of-features approach to acoustic event detection. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 3704--3708. IEEE, 2014.
[14]
J. Salamon, C. Jacoby, and J. P. Bello. A dataset and taxonomy for urban sound research. In Proceedings of the ACM International Conference on Multimedia, pages 1041--1044. ACM, 2014.
[15]
D. Stowell and M. D. Plumbley. An open dataset for research on audio field recording archives: freefield1010. arXiv preprint arXiv:1309.5275, 2013.
[16]
M. Vacher, J.-F. Serignat, and S. Chaillol. Sound classification in a smart room environment: an approach using GMM and HMM methods. In Proceedings of the IEEE Conference on Speech Technology and Human-Computer Dialogue, pages 135--146, 2007.
[17]
M. van Grootel, T. Andringa, and J. Krijnders. DARES-G1: Database of annotated real-world everyday sounds. In Proceedings of the NAG/DAGA International Conference on Acoustics, 2009.

Cited By

View all
  • (2025)Use of IoT with Deep Learning for Classification of Environment Sounds and Detection of GasesComputers10.3390/computers1402003314:2(33)Online publication date: 22-Jan-2025
  • (2025)Bayesian active sound localisation: To what extent do humans perform like an ideal-observer?PLOS Computational Biology10.1371/journal.pcbi.101210821:1(e1012108)Online publication date: 7-Jan-2025
  • (2025)Fully Few-Shot Class-Incremental Audio Classification With Adaptive Improvement of Stability and PlasticityIEEE Transactions on Audio, Speech and Language Processing10.1109/TASLPRO.2025.352714733(418-433)Online publication date: 2025
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
MM '15: Proceedings of the 23rd ACM international conference on Multimedia
October 2015
1402 pages
ISBN:9781450334594
DOI:10.1145/2733373
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 October 2015

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. classification
  2. dataset
  3. environmental sound

Qualifiers

  • Short-paper

Conference

MM '15
Sponsor:
MM '15: ACM Multimedia Conference
October 26 - 30, 2015
Brisbane, Australia

Acceptance Rates

MM '15 Paper Acceptance Rate 56 of 252 submissions, 22%;
Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)470
  • Downloads (Last 6 weeks)41
Reflects downloads up to 27 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2025)Use of IoT with Deep Learning for Classification of Environment Sounds and Detection of GasesComputers10.3390/computers1402003314:2(33)Online publication date: 22-Jan-2025
  • (2025)Bayesian active sound localisation: To what extent do humans perform like an ideal-observer?PLOS Computational Biology10.1371/journal.pcbi.101210821:1(e1012108)Online publication date: 7-Jan-2025
  • (2025)Fully Few-Shot Class-Incremental Audio Classification With Adaptive Improvement of Stability and PlasticityIEEE Transactions on Audio, Speech and Language Processing10.1109/TASLPRO.2025.352714733(418-433)Online publication date: 2025
  • (2025)Separate Anything You DescribeIEEE Transactions on Audio, Speech and Language Processing10.1109/TASLP.2024.352001733(458-471)Online publication date: 2025
  • (2025)Robust Audio Watermarking Against Manipulation Attacks Based on Deep LearningIEEE Signal Processing Letters10.1109/LSP.2024.350128532(126-130)Online publication date: 2025
  • (2025)EgoSep: Egocentric On-Screen Sound Source Separation for Real-Time Edge ComputingIEEE Access10.1109/ACCESS.2025.352675713(6387-6396)Online publication date: 2025
  • (2025)Sounds like gambling: detection of gambling venue visitation from sounds in gamblers’ environments using a transformerScientific Reports10.1038/s41598-024-83389-115:1Online publication date: 2-Jan-2025
  • (2025)A noise-robust acoustic method for recognizing foraging activities of grazing cattleComputers and Electronics in Agriculture10.1016/j.compag.2024.109692229(109692)Online publication date: Mar-2025
  • (2025)RankMatch: Environmental sound semi-supervised learning with audio classification propensityApplied Acoustics10.1016/j.apacoust.2024.110515231(110515)Online publication date: Mar-2025
  • (2025)An empirical evaluation of deep semi-supervised learningInternational Journal of Data Science and Analytics10.1007/s41060-024-00713-8Online publication date: 21-Jan-2025
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media