ABSTRACT
One of the obstacles in research activities concentrating on environmental sound classification is the scarcity of suitable and publicly available datasets. This paper tries to address that issue by presenting a new annotated collection of 2000 short clips comprising 50 classes of various common sound events, and an abundant unified compilation of 250000 unlabeled auditory excerpts extracted from recordings available through the Freesound project. The paper also provides an evaluation of human accuracy in classifying environmental sounds and compares it to the performance of selected baseline classifiers using features derived from mel-frequency cepstral coefficients and zero-crossing rate.
- BBC sound effects library. http://www.sound-ideas.com/sound-effects/bbc-sound-effects.html. (Aug. 5, 2015).Google Scholar
- E. Alexandre et al. Feature selection for sound classification in hearing aids through restricted search driven by genetic algorithms. IEEE Transactions on Audio, Speech, and Language Processing, 15(8):2249--2256, 2007. Google ScholarDigital Library
- L. Ballan et al. Deep networks for audio event classification in soccer videos. In Proceedings of the IEEE International Conference on Multimedia and Expo, pages 474--477, 2009. Google ScholarDigital Library
- D. Barchiesi et al. Acoustic scene classification: Classifying environments from the sounds they produce. Signal Processing Magazine, 32(3):16--34, 2015.Google ScholarCross Ref
- S. Chachada and C.-C. J. Kuo. Environmental sound recognition: A survey. APSIPA Transactions on Signal and Information Processing, 3:e14, 2014.Google ScholarCross Ref
- F. Font, G. Roma, and X. Serra. Freesound technical demo. In Proceedings of the ACM International Conference on Multimedia, pages 411--412. ACM, 2013. Google ScholarDigital Library
- D. Giannoulis et al. Detection and classification of acoustic scenes and events: An IEEE AASP challenge. In Proceedings of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA). IEEE, 2013.Google ScholarCross Ref
- I. Lallemand, D. Schwarz, and T. Artieres. Content-based retrieval of environmental sounds by multiresolution analysis. In Proceedings of the Sound and Music Computing conference, 2012.Google Scholar
- K. Łopatka, P. Zwan, and A. Czy\.zewski. Dangerous sound event recognition using support vector machine classifiers. In Advances in Multimedia and Network Information System Technologies, pages 49--57. Springer, 2010.Google ScholarCross Ref
- J. Maxime et al. Sound representation and classification benchmark for domestic robots. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), pages 6285--6292. IEEE, 2014.Google ScholarCross Ref
- T. Nishiura and S. Nakamura. An evaluation of sound source identification with RWCP sound scene database in real acoustic environments. In Proceedings of the IEEE International Conference on Multimedia and Expo, volume 2, pages 265--268. IEEE, 2002.Google ScholarCross Ref
- K. J. Piczak. Environmental sound classification with convolutional neural networks. In Proceedings of the IEEE International Workshop on Machine Learning for Signal Processing (MLSP). IEEE, 2015.textitIn press.Google ScholarCross Ref
- A. Plinge et al. A bag-of-features approach to acoustic event detection. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 3704--3708. IEEE, 2014.Google ScholarCross Ref
- J. Salamon, C. Jacoby, and J. P. Bello. A dataset and taxonomy for urban sound research. In Proceedings of the ACM International Conference on Multimedia, pages 1041--1044. ACM, 2014. Google ScholarDigital Library
- D. Stowell and M. D. Plumbley. An open dataset for research on audio field recording archives: freefield1010. arXiv preprint arXiv:1309.5275, 2013.Google Scholar
- M. Vacher, J.-F. Serignat, and S. Chaillol. Sound classification in a smart room environment: an approach using GMM and HMM methods. In Proceedings of the IEEE Conference on Speech Technology and Human-Computer Dialogue, pages 135--146, 2007.Google Scholar
- M. van Grootel, T. Andringa, and J. Krijnders. DARES-G1: Database of annotated real-world everyday sounds. In Proceedings of the NAG/DAGA International Conference on Acoustics, 2009.Google Scholar
Index Terms
- ESC: Dataset for Environmental Sound Classification
Recommendations
NMF-based environmental sound source separation using time-variant gain features
Various environmental sounds exist around us in our daily life. Recently, environmental sound recognition has drawn great attention for understanding our environment. However, because environmental sounds derive from multiple sound sources, it is ...
A new dataset evaluation method based on category overlap
The quality of dataset has a profound effect on classification accuracy, and there is a clear need for some method to evaluate this quality. In this paper, we propose a new dataset evaluation method using the R-value measure. This proposed method is ...
An Indoor Sound Source Localization Dataset for Machine Learning
CSAI '18: Proceedings of the 2018 2nd International Conference on Computer Science and Artificial IntelligenceIn this paper, we mainly describe a corpus for sound source localization. The audio of the corpus mainly uses the dual microphone array of the simulated human head to collect audio data in four directions. The reason for selecting four directions is it ...
Comments