skip to main content
10.1145/3015166.3015186acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicspsConference Proceedingsconference-collections
research-article

Detection of Acoustic Events by using MFCC and Spectro-Temporal Gabor Filterbank Features

Published: 21 November 2016 Publication History

Abstract

Acoustic event Detection (AED) is concerned with recognition of sound which is produced by the human and the object that is handled by human or by nature. The Detection of acoustic events is an important task for our intelligent system which is supposed to recognize not only speech but also sounds of our indoor and outdoor environments that includes information retrieval, audio-based surveillance and monitoring systems. Currently, System for detection and classification of events from our daily monophonic sound is mature enough to extract features and detect isolated events nearly accurate but accuracy is very low for large dataset and for noisy and overlapped audio events. Mostly the real life sounds are polyphonic and events have some part of overlap which is harder to detect. In our work we will discuss the previous issues for detection and feature extraction of acoustic events. We use the DCASE dataset, published in an international IEEE AASP challenge for Acoustic Event Detection which includes the "office live" recordings which were prepared in an office environment. MFCC is a technique commonly used for features extraction of speech and Acoustic event. We propose to use the Gabor filterbank in addition to MFCCs coefficients to analyze the feature. For Classification we use the Decision tree algorithm that gives better classification and detection result. Finally, we compare our proposed system with each system that was used for DCASE dataset and concluded that our technique gives best F-Score value in detection of events as compare to others.

References

[1]
M. Cobos, J. J. Perez-Solano, S. Felici - Castell, J. Segura, and J.M. Navarro. 2014. Cumulative sum - Based localization of sound events in low-cost wireless acoustic sensor networks. IEEEACM Transaction on Speech and Language Processing.
[2]
T. Sandhan, S. Sonowal, and J. Y. Choi. 2014. Audio bank: a highlevel acoustic signal representation for audio event recognition. In Proceeding of 14th International Conference on Control Automation and Systems (ICCAS'14), pp. 82--87, (October 2014) IEEE, Seoul, Republic of Korea.
[3]
S. E. Kucukbay and M. Sert. 2015. Audio-based event detection in office Live environments using optimized MFCC SVM approach. In Proceeding of the IEEE International Conference on Semantic Computing (ICSC '15), (February 2015) pp. 475 480, Calif USA.
[4]
T. Heittola, A. Mesaros, A. Eronen, and T. Virtanen. 2013. Context dependent sound event detection. EURASIP Journal on Audio, Speech, and Music Processing, vol. no. 1.
[5]
P. Guyot, J. Pinquier, and R. André - Obrecht. 2013. Water sounds recognition based on physical models. In Proceedings of the IEEE International Conference on Acoustic, Speech and Signal processing (ICASSP), pages 793--797.
[6]
L.R. Rabiner. 1989. A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE. vol. 77, no. 2, pp. 257--286.http://ieeexplore.ieee.org/document/18626/
[7]
H. G. Kim, J. J. Burred, and T. Sikora. How efficient is MPEG-7 for general sound recognition.
[8]
K. El-Maleh, A. Samouelian, and P. Kabal. 1999. Frame level noise classification in mobile environments. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing(ICASSP).
[9]
J. Ramιrez, JM Górriz, and JC Segura. 2010. Voice activity detection fundamentals and speech recognition system robustness, M. Grimm, and K. Kroschel, Robust Speech Recognition and Understanding, pp. 1--22.
[10]
S. M. Lajevardi, M. Lech. 2008. Facial Expression Recognition Using a Bank of Neural Networks and logarithmic Gabor Filters. DICTA 08, Canberra, Australia.
[11]
D. Hoiem, Y. Ke, and R. Sukthankar. 2005. SOLAR: Sound object localizationand retrieval in complex audio environments. In Acoustics, Speech and Signal Processing, Proceedings.(ICASSP'05). IEEE International Conference on. IEEE, vol. 5.
[12]
G. Peeters. 2004. A large set of audio features for sound description In CUIDADO project. In CUIDADO I.S.T. Project Report, pp. 125.
[13]
F. Pachet and P. Roy. 2007. Exploring billions of audio features," in Content-Based Multimedia Indexing. CBMI'07. International Workshop on. IEEE, pp. 227--235.
[14]
J. W. Picone. 1993. Signal modeling techniques in speech recognition. Proceedings of the IEEE. vol. 81, no.9, pp. 1215--1247. https://www.researchgate.net/.../2984685
[15]
A. Qiu, C. E. Schreiner, and M. A. Escabí. 2003. Gabor analysis of auditory midbrain receptive fields: Spectro-temporal and binaural Composition. Neurophysiol., vol.90, no.1, pp.456--476.
[16]
T. Butko. 2011. Feature Selection for Multimodal Acoustic Event Detection PhD thesis, Universitat Politecnica de Catalunya.
[17]
J. Bouvrie, T. Ezzat, and T. Poggio. 2008. Localized spectro -- temporal cepstral analysis of speech. In Proceedings of the IEEE International Conference on Acoutics, Speech, and Signal Processing (ICASSP), (March2008) pp. 4733--4736.
[18]
S. Chu, S. Narayanan, and C.J. Kuo. 2009. Environmental Sound Recognition With Time-Frequency Audio Features. IEEE Transactions on Audio Speech, and Language Processing, (Aug. 2009) vol.17, pp. 1142--1158.
[19]
DCASE: Detection and classification of acoustic scenes and events. 2013 {Online}. Available at: http://c4dm.eecs.qmul.ac.uk/sceneseventschallenge/
[20]
D. Gabor. 1946. Theory of communication. J. Inst. Elec. Eng., vol.93, pp. 429--457.http://wearcam.org/gabor1946
[21]
S. Marcelja. 1980. Discriminative learning of receptive fields from responses to non-Gaussian stimulus ensembles. Opt. Soc. Amer.(Nov. 1980) vol. 70, no. 11, pp. 1297--1300.
[22]
M. Cowling and R. Sitte. 2003. Comparison of techniques for Environmental sound recognition,"Pattern Recognition Letters, (Nov. 2003) vol.24, pp. 289 --2907
[23]
T. Aach, A. Kaup, and R. Mester. 1995. On texture analysis: Local energy transforms versus quadrature filters. Signal Process., vol. 45, no.2, pp. 173--181.
[24]
S. Young, G. Evermann, M. Gales, T. Hain, D. Kershaw, X.A. Liu, G. Moore, J. Odell, D. Ollason, D. Povey, V. Valtchev, and P. Woodland. 2006. The HTK Book. (for HTKVersion3.4). Cambridge, U.K.:Cambridge Univ. Press.
[25]
Jens Schröder, Stefan Goetze, and Jörn Anemüller. 2015. Spectro-Temporal Gabor Filterbank Features for Acoustic Event Detection. In proceeding of the IEEE / ACM Transactions on Audio, Speech, and Language Processing, (DECEMBER 2015) VOL. 23, NO.12.
[26]
M. E. Niessen, T. L. M. Van Kasteren, and A. Merentitis. 2013. Hierarchical sound event detection. Extended abstract. http://c4dm.eecs.qmul.ac.uk/sceneseventschallenge
[27]
Vuegen, B. Van Den Broeck, P. Karsmakers, J. F. Gemmeke, B. Vanrumste, and H. Van hamme. 2013. An MFCC - GMM approach for detection and classification. Technical report.
[28]
W. Nogueira, G. Roma, and P. Herrera. 2013. Automatic event classification using front end single channel noise reduction, MFCC features and a support vector machine classifier. Extended abstract.
[29]
A. Diment, T. Heittola, and T. Virtanen. 2013. Sound event detection for office live AASP challenge. Extended abstract.
[30]
J. F. Gemmeke, L. Vuegen, B. Vanrumste, and H. Van hamme. 2013. An exemplar-based NMF approach for audio event detection. Extended abstract.http://c4dm.eecs.qmul.ac.uk/sceneseventschallenge
  1. Detection of Acoustic Events by using MFCC and Spectro-Temporal Gabor Filterbank Features

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Other conferences
      ICSPS 2016: Proceedings of the 8th International Conference on Signal Processing Systems
      November 2016
      235 pages
      ISBN:9781450347907
      DOI:10.1145/3015166
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 21 November 2016

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. Acoustic event detection (AED)
      2. Gabor filterbank polyphonic
      3. MFCC
      4. classification
      5. feature extraction
      6. monophonic

      Qualifiers

      • Research-article
      • Research
      • Refereed limited

      Conference

      ICSPS 2016

      Acceptance Rates

      ICSPS 2016 Paper Acceptance Rate 46 of 83 submissions, 55%;
      Overall Acceptance Rate 46 of 83 submissions, 55%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • 0
        Total Citations
      • 131
        Total Downloads
      • Downloads (Last 12 months)2
      • Downloads (Last 6 weeks)1
      Reflects downloads up to 20 Feb 2025

      Other Metrics

      Citations

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media