research-article

Detection of Acoustic Events by using MFCC and Spectro-Temporal Gabor Filterbank Features

Authors:

Umair Zafar Khan,

Muhammad Usman Akram,

Arslan Shaukat,

Muhammad Kashan Basit,

Abdul WahidAuthors Info & Claims

ICSPS 2016: Proceedings of the 8th International Conference on Signal Processing Systems

Pages 158 - 164

https://doi.org/10.1145/3015166.3015186

Published: 21 November 2016 Publication History

Abstract

Acoustic event Detection (AED) is concerned with recognition of sound which is produced by the human and the object that is handled by human or by nature. The Detection of acoustic events is an important task for our intelligent system which is supposed to recognize not only speech but also sounds of our indoor and outdoor environments that includes information retrieval, audio-based surveillance and monitoring systems. Currently, System for detection and classification of events from our daily monophonic sound is mature enough to extract features and detect isolated events nearly accurate but accuracy is very low for large dataset and for noisy and overlapped audio events. Mostly the real life sounds are polyphonic and events have some part of overlap which is harder to detect. In our work we will discuss the previous issues for detection and feature extraction of acoustic events. We use the DCASE dataset, published in an international IEEE AASP challenge for Acoustic Event Detection which includes the "office live" recordings which were prepared in an office environment. MFCC is a technique commonly used for features extraction of speech and Acoustic event. We propose to use the Gabor filterbank in addition to MFCCs coefficients to analyze the feature. For Classification we use the Decision tree algorithm that gives better classification and detection result. Finally, we compare our proposed system with each system that was used for DCASE dataset and concluded that our technique gives best F-Score value in detection of events as compare to others.

References

[1]

M. Cobos, J. J. Perez-Solano, S. Felici - Castell, J. Segura, and J.M. Navarro. 2014. Cumulative sum - Based localization of sound events in low-cost wireless acoustic sensor networks. IEEEACM Transaction on Speech and Language Processing.

Digital Library

[2]

T. Sandhan, S. Sonowal, and J. Y. Choi. 2014. Audio bank: a highlevel acoustic signal representation for audio event recognition. In Proceeding of 14th International Conference on Control Automation and Systems (ICCAS'14), pp. 82--87, (October 2014) IEEE, Seoul, Republic of Korea.

[3]

S. E. Kucukbay and M. Sert. 2015. Audio-based event detection in office Live environments using optimized MFCC SVM approach. In Proceeding of the IEEE International Conference on Semantic Computing (ICSC '15), (February 2015) pp. 475 480, Calif USA.

[4]

T. Heittola, A. Mesaros, A. Eronen, and T. Virtanen. 2013. Context dependent sound event detection. EURASIP Journal on Audio, Speech, and Music Processing, vol. no. 1.

[5]

P. Guyot, J. Pinquier, and R. André - Obrecht. 2013. Water sounds recognition based on physical models. In Proceedings of the IEEE International Conference on Acoustic, Speech and Signal processing (ICASSP), pages 793--797.

[6]

L.R. Rabiner. 1989. A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE. vol. 77, no. 2, pp. 257--286.http://ieeexplore.ieee.org/document/18626/

[7]

H. G. Kim, J. J. Burred, and T. Sikora. How efficient is MPEG-7 for general sound recognition.

[8]

K. El-Maleh, A. Samouelian, and P. Kabal. 1999. Frame level noise classification in mobile environments. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing(ICASSP).

Digital Library

[9]

J. Ramιrez, JM Górriz, and JC Segura. 2010. Voice activity detection fundamentals and speech recognition system robustness, M. Grimm, and K. Kroschel, Robust Speech Recognition and Understanding, pp. 1--22.

[10]

S. M. Lajevardi, M. Lech. 2008. Facial Expression Recognition Using a Bank of Neural Networks and logarithmic Gabor Filters. DICTA 08, Canberra, Australia.

Digital Library

[11]

D. Hoiem, Y. Ke, and R. Sukthankar. 2005. SOLAR: Sound object localizationand retrieval in complex audio environments. In Acoustics, Speech and Signal Processing, Proceedings.(ICASSP'05). IEEE International Conference on. IEEE, vol. 5.

[12]

G. Peeters. 2004. A large set of audio features for sound description In CUIDADO project. In CUIDADO I.S.T. Project Report, pp. 125.

[13]

F. Pachet and P. Roy. 2007. Exploring billions of audio features," in Content-Based Multimedia Indexing. CBMI'07. International Workshop on. IEEE, pp. 227--235.

[14]

J. W. Picone. 1993. Signal modeling techniques in speech recognition. Proceedings of the IEEE. vol. 81, no.9, pp. 1215--1247. https://www.researchgate.net/.../2984685

[15]

A. Qiu, C. E. Schreiner, and M. A. Escabí. 2003. Gabor analysis of auditory midbrain receptive fields: Spectro-temporal and binaural Composition. Neurophysiol., vol.90, no.1, pp.456--476.

[16]

T. Butko. 2011. Feature Selection for Multimodal Acoustic Event Detection PhD thesis, Universitat Politecnica de Catalunya.

[17]

J. Bouvrie, T. Ezzat, and T. Poggio. 2008. Localized spectro -- temporal cepstral analysis of speech. In Proceedings of the IEEE International Conference on Acoutics, Speech, and Signal Processing (ICASSP), (March2008) pp. 4733--4736.

[18]

S. Chu, S. Narayanan, and C.J. Kuo. 2009. Environmental Sound Recognition With Time-Frequency Audio Features. IEEE Transactions on Audio Speech, and Language Processing, (Aug. 2009) vol.17, pp. 1142--1158.

Digital Library

[19]

DCASE: Detection and classification of acoustic scenes and events. 2013 {Online}. Available at: http://c4dm.eecs.qmul.ac.uk/sceneseventschallenge/

[20]

D. Gabor. 1946. Theory of communication. J. Inst. Elec. Eng., vol.93, pp. 429--457.http://wearcam.org/gabor1946

[21]

S. Marcelja. 1980. Discriminative learning of receptive fields from responses to non-Gaussian stimulus ensembles. Opt. Soc. Amer.(Nov. 1980) vol. 70, no. 11, pp. 1297--1300.

[22]

M. Cowling and R. Sitte. 2003. Comparison of techniques for Environmental sound recognition,"Pattern Recognition Letters, (Nov. 2003) vol.24, pp. 289 --2907

Digital Library

[23]

T. Aach, A. Kaup, and R. Mester. 1995. On texture analysis: Local energy transforms versus quadrature filters. Signal Process., vol. 45, no.2, pp. 173--181.

Digital Library

[24]

S. Young, G. Evermann, M. Gales, T. Hain, D. Kershaw, X.A. Liu, G. Moore, J. Odell, D. Ollason, D. Povey, V. Valtchev, and P. Woodland. 2006. The HTK Book. (for HTKVersion3.4). Cambridge, U.K.:Cambridge Univ. Press.

[25]

Jens Schröder, Stefan Goetze, and Jörn Anemüller. 2015. Spectro-Temporal Gabor Filterbank Features for Acoustic Event Detection. In proceeding of the IEEE / ACM Transactions on Audio, Speech, and Language Processing, (DECEMBER 2015) VOL. 23, NO.12.

Digital Library

[26]

M. E. Niessen, T. L. M. Van Kasteren, and A. Merentitis. 2013. Hierarchical sound event detection. Extended abstract. http://c4dm.eecs.qmul.ac.uk/sceneseventschallenge

[27]

Vuegen, B. Van Den Broeck, P. Karsmakers, J. F. Gemmeke, B. Vanrumste, and H. Van hamme. 2013. An MFCC - GMM approach for detection and classification. Technical report.

[28]

W. Nogueira, G. Roma, and P. Herrera. 2013. Automatic event classification using front end single channel noise reduction, MFCC features and a support vector machine classifier. Extended abstract.

[29]

A. Diment, T. Heittola, and T. Virtanen. 2013. Sound event detection for office live AASP challenge. Extended abstract.

[30]

J. F. Gemmeke, L. Vuegen, B. Vanrumste, and H. Van hamme. 2013. An exemplar-based NMF approach for audio event detection. Extended abstract.http://c4dm.eecs.qmul.ac.uk/sceneseventschallenge

Detection of Acoustic Events by using MFCC and Spectro-Temporal Gabor Filterbank Features
1. Hardware
  1. Communication hardware, interfaces and storage
  2. Robustness
    1. Hardware reliability

Recommendations

Spectro-temporal Gabor filterbank features for acoustic event detection

Algorithms for the automatic detection and recognition of acoustic events are increasingly gaining relevance for the reliable and robust functioning of consumer, assistive and monitoring systems. The extraction of appropriate task relevant acoustic ...
Multitaper MFCC and PLP features for speaker verification using i-vectors

In this paper we study the performance of the low-variance multi-taper Mel-frequency cepstral coefficient (MFCC) and perceptual linear prediction (PLP) features in a state-of-the-art i-vector speaker verification system. The MFCC and PLP features are ...
Fusion of TEO Phase with MFCC Features for Speaker Verification
PerMIn '15: Proceedings of the 2nd International Conference on Perception and Machine Intelligence

In the last few years, there has been significant work on using temporal features of speech excitation source, viz., Linear Prediction (LP) residual and its analytic or instantaneous phase, group delay method, glottal glow derivative, etc. for speaker ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

ICSPS 2016: Proceedings of the 8th International Conference on Signal Processing Systems

November 2016

235 pages

ISBN:9781450347907

DOI:10.1145/3015166

Copyright © 2016 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 21 November 2016

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

ICSPS 2016

ICSPS 2016: 8th International Conference on Signal Processing Systems

November 21 - 24, 2016

Auckland, New Zealand

Acceptance Rates

ICSPS 2016 Paper Acceptance Rate 46 of 83 submissions, 55%;

Overall Acceptance Rate 46 of 83 submissions, 55%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
131
Total Downloads

Downloads (Last 12 months)2
Downloads (Last 6 weeks)1

Reflects downloads up to 20 Feb 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten