Audio Event Detection Based on Cross Correlation in Selected Frequency Bands of Spectrogram

Hajihashemi, Vahid; Gharahbagh, Abdorreza Alavi; Machado, J. J. M.; Tavares, João Manuel R. S.

doi:10.1007/978-3-031-45651-0_19

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 802))

Included in the following conference series:

World Conference on Information Systems and Technologies

62 Accesses

Abstract

Audio event detection (AED) systems have various applications in modern world. Examples of applications include security systems, urban management and automatic monitoring in smart cities, and online multimedia processing. The noise and background sound vary in an urban environment, so frequency domain and normalized features usually show better efficiency in AED systems.

This work proposes a Mel spectrogram-based approach that uses the spectral characteristics of audio signals and cross-correlations to build a dictionary of effective spectrogram frequency bands and their patterns in different audio events. Initially, the proposed approach extracts the Mel spectrogram of audio input. In the next step, a mathematical-statistical analysis is used to specify the effective frequency bands of the spectrogram in each audio event. The pattern of selected frequency bands varies due to the type of event, which can effectively help decrease spectrogram size as input feature, reduce errors and increase the accuracy of different AED methods. The proposed approach was implemented on the URBAN-SED database, and its efficiency was compared against deep learning base state-of-the-art researches in the field. According to the results, about 50% of the frequency bands in the spectrum are useless and can be discarded in the training process of an AED system without any loss in terms of accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Hajihashemi, V., Gharahbagh, A.A., Cruz, P.M., Ferreira, M.C., Machado, J.J.M., Tavares, J.M.R.S.: Binaural acoustic scene classification using wavelet scattering, parallel ensemble classifiers and nonlinear fusion. Sensors 22(4), 1535 (2022)
Google Scholar
Hajihashemi, V., Alavigharahbagh, A., Oliveira, H.S., Cruz, P.M., Tavares, J.M.R.S.: Novel time-frequency based scheme for detecting sound events from sound background in audio segments. In: Tavares, J.M.R.S., Papa, J.P., González Hidalgo, M. (eds.) CIARP 2021. LNCS, vol. 12702, pp. 402–416. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-93420-0_38
Chapter Google Scholar
Plenkers, K., Ritter, J.R.R., Schindler, M.: Low signal-to-noise event detection based on waveform stacking and cross-correlation: application to a stimulation experiment. J. Seismol. 17(1), 27–49 (2013)
Google Scholar
Plinge, A., Grzeszick, R., Fink, G.A.: A bag-of-features approach to acoustic event detection. In: 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 3704–3708. IEEE (2014)
Google Scholar
Lu, X., Tsao, Y., Matsuda, S., Hori, C.: Sparse representation based on a bag of spectral exemplars for acoustic event detection. In: 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6255–6259. IEEE (2014)
Google Scholar
Espi, M., Fujimoto, M., Kinoshita, K., Nakatani, T.: Exploiting spectro-temporal locality in deep learning based acoustic event detection. EURASIP J. Audio Speech Music Process. 2015(1), 1–12 (2015)
Article Google Scholar
Pikrakis, A., Kopsinis, Y., Libra, M.L.I.: Dictionary learning assisted template matching for audio event detection (legato). Reconstruction 40, 60 (2016)
Google Scholar
Farina, A., Pieretti, N., Salutari, P., Tognari, E., Lombardi, A.: The application of the acoustic complexity indices (ACI) to ecoacoustic event detection and identification (EEDI) modeling. Biosemiotics 9(2), 227–246 (2016)
Google Scholar
Yang, L., Chen, X., Liu, Z., Sun, M.: Improving word representations with document labels. IEEE/ACM Trans. Audio Speech Lang. Process. 25(4), 863–870 (2017)
Article Google Scholar
Adavanne, S., Pertilä, P., Virtanen, T.: Sound event detection using spatial features and convolutional recurrent neural network. In: 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 771–775. IEEE (2017)
Google Scholar
Kim, H.-G., Kim, J.Y.: Environmental sound event detection in wireless acoustic sensor networks for home telemonitoring. China Commun. 14(9), 1–10 (2017)
Google Scholar
Lu, Z.: Sound event detection and localization based on CNN and LSTM. Detection Classification Acoust. Scenes Events Challenge, Technical report (2019)
Google Scholar
Cordourier, H., Meyer, P.L., Huang, J., Del Hoyo Ontiveros, J., Lu, H.: GCC-PHAT cross-correlation audio features for simultaneous sound event localization and detection (SELD) on multiple rooms, pp. 55–58 (2019)
Google Scholar
Cao, Y., Iqbal, T., Kong, Q., Galindo, M., Wang, W., Plumbley, M.: Two-stage sound event localization and detection using intensity vector and generalized cross-correlation. Technical report of Detection and Classification of Acoustic Scenes and Events 2019 (DCASE) Challenge (2019)
Google Scholar
Noh, K., Jeong-Hwan, C., Dongyeop, J., Joon-Hyuk, C.: Three-stage approach for sound event localization and detection. Technical report of Detection and Classification of Acoustic Scenes and Events 2019 (DCASE) Challenge (2019)
Google Scholar
Nguyen, T.N.T., Jones, D.L.: Gan, W.-S.: On the effectiveness of spatial and multi-channel features for multi-channel polyphonic sound event detection. In: DCASE, pp. 115–119 (2020)
Google Scholar
Sampathkumar, A., Kowerko, D.: Sound event detection and localization using CRNN models. (2020)
Google Scholar
Ick, C., McFee, B., Sound event detection in urban audio with single and multi-rate PCEN. In: ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 880–884. IEEE (2021)
Google Scholar
Nguyen, T.N.T., Watcharasupat, K.N., Nguyen, N.K., Jones, D.L., Gan, W.-S.: Salsa: spatial cue-augmented log-spectrogram features for polyphonic sound event localization and detection. IEEE/ACM Trans. Audio Speech Lang. Process. 30, pp. 1749–1762 (2022)
Google Scholar
Martín-Morató, I., Mesaros, A., Heittola, T., Virtanen, T., Cobos, M., Ferri, F.J.: Sound event envelope estimation in polyphonic mixtures. In: ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 935–939. IEEE (2019)
Google Scholar
Johnson, D.S., Lorenz, W., Taenzer, M., Mimilakis, S., Grollmisch, S., Abeßer, J., Lukashevich, H.: Desed-Fl and urban-Fl: federated learning datasets for sound event detection. In: 2021 29th European Signal Processing Conference (EUSIPCO), pp. 556–560. IEEE (2021)
Google Scholar
Salamon, J., MacConnell, D., Cartwright, M., Li, P., Bello, J.P.: Scaper: a library for soundscape synthesis and augmentation. In: 2017 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), pp. 344–348. IEEE (2017)
Google Scholar
Dinkel, H., Mengyue, W., Kai, Yu.: Towards duration robust weakly supervised sound event detection. IEEE/ACM Trans. Audio Speech Lang. Process. 29, 887–900 (2021)
Article Google Scholar
Huang, Y., Wang, X., Lin, L., Liu, H., Qian, Y.: Multi-branch learning for weakly-labeled sound event detection. In: ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 641–645. IEEE (2020)
Google Scholar
McFee, B., Salamon, J., Bello, J.P.: Adaptive pooling operators for weakly labeled sound event detection. IEEE/ACM Trans. Audio Speech Lang. Process. 26(11), 2180–2193 (2018)
Google Scholar

Download references

Acknowledgements

This article is partially a result of the project Safe Cities - “Inovação para Construir Cidades Seguras”, with reference POCI-01-0247-FEDER-041435, co-funded by the European Regional Development Fund (ERDF), through the Operational Programme for Competitiveness and Internationalization (COMPETE 2020), under the PORTUGAL 2020 Partnership Agreement. The first author would like to thank “Fundação para a Ciência e Tecnologia” (FCT) for his Ph.D. grant with reference 2021.08660.BD.

Author information

Authors and Affiliations

Faculdade de Engenharia, Universidade do Porto, Rua Dr. Roberto Frias, s/n, 4200-465, Porto, Portugal
Vahid Hajihashemi & Abdorreza Alavi Gharahbagh
Departamento de Engenharia Mecânica, Faculdade de Engenharia, Universidade do Porto, Rua Dr. Roberto Frias, s/n, 4200-465, Porto, Portugal
J. J. M. Machado & João Manuel R. S. Tavares

Authors

Vahid Hajihashemi
View author publications
You can also search for this author in PubMed Google Scholar
Abdorreza Alavi Gharahbagh
View author publications
You can also search for this author in PubMed Google Scholar
J. J. M. Machado
View author publications
You can also search for this author in PubMed Google Scholar
João Manuel R. S. Tavares
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to João Manuel R. S. Tavares .

Editor information

Editors and Affiliations

ISEG, Universidade de Lisboa, Lisbon, Cávado, Portugal
Alvaro Rocha
College of Engineering, The Ohio State University, Columbus, OH, USA
Hojjat Adeli
Institute of Data Science and Digital Technologies, Vilnius University, Vilnius, Lithuania
Gintautas Dzemyda
DCT, Universidade Portucalense, Porto, Portugal
Fernando Moreira
TeCIP Institute, Scuola Superiore Sant’Anna, Pisa, Italy
Valentina Colla

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hajihashemi, V., Gharahbagh, A.A., Machado, J.J.M., Tavares, J.M.R.S. (2024). Audio Event Detection Based on Cross Correlation in Selected Frequency Bands of Spectrogram. In: Rocha, A., Adeli, H., Dzemyda, G., Moreira, F., Colla, V. (eds) Information Systems and Technologies. WorldCIST 2023. Lecture Notes in Networks and Systems, vol 802. Springer, Cham. https://doi.org/10.1007/978-3-031-45651-0_19

Download citation

DOI: https://doi.org/10.1007/978-3-031-45651-0_19
Published: 15 February 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-45650-3
Online ISBN: 978-3-031-45651-0
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics

Audio Event Detection Based on Cross Correlation in Selected Frequency Bands of Spectrogram