Skip to main content

Audio Event Detection Based on Cross Correlation in Selected Frequency Bands of Spectrogram

  • Conference paper
  • First Online:
Information Systems and Technologies (WorldCIST 2023)

Abstract

Audio event detection (AED) systems have various applications in modern world. Examples of applications include security systems, urban management and automatic monitoring in smart cities, and online multimedia processing. The noise and background sound vary in an urban environment, so frequency domain and normalized features usually show better efficiency in AED systems.

This work proposes a Mel spectrogram-based approach that uses the spectral characteristics of audio signals and cross-correlations to build a dictionary of effective spectrogram frequency bands and their patterns in different audio events. Initially, the proposed approach extracts the Mel spectrogram of audio input. In the next step, a mathematical-statistical analysis is used to specify the effective frequency bands of the spectrogram in each audio event. The pattern of selected frequency bands varies due to the type of event, which can effectively help decrease spectrogram size as input feature, reduce errors and increase the accuracy of different AED methods. The proposed approach was implemented on the URBAN-SED database, and its efficiency was compared against deep learning base state-of-the-art researches in the field. According to the results, about 50% of the frequency bands in the spectrum are useless and can be discarded in the training process of an AED system without any loss in terms of accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Hajihashemi, V., Gharahbagh, A.A., Cruz, P.M., Ferreira, M.C., Machado, J.J.M., Tavares, J.M.R.S.: Binaural acoustic scene classification using wavelet scattering, parallel ensemble classifiers and nonlinear fusion. Sensors 22(4), 1535 (2022)

    Google Scholar 

  2. Hajihashemi, V., Alavigharahbagh, A., Oliveira, H.S., Cruz, P.M., Tavares, J.M.R.S.: Novel time-frequency based scheme for detecting sound events from sound background in audio segments. In: Tavares, J.M.R.S., Papa, J.P., González Hidalgo, M. (eds.) CIARP 2021. LNCS, vol. 12702, pp. 402–416. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-93420-0_38

    Chapter  Google Scholar 

  3. Plenkers, K., Ritter, J.R.R., Schindler, M.: Low signal-to-noise event detection based on waveform stacking and cross-correlation: application to a stimulation experiment. J. Seismol. 17(1), 27–49 (2013)

    Google Scholar 

  4. Plinge, A., Grzeszick, R., Fink, G.A.: A bag-of-features approach to acoustic event detection. In: 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 3704–3708. IEEE (2014)

    Google Scholar 

  5. Lu, X., Tsao, Y., Matsuda, S., Hori, C.: Sparse representation based on a bag of spectral exemplars for acoustic event detection. In: 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6255–6259. IEEE (2014)

    Google Scholar 

  6. Espi, M., Fujimoto, M., Kinoshita, K., Nakatani, T.: Exploiting spectro-temporal locality in deep learning based acoustic event detection. EURASIP J. Audio Speech Music Process. 2015(1), 1–12 (2015)

    Article  Google Scholar 

  7. Pikrakis, A., Kopsinis, Y., Libra, M.L.I.: Dictionary learning assisted template matching for audio event detection (legato). Reconstruction 40, 60 (2016)

    Google Scholar 

  8. Farina, A., Pieretti, N., Salutari, P., Tognari, E., Lombardi, A.: The application of the acoustic complexity indices (ACI) to ecoacoustic event detection and identification (EEDI) modeling. Biosemiotics 9(2), 227–246 (2016)

    Google Scholar 

  9. Yang, L., Chen, X., Liu, Z., Sun, M.: Improving word representations with document labels. IEEE/ACM Trans. Audio Speech Lang. Process. 25(4), 863–870 (2017)

    Article  Google Scholar 

  10. Adavanne, S., Pertilä, P., Virtanen, T.: Sound event detection using spatial features and convolutional recurrent neural network. In: 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 771–775. IEEE (2017)

    Google Scholar 

  11. Kim, H.-G., Kim, J.Y.: Environmental sound event detection in wireless acoustic sensor networks for home telemonitoring. China Commun. 14(9), 1–10 (2017)

    Google Scholar 

  12. Lu, Z.: Sound event detection and localization based on CNN and LSTM. Detection Classification Acoust. Scenes Events Challenge, Technical report (2019)

    Google Scholar 

  13. Cordourier, H., Meyer, P.L., Huang, J., Del Hoyo Ontiveros, J., Lu, H.: GCC-PHAT cross-correlation audio features for simultaneous sound event localization and detection (SELD) on multiple rooms, pp. 55–58 (2019)

    Google Scholar 

  14. Cao, Y., Iqbal, T., Kong, Q., Galindo, M., Wang, W., Plumbley, M.: Two-stage sound event localization and detection using intensity vector and generalized cross-correlation. Technical report of Detection and Classification of Acoustic Scenes and Events 2019 (DCASE) Challenge (2019)

    Google Scholar 

  15. Noh, K., Jeong-Hwan, C., Dongyeop, J., Joon-Hyuk, C.: Three-stage approach for sound event localization and detection. Technical report of Detection and Classification of Acoustic Scenes and Events 2019 (DCASE) Challenge (2019)

    Google Scholar 

  16. Nguyen, T.N.T., Jones, D.L.: Gan, W.-S.: On the effectiveness of spatial and multi-channel features for multi-channel polyphonic sound event detection. In: DCASE, pp. 115–119 (2020)

    Google Scholar 

  17. Sampathkumar, A., Kowerko, D.: Sound event detection and localization using CRNN models. (2020)

    Google Scholar 

  18. Ick, C., McFee, B., Sound event detection in urban audio with single and multi-rate PCEN. In: ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 880–884. IEEE (2021)

    Google Scholar 

  19. Nguyen, T.N.T., Watcharasupat, K.N., Nguyen, N.K., Jones, D.L., Gan, W.-S.: Salsa: spatial cue-augmented log-spectrogram features for polyphonic sound event localization and detection. IEEE/ACM Trans. Audio Speech Lang. Process. 30, pp. 1749–1762 (2022)

    Google Scholar 

  20. Martín-Morató, I., Mesaros, A., Heittola, T., Virtanen, T., Cobos, M., Ferri, F.J.: Sound event envelope estimation in polyphonic mixtures. In: ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 935–939. IEEE (2019)

    Google Scholar 

  21. Johnson, D.S., Lorenz, W., Taenzer, M., Mimilakis, S., Grollmisch, S., Abeßer, J., Lukashevich, H.: Desed-Fl and urban-Fl: federated learning datasets for sound event detection. In: 2021 29th European Signal Processing Conference (EUSIPCO), pp. 556–560. IEEE (2021)

    Google Scholar 

  22. Salamon, J., MacConnell, D., Cartwright, M., Li, P., Bello, J.P.: Scaper: a library for soundscape synthesis and augmentation. In: 2017 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), pp. 344–348. IEEE (2017)

    Google Scholar 

  23. Dinkel, H., Mengyue, W., Kai, Yu.: Towards duration robust weakly supervised sound event detection. IEEE/ACM Trans. Audio Speech Lang. Process. 29, 887–900 (2021)

    Article  Google Scholar 

  24. Huang, Y., Wang, X., Lin, L., Liu, H., Qian, Y.: Multi-branch learning for weakly-labeled sound event detection. In: ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 641–645. IEEE (2020)

    Google Scholar 

  25. McFee, B., Salamon, J., Bello, J.P.: Adaptive pooling operators for weakly labeled sound event detection. IEEE/ACM Trans. Audio Speech Lang. Process. 26(11), 2180–2193 (2018)

    Google Scholar 

Download references

Acknowledgements

This article is partially a result of the project Safe Cities - “Inovação para Construir Cidades Seguras”, with reference POCI-01-0247-FEDER-041435, co-funded by the European Regional Development Fund (ERDF), through the Operational Programme for Competitiveness and Internationalization (COMPETE 2020), under the PORTUGAL 2020 Partnership Agreement. The first author would like to thank “Fundação para a Ciência e Tecnologia” (FCT) for his Ph.D. grant with reference 2021.08660.BD.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to João Manuel R. S. Tavares .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Hajihashemi, V., Gharahbagh, A.A., Machado, J.J.M., Tavares, J.M.R.S. (2024). Audio Event Detection Based on Cross Correlation in Selected Frequency Bands of Spectrogram. In: Rocha, A., Adeli, H., Dzemyda, G., Moreira, F., Colla, V. (eds) Information Systems and Technologies. WorldCIST 2023. Lecture Notes in Networks and Systems, vol 802. Springer, Cham. https://doi.org/10.1007/978-3-031-45651-0_19

Download citation

Publish with us

Policies and ethics