Improving sound event detection through enhanced feature extraction and attention mechanisms

Zhang, Dongping; Wu, Siyi; Lu, Zhanhong; Zhang, Zhehao; Hu, Haimiao; Yu, Jiabin

doi:10.1007/s11704-025-41108-7

Improving sound event detection through enhanced feature extraction and attention mechanisms

Letter
Published: 03 April 2025

Volume 19, article number 1910707, (2025)
Cite this article

Frontiers of Computer Science Aims and scope Submit manuscript

Dongping Zhang¹,
Siyi Wu¹,
Zhanhong Lu²,
Zhehao Zhang³,
Haimiao Hu⁴ &
…
Jiabin Yu¹

35 Accesses
11 Altmetric
2 Mentions
Explore all metrics

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Li Y F, Liang D M. Safe semi-supervised learning: a brief introduction. Frontiers of Computer Science, 2019, 13(4): 669–676
MATH Google Scholar
Ji Z, Ni J, Liu X, Pang Y. Teachers cooperation: team-knowledge distillation for multiple cross-domain few-shot learning. Frontiers of Computer Science, 2023, 17(2): 172312
Google Scholar
Nam H, Kim S H, Ko B Y, Park Y H. Frequency dynamic convolution: frequency-adaptive pattern recognition for sound event detection. In: Proceedings of the 23rd Annual Conference of the International Speech Communication Association. 2022, 2763–2767
MATH Google Scholar
Xiao S, Zhang X, Zhang P. Multi-dimensional frequency dynamic convolution with confident mean teacher for sound event detection. In: Proceedings of ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 2023, 1–5
MATH Google Scholar
Chen S, Wu Y, Wang C, Liu S, Tompkins D, Chen Z, Che W, Yu X, Wei F. BEATs: audio pre-training with acoustic tokenizers. In: Proceedings of the 40th International Conference on Machine Learning. 2023, 5178–5193
MATH Google Scholar
Chen Y, Dai X, Liu M, Chen D, Yuan L, Liu Z. Dynamic convolution: attention over convolution kernels. In: Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020, 11027–11036
MATH Google Scholar
Li K, Cai P, Song Y. Li USTC team’s submission for DCASE 2023 challenge task4a. Technical Report, DCASE2023 Challenge, 2023
MATH Google Scholar
Li K, Song Y, Dai L R, McLoughlin I, Fang X, Liu L. AST-SED: an effective sound event detection method based on audio spectrogram transformer. In: Proceedings of ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 2023, 1–5
Google Scholar

Download references

Acknowledgements

This work was supported by the Zhejiang Provincial Key R&D Program (Nos. 2024C01108, 2023C01030, 2023C01034), the Hangzhou Key R&D Program (Nos. 2023SZD0046, 2024SZD1A03), and the Ningbo Key R&D Program (No. 2024Z114).

Author information

Authors and Affiliations

College of Information Engineering, China Jiliang University, Hangzhou, 310018, China
Dongping Zhang, Siyi Wu & Jiabin Yu
Hangzhou Hikvision Digital Technology Co., Ltd, Hangzhou, 310051, China
Zhanhong Lu
Hangzhou Aihua Intelligent Technology Co., Ltd, Hangzhou, 311422, China
Zhehao Zhang
Hangzhou Innovation Institute, Beihang University, Hangzhou, 310051, China
Haimiao Hu

Authors

Dongping Zhang
View author publications
Search author on:PubMed Google Scholar
Siyi Wu
View author publications
Search author on:PubMed Google Scholar
Zhanhong Lu
View author publications
Search author on:PubMed Google Scholar
Zhehao Zhang
View author publications
Search author on:PubMed Google Scholar
Haimiao Hu
View author publications
Search author on:PubMed Google Scholar
Jiabin Yu
View author publications
Search author on:PubMed Google Scholar

Corresponding author

Correspondence to Dongping Zhang.

Ethics declarations

Competing interests The authors declare that they have no competing interests or financial conflicts to disclose.

Electronic Supplementary Material