Abstract
Underwater acoustic voiceprint recognition, serving as a key technology in the field of biometric identification, presents a wide range of application prospects, especially in areas such as marine resource development, underwater communication, and underwater safety monitoring. Conventional acoustic voiceprint recognition methods exhibit limitations in underwater environments, prompting the need for a lightweight neural network approach to optimally address underwater acoustic voiceprint recognition tasks. This paper introduces a novel lightweight voicing recognition model, the Echo Lite Voice Fusion Network (ELVFN), which incorporates depthwise separable convolution and self-attention mechanism, and significantly improves voicing recognition performance by optimizing acoustic feature extraction technology and hierarchical feature fusion strategy. Concurrently, the computational complexity and parameter quantity of the model are substantially reduced. Comparative analyses with existing acoustic voiceprint recognition models corroborate the superior performance of our model across multiple underwater acoustic datasets. Experimental results demonstrate that ELVFN outperforms in various evaluation metrics, notably in terms of processing efficiency and recognition accuracy. Finally, we discuss the application potential and future development directions of the model, providing an efficient solution for underwater acoustic voiceprint recognition in resource-constrained environments.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Data Availability
The data that support the findings of this study are available from the corresponding author, Guan, upon reasonable request.
References
Urick RJ (1983) Principles of Underwater Sound; McGraw-Hill Book Co: Los Angeles. CA, USA, p 423p
Kenny AJ, Cato I, Desprez M, Fader G, Schüttenhelm RTE, Side J (2003) An overview of seabed-mapping technologies in the context of marine habitat classification. ICES J Mar Sci 60:411–418
Testolin A, Kipnis D, Diamant R (2022) Detecting submerged objects using active acoustics and deep neural networks: a test case for pelagic fish. IEEE Trans Mob Comput 21:2776–2788
Testolin A, Diamant R (2020) Combining denoising autoencoders and dynamic programming for acoustic detection and tracking of underwater moving targets. Sensors 20:2945
Yuan F, Ke X, Cheng E (2019) Joint representation and recognition for ship-radiated noise based on multimodal deep learning. J Mar Sci Eng 7:380
Meng Q, Yang S (2015) A wave structure based method for recognition of marine acoustic target signals[J]. J Acoust Soc Am 137(04):2242–2242
Azimi-Sadjadi MR, Yao D, Huang Q et al (2000) Underwater target classification using wavelet packets and neural networks[J]. IEEE Trans Neural Netw 11(03):784–794
Kang C, Zhang X, Zhang A et al (2004) Underwater acoustic targets classification using welch spectrum estimation and neural networks[C]. Advances in neural networks-ISNN 2004: international symposium on neural networks. Dalian, China, pp 930–935
Das A, Kumar A, Bahl R (2013) Marine vessel classification based on passive sonar data: the cepstrum-based approach[J]. IET Radar, Sonar & Navigation 7(01):87–93
Zhang L, Wu D, Han X et al (2016) Feature extraction of underwater target signal using mel frequency cepstrum coefficients based on acoustic vector sensor[J]. J Sens 7864213:1–11
Jahromi MS, Bagheri V, Rostami H et al (2019) Feature extraction in fractional Fourier domain for classification of passive sonar signals[J]. J Signal Process Syst 91:511–520
Mohankumar K, Supriya MH, Pillai PR (2015) Bispectral gammatone cepstral coefficient based neural network classifier[C]. In: 2015 IEEE underwater technology, Chennai, India, 2015, pp 1–5
Tuma M, Rørbech V, Prior MK et al (2016) Integrated optimization of long-range underwater signal detection, feature extraction, and classification for nuclear treaty monitoring[J]. IEEE Trans Geosci Remote Sens 54(06):3649–3659
Ke X, Yuan F, Cheng E (2020) Integrated optimization of underwater acoustic ship-radiated noise recognition based on two-dimensional feature fusion[J]. Appl Acoust 159:107057
Yang H, Gan A, Chen H et al (2016) Underwater acoustic target recognition using SVM ensemble via weighted sample and feature selection[C]. 2016 13th International Bhurban conference on applied sciences and technology. Islamabad, Pakistan, pp 522–527
Filho WS, Seixas JM, Moura NN (2011) Preprocessing passive sonar signals for neural classification[J]. IET Radar, Sonar & Navigation 5(06):605–612
Domingos LC, Santos PE, Skelton PS et al (2022) A survey of underwater acoustic data classification methods using deep learning for shoreline surveillance[J]. Sensors 22(6):2181
Zeng X-Y, Wang S-G (2013) Bark-wavelet analysis and Hilbert-Huang transform for underwater target recognition. Def Technol 9(2):115–120
ksuren IG, Hocaoglu AK (2022) Automatic Target Classification Using Underwater Acoustic Signals. In Proceedings of the 2022 30th signal processing and communications applications conference (SIU). Safranbolu, Turkey, 15–18
Wang P, Peng Y (2020) Research on feature extraction and recognition method of underwater acoustic target based on deep convolutional network. In: Proceedings of the 2020 IEEE international conference on advances in electrical engineering and computer applications (AEECA). Dalian, China, 25–27, pp 863–868
Chao-xion S (2014) Application of the lifting wavelet transform based MFCC in target identification. Tech Acoust 33:372–375
Teng B, Zhao H (2020) Underwater target recognition methods based on the framework of deep learning: a survey. Int J Adv Robot Syst 17:172988142097630
Zhang Q, Da L, Zhang Y, Hu Y (2021) Integrated neural networks based on feature fusion for underwater target recognition. Appl Acoust 182:108261
Desplanques B et al (2020) ECAPA-TDNN: emphasized channel attention, propagation and aggregation in TDNN based speaker verification. Interspeech 2020, https://doi.org/10.21437/interspeech.2020-2650
Hu J et al (2018) Squeeze-and-Excitation Networks. In: 2018 IEEE/CVF conference on computer vision and pattern recognition
Wang H et al (2023) CAM++: a fast and efficient network for speaker verification using context-aware masking
Yu Y-Q, Li W-J (2020) Densely connected time delay neural network for speaker verification. Interspeech
Choudhary S et al (2022) LEAN: light and efficient audio classification network. In: 2022 IEEE 19th india council international conference (INDICON). IEEE
Chen Y et al (2023) Effective audio classification network based on paired inverse pyramid structure and dense mlp block. In: Huang, DS, Premaratne P, Jin B, Qu B, Jo KH, Hussain A (eds) Advanced intelligent computing technology and applications. ICIC 2023. Lecture Notes in Computer Science, vol 14087. Springer, Singapore
Jena KK et al (2023) A hybrid deep learning approach for classification of music genres using wavelet and spectrogram analysis. Neural Computing and Applications, pp 11223–48. https://doi.org/10.1007/s00521-023-08294-6
Liu X et al (2023) Cat: Causal audio transformer for audio classification. In: ICASSP 2023-2023 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE
Zhu W, Omar M (2023) Multiscale audio spectrogram transformer for efficient audio classification
Howard A et al (2019) Searching for MobileNetV3. In: 2019 IEEE/CVF international conference on computer vision (ICCV), https://doi.org/10.1109/iccv.2019.00140
Ramachandran P, Zoph B, Le Q V (2017) Searching for activation functions[J]. arXiv:1710.05941
Castro M, Mario A, et al (2022) Prediction of speech intelligibility with dnn-based performance measures. Computer Speech & amp; Language, pp 101329, https://doi.org/10.1016/j.csl.2021.101329
Kong Q, et al (2021) PANNs: large-scale pretrained audio neural networks for audio pattern recognition. IEEE/ACM Transactions on Audio, Speech, and Language Processing
Woo, Sanghyun, et al (2023) Convnext v2: Co-designing and scaling convnets with masked autoencoders. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition
Touvron H, Cord M, Jégou H (2022) DeiT III: revenge of the ViT. In: Avidan S, Brostow G, Cissé M, Farinella GM, Hassner T (eds) ECCV. Springer, Cham, pp 516–533
Acknowledgements
This work is supported by the National Natural Science Foundation of China (No. 62472220).
Author information
Authors and Affiliations
Contributions
Jiaqi Wu:Methodology, Experiment. Donghai Guan: Methodology, Writing. Weiwei Yuan: Experiment, Writing.
Corresponding author
Ethics declarations
Competing of Interest
No potential conflict of interest was reported by the authors.
Ethical and informed consent for data used
In this study, we used public benchmark dataset.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Wu, J., Guan, D. & Yuan, W. Echo lite voice fusion network: advancing underwater acoustic voiceprint recognition with lightweight neural architectures. Appl Intell 55, 112 (2025). https://doi.org/10.1007/s10489-024-06035-3
Accepted:
Published:
DOI: https://doi.org/10.1007/s10489-024-06035-3