skip to main content
10.1145/3655755.3655758acmotherconferencesArticle/Chapter ViewAbstractPublication PagesivspConference Proceedingsconference-collections
research-article

A Hierarchical Underwater Acoustic Target Recognition Method Based on Transformer and Transfer Learning

Published: 08 June 2024 Publication History

Abstract

Underwater acoustic target recognition (UATR) is one of the essential research directions in the underwater acoustic signal processing field. The machine learning-based recognition methods have solved some performance bottleneck problems of traditional recognition methods. One of the challenges faced now is the lack of trainable data. This paper uses the transfer learning (TL) strategy for UATR with limited training samples. We transfer the pre-trained Swin Transformer model on the ImageNet dataset to underwater acoustic signals. This paper presents a Hierarchical Underwater Acoustic Transformer (HUATrans) model for ship-radiated noise recognition. The Transformer architecture has better feature extraction capability than Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs). The proposed model uses the Swin Transformer based on the shifted window self-attention mechanism instead of the traditional Transformer architecture and improves the patch embedding module. It reduces the training time and saves a lot of computational resources. At the same time, this paper transfers the Transformer architecture pre-trained in the image domain to the UATR domain, which alleviates the burden on model training due to a lack of training data. Experimental results show that the HUATrans model with the TL strategy achieves excellent recognition performance on ship-radiated noise datasets DeepShip and ShipsEar.

References

[1]
Robert J.Urick. 1983. Principles of underwater sound. McGraw-Hill Book Company.
[2]
Giacomo Giorli, Whitlow W. L. Au, Hui Ou, Susan Jarvis, Ronald Morrissey, and David Moretti. 2015. Acoustic detection of biosonar activity of deep diving odontocetes at Josephine Seamount High Seas Marine Protected Area. The Journal of the Acoustical Society of America 137, 5: 2495–2501. https://doi.org/10.1121/1.4919291
[3]
Sichun Li, Shuyu Yang, and Jinghan Liang. 2020. Recognition of ships based on vector sensor and bidirectional long short-term memory networks. Applied Acoustics 164: 107248. https://doi.org/10.1016/j.apacoust.2020.107248
[4]
Christoph Molnar, Giuseppe Casalicchio, and Bernd Bischl. 2020. Interpretable Machine Learning – A Brief History, State-of-the-Art and Challenges. In ECML PKDD 2020 Workshops, Irena Koprinska, Michael Kamp, Annalisa Appice, Corrado Loglisci, Luiza Antonie, Albrecht Zimmermann, Riccardo Guidotti, Özlem Özgöbek, Rita P. Ribeiro, Ricard Gavaldà, João Gama, Linara Adilova, Yamuna Krishnamurthy, Pedro M. Ferreira, Donato Malerba, Ibéria Medeiros, Michelangelo Ceci, Giuseppe Manco, Elio Masciari, Zbigniew W. Ras, Peter Christen, Eirini Ntoutsi, Erich Schubert, Arthur Zimek, Anna Monreale, Przemyslaw Biecek, Salvatore Rinzivillo, Benjamin Kille, Andreas Lommatzsch and Jon Atle Gulla (eds.). Springer International Publishing, Cham, 417–431. https://doi.org/10.1007/978-3-030-65965-3_28
[5]
Sherin B. M. and Supriya M. H. 2015. Selection and parameter optimization of SVM kernel function for underwater target classification. In 2015 IEEE Underwater Technology (UT), 1–5. https://doi.org/10.1109/UT.2015.7108260
[6]
Natanael Nunes de Moura and Jose Manoel de Seixas. 2015. Novelty detection in passive SONAR systems using support vector machines. In 2015 Latin America Congress on Computational Intelligence (LA-CCI), 1–6. https://doi.org/10.1109/LA-CCI.2015.7435957
[7]
Feng Liu, Tongsheng Shen, Zailei Luo, Dexin Zhao, and Shaojun Guo. 2021. Underwater target recognition using convolutional recurrent neural networks with 3-D Mel-spectrogram and data augmentation. Applied Acoustics 178: 107989. https://doi.org/10.1016/j.apacoust.2021.107989
[8]
Suraj Kamal, C. Satheesh Chandran, and M.H. Supriya. 2021. Passive sonar automated target classifier for shallow waters using end-to-end learnable deep convolutional LSTMs. Engineering Science and Technology, an International Journal 24, 4: 860–871. https://doi.org/10.1016/j.jestch.2021.01.014
[9]
Gang Hu, Kejun Wang, and Liangliang Liu. 2021. Underwater Acoustic Target Recognition Based on Depthwise Separable Convolution Neural Networks. Sensors 21, 4: 1429. https://doi.org/10.3390/s21041429
[10]
Ming Zhong, Manuel Castellote, Rahul Dodhia, Juan Lavista Ferres, Mandy Keogh, and Arial Brewer. 2020. Beluga whale acoustic signal classification using deep learning neural network models. The Journal of the Acoustical Society of America 147, 3: 1834–1841. https://doi.org/10.1121/10.0000921
[11]
Xinwei Luo, Minghong Zhang, Ting Liu, Ming Huang, and Xiaogang Xu. 2021. An Underwater Acoustic Target Recognition Method Based on Spectrograms with Different Resolutions. Journal of Marine Science and Engineering 9, 11: 1246. https://doi.org/10.3390/jmse9111246
[12]
Xingyue Zhou and Kunde Yang. 2020. A denoising representation framework for underwater acoustic signal recognition. The Journal of the Acoustical Society of America 147, 4: EL377–EL383. https://doi.org/10.1121/10.0001130
[13]
Yan Wang, Hao Zhang, Lingwei Xu, Conghui Cao, and T. Aaron Gulliver. 2020. Adoption of hybrid time series neural network in the underwater acoustic signal modulation identification. Journal of the Franklin Institute 357, 18: 13906–13922. https://doi.org/10.1016/j.jfranklin.2020.09.047
[14]
Ali K. Ibrahim, Hanqi Zhuang, Laurent M. Chérubin, Michelle T. Schärer-Umpierre, and Nurgun Erdol. 2018. Automatic classification of grouper species by their sounds using deep neural networks. The Journal of the Acoustical Society of America 144, 3: EL196–EL202. https://doi.org/10.1121/1.5054911
[15]
2009. Gradient Flow in Recurrent Nets: The Difficulty of Learning LongTerm Dependencies. In A Field Guide to Dynamical Recurrent Networks. IEEE. https://doi.org/10.1109/9780470544037.ch14
[16]
Junyoung Chung, Caglar Gulcehre, KyungHyun Cho, and Yoshua Bengio. 2014. Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling. Retrieved March 6, 2023 from http://arxiv.org/abs/1412.3555
[17]
Felix A. Gers, Jürgen Schmidhuber, and Fred Cummins. 2000. Learning to Forget: Continual Prediction with LSTM. Neural Computation 12, 10: 2451–2471. https://doi.org/10.1162/089976600300015015
[18]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is All you Need. In Advances in Neural Information Processing Systems. Retrieved from https://proceedings.neurips.cc/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf
[19]
Xiao Liu, Shiyu Zhao, Kai Su, Yukuo Cen, Jiezhong Qiu, Mengdi Zhang, Wei Wu, Yuxiao Dong, and Jie Tang. 2022. Mask and Reason: Pre-Training Knowledge Graph Transformers for Complex Logical Queries. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 1120–1130. https://doi.org/10.1145/3534678.3539472
[20]
Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and Neil Houlsby. 2020. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. https://doi.org/10.48550/arXiv.2010.11929
[21]
Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, and Baining Guo. 2021. Swin Transformer: Hierarchical Vision Transformer using Shifted Windows. In 2021 IEEE/CVF International Conference on Computer Vision (ICCV), 9992–10002. https://doi.org/10.1109/ICCV48922.2021.00986
[22]
Yuan Gong, Yu-An Chung, and James Glass. 2021. AST: Audio Spectrogram Transformer. In Interspeech 2021, 571–575. https://doi.org/10.21437/Interspeech.2021-698
[23]
A Noumida and Rajeev Rajan. 2022. Multi-label bird species classification from audio recordings using attention framework. Applied Acoustics 197: 108901. https://doi.org/10.1016/j.apacoust.2022.108901
[24]
Karl Weiss, Taghi M. Khoshgoftaar, and DingDing Wang. 2016. A survey of transfer learning. Journal of Big Data 3, 1: 9. https://doi.org/10.1186/s40537-016-0043-6
[25]
Fuzhen Zhuang, Zhiyuan Qi, Keyu Duan, Dongbo Xi, Yongchun Zhu, Hengshu Zhu, Hui Xiong, and Qing He. 2021. A Comprehensive Survey on Transfer Learning. Proceedings of the IEEE 109, 1: 43–76. https://doi.org/10.1109/JPROC.2020.3004555
[26]
Changro Lee. 2022. Random Forest with Transfer Learning: An Application to Vehicle Valuation. Journal of Advances in Information Technology 13, 4. https://doi.org/10.12720/jait.13.4.326-331
[27]
Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. ImageNet: A large-scale hierarchical image database. In 2009 IEEE Conference on Computer Vision and Pattern Recognition. https://doi.org/10.1109/cvpr.2009.5206848
[28]
Zachary C Lipton, John Berkowitz, and Charles Elkan. 2015. A critical review of recurrent neural networks for sequence learning. arXiv preprint arXiv:1506.00019.
[29]
Huu-Thu Nguyen, Eon-Ho Lee, and Sejin Lee. 2019. Study on the Classification Performance of Underwater Sonar Image Classification Based on Convolutional Neural Networks for Detecting a Submerged Human Body. Sensors 20, 1: 94. https://doi.org/10.3390/s20010094
[30]
Louise Rixon Fuchs, Andreas Gallstrom, and John Folkesson. 2018. Object Recognition in Forward Looking Sonar Images using Transfer Learning. In 2018 IEEE/OES Autonomous Underwater Vehicle Workshop (AUV), 1–6. https://doi.org/10.1109/AUV.2018.8729686
[31]
Andrey Guzhov, Federico Raue, Jörn Hees, and Andreas Dengel. 2021. ESResNet: Environmental Sound Classification Based on Visual Domain Models. In 2020 25th International Conference on Pattern Recognition (ICPR), 4933–4940. https://doi.org/10.1109/ICPR48806.2021.9413035
[32]
Grzegorz Gwardys and Daniel Grzywczak. 2014. Deep Image Features in Music Information Retrieval. International Journal of Electronics and Telecommunications 60, 4: 321–326. https://doi.org/10.2478/eletel-2014-0042
[33]
Karen Simonyan and Andrew Zisserman. 2015. Very Deep Convolutional Networks for Large-Scale Image Recognition. Retrieved March 1, 2023 from http://arxiv.org/abs/1409.1556
[34]
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. Retrieved August 24, 2022 from http://arxiv.org/abs/1810.04805
[35]
Yuan Gong, Yu-An Chung, and James Glass. 2021. PSLA: Improving Audio Tagging With Pretraining, Sampling, Labeling, and Aggregation. IEEE/ACM Transactions on Audio, Speech, and Language Processing 29: 3292–3306. https://doi.org/10.1109/TASLP.2021.3120633
[36]
Ilya Loshchilov and Frank Hutter. 2019. Decoupled Weight Decay Regularization. Retrieved August 18, 2022 from http://arxiv.org/abs/1711.05101
[37]
Feng Hong, Chengwei Liu, Lijuan Guo, Feng Chen, and Haihong Feng. 2021. Underwater Acoustic Target Recognition with ResNet18 on ShipsEar Dataset. In 2021 IEEE 4th International Conference on Electronics Technology (ICET), 1240–1244. https://doi.org/10.1109/ICET51757.2021.9451099
[38]
Jie Chen, Bing Han, Xufeng Ma, and Jian Zhang. 2021. Underwater Target Recognition Based on Multi-Decision LOFAR Spectrum Enhancement: A Deep-Learning Approach. Future Internet 13, 10: 265. https://doi.org/10.3390/fi13100265

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
IVSP '24: Proceedings of the 2024 6th International Conference on Image, Video and Signal Processing
March 2024
229 pages
ISBN:9798400716829
DOI:10.1145/3655755
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 08 June 2024

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. HUATrans
  2. Transfer learning
  3. Transformer architecture
  4. Underwater Acoustic Target Recognition

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Funding Sources

  • Stable Supporting Fund of National Key Laboratory of Underwater Acoustic Technology
  • National Natural Science Foundation of China
  • Fund of Science and Technology on Sonar Laboratory
  • Fundamental Research Funds for Central Universities

Conference

IVSP 2024

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 50
    Total Downloads
  • Downloads (Last 12 months)50
  • Downloads (Last 6 weeks)8
Reflects downloads up to 20 Jan 2025

Other Metrics

Citations

Cited By

View all

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media