skip to main content
10.1145/3604078.3604096acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicdipConference Proceedingsconference-collections
research-article

UATR-MSG-Transformer: A Deep Learning Network for Underwater Acoustic Target Recognition Based on Spectrogram Feature Fusion and Transformer with Messenger Tokens

Authors Info & Claims
Published:26 October 2023Publication History

ABSTRACT

Underwater acoustic target recognition (UATR) based on deep learning faces the problem of low recognition accuracy on larger datasets. The UATR-MSG-Transformer (Transformer with messenger tokens for UATR) model is proposed in this paper. The Mel-filter bank (Mel-fbank) and LOFAR spectrogram features of each target noise are extracted and concatenated in the channel dimension as the input, and the Squeeze-and-Excitation (SE) block is used to learn and adjust the weight of each feature in the channel dimension. Then the features are projected into tokens and split into local windows, and a messenger (MSG) token is introduced in each local window to summarize the information within the window and exchange it with other windows. Experimental results show that UATR-MSG-Transformer can effectively improve the accuracy of recognition.

References

  1. Kinnunen T, Li H. An overview of text-independent speaker recognition: From features to supervectors[J]. Speech communication, 2010, 52(1): 12-40.Google ScholarGoogle Scholar
  2. Adavanne S, Politis A, Nikunen J, Sound event localization and detection of overlapping sources using convolutional recurrent neural networks[J]. IEEE Journal of Selected Topics in Signal Processing, 2018, 13(1): 34-48.Google ScholarGoogle ScholarCross RefCross Ref
  3. Irfan M, Jiangbin Z, Ali S, DeepShip: An underwater acoustic benchmark dataset and a separable convolution based autoencoder for classification[J]. Expert Systems with Applications, 2021, 183: 115270.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Liu F, Shen T, Luo Z, Underwater target recognition using convolutional recurrent neural networks with 3-D Mel-spectrogram and data augmentation[J]. Applied Acoustics, 2021, 178: 107989.Google ScholarGoogle ScholarCross RefCross Ref
  5. Hong F, Liu C, Guo L, Underwater acoustic target recognition with resnet18 on shipsear dataset[C]//2021 IEEE 4th International Conference on Electronics Technology (ICET). IEEE, 2021: 1240-1244.Google ScholarGoogle Scholar
  6. Santos-Domínguez D, Torres-Guijarro S, Cardenal-López A, ShipsEar: An underwater vessel noise database[J]. Applied Acoustics, 2016, 113: 64-69.Google ScholarGoogle ScholarCross RefCross Ref
  7. Yu F, Koltun V. Multi-scale context aggregation by dilated convolutions[J]. arXiv preprint arXiv:1511.07122, 2015.Google ScholarGoogle Scholar
  8. Wang P, Chen P, Yuan Y, Understanding convolution for semantic segmentation[C]//2018 IEEE winter conference on applications of computer vision (WACV). Ieee, 2018: 1451-1460.Google ScholarGoogle Scholar
  9. Gong Y, Chung Y A, Glass J. Ast: Audio spectrogram transformer[J]. arXiv preprint arXiv:2104.01778, 2021.Google ScholarGoogle Scholar
  10. Yuan L, Chen Y, Wang T, Tokens-to-token vit: Training vision transformers from scratch on imagenet[C]//Proceedings of the IEEE/CVF international conference on computer vision. 2021: 558-567.Google ScholarGoogle Scholar
  11. Feng S, Zhu X. A Transformer-Based Deep Learning Network for Underwater Acoustic Target Recognition[J]. IEEE Geoscience and Remote Sensing Letters, 2022, 19: 1-5.Google ScholarGoogle ScholarCross RefCross Ref
  12. Yin X, Sun X, Liu P, Underwater acoustic target classification based on LOFAR spectrum and convolutional neural network[C]//Proceedings of the 2nd International Conference on Artificial Intelligence and Advanced Manufacture. 2020: 59-63.Google ScholarGoogle Scholar
  13. Chen J, Han B, Ma X, Underwater target recognition based on multi-decision lofar spectrum enhancement: A deep-learning approach[J]. Future Internet, 2021, 13(10): 265.Google ScholarGoogle ScholarCross RefCross Ref
  14. Su Y, Zhang K, Wang J, Environment sound classification using a two-stream CNN based on decision-level fusion[J]. Sensors, 2019, 19(7): 1733.Google ScholarGoogle ScholarCross RefCross Ref
  15. Tian S, Chen D, Wang H, Deep convolution stack for waveform in underwater acoustic target recognition[J]. Scientific reports, 2021, 11(1): 9614.Google ScholarGoogle Scholar
  16. Xu C, Li Y, Zhang M, Underwater Acoustic Target Recognition Based on Feature Fusion and Self-attention Mechanism[J]. Mobile Communications,202,46(06): 91-98.Google ScholarGoogle Scholar
  17. Luo D, Fang J, Liu Y. Feature Fusion Methods Based on Channel Domain Attention Mechanism[J]. Journal of northeast normal university (natural science edition), 2021 does (03): 44-48.Google ScholarGoogle Scholar
  18. Hu J, Shen L, Sun G. Squeeze-and-excitation networks[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2018: 7132-7141.Google ScholarGoogle Scholar
  19. Dai Y, Gieseke F, Oehmcke S, Attentional feature fusion[C]//Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. 2021: 3560-3569.Google ScholarGoogle Scholar
  20. Wang Q, Wu B, Zhu P, ECA-Net: Efficient channel attention for deep convolutional neural networks[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2020: 11534-11542.Google ScholarGoogle Scholar
  21. Izumi Y, Suto D, Kawakami S, Blind Image Restoration and Super-Resolution for Multispectral Images Using Sparse Optimization[J]. Journal of Image and Graphics, 2022, 10(1).Google ScholarGoogle ScholarCross RefCross Ref
  22. Vaswani A, Shazeer N, Parmar N, Attention is all you need[J]. Advances in neural information processing systems, 2017, 30.Google ScholarGoogle Scholar
  23. Dosovitskiy A, Beyer L, Kolesnikov A, An image is worth 16x16 words: Transformers for image recognition at scale[J]. arXiv preprint arXiv:2010.11929, 2020.Google ScholarGoogle Scholar
  24. Liu Z, Lin Y, Cao Y, Swin transformer: Hierarchical vision transformer using shifted windows[C]//Proceedings of the IEEE/CVF international conference on computer vision. 2021: 10012-10022.Google ScholarGoogle Scholar
  25. Fang J, Xie L, Wang X, Msg-transformer: Exchanging local spatial information by manipulating messenger tokens[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022: 12063-12072.Google ScholarGoogle Scholar
  26. Park D S, Chan W, Zhang Y, Specaugment: A simple data augmentation method for automatic speech recognition[J]. arXiv preprint arXiv:1904.08779, 2019.Google ScholarGoogle Scholar
  27. Tan M, Le Q. Efficientnet: Rethinking model scaling for convolutional neural networks[C]//International conference on machine learning. PMLR, 2019: 6105-6114.Google ScholarGoogle Scholar
  28. Hsiao S F, Tsai B C. Efficient computation of depthwise separable convolution in MoblieNet deep neural network models[C]//2021 IEEE International Conference on Consumer Electronics-Taiwan (ICCE-TW). IEEE, 2021: 1-2.Google ScholarGoogle Scholar
  29. He K, Zhang X, Ren S, Deep residual learning for image recognition[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2016: 770-778.Google ScholarGoogle Scholar

Index Terms

  1. UATR-MSG-Transformer: A Deep Learning Network for Underwater Acoustic Target Recognition Based on Spectrogram Feature Fusion and Transformer with Messenger Tokens

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Other conferences
        ICDIP '23: Proceedings of the 15th International Conference on Digital Image Processing
        May 2023
        711 pages
        ISBN:9798400708237
        DOI:10.1145/3604078

        Copyright © 2023 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 26 October 2023

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Research
        • Refereed limited
      • Article Metrics

        • Downloads (Last 12 months)38
        • Downloads (Last 6 weeks)15

        Other Metrics

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      HTML Format

      View this article in HTML Format .

      View HTML Format