Elsevier

Neurocomputing

Volume 443, 5 July 2021, Pages 235-246
Neurocomputing

Few-shot SAR automatic target recognition based on Conv-BiLSTM prototypical network

https://doi.org/10.1016/j.neucom.2021.03.037Get rights and content

Abstract

In recent studies, synthetic aperture radar (SAR) automatic target recognition (ATR) algorithms have achieved high recognition accuracy in the moving and stationary target acquisition and recognition (MSTAR) data set. However, these algorithms usually require hundreds or more training samples of each target type. In order to extract azimuth-insensitive features in a SAR ATR task with only a few training samples, a convolutional bidirectional long short-term memory (Conv-BiLSTM) network is designed as an embedding network to map the SAR images into a new feature space where the classification problem becomes easier. Based on the embedding network, a novel few-shot SAR ATR framework called Conv-BiLSTM Prototypical Network (CBLPN) is proposed. Experimental results on the MSTAR benchmark data set have illustrated that the proposed method performs well in SAR image classification with only a few training samples.

Introduction

Thanks to its all-day, all-weather, high-resolution and long operating distance capability, synthetic aperture radar (SAR) has been widely applied in battlefield reconnaissance, terrain mapping, geological exploration, and marine observation. Unlike optical imaging, single-polarized SAR image represents the intensity of target scattering by gray scales and usually has blurred edges and strong anisotropy due to background clutter and limited resolution. All these factors will increase the difficulty in effective feature extraction and target recognition.

With recent advances of deep learning theory, deep neural networks have been widely used in many fields [1], [2], [3]. In recent studies, SAR target recognition techniques based on deep learning also have achieved great success with superior performance to traditional ones [4]. However, they usually require a large number of training samples of each target type, which is difficult to meet in real-word situations due to large cost or mission limit. In this scenario, the problem of few-shot SAR ATR arises and invalidates the available algorithms. Therefore, it is quite necessary to take in-depth study on effective few-shot SAR ATR methods.

For SAR target recognition based on single image, there are three mainstream methods, i.e., template matching [5], target modeling [6], and machine learning [7]. Such methods require the designing of a dedicated template, a target model, or a classifier in advance. Nevertheless, the heavy reliance on hand-designed features usually results in high complexity and poor generalization performance.

Recently, deep learning has found wide applications in SAR ATR because of its strong ability in automatic feature extraction, and typical methods mainly include convolutional neural network (CNN) [4], auto-encoders [8], recurrent neural networks [9], and generative adversarial network [10], etc. Among them, CNN and its improved versions have achieved state-of-the-art recognition results in the MSTAR benchmark data set [11]. Their typical structures consist of a feature extractor and a classifier. Specifically, the feature extractor is formed by stacking convolutional layers and pooling layers, and is applied to extract hierarchical features from original data. Initially, a traditional CNN structure which consists of convolutional layers, pooling layers, and the softmax classifier was proposed [12], [13]. Later, its improved versions were developed. For instance, S. Chen et al. designed the A-convent, where the number of unknown parameters was reduced greatly by removing the fully-connected layers [4]. S. Wagner replaced the softmax classifier by the SVM classifier and achieved high recognition accuracy [14]. R. H. Shang et al. added an information recorder to CNN to remember and store the spatial features of samples, and then utilized spatial similarity information of the recorded features to predict the unknown sample labels [15]. J. Wang et al. applied a despeckling sub-network to suppress speckle noise before classification [16]. J. Pei et al. proposed a multi-channel CNN structure, which utilized SAR images with different viewing angels to improve the recognition accuracy [17].

Generally, the available SAR ATR algorithms based on deep learning require a large number of training samples to obtain satisfying generalization performance and alleviate over-fitting. For insufficient training samples, the available techniques may: a) augment the training set by image rotating, shifting and distorting [4], [12]; b) pre-train the network with another data set by transfer learning [13]; or c) design special network structures to reduce the amount of training data required, e.g., to replace the convolutional layers by convolutional highway units [18]. Nevertheless, the above methods still need at least hundreds of training samples for each target type, and their recognition performance will degrade heavily if there are only a few training samples in some classes. Ref. [18] demonstrates that the recognition accuracy of deep learning methods and traditional machine learning methods falls below 40% when the training set only includes dozens of samples for each class (about 10% of the MSTAR SOC training set).

Recently, few-shot learning (FSL) is proposed to tackle the ATR problem where only a few samples of some target types are available [19]. Generally, a few-shot learning task includes three data sets, i.e. the test set, the support set, and the training set. The test set contains the target samples that need to be recognized; the support set contains a few labeled samples that belong to the same classes as the test set; while the training set contains other target classes different from those in the support/test set. By exploiting prior knowledge in the training set, FSL could rapidly generalize to new recognition tasks with limited samples in the support set, mimicking the human ability to acquire knowledge from few examples through generalization. Prototypical Network (PN) [20] is a classical FSL method and has been successfully applied to dermatological disease diagnosis [21] and hyperspectral image classification [22]. This method consists of two main stages: 1) transforming each sample into an embedding vector by a single-channel CNN; and 2) performing classification with the embedding vectors according to the Euclidean distance. Specifically, the unknown parameters in PN are learned by an episode-based method [20]. However, such method cannot be directly applied to SAR ATR. On the one hand, the episode-based network training method requires a large training set with thousands of samples, which is hard to collect in the SAR ATR task. On the other hand, the embedding vectors learned though the single-channel CNN are sensitive to azimuth variation and lack robustness.

To tackle the above-mentioned problems, a novel few-shot SAR ATR method is proposed. The contributions of this paper can be summarized as follows.

a) A novel few-shot learning method for SAR ATR, namely CBLPN, is proposed, which maps each SAR sample to an embedding vector and then performs SAR ATR in the embedding space. Compared with traditional SAR ATR methods, CBLPN obtains close recognition accuracy while requires much less labeled samples.

b) To reduce the influence of azimuth variation on SAR ATR and extract azimuth-robust features from SAR samples, a convolutional bidirectional long short-term memory (Conv-BiLSTM) network is designed to replace the common CNN structure as the feature extractor. Experiments on the MSTAR dataset show that Conv-BiLSTM is less sensitive to azimuth variation of SAR images than CNN and improves the robustness of SAR ATR effectively.

c) A random-episode weights update method is proposed to train the parameters in CBLPN. By randomly sampling in the training set to mimic the real SAR ATR task, the scarcity of labeled samples is alleviated and the parameters in CNLPN can be learned effectively.

The remainder of this paper is organized as follows: Section 2 gives a brief introduction to the research background, including recurrent neural network (RNN) and few-shot learning. Section 3 introduces the structure of Conv-BiLSTM network for SAR feature extraction. Section 4 introduces the framework of CBLPN and explains its components in detail. Section 5 describes the training of CBLPN. Section 6 shows the experimental results with discussions, and finally Section 7 concludes the paper. A summary of the main abbreviations in the paper is listed with their expanded form in Table 1.

Section snippets

Background

In this section, a brief introduction to RNN, and a special RNN structure, namely BiLSTM will be provided. Then, the basic conception of few-shot learning will be introduced.

Conv-BiLSTM network for SAR feature extraction

Most existing FSL approaches based on deep learning exploit a representation shared between the auxiliary training set and the support set. In order to get the representation, a projection from the original sample space to a new embedding space where the classification problem becomes easier is learned from the training set. In a typical optical few-shot learning task, CNN is usually utilized as a feature extractor to complete the projection [20]. However, SAR images are quite different from

Conv-BiLSTM prototypical networks for few-shot SAR ATR

Based on the Conv-BiLSTM network, a novel few-shot SAR ATR method called Conv-BiLSTM prototypical network (CBLPN) is proposed. The framework of CBLPN consists of two stages, i.e., the training stage, and the test stage, as shown in Fig. 5. The Conv-BiLSTM network in CBLPN works as an embedding network fϕ:RDRL with learnable parameters ϕ. Each SAR image is mapped into a L-dimensional vector by the embedding network. For each episode in a C-way K-shot SAR ATR task, each prototype ckRLk1,2,,C

Back propagation in Euclidean distance based classifier

The weights in CBLPN are updated by back propagation (BP) [27], which calculates the partial derivatives of the objective loss with respect to each node in the embedding vector. Because the update of other weights in CBLPN is the same as typical CNN and BiLSTM, we only present the BP of the classifier in detail.

Fig. 7 shows the BP in the Euclidean distance based classifier, where fϕ(x(j))=a1,a2,,aN is the embedding vector of sample x(j), and N is its dimension. The partial derivatives of (10)

Data set description

The training set, support set, and test set utilized in this paper are generated from the MSTAR data set provided by the Defense Advanced Research Projects Agency (DARPA). The data set was collected by Sandia National Laboratory SAR sensor platform in 1995 and 1996 using an X-band SAR sensor. It provides a nominal spatial resolution of 0.3×0.3m in both range and azimuth, and the image size is 128×128. The publicly released data sets include ten categories of ground military vehicles, i.e.

Conclusion

This paper proposed an end-to-end few-shot SAR ATR method, namely CBLPN, which can effectively recognize SAR targets with only a few training samples. In CBLPN, the Conv-BiLSTM network was designed to extract features insensitive to azimuth variation, and a classifier based on Euclidean distance was utilized for classification. In addition, a random-episode weights updating method was proposed to train the parameters in CBLPN. Experimental results on the MSTAR data set have illustrated the

CRediT authorship contribution statement

Li Wang: Conceptualization, Methodology, Software, Validation, Investigation, Formal analysis, Data curation, Writing - original draft, Visualization, Writing - review & editing. Xueru Bai: Resources, Supervision, Project administration, Funding acquisition. Ruihang Xue: Software. Feng Zhou: Supervision.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgements

This work was supported in part by the National Natural Science Foundation of China under Grant 61971332, 61631019, and 61801344.

Li Wang was born in Jiangsu, China, in 1992. He received the B.S. and Ph.D. degrees in signal and information processing from Xidian University, Xi’an, China, in 2015 and 2020, respectively. His major research interests include deep learning and radar automatic target recognition.

References (36)

  • C.Q. Hong et al.

    Multimodal Deep Autoencoder for Human Pose Recovery

    IEEE Trans. Image Process.

    (2015)
  • J. Yu, M. Tan, H. Y. Zhang, D. C. Tao, Y. Rui, Hierarchical Deep Click Feature Prediction for Fine-grained Image...
  • J. Yu et al.

    Spatial Pyramid-Enhanced NetVLAD With Weighted Triplet Loss for Place Recognition

    IEEE Trans. Neural Netw. Learn. Syst.

    (2020)
  • S.Z. Chen et al.

    Target classification using the deep convolutional networks for SAR images

    IEEE Trans. Geosci. Remote. Sens.

    (2016)
  • L.M. Novak et al.

    The automatic target-recognition system in SAIP

    Lincoln Lab. J.

    (1997)
  • H.C. Chiang et al.

    Model-based classification of radar images

    IEEE Trans. Inf. Theory

    (2000)
  • G.G. Dong et al.

    Classification on the monogenic scale space: Application to target recognition in SAR image

    IEEE Trans. Image Process.

    (2015)
  • S. Deng et al.

    SAR automatic target recognition based on Euclidean distance restricted autoencoder, IEEE

    J. Sel. Top. Appl. Earth Observ.

    (2017)
  • F. Zhang et al.

    Multi-aspect-aware bidirectional LSTM networks for synthetic aperture radar target recognition

    IEEE Access

    (2017)
  • C. Zheng et al.

    Semi-Supervised SAR ATR via Multi-Discriminator Generative Adversarial Network

    IEEE Sens. Journ.

    (2019)
  • The Air Force Moving and Stationary Target Recognition Database. [Online]. Available:...
  • J. Ding et al.

    Convolutional neural network with data augmentation for SAR target recognition

    IEEE Geosci. Remote Sens. Lett.

    (2016)
  • M. David et al.

    Improving SAR automatic target recognition models with transfer learning from simulated data

    IEEE Geosci. Remote Sens. Lett.

    (2017)
  • S. Wagner

    SAR ATR by a combination of convolutional neural network and support vector machines

    IEEE Trans. Aerosp. Electron. Syst.

    (2017)
  • R.H. Shang et al.

    SAR targets classification based on deep memory convolution neural networks and transfer parameters, IEEE

    J. Sel. Top. Appl. Earth Observ.

    (2018)
  • J. Wang et al.

    Xiao Bai, Ground target classification in noisy SAR images using convolutional neural networks, IEEE

    J. Sel. Top. Appl. Earth Observ.

    (2018)
  • J. Pei et al.

    SAR automatic target recognition based on multiview deep learning framework

    IEEE Trans. Geosci. Remote Sens.

    (2018)
  • Z. Lin et al.

    Deep convolutional highway unit network for SAR target classification with limited labeled training data

    IEEE Geosci. Remote Sens. Lett.

    (2017)
  • Cited by (45)

    • Crucial feature capture and discrimination for limited training data SAR ATR

      2023, ISPRS Journal of Photogrammetry and Remote Sensing
    • Transductive distribution calibration for few-shot learning

      2022, Neurocomputing
      Citation Excerpt :

      Most existing works on FSL such as Prototypical Network [39], Matching Network [44], and Relation Network [40] focus on developing models and corresponding optimizing methods that enable meta-knowledge pretraining and quick update of parameters on few samples. Besides, there are some works of FSL has applied to different domain, like palmprint recognition [25], radar target recognition [46], fault diagnosis [45], traffic prediction [42], etc. However, models still tend to overfit in new tasks since training samples are scarce.

    • Low frequency and radar's physical based features for improvement of convolutional neural networks for PolSAR image classification

      2022, Egyptian Journal of Remote Sensing and Space Science
      Citation Excerpt :

      Because they provide more polarimetric information and back scattering characteristics of the land covers, which can increase the class discrimination (Haldar et al., 2018). So far, many classification methods such as support vector machine (Gao et al., 2021) and random forest (Imani, 2020) from the category of the classic classifiers; and deep belief networks (Liu et al., 2016) and convolutional neural networks (CNNs) (Wang et al., 2021) from the category of deep learning methods have been suggested for PolSAR image classification. CNNs with high ability in hierarchical extraction of spatial features layer by layer have shown high success in processing and classification of both SAR and PolSAR images.

    • Cooperative density-aware representation learning for few-shot visual recognition

      2022, Neurocomputing
      Citation Excerpt :

      However, its outstanding performance tends to depend on the abundant labeled annotations, which are time-consuming and laborious. As a feasible manner, few-shot [4,5] or zero-shot [6,7] visual recognition can tackle this annotated burden and has recently attracted extensive attention due to their ability to imitate human intelligence. The goal of few-shot visual recognition [8–10] is to train the model parameter so that it can perceive unseen concepts from a given few labels.

    View all citing articles on Scopus

    Li Wang was born in Jiangsu, China, in 1992. He received the B.S. and Ph.D. degrees in signal and information processing from Xidian University, Xi’an, China, in 2015 and 2020, respectively. His major research interests include deep learning and radar automatic target recognition.

    Xueru Bai was born in Xi’an, Shaanxi, China, in 1984. She received the B.S. and Ph.D. degrees in signal and information processing from Xidian University, Xi’an, China, in 2006 and 2011, respectively. She is currently a Professor with the National Laboratory of Radar Signal Processing, Xidian University. Her research interests include high-resolution radar imaging and radar automatic target recognition. Dr. Bai was a recipient of the National Excellent Doctoral Dissertation Award granted by the Ministry of Education of China and the Program for Excellent Young Scientist selected by the National Natural Science Foundation of China.

    Ruihang Xue was born in Xi’an, Shaanxi, China, in 1996. He received the B.S. degree in electronic and information engineering from Xidian University, Xi’an, China, in 2018, where he is currently working toward the Ph.D. degree in signal and information processing in the National Laboratory of Radar Signal Processing, Xidian University. His major research interests include deep learning and radar automatic target recognition.

    Feng Zhou was born in Tongxu, Henan, China, in 1980. He received the Ph.D. degree in signal and information processing from Xidian University, Xi’an, China, in 2007. He is currently a Director and a Professor with the Key Laboratory of Electronic Information Countermeasure and Simulation Technology of Ministry of Education, Xidian University. He has authored or coauthored over 80 papers. His research interests include high-resolution radar imaging and radar countermeasures. Dr. Zhou was a recipient of the Young Scientist Award from the XXXI URSI GASS Committee, the program for Support of Top-notch Young Professionals from the Central Organization Department of China, and the program for New Century Excellent Talents in University from the Ministry of Education of China.

    View full text