Few-shot SAR automatic target recognition based on Conv-BiLSTM prototypical network

doi:10.1016/j.neucom.2021.03.037

Neurocomputing

Volume 443, 5 July 2021, Pages 235-246

https://doi.org/10.1016/j.neucom.2021.03.037 Get rights and content

Abstract

In recent studies, synthetic aperture radar (SAR) automatic target recognition (ATR) algorithms have achieved high recognition accuracy in the moving and stationary target acquisition and recognition (MSTAR) data set. However, these algorithms usually require hundreds or more training samples of each target type. In order to extract azimuth-insensitive features in a SAR ATR task with only a few training samples, a convolutional bidirectional long short-term memory (Conv-BiLSTM) network is designed as an embedding network to map the SAR images into a new feature space where the classification problem becomes easier. Based on the embedding network, a novel few-shot SAR ATR framework called Conv-BiLSTM Prototypical Network (CBLPN) is proposed. Experimental results on the MSTAR benchmark data set have illustrated that the proposed method performs well in SAR image classification with only a few training samples.

Introduction

Thanks to its all-day, all-weather, high-resolution and long operating distance capability, synthetic aperture radar (SAR) has been widely applied in battlefield reconnaissance, terrain mapping, geological exploration, and marine observation. Unlike optical imaging, single-polarized SAR image represents the intensity of target scattering by gray scales and usually has blurred edges and strong anisotropy due to background clutter and limited resolution. All these factors will increase the difficulty in effective feature extraction and target recognition.

With recent advances of deep learning theory, deep neural networks have been widely used in many fields [1], [2], [3]. In recent studies, SAR target recognition techniques based on deep learning also have achieved great success with superior performance to traditional ones [4]. However, they usually require a large number of training samples of each target type, which is difficult to meet in real-word situations due to large cost or mission limit. In this scenario, the problem of few-shot SAR ATR arises and invalidates the available algorithms. Therefore, it is quite necessary to take in-depth study on effective few-shot SAR ATR methods.

For SAR target recognition based on single image, there are three mainstream methods, i.e., template matching [5], target modeling [6], and machine learning [7]. Such methods require the designing of a dedicated template, a target model, or a classifier in advance. Nevertheless, the heavy reliance on hand-designed features usually results in high complexity and poor generalization performance.

Recently, deep learning has found wide applications in SAR ATR because of its strong ability in automatic feature extraction, and typical methods mainly include convolutional neural network (CNN) [4], auto-encoders [8], recurrent neural networks [9], and generative adversarial network [10], etc. Among them, CNN and its improved versions have achieved state-of-the-art recognition results in the MSTAR benchmark data set [11]. Their typical structures consist of a feature extractor and a classifier. Specifically, the feature extractor is formed by stacking convolutional layers and pooling layers, and is applied to extract hierarchical features from original data. Initially, a traditional CNN structure which consists of convolutional layers, pooling layers, and the softmax classifier was proposed [12], [13]. Later, its improved versions were developed. For instance, S. Chen et al. designed the A-convent, where the number of unknown parameters was reduced greatly by removing the fully-connected layers [4]. S. Wagner replaced the softmax classifier by the SVM classifier and achieved high recognition accuracy [14]. R. H. Shang et al. added an information recorder to CNN to remember and store the spatial features of samples, and then utilized spatial similarity information of the recorded features to predict the unknown sample labels [15]. J. Wang et al. applied a despeckling sub-network to suppress speckle noise before classification [16]. J. Pei et al. proposed a multi-channel CNN structure, which utilized SAR images with different viewing angels to improve the recognition accuracy [17].

Generally, the available SAR ATR algorithms based on deep learning require a large number of training samples to obtain satisfying generalization performance and alleviate over-fitting. For insufficient training samples, the available techniques may: a) augment the training set by image rotating, shifting and distorting [4], [12]; b) pre-train the network with another data set by transfer learning [13]; or c) design special network structures to reduce the amount of training data required, e.g., to replace the convolutional layers by convolutional highway units [18]. Nevertheless, the above methods still need at least hundreds of training samples for each target type, and their recognition performance will degrade heavily if there are only a few training samples in some classes. Ref. [18] demonstrates that the recognition accuracy of deep learning methods and traditional machine learning methods falls below 40% when the training set only includes dozens of samples for each class (about 10% of the MSTAR SOC training set).

Recently, few-shot learning (FSL) is proposed to tackle the ATR problem where only a few samples of some target types are available [19]. Generally, a few-shot learning task includes three data sets, i.e. the test set, the support set, and the training set. The test set contains the target samples that need to be recognized; the support set contains a few labeled samples that belong to the same classes as the test set; while the training set contains other target classes different from those in the support/test set. By exploiting prior knowledge in the training set, FSL could rapidly generalize to new recognition tasks with limited samples in the support set, mimicking the human ability to acquire knowledge from few examples through generalization. Prototypical Network (PN) [20] is a classical FSL method and has been successfully applied to dermatological disease diagnosis [21] and hyperspectral image classification [22]. This method consists of two main stages: 1) transforming each sample into an embedding vector by a single-channel CNN; and 2) performing classification with the embedding vectors according to the Euclidean distance. Specifically, the unknown parameters in PN are learned by an episode-based method [20]. However, such method cannot be directly applied to SAR ATR. On the one hand, the episode-based network training method requires a large training set with thousands of samples, which is hard to collect in the SAR ATR task. On the other hand, the embedding vectors learned though the single-channel CNN are sensitive to azimuth variation and lack robustness.

To tackle the above-mentioned problems, a novel few-shot SAR ATR method is proposed. The contributions of this paper can be summarized as follows.

a) A novel few-shot learning method for SAR ATR, namely CBLPN, is proposed, which maps each SAR sample to an embedding vector and then performs SAR ATR in the embedding space. Compared with traditional SAR ATR methods, CBLPN obtains close recognition accuracy while requires much less labeled samples.

b) To reduce the influence of azimuth variation on SAR ATR and extract azimuth-robust features from SAR samples, a convolutional bidirectional long short-term memory (Conv-BiLSTM) network is designed to replace the common CNN structure as the feature extractor. Experiments on the MSTAR dataset show that Conv-BiLSTM is less sensitive to azimuth variation of SAR images than CNN and improves the robustness of SAR ATR effectively.

c) A random-episode weights update method is proposed to train the parameters in CBLPN. By randomly sampling in the training set to mimic the real SAR ATR task, the scarcity of labeled samples is alleviated and the parameters in CNLPN can be learned effectively.

The remainder of this paper is organized as follows: Section 2 gives a brief introduction to the research background, including recurrent neural network (RNN) and few-shot learning. Section 3 introduces the structure of Conv-BiLSTM network for SAR feature extraction. Section 4 introduces the framework of CBLPN and explains its components in detail. Section 5 describes the training of CBLPN. Section 6 shows the experimental results with discussions, and finally Section 7 concludes the paper. A summary of the main abbreviations in the paper is listed with their expanded form in Table 1.

Section snippets

Background

In this section, a brief introduction to RNN, and a special RNN structure, namely BiLSTM will be provided. Then, the basic conception of few-shot learning will be introduced.

Conv-BiLSTM network for SAR feature extraction

Most existing FSL approaches based on deep learning exploit a representation shared between the auxiliary training set and the support set. In order to get the representation, a projection from the original sample space to a new embedding space where the classification problem becomes easier is learned from the training set. In a typical optical few-shot learning task, CNN is usually utilized as a feature extractor to complete the projection [20]. However, SAR images are quite different from

Conv-BiLSTM prototypical networks for few-shot SAR ATR

Based on the Conv-BiLSTM network, a novel few-shot SAR ATR method called Conv-BiLSTM prototypical network (CBLPN) is proposed. The framework of CBLPN consists of two stages, i.e., the training stage, and the test stage, as shown in Fig. 5. The Conv-BiLSTM network in CBLPN works as an embedding network $f_{ϕ} : R^{D} \to R^{L}$ with learnable parameters $ϕ$ . Each SAR image is mapped into a L-dimensional vector by the embedding network. For each episode in a C-way K-shot SAR ATR task, each prototype $c_{k} \in R^{L} (k \in 1, 2, \dots, C)$

Back propagation in Euclidean distance based classifier

The weights in CBLPN are updated by back propagation (BP) [27], which calculates the partial derivatives of the objective loss with respect to each node in the embedding vector. Because the update of other weights in CBLPN is the same as typical CNN and BiLSTM, we only present the BP of the classifier in detail.

Fig. 7 shows the BP in the Euclidean distance based classifier, where $f_{ϕ} (x^{(j)}) = [a_{1}, a_{2}, \dots, a_{N}]$ is the embedding vector of sample $x^{(j)}$ , and N is its dimension. The partial derivatives of (10)

Data set description

The training set, support set, and test set utilized in this paper are generated from the MSTAR data set provided by the Defense Advanced Research Projects Agency (DARPA). The data set was collected by Sandia National Laboratory SAR sensor platform in 1995 and 1996 using an X-band SAR sensor. It provides a nominal spatial resolution of $0.3 \times 0.3$ m in both range and azimuth, and the image size is $128 \times 128$ . The publicly released data sets include ten categories of ground military vehicles, i.e.

Conclusion

This paper proposed an end-to-end few-shot SAR ATR method, namely CBLPN, which can effectively recognize SAR targets with only a few training samples. In CBLPN, the Conv-BiLSTM network was designed to extract features insensitive to azimuth variation, and a classifier based on Euclidean distance was utilized for classification. In addition, a random-episode weights updating method was proposed to train the parameters in CBLPN. Experimental results on the MSTAR data set have illustrated the

CRediT authorship contribution statement

Li Wang: Conceptualization, Methodology, Software, Validation, Investigation, Formal analysis, Data curation, Writing - original draft, Visualization, Writing - review & editing. Xueru Bai: Resources, Supervision, Project administration, Funding acquisition. Ruihang Xue: Software. Feng Zhou: Supervision.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgements

This work was supported in part by the National Natural Science Foundation of China under Grant 61971332, 61631019, and 61801344.

Li Wang was born in Jiangsu, China, in 1992. He received the B.S. and Ph.D. degrees in signal and information processing from Xidian University, Xi’an, China, in 2015 and 2020, respectively. His major research interests include deep learning and radar automatic target recognition.

References (36)

C.Q. Hong et al.
Multimodal Deep Autoencoder for Human Pose Recovery
IEEE Trans. Image Process.
(2015)
J. Yu, M. Tan, H. Y. Zhang, D. C. Tao, Y. Rui, Hierarchical Deep Click Feature Prediction for Fine-grained Image...
J. Yu et al.
Spatial Pyramid-Enhanced NetVLAD With Weighted Triplet Loss for Place Recognition
IEEE Trans. Neural Netw. Learn. Syst.
(2020)
S.Z. Chen et al.
Target classification using the deep convolutional networks for SAR images
IEEE Trans. Geosci. Remote. Sens.
(2016)
L.M. Novak et al.
The automatic target-recognition system in SAIP
Lincoln Lab. J.
(1997)
H.C. Chiang et al.
Model-based classification of radar images
IEEE Trans. Inf. Theory
(2000)
G.G. Dong et al.
Classification on the monogenic scale space: Application to target recognition in SAR image
IEEE Trans. Image Process.
(2015)
S. Deng et al.
SAR automatic target recognition based on Euclidean distance restricted autoencoder, IEEE
J. Sel. Top. Appl. Earth Observ.
(2017)
F. Zhang et al.
Multi-aspect-aware bidirectional LSTM networks for synthetic aperture radar target recognition
IEEE Access
(2017)
C. Zheng et al.
Semi-Supervised SAR ATR via Multi-Discriminator Generative Adversarial Network
IEEE Sens. Journ.
(2019)

The Air Force Moving and Stationary Target Recognition Database. [Online]. Available:...

J. Ding et al.

Convolutional neural network with data augmentation for SAR target recognition

IEEE Geosci. Remote Sens. Lett.

(2016)

M. David et al.

Improving SAR automatic target recognition models with transfer learning from simulated data

IEEE Geosci. Remote Sens. Lett.

(2017)

S. Wagner

SAR ATR by a combination of convolutional neural network and support vector machines

IEEE Trans. Aerosp. Electron. Syst.

(2017)

R.H. Shang et al.

SAR targets classification based on deep memory convolution neural networks and transfer parameters, IEEE

J. Sel. Top. Appl. Earth Observ.

(2018)

J. Wang et al.

Xiao Bai, Ground target classification in noisy SAR images using convolutional neural networks, IEEE

J. Sel. Top. Appl. Earth Observ.

(2018)

J. Pei et al.

SAR automatic target recognition based on multiview deep learning framework

IEEE Trans. Geosci. Remote Sens.

(2018)

Z. Lin et al.

Deep convolutional highway unit network for SAR target classification with limited labeled training data

IEEE Geosci. Remote Sens. Lett.

(2017)

Cited by (45)

Crucial feature capture and discrimination for limited training data SAR ATR
2023, ISPRS Journal of Photogrammetry and Remote Sensing
Deep learning-based methods have demonstrated exceptional performance in the field of synthetic aperture radar automatic target recognition (SAR ATR). However, obtaining a sufficient number of labeled SAR images remains a significant challenge that can negatively affect the performance of these methods. This is because most deep learning models use the entire target image as input. However, but research has shown that with limited training data, the model may not be able to capture discriminative regions of the image. Instead, they might focus on more useless even harmful image regions for recognition, leading to poor recognition results. In this study, we propose a novel SAR ATR framework that addresses the limitation of limited training data. The proposed framework primarily comprises a global assisted branch, locally enhanced branch, feature capture module, and feature discrimination module. The global-assisted branch conducts an initial recognition and provides a loss based on the entire SAR image during each training epoch and the feature capture module automatically segments and captures crucial image regions for current recognition, which we refer to as the “golden key” of the image. The local enhanced branch then performs another recognition and provides another loss function based on the image parts. Instead of updating the model with two basic recognition losses to roughly search for and capture crucial image parts, a feature discrimination module is proposed to combine the global and local branches in a subtle manner to improve local feature separability or compactness for similar inter-class or dissimilar inner-class sample pairs in global features. This adaptively forced the model to capture more crucial image parts and extract more effective features. Experimental results and comparisons of the MSTAR and OpenSARship datasets indicated that the proposed method achieves superior recognition performance compared to existing methods. The effectiveness of our method was further demonstrated through the visualization of the golden key of the testing images and the recognition performance in ablation experiments. We will release our codes and more experimental results at https://github.com/cwwangSARATR/SARATR_FeaCapture_Discrimination.
Transductive distribution calibration for few-shot learning
2022, Neurocomputing
Citation Excerpt :
Most existing works on FSL such as Prototypical Network [39], Matching Network [44], and Relation Network [40] focus on developing models and corresponding optimizing methods that enable meta-knowledge pretraining and quick update of parameters on few samples. Besides, there are some works of FSL has applied to different domain, like palmprint recognition [25], radar target recognition [46], fault diagnosis [45], traffic prediction [42], etc. However, models still tend to overfit in new tasks since training samples are scarce.
Few-shot image classification aims at learning a model from previous experiences that can be rapidly adapted to classify images of new classes with a few labeled examples. The learned model is easy to overfit since the distributions of new classes formed by a small number of samples are severely biased. Recently, Distribution Calibration (DC) tackles this problem by transferring the Gaussian statistics of seen classes with sufficient samples to calibrate the distributions of new classes. In this paper, we first take a closer look at the calibration mechanism from the source class distribution to the new class distribution in DC and propose a simplified version using averaged mean and covariance of all base classes as source statistics for all new classes. We further extend the simplified DC to the transductive setting. We extract the Gaussian statistics of unlabeled query samples to calibrate the distributions of new classes. We augment the labeled samples by sampling from the calibrated distributions to train a more accurate task-specific classifier. Our method can be readily applied on top of any existing pre-trained feature extractor and classifier without extra learnable parameters. Extensive experiments on several few-shot learning benchmarks demonstrate the effectiveness of our method. We provide visualizations to show that new classes are better separated under our calibrated distributions.
Low frequency and radar's physical based features for improvement of convolutional neural networks for PolSAR image classification
2022, Egyptian Journal of Remote Sensing and Space Science
Citation Excerpt :
Because they provide more polarimetric information and back scattering characteristics of the land covers, which can increase the class discrimination (Haldar et al., 2018). So far, many classification methods such as support vector machine (Gao et al., 2021) and random forest (Imani, 2020) from the category of the classic classifiers; and deep belief networks (Liu et al., 2016) and convolutional neural networks (CNNs) (Wang et al., 2021) from the category of deep learning methods have been suggested for PolSAR image classification. CNNs with high ability in hierarchical extraction of spatial features layer by layer have shown high success in processing and classification of both SAR and PolSAR images.
Although various deep neural networks such as convolutional neural networks (CNNs) have been suggested for classification of polarimetric synthetic aperture radar (PolSAR) images, but, they have several deficiencies. CNNs have weakness in producing classification maps with reduced noise and also are disabled in extraction of polarimetric/scattering information to explore the physical characteristics of the radar image. A deep neural network based on convolutional blocks is proposed for PolSAR image classification in this work that deals with the above difficulties. The low frequency components of the PolSAR image are added to the output of convolutional blocks to help the network to learn noise reduction. Moreover, eight fuzzy clustering maps obtained by the polarimetric entropy and averaged alpha angle are extracted as radar’s physical feature maps which concatenated with the spatial features extracted by convolutional blocks. So, the proposed network while learns to reduce the speckle noise, learns to simultaneously extract the spatial-physical characteristics of the PolSAR cube. The experiments on two real PolSAR datasets show superior performance of the proposed network compared to CNN, residual network and some other well-done networks.
Cooperative density-aware representation learning for few-shot visual recognition
2022, Neurocomputing
Citation Excerpt :
However, its outstanding performance tends to depend on the abundant labeled annotations, which are time-consuming and laborious. As a feasible manner, few-shot [4,5] or zero-shot [6,7] visual recognition can tackle this annotated burden and has recently attracted extensive attention due to their ability to imitate human intelligence. The goal of few-shot visual recognition [8–10] is to train the model parameter so that it can perceive unseen concepts from a given few labels.
Few-shot visual recognition has achieved remarkable advances along with the rise of deep learning. Its goal is to learn the model parameter from the base category for transferring it to the novel category with limited annotations. However, most of the existing few-shot visual recognition approaches mainly focus on extracting a global feature representation of the sample, which fails to encode the semantic information. To alleviate this issue, this paper presents a novel cooperative density-aware representation learning approach for few-shot visual recognition. Specifically, we first yield the high-level semantic features of the query set and the support set by leveraging a shared convolutional neural network. A cooperative density loss module is then designed to optimize the model to form the discriminative features by incorporating the density global classification loss and the density few-shot loss. The density few-shot loss conducts the semantic alignment with regional features by the mutual information finding manner while the density global classification loss supervises each regional feature lead to more precise classification. Comprehensive experiments in few-shot visual recognition benchmarks validate the effectiveness and superiority of our proposed approach, and elaborate ablations explain the utility of different modules.
Contrastive Learning and Cycle Consistency-based Transductive Transfer Learning for Target Annotation
2024, arXiv
Few-Shot Fine-Grained Classification With Rotation-Invariant Feature Map Complementary Reconstruction Network
2024, IEEE Transactions on Geoscience and Remote Sensing

View all citing articles on Scopus

Xueru Bai was born in Xi’an, Shaanxi, China, in 1984. She received the B.S. and Ph.D. degrees in signal and information processing from Xidian University, Xi’an, China, in 2006 and 2011, respectively. She is currently a Professor with the National Laboratory of Radar Signal Processing, Xidian University. Her research interests include high-resolution radar imaging and radar automatic target recognition. Dr. Bai was a recipient of the National Excellent Doctoral Dissertation Award granted by the Ministry of Education of China and the Program for Excellent Young Scientist selected by the National Natural Science Foundation of China.

Ruihang Xue was born in Xi’an, Shaanxi, China, in 1996. He received the B.S. degree in electronic and information engineering from Xidian University, Xi’an, China, in 2018, where he is currently working toward the Ph.D. degree in signal and information processing in the National Laboratory of Radar Signal Processing, Xidian University. His major research interests include deep learning and radar automatic target recognition.

Feng Zhou was born in Tongxu, Henan, China, in 1980. He received the Ph.D. degree in signal and information processing from Xidian University, Xi’an, China, in 2007. He is currently a Director and a Professor with the Key Laboratory of Electronic Information Countermeasure and Simulation Technology of Ministry of Education, Xidian University. He has authored or coauthored over 80 papers. His research interests include high-resolution radar imaging and radar countermeasures. Dr. Zhou was a recipient of the Young Scientist Award from the XXXI URSI GASS Committee, the program for Support of Top-notch Young Professionals from the Central Organization Department of China, and the program for New Century Excellent Talents in University from the Ministry of Education of China.

View full text

Few-shot SAR automatic target recognition based on Conv-BiLSTM prototypical network

Abstract

Introduction

Section snippets

Background

Conv-BiLSTM network for SAR feature extraction

Conv-BiLSTM prototypical networks for few-shot SAR ATR

Back propagation in Euclidean distance based classifier

Data set description

Conclusion

CRediT authorship contribution statement

Declaration of Competing Interest

Acknowledgements

Multimodal Deep Autoencoder for Human Pose Recovery

IEEE Trans. Image Process.

Spatial Pyramid-Enhanced NetVLAD With Weighted Triplet Loss for Place Recognition

IEEE Trans. Neural Netw. Learn. Syst.

Target classification using the deep convolutional networks for SAR images

IEEE Trans. Geosci. Remote. Sens.

The automatic target-recognition system in SAIP

Lincoln Lab. J.

Model-based classification of radar images

IEEE Trans. Inf. Theory

Classification on the monogenic scale space: Application to target recognition in SAR image

IEEE Trans. Image Process.

SAR automatic target recognition based on Euclidean distance restricted autoencoder, IEEE

J. Sel. Top. Appl. Earth Observ.

Multi-aspect-aware bidirectional LSTM networks for synthetic aperture radar target recognition

IEEE Access

Semi-Supervised SAR ATR via Multi-Discriminator Generative Adversarial Network

IEEE Sens. Journ.

Convolutional neural network with data augmentation for SAR target recognition

IEEE Geosci. Remote Sens. Lett.

Improving SAR automatic target recognition models with transfer learning from simulated data

IEEE Geosci. Remote Sens. Lett.

SAR ATR by a combination of convolutional neural network and support vector machines

IEEE Trans. Aerosp. Electron. Syst.

SAR targets classification based on deep memory convolution neural networks and transfer parameters, IEEE

J. Sel. Top. Appl. Earth Observ.

Xiao Bai, Ground target classification in noisy SAR images using convolutional neural networks, IEEE

J. Sel. Top. Appl. Earth Observ.

SAR automatic target recognition based on multiview deep learning framework

IEEE Trans. Geosci. Remote Sens.

Deep convolutional highway unit network for SAR target classification with limited labeled training data

IEEE Geosci. Remote Sens. Lett.