Elsevier

Neurocomputing

Volume 179, 29 February 2016, Pages 88-100
Neurocomputing

A batch-mode active learning framework by querying discriminative and representative samples for hyperspectral image classification

https://doi.org/10.1016/j.neucom.2015.11.062Get rights and content

Abstract

Batch-mode active learning approaches are dedicated on the training sample set selection for classification, where a batch of unlabeled samples is queried at each iteration. The current state-of-the-art AL techniques exploit different query functions, which are mainly based on the evaluation of two criteria—uncertainty and diversity. Generally, the two criterions are independent of each other, and they also cannot guarantee that the new queried samples are identical and independently distributed (i.i.d.) from the unknown source distribution. To solve this problem, a novel form of upper bound for the true risk in the setting is derived by minimizing this upper bound to measure the discriminative information, which is connected with the uncertainty. And for the distribution match, the proposed method adopts the maximum mean discrepancy to constrain the distribution of the labeled samples and make them as similar to the overall sample distribution as possible, which helps capture the representative information of the data structure. In the proposed framework, the defining of the binary classes is generalized to a multiclass problem, in addition, the discriminative and representative information (DR) are combined together. In this way, our method is shown to query the most informative samples while preserving the source distribution as much as possible, thus identifying the most uncertain and representative queries. Meanwhile, the number of new queried samples is adaptive, and depends on the distribution of the labeled samples. In the experiments, we employed two benchmark remote sensing datasets—the Indian Pines and Washington DC datasets—and the results confirmed the superior performance of the proposed framework compared with the other state-of-the-art AL methods.

Introduction

Machine learning algorithms have become powerful tools for the extraction of information from data in the different fields of data mining, pattern recognition, computer version, as well as in remote sensing [1], [2], [3], [4], [5], [6], [7], [54], and advances in remote sensing technology have made hyperspectral data with hundreds of narrow contiguous bands available. Hyperspectral image (HSI) processing with machine learning methods has been widely studied in the past decade [8], [9], [10], [11], [12], [13], [14]. HSI classification is one of the important tasks used to extract environmental information from remote sensing images and has been an active field in current HIS processing [15], [16], [17], [18], [19], [20]. To fully utilize the information in remote sensing images, many different machine learning algorithms have been developed to classify the data [21], [22], [23]. Supervised classification is the main technique, which requires the availability of labeled samples for training the classifiers. Given a specific supervised classifier, the remote sensing images can be automatically classified. However, the supervised classifiers are highly dependent on the amount and quality of the training samples [24]. Therefore, to collect samples of a good quality (e.g., informative and non-redundant) is vital.

Manually selecting the region of interest in the HSI as the training samples is a common approach, but this procedure is very expensive in most real-world applications. As HSIs have very high dimensionality, it is more difficult to design classifiers using only a few labeled data points than with a multispectral image [11]. This paper is focused on HSI classification with a few labeled data points. Two popular machine learning approaches have been developed to solve this problem: semi-supervised learning and active learning (AL). Semi-supervised algorithms incorporate the unlabeled samples and the labeled samples to find a classifier with better boundaries [25], [26], [27]. An overview of the semi-supervised classification techniques can be found in [12]. In contrast, AL assumes that a primary classifier with a small amount of labeled samples exists. AL is based on iteration and can provide better classification results with a small number of unlabeled samples. The AL methods are conducted according to an iterative process. In each iteration, the most informative unlabeled samples are chosen for manual labeling. In this way, the unnecessary and redundant labeling of non-informative samples is avoided, greatly reducing the labeling cost and time. Moreover, AL allows one to reduce the computational complexity of the training phase. The batch-mode active learning method is expected to be more suitable for hyperspectral image classification, where a batch of unlabeled samples is queried at each iteration, which increases the speed of the sample selection and reduces the iterations [28].

The best result for batch-mode AL is to select the most informative batch of samples with as little redundancy as possible, so that they can provide the uncertain information to the classifier. At the same time, batch-mode AL can also increase the speed of the sample selection and reduce the iterations [29]. There are two main phases for querying the unlabeled samples: the uncertainty and the diversity [30], [31], [32]. The first phase is to query the most informative samples with the uncertainty criterion, but in the queried samples, some very similar samples may exist, so in these samples, just one sample query is enough; in this way, it is necessary to remove the redundancy in these samples. Meanwhile, in the second phase, the diversity criterion is used to reduce the redundancy in the samples which are queried in the first phase with the diversity criterion. There has been a large amount of research into the study of the uncertainty criterion, the conventional uncertainty criteria of batch-mode AL can be grouped into three fields: 1) query by committee, in which the uncertainty of an unlabeled sample is measured by the disagreement of several classifiers [33], [34], [35]; 2) the posterior probability based methods, where the posterior probability is used to measure the uncertainty of the candidates [36], [37]; and 3) the large margin heuristic based methods, where the uncertainty of the candidates is measured by the distance to the margin of the classifier, such as support vector machine (SVM) [38], [39].

However, in the current research, less attention is being paid to the diversity criterion, the diversity criteria are mainly the clustering algorithms, such as k-means [40] and its kernel version [41], which depend on the correctness of the convergence and are usually influenced by the initialization adequacy of the initialization [42]. Moreover, these algorithms have to be given the number of the clustering centers beforehand. Thus the queried data by such methods are not guaranteed to be i.i.d. sampled from the original data distribution, as they are selectively sampled based on the AL criterion [43]. At the same time, they do not fully use the label information, and divide the uncertainty and the diversity criteria into two steps. In fact, using either kind of criterion alone may not be sufficient to get the optimal results.

This paper proposes a new diversity criterion, extends the empirical risk minimization principle to the AL case and presents a novel AL framework. This framework adopts the maximum mean discrepancy (MMD) to measure the distribution difference and derives an empirical upper bound for the AL risk. By minimizing this upper bound, it approximately minimizes the true risk under the original data distribution. In the proposed framework, it attempts to query the unlabeled samples by both discriminative and representative information with one optimized formulation. Our goal is to query a subset of unlabeled samples which help minimize the discriminative and representative information. The contributions of this manuscript can be summarized as:

  • (1)

    In the proposed framework, the MMD is adopted, so that the queried samples are not only diverse, but also preserve the distribution of the original data. This strategy can rapidly reduce the empirical risk in the training data.

  • (2)

    With the discriminative and representative information in one optimal formulation, a trade-off is undertaken by a weight parameter, and the queried samples can contain both discriminative and representative information.

  • (3)

    The proposed method is suitable for multiple classes problem, and the number of queried samples is adaptive. Furthermore, only the most uncertain samples are selected in the preparation procedure, so the proposed method can be used to solve large-scale data.

The reminder of this paper is organized as follows. Section 2 presents the recent research into batch-mode AL in remote sensing image classification. Section 3 formulates the proposed batch-mode AL framework. Section 4 describes the experiments with two benchmark hyperspectral datasets—the Indian Pines and Washington DC datasets,—and presents the experimental results in comparison with the other state-of-the-art batch-mode AL methods. Finally, Section 5 summarizes the paper.

Section snippets

The framework of conventional batch-mode active learning

The conventional AL methods can be modeled as a quintuple (F,Q,D,T,U) [44], where F is a supervised classifier which is used to train the training dataset T. Q is the query function used to select the most informative unlabeled samples from a pool U of unlabeled samples. D is a supervisor that can correctly label a batch of the most informative samples queried by Q. AL is an iterative process, in which the supervisor D labels the most informative samples queried by the query function Q. For the

The proposed batch-mode active learning framework

In this paper, we combine the discriminative and representative information into one optimal formulation as the diversity criterion, and select MS and MCLU as the uncertainty criteria. In the proposed method, the number of queried samples is adaptive. Meanwhile, the queried samples have the same distribution as the original data, and the relationship between the queried samples is identical and independently distributed (i.i.d.).

In the conventional batch-mode AL methods, the query function is

Experiments and analysis

We used two benchmark HSI datasets in the experiments [51] and compared the results of the proposed method and the other state-of-the-art methods. According to the experimental results, we then analyzed the proposed method.

Conclusion

In this paper, we generalize the empirical risk minimization principle to the active learning setting and propose a novel active learning framework. By effectively combing the representative term and discriminative term, we query the samples which are expected to rapidly reduce the empirical risk, and preserve the original source distribution at the same time. This enables our method to achieve a consistent good performance during the whole active learning process. The superior performance of

Zengmao Wang received the B.S. degree in project of surveying and mapping from Central South University, Changsha, China, in 2013, and is currently pursuing M.S. degree at the State Key Laboratory of Information Engineering in Surveying, Mapping, and Remote Sensing (LIESMARS). His research interests include hyperspectral image processing and machine learning.

References (54)

  • D. Tao, S. Maybank, W. Hu, X. Li, Stable third-order tensor representation for colour image classification, in:...
  • Y. Luo et al.

    Manifold regularized multitask learning for semi-supervised multilabel image classification

    IEEE Trans. Image Process.

    (2013)
  • G. Shaw et al.

    Signal processing for hyperspectral image exploitation

    IEEE Signal Process. Mag.

    (2002)
  • Y. Zhong et al.

    Remote sensing image subpixel mapping based on adaptive differential evolution

    IEEE Trans. Syst. Man Cybern. B: Cybern.

    (2012)
  • B. Du et al.

    A discriminative manifold learning based dimension reduction method for hyperspectral classification

    Int. J. Fuzzy Syst.

    (2012)
  • D.A. Landgrebe

    Signal Theory Methods in Multispectral Remote Sensing

    (2005)
  • S. Rajan, J. Ghosh, M.M. Crawford, An active learning approach to knowledge transfer for hyperspectral data analysis,...
  • M. Seeger

    Learning with Labeled and Unlabeled Data

    (2000)
  • J. Li et al.

    Spectral-spatial classification of hyperspectral data using loopy belief propagation and active learning

    IEEE Trans. Geosci. Remote. Sens.

    (2013)
  • Y. Gao et al.

    Hyperspectral image classification through bilayer graph-based learning

    IEEE Trans. Image Process.

    (2014)
  • E. Pasolli et al.

    SVM active learning approach for image classification using spatial information

    IEEE Geosci. Remote Sens.

    (2014)
  • Q. Shi. et al.

    Semi-supervised discriminative locally enhanced alignment for hyperspectral image classification

    IEEE Trans. Geosci. Remote Sens.

    (2013)
  • K. Bernard et al.

    Spectral-Spatial Classification of Hyperspectral Data Based on a Stochastic Minimum Spanning Forest Approach

    IEEE Trans. Image Process.

    (2012)
  • J.E. Fowler et al.

    Anomaly Detection and reconstruction from random projections

    IEEE Trans. Image Process.

    (2012)
  • J. Li et al.

    Generalized composite kernel framework for hyperspectral image classification

    IEEE Trans. Geosci. Remote Sens.

    (2013)
  • Huo et al.

    A batch-mode active learning algorithm using region-partitioning diversity for SVM classifier

    IEEE J. Sel. Top. Appl. Earth Obs.

    (2014)
  • L. Bruzzone et al.

    A novel transductive SVM for semisupervised classification of remote-sensing images

    IEEE Trans. Geosci. Remote Sens.

    (2006)
  • Cited by (36)

    • Integrating Machine Learning with Human Knowledge

      2020, iScience
      Citation Excerpt :

      Besides, there are many other variants, such as density or diversity methods (Settles and Craven, 2008; Yang et al., 2015), which consider the repressiveness (reflection on input distribution) of instances in uncertainty sampling, clustering-based approaches (Dasgupta and Hsu, 2008; Nguyen and Smeulders, 2004; Saito et al., 2015) which cluster unlabeled data and query the most representative instances of those clusters, and min-max framework (Hoi et al., 2009; Huang et al., 2010) which minimizes the maximum possible classification loss. More versatile methods include combining multiple criteria (Du et al., 2015; Wang et al., 2016; Yang and Loog, 2018), choosing strategies automatically (Baram et al., 2004; Ebert et al., 2012), and training models to control active learning (Bachman et al., 2018; Konyushkova et al., 2017; Pang et al., 2018). In addition to asking the oracle to label instances, queries may seek for more advanced domain knowledge.

    • Multi-label active learning based on submodular functions

      2018, Neurocomputing
      Citation Excerpt :

      And for representativeness, the methods based on this do not fully use the label information. Recent researches showed that methods combining these two criteria can result in better performance [6,12,22]. Traditional supervised learning problems assume that one instance is associated with only one single label.

    • Active learning with confidence-based answers for crowdsourcing labeling tasks

      2018, Knowledge-Based Systems
      Citation Excerpt :

      Thus, it will be more efficient to use a batch method which selects multiple instances at a time. There have been many works which studied batch methods for traditional active learning [34–37]. We investigate two methods which can be applied to our setting: top-k method and clustering-based method.

    • A variance maximization criterion for active learning

      2018, Pattern Recognition
      Citation Excerpt :

      Clustering-based approaches [36,43,58] and variance minimization methods [18,32,33,61] are included in the representativeness group. There are also methods that try to combine the two criteria, such as min-max view active learning [17], density or diversity weighted methods [1,30,47,60,64] and multi-criteria fusion [7,52,54,56]. The framework of retraining-based active learning, which our method is also an instantiation of, was first proposed by Roy and Mccallum [40] to perform so-called expected error reduction (EER for short).

    • Collaborative learning for hyperspectral image classification

      2018, Neurocomputing
      Citation Excerpt :

      In AL, the classifier is retrained with new training set, and it is promised to select two or more samples at each AL iteration. These AL query strategies, such as margin sampling (MS) [43], Multiclass level uncertainty (MCLU) [36], max entropy (ME) [15], breaking ties (BT) [44], and Kullback–Leiber divergence maximization (KL-Max) [45,46], consider only the uncertainty of unlabeled samples. And, the queried samples may be redundant to each other.

    View all citing articles on Scopus

    Zengmao Wang received the B.S. degree in project of surveying and mapping from Central South University, Changsha, China, in 2013, and is currently pursuing M.S. degree at the State Key Laboratory of Information Engineering in Surveying, Mapping, and Remote Sensing (LIESMARS). His research interests include hyperspectral image processing and machine learning.

    Bo Du (M’10–SM’15) received the B.S. degree and the Ph.D. degree in Photogrammetry and Remote Sensing from State Key Lab of Information Engineering in Surveying, Mapping and Remote sensing, Wuhan University, Wuhan, China, in 2005, and in 2010, respectively. He is currently an associate professor with the School of Computer, Wuhan University, Wuhan, China. He has more than 40 research papers published in the IEEE Transactions on Geoscience and Remote Sensing (TGRS), IEEE Transactions on image processing (TIP), IEEE Journal of Selected Topics in Earth Observations and Applied Remote Sensing (JSTARS), and IEEE Geoscience and Remote Sensing Letters (GRSL), etc. His major research interests include pattern recognition, hyperspectral image processing, and signal processing. He is currently a senior member of IEEE. He received the best reviewer awards from IEEE GRSS for his service to IEEE Journal of Selected Topics in Earth Observations and Applied Remote Sensing (JSTARS) in 2011 and ACM rising star awards for his academic progress in 2015. He was the Session Chair for the 4th IEEE GRSS Workshop on Hyperspectral Image and Signal Processing: Evolution in Remote Sensing (WHISPERS). He also serves as a reviewer of 20 Science Citation Index (SCI) magazines including IEEE TGRS, TIP, JSTARS, and GRSL.

    Lefei Zhang (S'11-M'14) received the B.S. and Ph.D. degrees from Wuhan University, Wuhan, China, in 2008 and 2013, respectively. From August 2013 to July 2015, he was with the School of Computer, Wuhan University, as a Postdoctoral Researcher, and he was a Visiting Scholar with the CAD & CG Lab, Zhejiang University in 2015. He is currently a lecturer with the School of Computer, Wuhan University, and also a Hong Kong Scholar with the Department of Computing, Hong Kong Polytechnic University, Hong Kong. His research interests include pattern recognition, image processing, and remote sensing. Dr. Zhang is a reviewer of more than twenty international journals, including the IEEE TIP, TNNLS, and TGRS.

    Liangpei Zhang (M'06–SM'08) received the B.S. degree in physics from Hunan Normal University, Changsha, China, in 1982, the M.S. degree in optics from the Xi'an Institute of Optics and Precision Mechanics, Chinese Academy of Sciences, Xi'an, China, in 1988, and the Ph.D. degree in photogrammetry and remote sensing from Wuhan University, Wuhan, China, in 1998. He is currently the head of the remote sensing division, state key laboratory of information engineering in surveying, mapping, and remote sensing (LIESMARS), Wuhan University. He is also a "Chang-Jiang Scholar" chair professor appointed by the ministry of education of China. He is currently a principal scientist for the China state key basic research project (2011–2016) appointed by the ministry of national science and technology of China to lead the remote sensing program in China. He has more than 450 research papers and five books. He is the holder of 15 patents. His research interests include hyperspectral remote sensing, high-resolution remote sensing, image processing, and artificial intelligence. Dr. Zhang is the founding chair of IEEE Geoscience and Remote Sensing Society (GRSS) Wuhan Chapter. He received the best reviewer awards from IEEE GRSS for his service to IEEE Journal of Selected Topics in Earth Observations and Applied Remote Sensing (JSTARS) in 2012 and IEEE Geoscience and Remote Sensing Letters (GRSL) in 2014. He was the General Chair for the 4th IEEE GRSS Workshop on Hyperspectral Image and Signal Processing: Evolution in Remote Sensing (WHISPERS) and the guest editor of JSTARS. His research teams won the top three prizes of the IEEE GRSS 2014 Data Fusion Contest, and his students have been selected as the winners or finalists of the IEEE International Geoscience and Remote Sensing Symposium (IGARSS) student paper contest in recent years. Dr. Zhang is a Fellow of the Institution of Engineering and Technology (IET), executive member (board of governor) of the China national committee of international geosphere–biosphere programme, executive member of the China society of image and graphics, etc. He was a recipient of the 2010 best paper Boeing award and the 2013 best paper ERDAS award from the American society of photogrammetry and remote sensing (ASPRS). He regularly serves as a Co-chair of the series SPIE conferences on multispectral image processing and pattern recognition, conference on Asia remote sensing, and many other conferences. He edits several conference proceedings, issues, and geoinformatics symposiums. He also serves as an associate editor of the International Journal of Ambient Computing and Intelligence, International Journal of Image and Graphics, International Journal of Digital Multimedia Broadcasting, Journal of Geo-spatial Information Science, and Journal of Remote Sensing, and the guest editor of Journal of applied remote sensing and Journal of sensors. Dr. Zhang is currently serving as an associate editor of the IEEE Transactions on Geoscience and Remote Sensing.

    This work was supported in part by the National Basic Research Program of China (973Program) under Grant 2012CB719905, the National Natural Science Foundation of China under Grants 61471274, 61401317 and 41431175, and the Natural Science Foundation of Hubei Province under Grants 2014CFB193.

    View full text