Elsevier

Computers & Geosciences

Volume 112, March 2018, Pages 38-46
Computers & Geosciences

Case study
An improved optimum-path forest clustering algorithm for remote sensing image segmentation

https://doi.org/10.1016/j.cageo.2017.12.003Get rights and content

Highlights

  • We improve the optimum-path forest clustering algorithm.

  • The proposed algorithm considers the characteristic that cluster centres are far away from the samples with higher density.

  • Results show our method can outperform the original optimum-path forest algorithm.

  • The proposed algorithm was applied to remote sensing land cover images.

Abstract

Remote sensing image segmentation is a key technology for processing remote sensing images. The image segmentation results can be used for feature extraction, target identification and object description. Thus, image segmentation directly affects the subsequent processing results. This paper proposes a novel Optimum-Path Forest (OPF) clustering algorithm that can be used for remote sensing segmentation. The method utilizes the principle that the cluster centres are characterized based on their densities and the distances between the centres and samples with higher densities. A new OPF clustering algorithm probability density function is defined based on this principle and applied to remote sensing image segmentation. Experiments are conducted using five remote sensing land cover images. The experimental results illustrate that the proposed method can outperform the original OPF approach.

Introduction

Remote sensing images play a significant role in earth science due to their superior ability to express the characteristics of ground objects. These images have been widely applied in various fields, such as environment monitoring (Kang et al., 2015), urban planning (Banerjee et al., 2013) and national defence (Lampropoulos et al., 2008). The number of obtainable remote sensing images has dramatically increased due to the rapid development of remote sensing observation techniques. Large amounts of remote sensing data and growing demands have accelerated the development of remote sensing image processing. An emphasis has been laid on automatically processing and analysing these remote sensing images and extracting useful information from them. Image segmentation is a method used to automatically extract features and distinguish distinct objects in remote sensing images (Zhang et al., 2015).

Numerous remote sensing image segmentation approaches have been utilized. These approaches mainly encompass three strategies: supervised methods, unsupervised methods and semi-supervised methods. The supervised methods require many known pixels with class labels, which are used as a training set to label the unknown pixels. For instance, support vector machine (SVM) (Cortes and Vapnik, 1995, Gomez-chova et al., 2008) is a typical supervised method. Supervised methods are extremely time consuming when applied to hyperspectral or very high resolution images. Furthermore, the class labels are unavailable in many cases.

Most unsupervised methods employ clustering algorithms. Unlike supervised methods, unsupervised methods exploit observation features to segment images and do not require training sets. Unsupervised methods are generally used when the class labels are unknown. Two main clustering approaches are commonly used in the literature: partitioning methods and hierarchical methods. Typical partitioning methods include the minimum spanning forest (Bernard et al., 2012, Tarabalka et al., 2010a), fuzzy c-means (FCM) (Zhong et al., 2014, Alhichri et al., 2014, Li et al., 2013) and associated extension algorithms, k-means (Isa et al., 2009) and associated variant algorithms, iterative self-organizing data analysis (ISODATA) (Ball and Hall, 1965) and spectral clustering techniques (Zhang et al., 2008, Jia et al., 2011). In addition to the basic spectral features of images, hierarchical methods also (Tarabalka et al., 2010b) consider spatial information. Lee (2004) and Lee and Crawford (2004) employed hierarchical clustering and the theory that pixels belonging to the same cluster are spatially contiguous to classify hyperspectral data. Bruzzone and Carlin (2006) and Huo et al. (2015) combined hierarchical segmentation with SVM to classify very high spatial resolution images.

Unsupervised methods include various drawbacks. For instance, the number of resultant clusters remains difficult to determine. Few robust criteria exist to define the appropriate number of clusters. Users generally determine the number of clusters based on their previous knowledge. Furthermore, unsupervised method outputs lack semantic information. Thus, these methods can only group data objects into classes according to their similarity and cannot provide semantic information for these classes or the relationships between classes, often requiring the user to further explain the clustering results.

Semi-supervised methods (Li et al., 2010) combine supervised learning with unsupervised learning. These methods utilize a small number of labelled samples and many unmarked samples for training and classification, providing advantages over both unsupervised and supervised methods. Several strategies have been proposed in the literature (Yang et al., 2013, Tuia and Camps-Valls, 2011.). Some semi-supervised methods use the supervised model to initialize the segmentation algorithms (Tarabalka et al., 2010c). However, these methods require large numbers of labelled pixels and their results rely on the supervised model utilized. Other semi-supervised methods must be accurately tuned and require a large amount of unlabelled data (Munoz-Mari et al., 2012).

The Optimum-Path Forest (OPF) (Rocha et al., 2009) pattern recognition algorithm has recently attracted extensive attention of researchers and has been widely applied to image segmentation. The OPF classifier includes supervised (Papa et al., 2009) and unsupervised versions (Papa and Falcao, 2008). Pisani et al. (2014) introduced the OPF algorithm for land cover classification. Filho et al. (2013) applied OPF operators to segment sandstone thin section images. Cappabianco et al. (2012) handled MR-image brain tissue segmentation via OPF clustering. Nakamura et al. (2014) combined OPF with evolutionary algorithms to extract spectral features and improved the speed and accuracy of segmentation. Iwashita et al. (2014) adopted a path- and label-cost propagation approach to accelerate the OPF classifier training. Costa et al. (2015) introduced a nature-inspired approach to estimate the probability density function (PDF) and increase the speed of the clustering algorithm based on OPF.

This paper proposes a new OPF clustering algorithm and applies it to land cover classification. The new, improved algorithm utilizes more detailed attributes of cluster centres than does the original OPF clustering algorithm. A new probability density function is defined in our proposed algorithm. We consider that the characteristics of cluster centres lie not only in the density but also in the distance between cluster centres and in higher density samples. The experiments show that the proposed algorithm performs better than the original OPF clustering. The remainder of this paper is organized as follows. Section 2 reviews the OPF theory. Section 3 presents the new algorithm. Section 4 discusses the experimental results. Finally, the conclusions are stated in Section 5.

Section snippets

The optimum-path forest clustering algorithm

OPF is a clustering algorithm based on a graph structure that represents the feature space. It divides the graph into several optimum-path trees, which each denote a cluster. The advantage of the OPF clustering algorithm is that it does not require assumptions regarding the shape/separability of the feature space. The nodes represent the samples in the dataset and the arcs connecting the nodes denote specific adjacency relations between the samples. Both the nodes and the arcs in the graph can

The proposed algorithm

The OPF clustering method defines a cluster as a set of points that converge to the local maximum of the density distribution function. The OPF method determines the cluster centres firstly and assigns each remaining point to a certain cluster according to the connectivity function of the paths between the point and cluster centres. Thus, determining the correct cluster centres, namely the local density maximums, is crucial to the OPF method. Most previous studies have defined cluster centres

Dataset and evaluation metrics

The proposed algorithm is applied to land cover classification and compared to the original OPF clustering algorithm using a k-nearest neighbours approach. Fig. 2 illustrates the five images used in the experiments. These images were obtained from Landsat-5 and CBERS-2B, covering the Itatinga area of Brazil, and from Geoeye and Ikonos-2 MS, covering the Duque de Caxias area of Brazil, and from Landsat-8 covering the area of Furnas, city of Alfenas, Minas Gerais state, Brazil. The labelled

Conclusion

This paper proposes a new clustering approach based on the optimum-path forest (OPF). The method exploits the principle that cluster centres are characterized by the density and the distance between a centre and higher density samples. A new probability density function is defined for the OPF clustering algorithm. The proposed algorithm is used to segment remote sensing images. The experimental results demonstrate that our algorithm is superior to the original OPF method.

However, the

Acknowledgements

This paper was sponsored by Jilin Provincial Science and Technology Department of China (Grant No. 20170204002GX), Jilin Province Development and Reform Commission of China (Grant No.2014Y056), and Changchun Science and Technology Bureau of China (Grant No. 14KP009). We would like to thank the organizations for their support.

The authors would like to thank Rodrigo Yuji Mizobe Nakamura for providing the Ikonos-2 MS, Geoeye, CBERS-2B, Landsat-5 and Landsat-8 images.

References (38)

  • D.L. Davies et al.

    A cluster separation measure

    IEEE Trans. Pattern Anal. Mach. Intell.

    (1979)
  • I.M. Filho et al.

    Segmentation of sandstone thin section images with separation of touching grains using optimum path forest operators

    Comput. Geosci.

    (2013)
  • L. Gomez-chova et al.

    Semisupervised image classification with Laplacian support vector machines

    Geosci. Rem. Sens. Lett. IEEE

    (2008)
  • L. Hubert et al.

    Comparing partitions

    J. Classif.

    (1985)
  • L.Z. Huo et al.

    Semisupervised classification of remote sensing images with hierarchical spatial similarity

    Geosci. Rem. Sens. Lett. IEEE

    (2015)
  • N.A.M. Isa et al.

    Adaptive fuzzy moving k-means clustering algorithm for image segmentation

    IEEE Trans. Consum. Electron.

    (2009)
  • J.H. Jia et al.

    Soft spectral clustering ensemble applied to image segmentation

    Front. Comput. Sci. China

    (2011)
  • X.D. Kang et al.

    Extenden random walker-based classification of hyperspectral images

    IEEE Trans. Geosci. Rem. Sens.

    (2015)
  • S. Lee

    Efficient multistage approach for unsupervised image classification

  • Cited by (0)

    View full text