Abstract
Supervised learning methods such as Maximum Likelihood (ML) are often used in land cover (thematic) classification of remote sensing imagery. ML classifier relies exclusively on spectral characteristics of thematic classes whose statistical distributions (class conditional probability densities) are often overlapping. The spectral response distributions of thematic classes are dependent on many factors including elevation, soil types, and ecological zones. A second problem with statistical classifiers is the requirement of the large number of accurate training samples (10 to 30 × |dimensions|), which are often costly and time consuming to acquire over large geographic regions. With the increasing availability of geospatial databases, it is possible to exploit the knowledge derived from these ancillary datasets to improve classification accuracies even when the class distributions are highly overlapping. Likewise newer semi-supervised techniques can be adopted to improve the parameter estimates of the statistical model by utilizing a large number of easily available unlabeled training samples. Unfortunately, there is no convenient multivariate statistical model that can be employed for multisource geospatial databases. In this paper we present a hybrid semi-supervised learning algorithm that effectively exploits freely available unlabeled training samples from multispectral remote sensing images and also incorporates ancillary geospatial databases. We have conducted several experiments on Landsat satellite image datasets, and our new hybrid approach shows over 24% to 36% improvement in overall classification accuracy over conventional classification schemes.
Similar content being viewed by others
References
Ando RK, Zhang T (2005) A framework for learning predictive structures from multiple tasks and unlabeled data. J Mach Learn Res 6:1817–1853
Bilmes J (1997) A gentle tutorial on the EM algorithm and its application to parameter estimation for Gaussian mixture and hidden Markov models. Technical Report, University of Berkeley. ICSI-TR-97-021, 1997
Bolstad P, Lillesand T (1992) Rule-based classification models: flexible integration of satellite imagery and thematic spatial data. Photogramm Eng Remote Sensing 58(7):965–971
Bruzzone L, Conese C, Maselli F, Roli F (1997) Multisource classification of complex rural areas by statistical and neural-network approaches. Photogramm Eng Remote Sensing 63(5):523–533
Cozman F, Cohen I, Cirelo M (2003) Semi-supervised learning of mixture models. In: Twentieth international conference on machine learning (ICML)
Crist E, Kauth RJ (1986) The tasseled cap de-mystified. Photogramm Eng Remote Sensing 52(1):81–86
Dempster A, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc Ser B 39(1):1–38
Duin R (2000) Classifiers in almost empty spaces. In: Proc. 15th int. conference on pattern recognition (Barcelona, Spain, Sep.3–7), vol 2. IEEE Computer Society, Los Alamitos, pp 1–7
Fukunaga K, Hayes RR (1989) Effects of sample size in classifier design. IEEE Trans Pattern Anal Mach Intell 13(3):252–264
Goldman S, Zhou Y (2000) Enhancing supervised learning with unlabeled data. In: Proc. 17th international conf. on machine learning. Morgan Kaufmann, San Francisco, pp 327–334
Jensen JR (1996) Introductory digital image processing, a remote sensing perspective. Prentice Hall, Upper Saddle River
Maselli F, Conese C, Petkov L, Resti R (1992) Inclusion of prior probabilities derived from a non-parametric process into the maximum likelihood classifier. Photogramm Eng Remote Sensing 58(2):201–207
Mather PM (2004) Computer processing of remotely-sensed images: an introduction. Wiley, New York
Mitchell T (1999) The role of unlabeled data in supervised learning. In: Proceedings of the sixth international colloquium on cognitive science, San Sebastian, Spain
Nigam K, McCallum AK, Thrun S, Mitchell TM (2000) Text classification from labeled and unlabeled documents using EM. Mach learn 39(2/3):103–134
Paola JD, Schowengerdt RA (1997) The effect of neural-network structure on a multispectral land-use/land-cover classification. Photogramm Eng Remote Sensing 63(5):535–544
Raudys SJ, Jain AK (1991) Small sample size effects in statistical pattern recognition: recommendations for practitioners. IEEE Trans Pattern Anal Mach Intell 13(3):252–264
Richards JA, Jia X (1999) Remote sensing digital image analysis. Springer, New York
Shahshahani B, Landgrebe D (1994) The effect of unlabeled samples in reducing the small sample size problem and mitigating the Hughes phenomenon. IEEE Trans Geosci Remote Sens 32(5):1087–1095
Skidmore A (1989) An expert system classifies eucalypt forest types using thematic mapper data and a digital terrain model. Photogramm Eng Remote Sensing 55(10):1449–1464
Skidmore A, Turner B, Brinkhof W, Knowles E (1997) Performance of a neural network: mapping forest using GIS and remotely sensed data. Photogramm Eng Remote Sensing 63(5):501–514
Skurichina M, Duin R (1996) Stabilizing classifiers for very small sample sizes. In: Proc. 10th int. conference on pattern recognition. IEEE Computer Society, Los Alamitos, pp 891–896
Strahler A (1980) The use of prior probabilities in maximum likelihood classificaiton of remote sensing data. Remote Sens Environ 10:135–163
Tadjudin S, Landgrebe DA (1999) Covariance estimation with limited training samples. IEEE Trans Geosci Remote Sens 37(4):2113–2118
Zhu X (2008) Semi-supervised learning literature survey. Technical report (TR 1530, University of Wisconsin, Madison)
Acknowledgements
We would like to thank our collaborators Prof. Shekhar and Prof. Thomas E. Burk at the University of Minnesota for their contributions and support. We would like to thank ORNL reviewers Eddie Bright, Phil Coleman, Veeraraghavan Vijayaraj, and the unanimous SSTDM-07 workshop reviewers whose comments have greatly helped us in improving the technical quality of this paper. This research was partially supported by the LDRD initiative on “Emerging Science and Technology for Sustainable Bioenergy.”
Author information
Authors and Affiliations
Corresponding author
Additional information
Prepared by Oak Ridge National Laboratory, P.O. Box 2008, Oak Ridge, Tennessee 37831-6285, managed by UT-Battelle, LLC for the U. S. Department of Energy under contract no. DEAC05-00OR22725.
Rights and permissions
About this article
Cite this article
Vatsavai, R.R., Bhaduri, B. A hybrid classification scheme for mining multisource geospatial data. Geoinformatica 15, 29–47 (2011). https://doi.org/10.1007/s10707-010-0113-4
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10707-010-0113-4