Skip to main content
Log in

A hybrid classification scheme for mining multisource geospatial data

  • Published:
GeoInformatica Aims and scope Submit manuscript

Abstract

Supervised learning methods such as Maximum Likelihood (ML) are often used in land cover (thematic) classification of remote sensing imagery. ML classifier relies exclusively on spectral characteristics of thematic classes whose statistical distributions (class conditional probability densities) are often overlapping. The spectral response distributions of thematic classes are dependent on many factors including elevation, soil types, and ecological zones. A second problem with statistical classifiers is the requirement of the large number of accurate training samples (10 to 30 × |dimensions|), which are often costly and time consuming to acquire over large geographic regions. With the increasing availability of geospatial databases, it is possible to exploit the knowledge derived from these ancillary datasets to improve classification accuracies even when the class distributions are highly overlapping. Likewise newer semi-supervised techniques can be adopted to improve the parameter estimates of the statistical model by utilizing a large number of easily available unlabeled training samples. Unfortunately, there is no convenient multivariate statistical model that can be employed for multisource geospatial databases. In this paper we present a hybrid semi-supervised learning algorithm that effectively exploits freely available unlabeled training samples from multispectral remote sensing images and also incorporates ancillary geospatial databases. We have conducted several experiments on Landsat satellite image datasets, and our new hybrid approach shows over 24% to 36% improvement in overall classification accuracy over conventional classification schemes.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  1. Ando RK, Zhang T (2005) A framework for learning predictive structures from multiple tasks and unlabeled data. J Mach Learn Res 6:1817–1853

    Google Scholar 

  2. Bilmes J (1997) A gentle tutorial on the EM algorithm and its application to parameter estimation for Gaussian mixture and hidden Markov models. Technical Report, University of Berkeley. ICSI-TR-97-021, 1997

  3. Bolstad P, Lillesand T (1992) Rule-based classification models: flexible integration of satellite imagery and thematic spatial data. Photogramm Eng Remote Sensing 58(7):965–971

    Google Scholar 

  4. Bruzzone L, Conese C, Maselli F, Roli F (1997) Multisource classification of complex rural areas by statistical and neural-network approaches. Photogramm Eng Remote Sensing 63(5):523–533

    Google Scholar 

  5. Cozman F, Cohen I, Cirelo M (2003) Semi-supervised learning of mixture models. In: Twentieth international conference on machine learning (ICML)

  6. Crist E, Kauth RJ (1986) The tasseled cap de-mystified. Photogramm Eng Remote Sensing 52(1):81–86

    Google Scholar 

  7. Dempster A, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc Ser B 39(1):1–38

    Google Scholar 

  8. Duin R (2000) Classifiers in almost empty spaces. In: Proc. 15th int. conference on pattern recognition (Barcelona, Spain, Sep.3–7), vol 2. IEEE Computer Society, Los Alamitos, pp 1–7

    Google Scholar 

  9. Fukunaga K, Hayes RR (1989) Effects of sample size in classifier design. IEEE Trans Pattern Anal Mach Intell 13(3):252–264

    Google Scholar 

  10. Goldman S, Zhou Y (2000) Enhancing supervised learning with unlabeled data. In: Proc. 17th international conf. on machine learning. Morgan Kaufmann, San Francisco, pp 327–334

    Google Scholar 

  11. Jensen JR (1996) Introductory digital image processing, a remote sensing perspective. Prentice Hall, Upper Saddle River

    Google Scholar 

  12. Maselli F, Conese C, Petkov L, Resti R (1992) Inclusion of prior probabilities derived from a non-parametric process into the maximum likelihood classifier. Photogramm Eng Remote Sensing 58(2):201–207

    Google Scholar 

  13. Mather PM (2004) Computer processing of remotely-sensed images: an introduction. Wiley, New York

    Google Scholar 

  14. Mitchell T (1999) The role of unlabeled data in supervised learning. In: Proceedings of the sixth international colloquium on cognitive science, San Sebastian, Spain

  15. Nigam K, McCallum AK, Thrun S, Mitchell TM (2000) Text classification from labeled and unlabeled documents using EM. Mach learn 39(2/3):103–134

    Article  Google Scholar 

  16. Paola JD, Schowengerdt RA (1997) The effect of neural-network structure on a multispectral land-use/land-cover classification. Photogramm Eng Remote Sensing 63(5):535–544

    Google Scholar 

  17. Raudys SJ, Jain AK (1991) Small sample size effects in statistical pattern recognition: recommendations for practitioners. IEEE Trans Pattern Anal Mach Intell 13(3):252–264

    Article  Google Scholar 

  18. Richards JA, Jia X (1999) Remote sensing digital image analysis. Springer, New York

    Google Scholar 

  19. Shahshahani B, Landgrebe D (1994) The effect of unlabeled samples in reducing the small sample size problem and mitigating the Hughes phenomenon. IEEE Trans Geosci Remote Sens 32(5):1087–1095

    Article  Google Scholar 

  20. Skidmore A (1989) An expert system classifies eucalypt forest types using thematic mapper data and a digital terrain model. Photogramm Eng Remote Sensing 55(10):1449–1464

    Google Scholar 

  21. Skidmore A, Turner B, Brinkhof W, Knowles E (1997) Performance of a neural network: mapping forest using GIS and remotely sensed data. Photogramm Eng Remote Sensing 63(5):501–514

    Google Scholar 

  22. Skurichina M, Duin R (1996) Stabilizing classifiers for very small sample sizes. In: Proc. 10th int. conference on pattern recognition. IEEE Computer Society, Los Alamitos, pp 891–896

    Chapter  Google Scholar 

  23. Strahler A (1980) The use of prior probabilities in maximum likelihood classificaiton of remote sensing data. Remote Sens Environ 10:135–163

    Article  Google Scholar 

  24. Tadjudin S, Landgrebe DA (1999) Covariance estimation with limited training samples. IEEE Trans Geosci Remote Sens 37(4):2113–2118

    Article  Google Scholar 

  25. Zhu X (2008) Semi-supervised learning literature survey. Technical report (TR 1530, University of Wisconsin, Madison)

Download references

Acknowledgements

We would like to thank our collaborators Prof. Shekhar and Prof. Thomas E. Burk at the University of Minnesota for their contributions and support. We would like to thank ORNL reviewers Eddie Bright, Phil Coleman, Veeraraghavan Vijayaraj, and the unanimous SSTDM-07 workshop reviewers whose comments have greatly helped us in improving the technical quality of this paper. This research was partially supported by the LDRD initiative on “Emerging Science and Technology for Sustainable Bioenergy.”

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ranga Raju Vatsavai.

Additional information

Prepared by Oak Ridge National Laboratory, P.O. Box 2008, Oak Ridge, Tennessee 37831-6285, managed by UT-Battelle, LLC for the U. S. Department of Energy under contract no. DEAC05-00OR22725.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Vatsavai, R.R., Bhaduri, B. A hybrid classification scheme for mining multisource geospatial data. Geoinformatica 15, 29–47 (2011). https://doi.org/10.1007/s10707-010-0113-4

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10707-010-0113-4

Keywords

Navigation