Skip to main content

Ensemble Synthetic Oversampling with Manhattan Distance for Unbalanced Hyperspectral Data

  • Conference paper
  • First Online:
Intelligent Data Engineering and Automated Learning – IDEAL 2021 (IDEAL 2021)

Abstract

Hyperspectral imaging is a spectroscopic imaging technique that can cover a broad range of electromagnetic wavelengths and subdivide those into spectral bands. As a consequence, it may distinguish specific features more effectively than conventional colour cameras. This technology has been increasingly used in agriculture for various applications such as crop leaf area index, plant classification and disease monitoring. However, the abundance of information in hyperspectral imagery may cause high dimensionality problem, leading to computational complexity and storage issues. Furthermore, data availability is another major issue. In agriculture application, typically, it is difficult to collect equal number of samples as some classes or diseases are rare while others are abundant and easy to collect. This may give rise to an imbalanced data problem that can severely reduce machine learning performance and introduce bias in performance measurement. In this paper, an oversampling method is proposed based on Safe-Level synthetic minority oversampling technique (Safe-Level SMOTE), which is modified in terms of its k-nearest neighbours (KNN) function to make it fit better with high dimensional data. Using convolutional neural networks (CNN) as the classifier combined with ensemble bagging with differentiated sampling rate (DSR), the approach demonstrates better performances than the other state-of-the-art methods in handling imbalance situations.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Alsuwaidi, A., Veys, C., Hussey, M., Grieve, B., Yin, H.: Hyperspectral feature selection ensemble for plant classification. Hyperspectral Imaging Appl. (HSI 2016) (2016)

    Google Scholar 

  2. Alsuwaidi, A., Grieve, B., Yin, H.: Feature-ensemble-based novelty detection for analyzing plant hyperspectral datasets. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 11(4), 1041–1055 (2018)

    Google Scholar 

  3. Sambasivam, G., Opiyo, G.D.: A predictive machine learning application in agriculture: Cassava disease detection and classification with imbalanced dataset using convolutional neural networks. Egypt. Inf. J. 22(1), 27–34 (2020)

    Google Scholar 

  4. Hussein, B.R., Malik, O.A., Ong, W.-H., Slik, J.W.F.: Automated classification of tropical plant species data based on machine learning techniques and leaf trait measurements. In: Alfred, R., Lim, Y., Haviluddin, H., On, C.K. (eds.) Computational Science and Technology. LNEE, vol. 603, pp. 85–94. Springer, Singapore (2020). https://doi.org/10.1007/978-981-15-0058-9_9

    Chapter  Google Scholar 

  5. Divakar, S., Bhattacharjee, A., Priyadarshini, R.: Smote-DL: a deep learning based plant disease detection method. In: 6th International Conference for Convergence in Technology (I2CT) (2021)

    Google Scholar 

  6. Feng, W., Huang, W., Ye, H., Zhao, L.: Synthetic minority over-sampling technique based rotation forest for the classification of unbalanced hyperspectral data. In: International Geoscience and Remote Sensing Symposium (IGARSS), vol. 12(7), pp. 2159–2169 (2018)

    Google Scholar 

  7. Zhang, X., Song, Q., Zheng, Y., Hou, B., Gou, S.: Classification of imbalanced hyperspectral imagery data using support vector sampling. In: International Geoscience and Remote Sensing Symposium (IGARSS) (2014)

    Google Scholar 

  8. Li, C., Qu, X., Yang, Y., Yao, D., Gao, H., Hua, Z.: Composite clustering sampling strategy for multiscale spectral-spatial classification of hyperspectral images. J. Sens. 2020 (2020). Article ID 9637839, 17 pages. https://doi.org/10.1155/2020/9637839

  9. Baumgardner, M.F., Biehl, L.L., Landgrebe, D.A.: 220 Band AVIRIS hyperspectral image data set: June 12, 1992 Indian pine test site 3. Purdue Univ. Res. Repos. (2015)

    Google Scholar 

  10. Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16(1), 321–357 (2002)

    Article  Google Scholar 

  11. Blagus, R., Lusa, L.: SMOTE for high-dimensional class-imbalanced data. BMC Bioinformatics 14(106), 1471–2105 (2013)

    Google Scholar 

  12. Batista, G.E.A.P.A., Prati, R.C., Monard, M.C.: A study of the behavior of several methods for balancing machine learning training data. ACM SIGKDD Explor. Newsl. 6(1), 20–29 (2004)

    Google Scholar 

  13. Han, H., Wang, Wen-Yuan., Mao, Bing-Huan.: Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning. In: Huang, De-Shuang., Zhang, Xiao-Ping., Huang, Guang-Bin. (eds.) ICIC 2005. LNCS, vol. 3644, pp. 878–887. Springer, Heidelberg (2005). https://doi.org/10.1007/11538059_91

    Chapter  Google Scholar 

  14. Bunkhumpornpat, C., Sinapiromsaran, K., Lursinsap, C.: Safe-level-SMOTE: safe-level-synthetic minority over-sampling technique for handling the class imbalanced problem. In: Theeramunkong, Thanaruk, Kijsirikul, Boonserm, Cercone, Nick, Ho, Tu-Bao. (eds.) PAKDD 2009. LNCS (LNAI), vol. 5476, pp. 475–482. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-01307-2_43

    Chapter  Google Scholar 

  15. Aggarwal, C.C., Hinneburg, A., Keim, D.A.: On the surprising behavior of distance metrics in high dimensional space. In: Van den Bussche, J., Vianu, V. (eds.) Database Theory — ICDT 2001. ICDT 2001. Lecture Notes in Computer Science, vol. 1973, pp. 420–434. Springer, Heidelberg (2001). https://doi.org/10.1007/3-540-44503-X_27

  16. Feng, W., Huang, W., Bao, W.: Imbalanced hyperspectral image classification with an adaptive ensemble method based on SMOTE and rotation forest with differentiated sampling rates. IEEE Geosci. Remote Sens. Lett. 16(12), 1879–1883 (2019)

    Article  Google Scholar 

  17. Chicco, D., Jurman, G.: The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation. BMC Genomics. 21(1) (2020). Article ID 6. https://doi.org/10.1186/s12864-019-6413-7

  18. Alotaibi, B., Alotaibi, M.: A hybrid deep ResNet and inception model for hyperspectral image classification. PFG – J. Photogram. Remote Sens. Geoinformation Sci. 88(6), 463–476 (2020). https://doi.org/10.1007/s41064-020-00124-x

    Article  Google Scholar 

  19. Cai, L., Zhang, G.: Hyperspectral image classification with imbalanced data based on oversampling and convolutional neural network. In: AOPC: AI in Optics and Photonics (2019)

    Google Scholar 

  20. Li, J., Du, Q., Li, Y., Li, W.: Hyperspectral image classification with imbalanced data based on orthogonal complement subspace projection. IEEE Trans. Geosci. Remote Sens. 56(7), 3838–3851 (2018)

    Article  Google Scholar 

Download references

Acknowledgement

Tajul Miftahushudur would like to acknowledge the Scholarship provided by the Indonesian Endowment Fund for Education (LPDP).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tajul Miftahushudur .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Miftahushudur, T., Grieve, B., Yin, H. (2021). Ensemble Synthetic Oversampling with Manhattan Distance for Unbalanced Hyperspectral Data. In: Yin, H., et al. Intelligent Data Engineering and Automated Learning – IDEAL 2021. IDEAL 2021. Lecture Notes in Computer Science(), vol 13113. Springer, Cham. https://doi.org/10.1007/978-3-030-91608-4_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-91608-4_6

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-91607-7

  • Online ISBN: 978-3-030-91608-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics