Skip to main content

SCUT-DS: Learning from Multi-class Imbalanced Canadian Weather Data

  • Conference paper
  • First Online:
Foundations of Intelligent Systems (ISMIS 2018)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11177))

Included in the following conference series:

Abstract

Learning from multi-class imbalanced data streams with multiple minority classes, and varying degrees of skewed distributions, is an important problem in many real-world applications. However, to date, this aspect has received limited attention in the research community. Rather, the focus is on binary class problems or, alternatively, multi-class scenarios are decomposed into multiple binary sub-problems that are handled separately. Furthermore, the evolving nature of data streams make the task of correctly predicting minority instances challenging. In this paper, we introduce the SCUT-DS approach that combines multi-class synthetic oversampling and cluster-based under-sampling. SCUT-DS is a window-based method that balances the number of incoming instances of all classes directly, as the stream evolves. We present our experimental evaluation against a stream of Canadian weather data, with varying degree of skewed distributions and multiple classes. We demonstrate that our SCUT-DS algorithms consistently improve the recognition rates of the minority instances in this multi-class imbalanced setting. Our results are especially promising for difficult-to-learn minority classes, notably for predicting ice storms and glaze events.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://github.com/BukkyOlaitan.

  2. 2.

    http://climate.weather.gc.ca/historical_data/search_historic_data_e.html.

References

  1. Agrawal, A., Viktor, H.L., Paquet, E.: SCUT: multi-class imbalanced data classification using SMOTE and cluster-based undersampling. In: 2015 7th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K), vol. 1, pp. 226–234. IEEE (2015)

    Google Scholar 

  2. Bifet, A., Holmes, G., Kirkby, R., Pfahringer, B.: MOA: Massive Online Analysis. J. Mach. Learn. Res. 11, 1601–1604 (2010)

    Google Scholar 

  3. Chen, S., He, H.: SERA: selectively recursive approach towards nonstationary imbalanced stream data mining. In: International Joint Conference on Neural Networks, IJCNN 2009, pp. 522–529. IEEE (2009)

    Google Scholar 

  4. Chen, S., He, H., Li, K., Desai, S.: MUSERA: multiple selectively recursive approach towards imbalanced stream data mining. In: The 2010 International Joint Conference on Neural Networks (IJCNN), pp. 1–8. IEEE (2010)

    Google Scholar 

  5. Ditzler, G., Polikar, R., Chawla, N.: An incremental learning algorithm for non-stationary environments and class imbalance. In: 2010 20th International Conference on Pattern Recognition (ICPR), pp. 2997–3000. IEEE (2010)

    Google Scholar 

  6. Gao, J., Ding, B., Fan, W., Han, J., Philip, S.Y.: Classifying data streams with skewed class distributions and concept drifts. IEEE Internet Comput. 12(6), 37–49 (2008)

    Article  Google Scholar 

  7. Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA data mining software: an update. ACM SIGKDD Explor. Newsl. 11(1), 10–18 (2009)

    Article  Google Scholar 

  8. He, H., Garcia, E.A.: Learning from imbalanced data. IEEE Trans. Knowl. Data Eng. 21(9), 1263–1284 (2009)

    Article  Google Scholar 

  9. Jo, T., Japkowicz, N.: Class imbalances versus small disjuncts. ACM SIGKDD Explor. Newsl. 6(1), 40–49 (2004)

    Article  Google Scholar 

  10. Mirza, B., Lin, Z., Toh, K.-A.: Weighted online sequential extreme learning machine for class imbalance learning. Neural Process. Lett. 38(3), 465–486 (2013)

    Article  Google Scholar 

  11. Oza, N.C.: Online bagging and boosting. In: 2005 IEEE International Conference on Systems, Man and Cybernetics, vol. 3, pp. 2340–2345. IEEE (2005)

    Google Scholar 

  12. Wang, S., Minku, L.L., Yao, X.: Dealing with multiple classes in online class imbalance learning. In: IJCAI, pp. 2118–2124 (2016)

    Google Scholar 

  13. Wang, S., Yao, X.: Multiclass imbalance problems: analysis and potential solutions. IEEE Trans. Syst. Man Cybern. Part B (Cybern.) 42(4), 1119–1130 (2012)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Herna L. Viktor .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Olaitan, O.M., Viktor, H.L. (2018). SCUT-DS: Learning from Multi-class Imbalanced Canadian Weather Data. In: Ceci, M., Japkowicz, N., Liu, J., Papadopoulos, G., RaÅ›, Z. (eds) Foundations of Intelligent Systems. ISMIS 2018. Lecture Notes in Computer Science(), vol 11177. Springer, Cham. https://doi.org/10.1007/978-3-030-01851-1_28

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-01851-1_28

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-01850-4

  • Online ISBN: 978-3-030-01851-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics