Skip to main content

On the Impact of Class Imbalance in GP Streaming Classification with Label Budgets

  • Conference paper
  • First Online:
Genetic Programming (EuroGP 2016)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9594))

Included in the following conference series:

Abstract

Streaming data scenarios introduce a set of requirements that do not exist under supervised learning paradigms typically employed for classification. Specific examples include, anytime operation, non-stationary processes, and limited label budgets. From the perspective of class imbalance, this implies that it is not even possible to guarantee that all classes are present in the samples of data used to construct a model. Moreover, when decisions are made regarding what subset of data to sample, no label information is available. Only after sampling is label information provided. This represents a more challenging task than encountered under non-streaming (offline) scenarios because the training partition contains label information. In this work, we investigate the utility of different protocols for sampling from the stream under the above constraints. Adopting a uniform sampling protocol was previously shown to be reasonably effective under both evolutionary and non-evolutionary streaming classifiers. In this work, we introduce a scheme for using the current ‘champion’ classifier to bias the sampling of training instances during the course of the stream. The resulting streaming framework for genetic programming is more effective at sampling minor classes and therefore reacting to changes in the underlying process responsible for generating the data stream.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Earlier work with SBB under streaming data assumed that label information could be used to ensure the data subset was always balanced [15].

  2. 2.

    Valid class labels appear over the interval [1, ... , C].

  3. 3.

    Given the later benchmarking parameterization this corresponds to no more than 25 generations.

  4. 4.

    Previous studies had compared StreamGP under the uniform sampling protocol to non-evolutionary streaming algorithms [16, 17].

  5. 5.

    Shift and Drift datasets are available from: http://web.cs.dal.ca/~mheywood/Code/SBB/Stream/StreamData.html.

  6. 6.

    Electricity and Cover Type are available from: http://moa.cms.waikato.ac.nz/datasets/.

  7. 7.

    Any more than five resulted in negligible improvement [16].

  8. 8.

    Violin plots were used to establish that the distributions did not conform to a normal distribution. Space precludes their inclusion.

References

  1. Bifet, A.: Adaptive Stream Mining: Pattern Learning and Mining from Evolving Data Streams. Frontiers in Artificial Intelligence and Applications, vol. 207. IOS Press, Amsterdam (2010)

    MATH  Google Scholar 

  2. Bifet, A., Read, J., Žliobaitė, I., Pfahringer, B., Holmes, G.: Pitfalls in benchmarking data stream classification and how to avoid them. In: Blockeel, H., Kersting, K., Nijssen, S., Železný, F. (eds.) ECML PKDD 2013, Part I. LNCS, vol. 8188, pp. 465–479. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  3. Brameier, M., Banzhaf, W.: Evolving teams of predictors with linear genetic programming. Genet. Program. Evolvable Mach. 2(4), 381–408 (2001)

    Article  MATH  Google Scholar 

  4. Dempsey, I., O’Neill, M., Brabazon, A.: Grammatical Evolution. In: Dempsey, I., O’Neill, M., Brabazon, A. (eds.) Foundations in Grammatical Evolution for Dynamic Environments. SCI, vol. 194, pp. 9–24. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  5. Ditzler, G., Roveri, M., Alippi, C., Polikar, R.: Learning in nonstationary environments: a survey. IEEE Comput. Intell. Mag. 10(4), 12–25 (2015)

    Article  Google Scholar 

  6. Fan, W., Huang, Y., Wang, H., Yu, P.S.: Active mining of data streams. In: SIAM International Conference on Data Mining, pp. 457–461 (2004)

    Google Scholar 

  7. Gama, J.: Knowledge Discovery from Data Streams. CRC Press, Boca Raton (2010)

    Book  MATH  Google Scholar 

  8. Gama, J., Sabastiao, R., Rodrigues, P.P.: On evaluating stream learning algorithms. Mach. Learn. 90, 317–346 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  9. Heywood, M.I.: Evolutionary model building under streaming data for classification tasks: opportunities and challenges. Genet. Program. Evolvable Mach. 16(3), 283–326 (2015)

    Article  Google Scholar 

  10. Lichodzijewski, P., Heywood, M.I.: Managing team-based problem solving with symbiotic bid-based genetic programming. In: ACM Genetic and Evolutionary Computation Conference, pp. 363–370 (2008)

    Google Scholar 

  11. Lichodzijewski, P., Heywood, M.I.: Symbiosis, complexification and simplicity under GP. In: ACM Genetic and Evolutionary Computation Conference, pp. 853–860 (2010)

    Google Scholar 

  12. Polikar, R., Alippi, C.: Guest editorial: learning in nonstationary and evolving environments. IEEE Trans. Neural Netw. Learn. Syst. 25(1), 9–11 (2014)

    Article  Google Scholar 

  13. Thomason, R., Soule, T.: Novel ways of improving cooperation and performance in ensemble classifiers. In: ACM Genetic and Evolutionary Computation Conference, pp. 1708–1715 (2007)

    Google Scholar 

  14. Žliobaitė, I., Bifet, A., Pfahringer, B., Holmes, G.: Active learning with drifting streaming data. IEEE Trans. Neural Netw. Learn. Syst. 25(1), 27–54 (2014)

    Article  Google Scholar 

  15. Vahdat, A., Atwater, A., McIntyre, A.R., Heywood, M.I.: On the application of GP to streaming data classification tasks with label budgets. In: ACM GECCO (Companion), pp. 1287–1294 (2014)

    Google Scholar 

  16. Vahdat, A., Morgan, J., McIntyre, A., Heywood, M., Zincir-Heywood, A.: Evolving GP classifiers for streaming data tasks with concept change and label budgets: a benchmarking study. In: Gandomi, A.H., Alavi, A.H., Ryan, C. (eds.) Handbook of Genetic Programming Applications, pp. 451–480. Springer, Switzerland (2015)

    Chapter  Google Scholar 

  17. Vahdat, A., Morgan, J., McIntyre, A., Heywood, M., Zincir-Heywood, A.: Tapped delay lines for GP streaming data classification with label budgets. In: Machado, P., et al. (eds.) Genetic Programming. LNCS, vol. 9025, pp. 126–138. Springer, Switzerland (2015)

    Google Scholar 

  18. Wagner, N., Michalewicz, Z., Khouja, M., McGregor, R.R.: Time series forecasting for dynamic environments: the DyFor genetic program model. IEEE Trans. Evol. Comput. 11(4), 433–452 (2007)

    Article  Google Scholar 

  19. Wu, S., Banzhaf, W.: Rethinking multilevel selection in genetic programming. In: ACM Genetic and Evolutionary Computation Conference, pp. 1403–1410 (2011)

    Google Scholar 

  20. Zhu, X., Zhang, P., Lin, X., Shi, Y.: Active learning from stream data using optimal weight classifier ensemble. IEEE Trans. Syst. Man Cybern.: Part B 40(6), 1607–1621 (2010)

    Article  Google Scholar 

Download references

Acknowledgments

This research is supported by the Canadian Safety and Security Program(CSSP) E-Security grant. The CSSP is led by the Defense Research and Development Canada, Centre for Security Science (CSS) on behalf of the Government of Canada and its partners across all levels of government, response and emergency management organizations, nongovernmental agencies, industry and academia.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sara Khanchi .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Khanchi, S., Heywood, M.I., Zincir-Heywood, N. (2016). On the Impact of Class Imbalance in GP Streaming Classification with Label Budgets. In: Heywood, M., McDermott, J., Castelli, M., Costa, E., Sim, K. (eds) Genetic Programming. EuroGP 2016. Lecture Notes in Computer Science(), vol 9594. Springer, Cham. https://doi.org/10.1007/978-3-319-30668-1_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-30668-1_3

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-30667-4

  • Online ISBN: 978-3-319-30668-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics