Abstract
Streaming data classification requires that several additional challenges are addressed that are not typically encountered in offline supervised learning formulations. Specifically, access to data at any training generation is limited to a small subset of the data, and the data itself is potentially generated by a non-stationary process. Moreover, there is a cost to requesting labels, thus a label budget is enforced. Finally, an anytime classification requirement implies that it must be possible to identify a ‘champion’ classifier for predicting labels as the stream progresses. In this work, we propose a general framework for deploying genetic programming (GP) to streaming data classification under these constraints. The framework consists of a sampling policy and an archiving policy that enforce criteria for selecting data to appear in a data subset. Only the exemplars of the data subset are labeled, and it is the content of the data subset that training epochs are performed against. Specific recommendations include support for GP task decomposition/modularity and making additional training epochs per data subset. Both recommendations make significant improvements to the baseline performance of GP under streaming data with label budgets. Benchmarking issues addressed include the identification of datasets and performance measures.
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Publicly available at http://web.cs.dal.ca/~mheywood/Code/SBB/Stream/StreamData.html.
- 2.
Gabor Melli. The ‘datgen’ Dataset Generator. http://www.datsetgenerator.com/.
- 3.
MOA prerelease 2014.03; http://moa.cms.waikato.ac.nz/overview/.
- 4.
Implying that 10 % of the label budget is consumed in pre-training.
- 5.
Electricity demand and forest cover type datasets observed similar effects and therefore results are not explicitly reported.
- 6.
Similar effects being observed for the electricity and forest cover type datasets.
- 7.
Other than the monolithic formulation of SBB being subject to the constraint that only one program may represent each class, the two implementations are the same.
- 8.
Results not shown for brevity.
References
A. Atwater and M. I. Heywood. Benchmarking Pareto archiving heuristics in the presence of concept drift: Diversity versus age. In ACM Genetic and Evolutionary Computation Conference, pages 885–892, 2013.
A. Atwater, M. I. Heywood, and A. N. Zincir-Heywood. GP under streaming data constraints: A case for Pareto archiving? In ACM Genetic and Evolutionary Computation Conference, pages 703–710, 2012.
K. Bache and M. Lichman. UCI machine learning repository, 2013.
M. Behdad and T. French. Online learning classifiers in dynamic environments with incomplete feedback. In IEEE Congress on Evolutionary Computation, pages 1786–1793, 2013.
A. Bifet. Adaptive Stream Mining: Pattern Learning and Mining from Evolving Data Streams, volume 207 of Frontiers in Artificial Intelligence and Applications. IOS Press, 2010.
A. Bifet and R. Gavalda. Learning from time-changing data with adaptive windowing. In SIAM International Conference on Data Mining, pages 443–448, 2007.
A. Bifet, I. Z̆liobaitė, B. Pfahringer, and G. Holmes. Pitfalls in benchmarking data stream classification and how to avoid them. In Machine Learning and Knowledge Discovery in Databases, volume 8188 of LNCS, pages 465–479, 2013.
T. Blackwell and J. Branke. Multiswarms, exclusion, and anti-convergence in dynamic environments. IEEE Transactions on Evolutionary Computation, 10(4):459–472, 2006.
G. Brown and L. I. Kuncheva. “Good” and “bad” diversity in majority vote ensembles. In Multiple Classifier Systems, volume 5997 of LNCS, pages 124–133, 2010.
T. Dasu, S. Krishnan, S. Venkatasubramanian, and K. Yi. An information-theoretic approach to detecting changes in multi-dimensional data streams. In Proceedings of the Symposium on the Interface of Statistics, 2006.
A. P. Dawid. Statistical theory: The prequential approach. Journal of the Royal Statistical Society-A, 147:278–292, 1984.
E. D. de Jong. A monotonic archive for pareto-coevolution. Evolutionary Computation, 15(1):61–94, 2007.
I. Dempsey, M. O’Neill, and A. Brabazon. Foundations in Grammatical Evolution for Dynamic Environments, volume 194 of Studies in Computational Intelligence. Springer, 2009.
G. Ditzler and R. Polikar. Hellinger distance based drift detection for non-stationary environments. In IEEE Symposium on Computational Intelligence in Dynamic and Uncertain Environments, pages 41–48, 2011.
J. A. Doucette, P. Lichodzijewski, and M. I. Heywood. Hierarchical task decomposition through symbiosis in reinforcement learning. In ACM Genetic and Evolutionary Computation Conference, pages 97–104, 2012a.
J. A. Doucette, A. R. McIntyre, P. Lichodzijewski, and M. I. Heywood. Symbiotic coevolutionary genetic programming: a benchmarking study under large attribute spaces. Genetic Programming and Evolvable Machines, 13(1), 2012b.
W. Fan, Y. Huang, H. Wang, and P. S. Yu. Active mining of data streams. In Proceedings of SIAM International Conference on Data Mining, pages 457–461, 2004.
G. Folino and G. Papuzzo. Handling different categories of concept drift in data streams using distributed GP. In European Conference on Genetic Programming, volume 6021 of LNCS, pages 74–85, 2010.
J. Gama. Knowledge discovery from data streams. CRC Press, 2010.
J. Gama. A survey on learning from data streams: Current and future trends. Progress in Artificial Intelligence, 1(1):45–55, 2012.
J. Gama, P. Medas, G. Castillo, and P. P. Rodrigues. Learning with drift detection. In Advances in Artificial Intelligence, volume 3171 of LNCS, pages 66–112, 2004.
J. Gama, R. Sebastião, and P. Rodrigues. On evaluating stream learning algorithms. Machine Learning, 90(3):317–346, 2013.
M. Harries. Splice-2 comparative evaluation: Electricity pricing. Technical report, University of New South Wales, 1999.
M. I. Heywood. Evolutionary model building under streaming data for classification tasks: opportunities and challenges. Genetic Programming and Evolvable Machines, 2015. DOI 10.1007/s10710-014-9236-y.
S. Huang and Y. Dong. An active learning system for mining time changing data streams. Intelligent Data Analysis, 11(4):401–419, 2007.
N. Kashtan, E. Noor, and U. Alon. Varying environments can speed up evolution. Proceedings of the National Academy of Sciences, 104(34):13713–13716, 2007.
D. Kifer, S. Ben-David, and J. Gehrke. Detecting change in data streams. In Proceedings of the International Conference on Very Large Data Bases, pages 180–191. Morgan Kaufmann, 2004.
C. Lanquillon. Information filtering in changing domains. In Proceedings of the International Joint Conference on Artificial Intelligence, pages 41–48, 1999.
P. Lichodzijewski and M. I. Heywood. Managing team-based problem solving with Symbiotic Bid-based Genetic Programming. In ACM Genetic and Evolutionary Computation Conference, pages 363–370, 2008.
P. Lichodzijewski and M. I. Heywood. Symbiosis, complexification and simplicity under GP. In ACM Genetic and Evolutionary Computation Conference, pages 853–860, 2010.
P. Lindstrom, B. MacNamee, and S. J. Delany. Handling concept drift in a text data stream constrained by high labelling cost. In Proceedings of the International Florida Artificial Intelligence Research Society Conference. AAAI, 2010.
P. Lindstrom, B. MacNamee, and S. J. Delany. Drift detection using uncertainty distribution divergence. Evolutionary Intelligence, 4(1):13–25, 2013.
L. L. Minku, A. P. White, and X. Yao. The impact of diversity on online ensemble learning in the presence of concept drift. IEEE Transactions on Knowledge and Data Engineering, 22(5):730–742, 2010.
M. Parter, N. Kashtan, and U. Alon. Facilitated variation: How evolution learns from past environments to generalize to new environments. PLoS Computational Biology, 4(11):e1000206, 2008.
J. Quinonero-Candela, M. Sugiyama, A. Schwaighofer, and N. D. Lawrence, editors. Dataset shift in machine learning. MIT Press, 2009.
R. Sebastio and J. Gama. Change detection in learning histograms from data streams. In Proceedings of the Portuguese Conference on Artificial Intelligence, volume 4874 of LNCS, pages 112–123. Springer, 2007.
R. Stapenhurst and G. Brown. Theoretical and empirical analysis of diversity in non-stationary learning. In IEEE Symposium on Computational Intelligence in Dynamic and Uncertain Environments, pages 25–32, 2011.
I. Z̆liobaitė, A. Bifet, B. Pfahringer, and G. Holmes. Active learning with evolving streaming data. In Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases, pages 597–612. Springer, 2011.
I. Z̆liobaitė, A. Bifet, B. Pfahringer, and G. Holmes. Active learning with drifting streaming data. IEEE Transactions on Neural Networks and Learning Systems, 25(1):27–54, 2014.
A. Vahdat, A. Atwater, A. R. McIntyre, and M. I. Heywood. On the application of GP to streaming data classification tasks with label budgets. In ACM Genetic and Evolutionary Computation Conference: ECBDL Workshop, pages 1287–1294, 2014.
A. Vahdat, J. Morgan, A. R. McIntyre, M. I. Heywood, and A. N. Zincir-Heywood. Tapped delay lines for GP streaming data classification with label budgets. In European Conference on Genetic Programming, volume 9025 of LNCS. Springer, 2015.
P. Vorburger and A. Bernstein. Entropy-based concept shift detection. In Proceedings of the Sixth International Conference on Data Mining, pages 1113–1118, 2006.
G. P. Wagner and L. Altenberg. Complex adaptations and the evolution of evolvability. Complexity, 50(3):433–452, 1996.
X. Zhu, P. Zhang, X. Lin, and Y. Shi. Active learning from stream data using optimal weight classifier ensemble. IEEE Transactions on Systems, Man, and Cybernetics – Part B, 40(6):1607–1621, 2010.
Acknowledgements
The authors gratefully acknowledge funding provided by the NSERC CRD grant program (Canada).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this chapter
Cite this chapter
Vahdat, A., Morgan, J., McIntyre, A.R., Heywood, M.I., Zincir-Heywood, N. (2015). Evolving GP Classifiers for Streaming Data Tasks with Concept Change and Label Budgets: A Benchmarking Study. In: Gandomi, A., Alavi, A., Ryan, C. (eds) Handbook of Genetic Programming Applications. Springer, Cham. https://doi.org/10.1007/978-3-319-20883-1_18
Download citation
DOI: https://doi.org/10.1007/978-3-319-20883-1_18
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-20882-4
Online ISBN: 978-3-319-20883-1
eBook Packages: Computer ScienceComputer Science (R0)