Abstract
Using Genetic Programming (GP) for classifying data streams is problematic as GP is slow compared with traditional single solution techniques. However, the availability of cheaper and better-performing distributed and parallel architectures make it possible to deal with complex problems previously hardly solved owing to the large amount of time necessary. This work presents a general framework based on a distributed GP ensemble algorithm for coping with different types of concept drift for the task of classification of large data streams. The framework is able to detect changes in a very efficient way using only a detection function based on the incoming unclassified data. Thus, only if a change is detected a distributed GP algorithm is performed in order to improve classification accuracy and this limits the overhead associated with the use of a population-based method. Real world data streams may present drifts of different types. The introduced detection function, based on the self-similarity fractal dimension, permits to cope in a very short time with the main types of different drifts, as demonstrated by the first experiments performed on some artificial datasets. Furthermore, having an adequate number of resources, distributed GP can handle very frequent concept drifts.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Barbará, D.: Chaotic mining: Knowledge discovery using the fractal dimension. In: 1999 ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery (1999)
Breiman, L.: Bagging predictors. Machine Learning 24(2), 123–140 (1996)
Dietterich, T.G.: An experimental comparison of three methods for costructing ensembles of decision trees: Bagging, boosting, and randomization. Machine Learning (40), 139–157 (2000)
Folino, G., Pizzuti, C., Spezzano, G.: Ensembles for large scale data classification. IEEE Transaction on Evolutionary Computation 10(5), 604–616 (2006)
Folino, G., Pizzuti, C., Spezzano, G.: Mining distributed evolving data streams using fractal gp ensembles. In: Ebner, M., O’Neill, M., Ekárt, A., Vanneschi, L., Esparcia-Alcázar, A.I. (eds.) EuroGP 2007. LNCS, vol. 4445, pp. 160–169. Springer, Heidelberg (2007)
Folino, G., Pizzuti, C., Spezzano, G.: Training distributed gp ensemble with a selective algorithm based on clustering and pruning for pattern classification. IEEE Trans. Evolutionary Computation 12(4), 458–468 (2008)
Freund, Y., Scapire, R.: Experiments with a new boosting algorithm. In: Proceedings of the 13th Int. Conference on Machine Learning, pp. 148–156 (1996)
Grassberger, P.: Generalized dimensions of strange attractors. Physics Letters 97A, 227–230 (1983)
Iba, H.: Bagging, boosting, and bloating in genetic programming. In: Proc. of the Genetic and Evolutionary Computation Conference GECCO 1999, Orlando, Florida, July 1999, pp. 1053–1060. Morgan Kaufmann, San Francisco (1999)
Kolter, J.Z., Maloof, M.A.: Dynamic weighted majority: An ensemble method for drifting concepts. J. Mach. Learn. Res. 8, 2755–2790 (2007)
Langdon, W.B., Buxton, B.F.: Genetic programming for combining classifiers. In: Proc. of the Genetic and Evolutionary Computation Conference GECCO 2001, July 2001, pp. 66–73. Morgan Kaufmann, San Francisco (2001)
Liebovitch, L., Toth, T.: A fast algorithm to determine fractal dimensions by box counting. Physics Letters 141A(8) (1989)
Lin, G., Chen, L.: A grid and fractal dimension-based data stream clustering algorithm. In: International Symposium on Information Science and Engieering, vol. 1, pp. 66–70 (2008)
Mandelbrot, B.: The Fractal Geometry of Nature. W.H Freeman, New York (1983)
Minku, L.L., White, A.P., Yao, X.: The impact of diversity on on-line ensemble learning in the presence of concept drift. IEEE Transactions on Knowledge and Data Engineering 99(1), 5555
Ross Quinlan, J.: Bagging, boosting, and c4.5. In: Proceedings of the 13th National Conference on Artificial Intelligence AAAI 1996, pp. 725–730. MIT Press, Cambridge (1996)
Sarraille, J., DiFalco, P.: FD3, http://tori.postech.ac.kr/softwares
Schapire, R.E.: The strength of weak learnability. Machine Learning 5(2), 197–227 (1990)
Schapire, R.E.: Boosting a weak learning by majority. Information and Computation 121(2), 256–285 (1996)
Schlimmer, J.C., Granger Jr., R.H.: Incremental learning from noisy data. Mach. Learn. 1(3), 317–354 (1986)
Sousa, E.P.M., Ribeiro, M.X., Traina, A.J.M., Traina Jr., C.: Tracking the intrinsic dimension of evolving data streams to update association rules. In: 3rd International Workshop on Knowledge Discovery from Data Streams, part of the 23th International Conference on Machine Learning, ICML 2006 (2006)
Street, W.N., Kim, Y.: A streaming ensemble algorithm (SEA) for large-scale classification. In: Proceedings of the seventh ACM SIGKDD International conference on Knowledge discovery and data mining (KDD 2001), San Francisco, CA, USA, August 26-29, pp. 377–382. ACM, New York (2001)
Tykierko, M.: Using invariants to change detection in dynamical system with chaos. Physica D Nonlinear Phenomena 237, 6–13 (2008)
Utgoff, P.E.: Incremental induction of decision trees. Machine Learning 4, 161–186 (1989)
Wang, H., Fan, W., Yu, P.S., Han, J.: Mining concept-drifting data streams using ensemble classifiers. In: Proceedings of the nineth ACM SIGKDD International conference on Knowledge discovery and data mining (KDD 2003), Washington, DC, USA, August 24-27, pp. 226–235. ACM, New York (2003)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Folino, G., Papuzzo, G. (2010). Handling Different Categories of Concept Drifts in Data Streams Using Distributed GP. In: Esparcia-Alcázar, A.I., Ekárt, A., Silva, S., Dignum, S., Uyar, A.Ş. (eds) Genetic Programming. EuroGP 2010. Lecture Notes in Computer Science, vol 6021. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-12148-7_7
Download citation
DOI: https://doi.org/10.1007/978-3-642-12148-7_7
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-12147-0
Online ISBN: 978-3-642-12148-7
eBook Packages: Computer ScienceComputer Science (R0)