Handling Different Categories of Concept Drifts in Data Streams Using Distributed GP

Folino, Gianluigi; Papuzzo, Giuseppe

doi:10.1007/978-3-642-12148-7_7

Handling Different Categories of Concept Drifts in Data Streams Using Distributed GP

Gianluigi Folino²¹ &
Giuseppe Papuzzo²¹

Conference paper

791 Accesses
5 Citations

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 6021))

Abstract

Using Genetic Programming (GP) for classifying data streams is problematic as GP is slow compared with traditional single solution techniques. However, the availability of cheaper and better-performing distributed and parallel architectures make it possible to deal with complex problems previously hardly solved owing to the large amount of time necessary. This work presents a general framework based on a distributed GP ensemble algorithm for coping with different types of concept drift for the task of classification of large data streams. The framework is able to detect changes in a very efficient way using only a detection function based on the incoming unclassified data. Thus, only if a change is detected a distributed GP algorithm is performed in order to improve classification accuracy and this limits the overhead associated with the use of a population-based method. Real world data streams may present drifts of different types. The introduced detection function, based on the self-similarity fractal dimension, permits to cope in a very short time with the main types of different drifts, as demonstrated by the first experiments performed on some artificial datasets. Furthermore, having an adequate number of resources, distributed GP can handle very frequent concept drifts.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Barbará, D.: Chaotic mining: Knowledge discovery using the fractal dimension. In: 1999 ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery (1999)
Google Scholar
Breiman, L.: Bagging predictors. Machine Learning 24(2), 123–140 (1996)
MATH MathSciNet Google Scholar
Dietterich, T.G.: An experimental comparison of three methods for costructing ensembles of decision trees: Bagging, boosting, and randomization. Machine Learning (40), 139–157 (2000)
Google Scholar
Folino, G., Pizzuti, C., Spezzano, G.: Ensembles for large scale data classification. IEEE Transaction on Evolutionary Computation 10(5), 604–616 (2006)
Article Google Scholar
Folino, G., Pizzuti, C., Spezzano, G.: Mining distributed evolving data streams using fractal gp ensembles. In: Ebner, M., O’Neill, M., Ekárt, A., Vanneschi, L., Esparcia-Alcázar, A.I. (eds.) EuroGP 2007. LNCS, vol. 4445, pp. 160–169. Springer, Heidelberg (2007)
Chapter Google Scholar
Folino, G., Pizzuti, C., Spezzano, G.: Training distributed gp ensemble with a selective algorithm based on clustering and pruning for pattern classification. IEEE Trans. Evolutionary Computation 12(4), 458–468 (2008)
Article Google Scholar
Freund, Y., Scapire, R.: Experiments with a new boosting algorithm. In: Proceedings of the 13th Int. Conference on Machine Learning, pp. 148–156 (1996)
Google Scholar
Grassberger, P.: Generalized dimensions of strange attractors. Physics Letters 97A, 227–230 (1983)
MathSciNet Google Scholar
Iba, H.: Bagging, boosting, and bloating in genetic programming. In: Proc. of the Genetic and Evolutionary Computation Conference GECCO 1999, Orlando, Florida, July 1999, pp. 1053–1060. Morgan Kaufmann, San Francisco (1999)
Google Scholar
Kolter, J.Z., Maloof, M.A.: Dynamic weighted majority: An ensemble method for drifting concepts. J. Mach. Learn. Res. 8, 2755–2790 (2007)
Google Scholar
Langdon, W.B., Buxton, B.F.: Genetic programming for combining classifiers. In: Proc. of the Genetic and Evolutionary Computation Conference GECCO 2001, July 2001, pp. 66–73. Morgan Kaufmann, San Francisco (2001)
Google Scholar
Liebovitch, L., Toth, T.: A fast algorithm to determine fractal dimensions by box counting. Physics Letters 141A(8) (1989)
Google Scholar
Lin, G., Chen, L.: A grid and fractal dimension-based data stream clustering algorithm. In: International Symposium on Information Science and Engieering, vol. 1, pp. 66–70 (2008)
Google Scholar
Mandelbrot, B.: The Fractal Geometry of Nature. W.H Freeman, New York (1983)
Google Scholar
Minku, L.L., White, A.P., Yao, X.: The impact of diversity on on-line ensemble learning in the presence of concept drift. IEEE Transactions on Knowledge and Data Engineering 99(1), 5555
Google Scholar
Ross Quinlan, J.: Bagging, boosting, and c4.5. In: Proceedings of the 13th National Conference on Artificial Intelligence AAAI 1996, pp. 725–730. MIT Press, Cambridge (1996)
Google Scholar
Sarraille, J., DiFalco, P.: FD3, http://tori.postech.ac.kr/softwares
Schapire, R.E.: The strength of weak learnability. Machine Learning 5(2), 197–227 (1990)
Google Scholar
Schapire, R.E.: Boosting a weak learning by majority. Information and Computation 121(2), 256–285 (1996)
Google Scholar
Schlimmer, J.C., Granger Jr., R.H.: Incremental learning from noisy data. Mach. Learn. 1(3), 317–354 (1986)
Google Scholar
Sousa, E.P.M., Ribeiro, M.X., Traina, A.J.M., Traina Jr., C.: Tracking the intrinsic dimension of evolving data streams to update association rules. In: 3rd International Workshop on Knowledge Discovery from Data Streams, part of the 23th International Conference on Machine Learning, ICML 2006 (2006)
Google Scholar
Street, W.N., Kim, Y.: A streaming ensemble algorithm (SEA) for large-scale classification. In: Proceedings of the seventh ACM SIGKDD International conference on Knowledge discovery and data mining (KDD 2001), San Francisco, CA, USA, August 26-29, pp. 377–382. ACM, New York (2001)
Chapter Google Scholar
Tykierko, M.: Using invariants to change detection in dynamical system with chaos. Physica D Nonlinear Phenomena 237, 6–13 (2008)
Article MATH MathSciNet Google Scholar
Utgoff, P.E.: Incremental induction of decision trees. Machine Learning 4, 161–186 (1989)
Article Google Scholar
Wang, H., Fan, W., Yu, P.S., Han, J.: Mining concept-drifting data streams using ensemble classifiers. In: Proceedings of the nineth ACM SIGKDD International conference on Knowledge discovery and data mining (KDD 2003), Washington, DC, USA, August 24-27, pp. 226–235. ACM, New York (2003)
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

Institute for High Performance Computing and Networking, CNR-ICAR,
Gianluigi Folino & Giuseppe Papuzzo

Authors

Gianluigi Folino
View author publications
You can also search for this author in PubMed Google Scholar
Giuseppe Papuzzo
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Instituto Tecnologico de Informatica, Universidad Politecnica de Valencia, Camino de Vera s/n, 46022, Valencia, Spain
Anna Isabel Esparcia-Alcázar
Computer Science, Aston Triangle, Aston University, B4 7ET, Birmingham, UK
Anikó Ekárt
INESC-ID Lisboa, Rua Alves Redol 9, 1000-029, Lisboa, Portugal
Sara Silva
School of Computer Science and Electronic Engineering, University of Essex, Wivenhoe Park, CO4 3SQ, Colchester, UK
Stephen Dignum
Computer Engineering Department, Istanbul Technical University, Room Nr. 3302, 34469, Maslak, Istanbul, Turkey
A. Şima Uyar

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Folino, G., Papuzzo, G. (2010). Handling Different Categories of Concept Drifts in Data Streams Using Distributed GP. In: Esparcia-Alcázar, A.I., Ekárt, A., Silva, S., Dignum, S., Uyar, A.Ş. (eds) Genetic Programming. EuroGP 2010. Lecture Notes in Computer Science, vol 6021. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-12148-7_7

Download citation

DOI: https://doi.org/10.1007/978-3-642-12148-7_7
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-12147-0
Online ISBN: 978-3-642-12148-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics