Abstract
A Genetic Programming based boosting ensemble method for the classification of distributed streaming data is proposed. The approach handles flows of data coming from multiple locations by building a global model obtained by the aggregation of the local models coming from each node. A main characteristics of the algorithm presented is its adaptability in presence of concept drift. Changes in data can cause serious deterioration of the ensemble performance. Our approach is able to discover changes by adopting a strategy based on self-similarity of the ensemble behavior, measured by its fractal dimension, and to revise itself by promptly restoring classification accuracy. Experimental results on a synthetic data set show the validity of the approach in maintaining an accurate and up-to-date GP ensemble.
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Bauer, E., Kohavi, R.: An empirical comparison of voting classification algorithms: Bagging, boosting, and variants. Machine Learning 36, 105–139 (1999)
Breiman, L.: Bagging predictors. Machine Learning 24(2), 123–140 (1996)
Cantú-Paz, E., Kamath, C.: Inducing oblique decision trees with evolutionary algorithms. IEEE Transaction on Evolutionary Computation 7(1), 54–68 (2003)
Chu, F., Zaniolo, C.: Fast and light boosting for adaptive mining of data streams. In: Dai, H., Srikant, R., Zhang, C. (eds.) PAKDD 2004. LNCS (LNAI), vol. 3056, pp. 26–28. Springer, Heidelberg (2004)
Dietterich, T.G.: An experimental comparison of three methods for costructing ensembles of decision trees: Bagging, boosting, and randomization. Machine Learning 40, 139–157 (2000)
Folino, G., Pizzuti, C., Spezzano, G.: A cellular genetic programming approach to classification. In: Proc. Of the Genetic and Evolutionary Computation Conference GECCO99, Orlando, Florida, July 1999, pp. 1015–1020. Morgan Kaufmann, San Francisco (1999)
Folino, G., Pizzuti, C., Spezzano, G.: A scalable cellular implementation of parallel genetic programming. IEEE Transaction on Evolutionary Computation 10(5), 604–616 (2006)
Freund, Y., Scapire, R.: Experiments with a new boosting algorithm. In: Proceedings of the 13th Int. Conference on Machine Learning, pp. 148–156 (1996)
Gehrke, J., Ganti, V., Ramakrishnan, R., Loh, W.: Boat - optimistic decision tree construction. In: Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD’99), pp. 169–180. ACM Press, New York (1999)
Grassberger, P.: Generalized dimensions of strange attractors. Physics Letters 97A, 227–230 (1983)
Iba, H.: Bagging, boosting, and bloating in genetic programming. In: Proc. Of the Genetic and Evolutionary Computation Conference. GECCO99, Orlando, Florida, pp. 103–1060. Morgan Kaufmann, San Francisco (1999)
Langdon, W.B., Buxton, B.F.: Genetic programming for combining classifiers. In: Proc. Of the Genetic and Evolutionary Computation Conference. GECCO 2001, San Francisco, CA, July 2001, pp. 66–73. Morgan Kaufmann, San Francisco (2001)
Liebovitch, L., Toth, T.: A fast algorithm to determine fractal dimensions by box counting. Physics Letters 141A(8) (1989)
Mandelbrot, B.: The Fractal Geometry of Nature. W.H. Freeman, New York (1983)
Sarraille, J., DiFalco, P.: FD3. http://tori.postech.ac.kr/softwares
Soule, T.: Voting teams: A cooperative approach to non-typical problems using genetic programming. In: Proc. Of the Genetic and Evolutionary Computation Conference. GECCO99, Orlando, Florida, July 1999, pp. 916–922. Morgan Kaufmann, San Francisco (1999)
Street, W.N., Kim, Y.S.: A streaming ensemble algorithm (sea) for large-scale classification. In: Proceedings of the seventh ACM SIGKDD International conference on Knowledge discovery and data mining. KDD’01, San Francisco, CA, USA, August 26-29, pp. 377–382. ACM, New York (2001)
Utgoff, P.E.: Incremental induction of decision trees. Machine Learning 4, 161–186 (1989)
Wang, H., Fan, W., Yu, P.S., Han, J.: Mining concept-drifting data streams using ensemble classifiers. In: Proceedings of the nineth ACM SIGKDD International conference on Knowledge discovery and data mining. KDD’03, Washington, DC, USA, August 24-27, pp. 226–235. ACM, New York (2003)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer Berlin Heidelberg
About this paper
Cite this paper
Folino, G., Pizzuti, C., Spezzano, G. (2007). Mining Distributed Evolving Data Streams Using Fractal GP Ensembles. In: Ebner, M., O’Neill, M., Ekárt, A., Vanneschi, L., Esparcia-Alcázar, A.I. (eds) Genetic Programming. EuroGP 2007. Lecture Notes in Computer Science, vol 4445. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-71605-1_15
Download citation
DOI: https://doi.org/10.1007/978-3-540-71605-1_15
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-71602-0
Online ISBN: 978-3-540-71605-1
eBook Packages: Computer ScienceComputer Science (R0)