Skip to main content

Handling Different Categories of Concept Drifts in Data Streams Using Distributed GP

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 6021))

Abstract

Using Genetic Programming (GP) for classifying data streams is problematic as GP is slow compared with traditional single solution techniques. However, the availability of cheaper and better-performing distributed and parallel architectures make it possible to deal with complex problems previously hardly solved owing to the large amount of time necessary. This work presents a general framework based on a distributed GP ensemble algorithm for coping with different types of concept drift for the task of classification of large data streams. The framework is able to detect changes in a very efficient way using only a detection function based on the incoming unclassified data. Thus, only if a change is detected a distributed GP algorithm is performed in order to improve classification accuracy and this limits the overhead associated with the use of a population-based method. Real world data streams may present drifts of different types. The introduced detection function, based on the self-similarity fractal dimension, permits to cope in a very short time with the main types of different drifts, as demonstrated by the first experiments performed on some artificial datasets. Furthermore, having an adequate number of resources, distributed GP can handle very frequent concept drifts.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Barbará, D.: Chaotic mining: Knowledge discovery using the fractal dimension. In: 1999 ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery (1999)

    Google Scholar 

  2. Breiman, L.: Bagging predictors. Machine Learning 24(2), 123–140 (1996)

    MATH  MathSciNet  Google Scholar 

  3. Dietterich, T.G.: An experimental comparison of three methods for costructing ensembles of decision trees: Bagging, boosting, and randomization. Machine Learning (40), 139–157 (2000)

    Google Scholar 

  4. Folino, G., Pizzuti, C., Spezzano, G.: Ensembles for large scale data classification. IEEE Transaction on Evolutionary Computation 10(5), 604–616 (2006)

    Article  Google Scholar 

  5. Folino, G., Pizzuti, C., Spezzano, G.: Mining distributed evolving data streams using fractal gp ensembles. In: Ebner, M., O’Neill, M., Ekárt, A., Vanneschi, L., Esparcia-Alcázar, A.I. (eds.) EuroGP 2007. LNCS, vol. 4445, pp. 160–169. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  6. Folino, G., Pizzuti, C., Spezzano, G.: Training distributed gp ensemble with a selective algorithm based on clustering and pruning for pattern classification. IEEE Trans. Evolutionary Computation 12(4), 458–468 (2008)

    Article  Google Scholar 

  7. Freund, Y., Scapire, R.: Experiments with a new boosting algorithm. In: Proceedings of the 13th Int. Conference on Machine Learning, pp. 148–156 (1996)

    Google Scholar 

  8. Grassberger, P.: Generalized dimensions of strange attractors. Physics Letters 97A, 227–230 (1983)

    MathSciNet  Google Scholar 

  9. Iba, H.: Bagging, boosting, and bloating in genetic programming. In: Proc. of the Genetic and Evolutionary Computation Conference GECCO 1999, Orlando, Florida, July 1999, pp. 1053–1060. Morgan Kaufmann, San Francisco (1999)

    Google Scholar 

  10. Kolter, J.Z., Maloof, M.A.: Dynamic weighted majority: An ensemble method for drifting concepts. J. Mach. Learn. Res. 8, 2755–2790 (2007)

    Google Scholar 

  11. Langdon, W.B., Buxton, B.F.: Genetic programming for combining classifiers. In: Proc. of the Genetic and Evolutionary Computation Conference GECCO 2001, July 2001, pp. 66–73. Morgan Kaufmann, San Francisco (2001)

    Google Scholar 

  12. Liebovitch, L., Toth, T.: A fast algorithm to determine fractal dimensions by box counting. Physics Letters 141A(8) (1989)

    Google Scholar 

  13. Lin, G., Chen, L.: A grid and fractal dimension-based data stream clustering algorithm. In: International Symposium on Information Science and Engieering, vol. 1, pp. 66–70 (2008)

    Google Scholar 

  14. Mandelbrot, B.: The Fractal Geometry of Nature. W.H Freeman, New York (1983)

    Google Scholar 

  15. Minku, L.L., White, A.P., Yao, X.: The impact of diversity on on-line ensemble learning in the presence of concept drift. IEEE Transactions on Knowledge and Data Engineering 99(1), 5555

    Google Scholar 

  16. Ross Quinlan, J.: Bagging, boosting, and c4.5. In: Proceedings of the 13th National Conference on Artificial Intelligence AAAI 1996, pp. 725–730. MIT Press, Cambridge (1996)

    Google Scholar 

  17. Sarraille, J., DiFalco, P.: FD3, http://tori.postech.ac.kr/softwares

  18. Schapire, R.E.: The strength of weak learnability. Machine Learning 5(2), 197–227 (1990)

    Google Scholar 

  19. Schapire, R.E.: Boosting a weak learning by majority. Information and Computation 121(2), 256–285 (1996)

    Google Scholar 

  20. Schlimmer, J.C., Granger Jr., R.H.: Incremental learning from noisy data. Mach. Learn. 1(3), 317–354 (1986)

    Google Scholar 

  21. Sousa, E.P.M., Ribeiro, M.X., Traina, A.J.M., Traina Jr., C.: Tracking the intrinsic dimension of evolving data streams to update association rules. In: 3rd International Workshop on Knowledge Discovery from Data Streams, part of the 23th International Conference on Machine Learning, ICML 2006 (2006)

    Google Scholar 

  22. Street, W.N., Kim, Y.: A streaming ensemble algorithm (SEA) for large-scale classification. In: Proceedings of the seventh ACM SIGKDD International conference on Knowledge discovery and data mining (KDD 2001), San Francisco, CA, USA, August 26-29, pp. 377–382. ACM, New York (2001)

    Chapter  Google Scholar 

  23. Tykierko, M.: Using invariants to change detection in dynamical system with chaos. Physica D Nonlinear Phenomena 237, 6–13 (2008)

    Article  MATH  MathSciNet  Google Scholar 

  24. Utgoff, P.E.: Incremental induction of decision trees. Machine Learning 4, 161–186 (1989)

    Article  Google Scholar 

  25. Wang, H., Fan, W., Yu, P.S., Han, J.: Mining concept-drifting data streams using ensemble classifiers. In: Proceedings of the nineth ACM SIGKDD International conference on Knowledge discovery and data mining (KDD 2003), Washington, DC, USA, August 24-27, pp. 226–235. ACM, New York (2003)

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Folino, G., Papuzzo, G. (2010). Handling Different Categories of Concept Drifts in Data Streams Using Distributed GP. In: Esparcia-Alcázar, A.I., Ekárt, A., Silva, S., Dignum, S., Uyar, A.Ş. (eds) Genetic Programming. EuroGP 2010. Lecture Notes in Computer Science, vol 6021. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-12148-7_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-12148-7_7

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-12147-0

  • Online ISBN: 978-3-642-12148-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics