Skip to main content

Efficient Decision Tree Re-alignment for Clustering Time-Changing Data Streams

  • Chapter
  • 2297 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 6462))

Abstract

Mining streaming data has been an active research area to address requirements of applications, such as financial marketing, telecommunication, network monitoring, and so on. A popular technique for mining these continuous and fast-arriving data streams is decision trees. The accuracy of decision trees can deteriorate if the distribution of values in the stream changes over time. In this paper, we propose an approach based on decision trees that can detect distribution changes and re-align the decision tree quickly to reflect the change. The technique exploits a set of synopses on the leaf nodes, which are also used to prune the decision tree. Experimental results demonstrate that the proposed approach can detect the distribution changes in real-time with high accuracy, and re-aligning a decision tree can improve its performance in clustering the subsequent data stream tuples.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Aggarwal, C.: A framework for diagnosing changes in evolving data streams. In: Proc. ACM SIGMOD Int. Conf. on Management of Data, pp. 575–586 (2003)

    Google Scholar 

  2. Aggarwal, C., Han, J., Wang, J., Yu, P.: A framework for clustering evolving data streams. In: Proc. 29th Int. Conf. on Very Large Data Bases, pp. 81–92 (2003)

    Google Scholar 

  3. Chakravarti, I., Laha, R., Roy, J.: Handbook of Methods of Applied Statistics. John Wiley and Sons, Chichester (1967)

    MATH  Google Scholar 

  4. Charikar, M., Chen, K., Motwani, R.: Incremental clustering and dynamic information retrieval. In: Proc. ACM Symp. on Theory of Computing, pp. 626–635 (1997)

    Google Scholar 

  5. Charikar, M., O’Callaghan, L., Panigrahy, R.: Better streaming algorithms for clustering problems. In: Proc. ACM Symp. on Theory of Computing, pp. 30–39 (2003)

    Google Scholar 

  6. Domingos, P., Hulten, G.: Mining high-speed data streams. In: Proc. 6th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, pp. 71–80 (2000)

    Google Scholar 

  7. Fan, W., Huang, Y., Yu, P.: Decision tree evolution using limited number of labeled data items from drifting data streams. In: Proc. 2004 IEEE Int. Conf. on Data Mining, pp. 379–382 (2004)

    Google Scholar 

  8. Fredman, M.: Two applications of a probabilistic search technique: Sorting x + y and building balanced search tree. In: Proc. ACM Symp. on Theory of Computing, pp. 240–244 (1975)

    Google Scholar 

  9. Gaber, M., Zaslavsky, A., Krishnaswamy, S.: Mining data streams: A review. ACM SIGMOD Record 34(2), 18–26 (2005)

    Article  MATH  Google Scholar 

  10. Gama, J., Medas, P., Rodrigues, P.: Learning decision trees from dynamic data streams. In: Proc. 2005 ACM Symp. on Applied Computing, pp. 573–577 (2005)

    Google Scholar 

  11. Gama, J., Rocha, R., Medas, P.: Accurate decision tree for mining high-speed data streams. In: Proc. 9th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, pp. 523–528 (2003)

    Google Scholar 

  12. Guha, S., Meyerson, A., Mishra, N., Motwani, R.: Clustering data streams: Theory and practice. IEEE Trans. Knowledge and Data Eng. 15(3), 515–528 (2003)

    Article  Google Scholar 

  13. Hoeffding, W.: Probability inequalities for sums of bounded random variables. Journal of the American Statistical Association 58, 18–30 (1963)

    Article  MathSciNet  MATH  Google Scholar 

  14. Hulten, G., Spencer, L., Domingos, P.: Mining time-chaning data streams. In: Proc. 7th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, pp. 97–106 (2001)

    Google Scholar 

  15. Jin, R., Aggrawal, G.: Efficient decision tree constructions on streaming data. In: Proc. 9th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, pp. 571–576 (2003)

    Google Scholar 

  16. Kaufman, L., Rousseeuw, P.: Finding groups in data: An introduction to cluster analysis. Addison-Wesley, Reading (1990)

    Book  MATH  Google Scholar 

  17. Kifer, D., Ben-David, S., Gehrke, J.: Detecting change in data streams. In: Proc. 30th Int. Conf. on Very Large Data Bases, pp. 180–191 (2004)

    Google Scholar 

  18. Knuth, D.: Optimum binary search trees. Acta Informatica 1, 14–25 (1971)

    Article  MATH  Google Scholar 

  19. Knuth, D.: The art of computer programming 3: Sorting and searching. Addison-Wesley, Reading (1973)

    MATH  Google Scholar 

  20. Babock, B., et al.: Models and issues in data stream systems. In: Proc. 21st ACM SIGACT-SIGMOD-SIGART Symp. Principles of Database Systems, pp. 1–16 (2002)

    Google Scholar 

  21. Abadi, D., et al.: The design of the borealis stream processing engine. In: Proc. 2nd Biennial Conf. on Innovative Data Systems Research (2005)

    Google Scholar 

  22. Li, J., et al.: Semantics and evaluation techniques for window aggregates in data streams. In: Proc. ACM SIGMOD Int. Conf. on Management of Data, pp. 311–322 (2005)

    Google Scholar 

  23. Chen, M., et al.: Path-based failure and evolution management. In: 1st Symposium on Network Systems Design and Implementation, pp. 309–322 (2004)

    Google Scholar 

  24. Wang, H., Fan, W., Yu, P.S., Han, J.: Mining concept-drifting data streams using ensemble classifiers. In: Proc. 9th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, pp. 226–235 (2003)

    Google Scholar 

  25. Wilcoxon, F.: Individual comparisons by ranking methods. Biometrics Bulletin 1, 80–83 (1945)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Tao, Y., Özsu, M.T. (2010). Efficient Decision Tree Re-alignment for Clustering Time-Changing Data Streams. In: Sachs, K., Petrov, I., Guerrero, P. (eds) From Active Data Management to Event-Based Systems and More. Lecture Notes in Computer Science, vol 6462. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-17226-7_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-17226-7_2

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-17225-0

  • Online ISBN: 978-3-642-17226-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics