Skip to main content

A New Fuzzy Decision Tree Classification Method for Mining High-Speed Data Streams Based on Binary Search Trees

  • Conference paper
Frontiers in Algorithmics (FAW 2007)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 4613))

Included in the following conference series:

Abstract

Decision tree construction is a well-studied problem in data mining. Recently, there has been much interest in mining data streams. Domingos and Hulten have presented a one-pass algorithm for decision tree constructions. Their system using Hoeffding inequality to achieve a probabilistic bound on the accuracy of the tree constructed. Gama et al. have extended VFDT in two directions. Their system VFDTc can deal with continuous data and use more powerful classification techniques at tree leaves. Peng et al. present soft discretization method to solve continuous attributes in data mining. In this paper, we revisit these problems and implemented a system sVFDT for data stream mining. We make the following contributions: 1) we present a binary search trees (BST) approach for efficiently handling continuous attributes. Its processing time for values inserting is O(nlogn), while VFDT‘s processing time is O(n 2 ). 2) We improve the method of getting the best split-test point of a given continuous attribute. Comparing to the method used in VFDTc, it decreases fromO(nlogn) to O (n) in processing time. 3) Comparing to VFDTc, sVFDT‘zs candidate split-test number decrease fromO(n) to O(logn).4)Improve the soft discretization method to increase classification accuracy in data stream mining.

This work was supported by the National Science Foundation of China under Grants No. 60573057, 60473057 and 90604007.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Babcock, B., Babu, S., Datar, M., Motawani, R., Widom, J.: Models and Issues in Data Stream Systems. In: PODS (2002)

    Google Scholar 

  2. Domingos, P., Hulten, G.: Mining High-Speed Data Streams. In: Proceedings of the Association for Computing Machinery Sixth International Conference on Knowledge Discovery and Data Mining, pp. 71–80 (2000)

    Google Scholar 

  3. Mehta, M., Agrawal, A., Rissanen, J.: SLIQ: A Fast Scalable Classifier for Data Mining. In: Proceedings of The Fifth International Conference on Extending Database Technology, Avignon, France, pp. 18–32 (1996)

    Google Scholar 

  4. Fan, W.: StreamMiner: A Classifier Ensemble-based Engine to Mine Concept Drifting Data Streams. In: VLDB 2004 (2004)

    Google Scholar 

  5. Gama, J., Rocha, R., Medas, P.: Accurate Decision Trees for Mining High-Speed Data Streams. In: Domingos, P., Faloutsos, C. (eds.) Proceedings of the Ninth International Conference on Knowledge Discovery and Data Mining, ACM Press, New York (2003)

    Google Scholar 

  6. Hulten, G., Spencer, L., Domingos, P.: Mining Time-Changing Data Streams. In: ACM SIGKDD, ACM Press, New York (2001)

    Google Scholar 

  7. Jin, R., Agrawal, G.: Efficient Decision Tree Construction on Streaming Data. In: Proceedings of ACM SIGKDD, ACM Press, New York (2003)

    Google Scholar 

  8. Last, M.: Online Classification of Nonstationary Data Streams. Intelligent Data Analysis 6(2), 129–147 (2002)

    MATH  Google Scholar 

  9. Muthukrishnan, S.: Data streams: Algorithms and Applications. In: Proceedings of the fourteenth annual ACM-SIAM symposium on discrete algorithms, ACM Press, New York (2003)

    Google Scholar 

  10. Wang, H., Fan, W., Yu, P., Han, J.: Mining Concept-Drifting Data Streams using Ensemble Classifiers. In: The 9th ACM International Conference on Knowledge Discovery and Data Mining, Washington DC, USA. SIGKDD, ACM Press, New York (2003)

    Google Scholar 

  11. Arasu, A., Babcock, B., Babu, S., Datar, M., Ito, K., Nishizawa, I., Rosenstein, J., Widom, J.: STREAM: The Stanford Stream Data Manager Demonstration Description –Short Overview of System Status and Plans. In: SIGMOD 2003. Proc. of the ACM Intl Conf. on Management of Data, ACM Press, New York (2003)

    Google Scholar 

  12. Aggarwal, C., Han, J., Wang, J., Yu, P.S.: On Demand Classification of Data Streams. In: Proc. 2004 Int. Conf. on Knowledge Discovery and Data Mining (KDD 2004), Seattle, WA (2004)

    Google Scholar 

  13. Guetova, M., Holldobter, Storr, H.-P.: Incremental Fuzzy Decision Trees. In: 25th German conference on Artificial Intelligence (2002)

    Google Scholar 

  14. Ben-David, S., Gehrke, J., Kifer, D.: Detecting Change in Data Streams. In: Proceedings of VLDB 2004 (2004)

    Google Scholar 

  15. Aggarwal, C.: A Framework for Diagnosing Changes in Evolving Data Streams. In: Proceedings of the ACM SIGMOD Conference, ACM Press, New York (2003)

    Google Scholar 

  16. Gaber, M.M., Zaslavskey, A., Krishnaswamy, S.: Mining Data Streams: a Review. SIGMOD Record 34(2) (June 2005)

    Google Scholar 

  17. Cezary, Janikow, Z.: Fuzzy Decision Trees: Issues and Methods. IEEE Transactions on Systems, Man, and Cybernetics 28(1), 1–14 (1998)

    Google Scholar 

  18. Utgoff, P.E.: Incremental Induction of Decision Trees. Machine Learning 4(2), 161–186 (1989)

    Article  Google Scholar 

  19. Xie, Q.H.: An Efficient Approach for Mining Concept-Drifting Data Streams, Master Thesis

    Google Scholar 

  20. Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann, San Mateo, CA (1993)

    Google Scholar 

  21. Hoeffding, W.: Probability Inequalities for Sums of Bounded Random Variables. Journal of the American Statistical Association 58, 13–30 (1963)

    Article  MATH  Google Scholar 

  22. Breiman, L., Friedman, J.H., Olshen, R.A., Stone, C.J.: Classification and Regression Trees. Wadsworth, Belmont, CA (1984)

    MATH  Google Scholar 

  23. Maron, O., Moore, A.: Hoeffding Races: Accelerating Model Selection Search for Classification and Function Approximation. In: Cowan, J.D., Tesauro, G., Alspector, J. (eds.) Advances in Neural Information Processing System (1994)

    Google Scholar 

  24. Kelly, M.G., Hand, D.J., Adams, N.M.: The Impact of Changing Populations on Classifier Performance. In: Proc. of KDD-99, pp. 367–371 (1999)

    Google Scholar 

  25. Black, M., Hickey, R.J.: Maintaining the Performance of a Learned Classifier under Concept Drift. Intelligent Data Analysis 3, 453–474 (1999)

    Article  Google Scholar 

  26. Maimon, O., Last, M.: Knowledge Discovery and Data Mining,the Info-Fuzzy Network(IFN) Methodology. Kluwer Academic Publishers, Dordrecht (2000)

    MATH  Google Scholar 

  27. Fayyad, U.M., Irani, K.B.: On the Handling of Continuous-valued Attributes in Decision Tree Generation. Machine Learning 8, 87–102 (1992)

    MATH  Google Scholar 

  28. Wang, T., Li, Z., Yan, Y., Chen, H.: An Efficient Classification System Based on Binary Search Trees for Data Streams Mining, ICONS (2007)

    Google Scholar 

  29. Wang, T., Li, Z., Hu, X., Yan, Y., Chen, H.: A New Decision Tree Classification Method for Mining High-Speed Data Streams Based on Threaded Binary Search Trees. In: Workshop on High Performance Data Mining and Application, PAKDD (2007)

    Google Scholar 

  30. Peng, Y.H., Flach, P.A.: Soft Discretization to Enhance the Continuous Decision Tree Induction. In: Proceedings of ECML/PKDD-2001 Workshop IDDM-2001, Freiburg, Germany (2001)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Franco P. Preparata Qizhi Fang

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Li, Z., Wang, T., Wang, R., Yan, Y., Chen, H. (2007). A New Fuzzy Decision Tree Classification Method for Mining High-Speed Data Streams Based on Binary Search Trees. In: Preparata, F.P., Fang, Q. (eds) Frontiers in Algorithmics. FAW 2007. Lecture Notes in Computer Science, vol 4613. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-73814-5_20

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-73814-5_20

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-73813-8

  • Online ISBN: 978-3-540-73814-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics