Skip to main content

Maintaining Optimal Multi-way Splits for Numerical Attributes in Data Streams

  • Conference paper
Advances in Knowledge Discovery and Data Mining (PAKDD 2008)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5012))

Included in the following conference series:

Abstract

In the batch learning setting it suffices to take into account only a reduced number of threshold candidates in discretizing the value range of a numerical attribute for many commonly-used attribute evaluation functions. We show that the same techniques are also efficiently applicable in the on-line learning scheme. Only constant time per example is needed for determining the changes on data grouping. Hence, one can apply multi-way splits, e.g., in the standard approach to decision tree learning from data streams. We also briefly consider modifications needed to cope with drifting concepts. Our empirical evaluation demonstrates that often the reduction in threshold candidates obtained is high for the important attributes. In a data stream logarithmic growth in the number of potential cut points and the reduced number of threshold candidates is observed.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Domingos, P., Hulten, G.: Mining high-speed data streams. In: Proc. 6th ACM SIGKDD Conf. on Data Mining and Knowl. Discovery, pp. 71–80. ACM Press, New York (2000)

    Chapter  Google Scholar 

  2. Gama, J., Rocha, R., Medas, P.: Accurate decision trees for mining high-speed data streams. In: Proc. 9th ACM SIGKDD Conf. on Data Mining and Knowledge Discovery, pp. 523–528. ACM Press, New York (2003)

    Google Scholar 

  3. Hulten, G., Spencer, L., Domingos, P.: Mining time-changing data streams. In: Proc. 7th ACM SIGKDD Conf. on Data Mining and Knowledge Discovery, pp. 97–106. ACM Press, New York (2001)

    Google Scholar 

  4. Jin, R., Agrawal, G.: Efficient decision tree construction for streaming data. In: Proc. 9th ACM SIGKDD Conf. on Data Mining and Knowledge Discovery, pp. 571–576. ACM Press, New York (2003)

    Google Scholar 

  5. Gama, J., Medas, P., Rodrigues, P.: Learning decision trees from dynamic data streams. In: Proc. 2005 ACM Symp. on Appl. Comput., pp. 573–577. ACM Press, New York (2005)

    Chapter  Google Scholar 

  6. Gama, J., Pinto, C.: Dizcretization from data streams: Applications to histograms and data mining. In: Proc. 2006 ACM Symp. on Applied Computing, pp. 662–667. ACM Press, New York (2006)

    Chapter  Google Scholar 

  7. Fayyad, U.M., Irani, K.B.: On the handling of continuous-valued attributes in decision tree generation. Mach. Learn. 8, 87–102 (1992)

    MATH  Google Scholar 

  8. Fulton, T., Kasif, S., Salzberg, S.: Efficient algorithms for finding multi-way splits for decision trees. In: Proc. 12th ICML, pp. 244–251. Morgan Kaufmann, San Francisco (1995)

    Google Scholar 

  9. Zighed, D., Rakotomalala, R., Feschet, F.: Optimal multiple intervals discretization of continuous attributes for supervised learning. In: Proc. 3rd Intl. Conf. on Knowledge Discovery and Data Mining, pp. 295–298. AAAI Press, Menlo Park (1997)

    Google Scholar 

  10. Elomaa, T., Rousu, J.: General and efficient multisplitting of numerical attributes. Mach. Learn. 36, 201–244 (1999)

    Article  MATH  Google Scholar 

  11. Elomaa, T., Rousu, J.: Efficient multisplitting revisited: Optima-preserving elimination of partition candidates. Data Mining Knowl. Discovery 8, 97–126 (2004)

    Article  MathSciNet  Google Scholar 

  12. Breiman, L., Friedman, J.H., Olshen, R.A., Stone, C.J.: Classification and Regression Trees. Wadsworth (1984)

    Google Scholar 

  13. Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann, San Francisco (1993)

    Google Scholar 

  14. Catlett, J.: On changing continuous attributes into ordered discrete attributes. In: Kodratoff, Y. (ed.) EWSL 1991. LNCS, vol. 482, pp. 164–178. Springer, Heidelberg (1991)

    Chapter  Google Scholar 

  15. Fayyad, U.M., Irani, K.B.: Multi-interval discretization of continuous-valued attributes for classification learning. In: Proc. 13th Intl. Joint Conf. on Artificial Intelligence, pp. 1022–1027. Morgan Kaufmann, San Francisco (1993)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Takashi Washio Einoshin Suzuki Kai Ming Ting Akihiro Inokuchi

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Elomaa, T., Lehtinen, P. (2008). Maintaining Optimal Multi-way Splits for Numerical Attributes in Data Streams. In: Washio, T., Suzuki, E., Ting, K.M., Inokuchi, A. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2008. Lecture Notes in Computer Science(), vol 5012. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-68125-0_49

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-68125-0_49

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-68124-3

  • Online ISBN: 978-3-540-68125-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics