Skip to main content

A Concept-Drifting Detection Algorithm for Categorical Evolving Data

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7819))

Abstract

In data streams analysis, detecting concept-drifting is a very important problem for real-time decision making. In this paper, we propose a new method for detecting concept drifts by measuring the difference of distributions between two concepts. The difference is defined by approximation accuracy of rough set theory, which can also be used to measure the change speed of concepts. We propose a concept-drifting detection algorithm and analyze its complexity. The experimental results on a real data set with a half million records have shown that the proposed algorithm is not only effective in discovering the changes of concepts but also efficient in processing large data sets.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Babcock, B., Babu, S., Dater, M., Motwanti, R.: Models and Issues in data stream systems. In: Proc. PODS, pp. 1–16 (2002)

    Google Scholar 

  2. Widmer, G., Kubat, M.: Learning in the presence of concept drift and hidden context. Machine Learning 23, 69–101 (1996)

    Google Scholar 

  3. Guha, S., Meyerson, A., Mishra, N., Motwani, R., OCallaghan, L.: Clustering data streams: theory and practice. IEEE Transactions Knowledge and Data Engineering 15, 515–528 (2003)

    Article  Google Scholar 

  4. Pawlak, Z.: Rough sets. International Journal of Computer and Information Sciences 11, 341–356 (1982)

    Article  MathSciNet  MATH  Google Scholar 

  5. Cao, F.Y., Liang, J.Y., Bai, L., Zhao, X.W., Dang, C.Y.: A framework for clustering categorical time-evolving data. IEEE Transactions on Fuzzy Systems 18, 872–885 (2010)

    Article  Google Scholar 

  6. Aggarwal, C.C., Han, J., Wang, J., Yu, P.S.: A framework for clustering evolving data streams. In: Proc. Very Large Data Bases Conf. (2003)

    Google Scholar 

  7. Chakrabarti, D., Kumar, R., Tomkins, A.: Evloluationary clustering. In: Proc. ACM SIGKDD. Knowledge Discovery and Data Mining, pp. 554–560 (2006)

    Google Scholar 

  8. Gaber, M.M., Yu, P.S.: Detection and classification of changes in evolving data streams. International Journal of Information Technology and Decision Making 5, 659–670 (2006)

    Article  Google Scholar 

  9. Minku, L.L., White, A.P., Yao, X.: The impact of diversity on online ensemble learning in the presence of concept drift. IEEE Transactions on Knowledge and Data Engineering 22, 730–742 (2010)

    Article  Google Scholar 

  10. UCI Machine Learning Repository (2012), http://www.ics.uci.edu/~mlearn/MLRepository.html

  11. Dai, B.-R., Huang, J.-W., Yeh, M.-Y., Chen, M.-S.: Adaptive clustering for multiple evolving steams. IEEE Transactions Knowledge and Data Engineering 18, 1166–1180 (2006)

    Article  Google Scholar 

  12. Yeh, M.Y., Dai, B.R., Chen, M.S.: Clustering over multiple evolving streams by events and corrlations. IEEE Transactions Knowledge and Data Engineering 19, 1349–1362 (2007)

    Article  Google Scholar 

  13. Chen, H.-L., Chen, M.-S., Lin, S.-C.: Catching the trend: A framework for clustering concept-drifting categorical data. IEEE Transactions Knowledge and Data Engineering 21, 652–665 (2009)

    Article  Google Scholar 

  14. Chen, K.K., Liu, L.: HE-Tree:a framework for detecting changes in clustering structure for categorical data streams. The VLDB Journal 18, 1241–1260 (2009)

    Article  Google Scholar 

  15. Nasraoui, O., Soliman, M., Saka, E., Badia, A., Germain, R.: A web usage mining framework for mining evolving user profiles in dynamic web sites. IEEE Transactions Knowledge and Data Engineering 20, 202–215 (2008)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Cao, F., Huang, J.Z. (2013). A Concept-Drifting Detection Algorithm for Categorical Evolving Data. In: Pei, J., Tseng, V.S., Cao, L., Motoda, H., Xu, G. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2013. Lecture Notes in Computer Science(), vol 7819. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-37456-2_41

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-37456-2_41

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-37455-5

  • Online ISBN: 978-3-642-37456-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics