Skip to main content

Mining Uncertain Data Streams Using Clustering Feature Decision Trees

  • Conference paper
Advanced Data Mining and Applications (ADMA 2011)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7121))

Included in the following conference series:

Abstract

During the last decade, classification from data streams is based on deterministic learning algorithms which learn from precise and complete data. However, a multitude of practical applications only supply approximate measurements. Usually, the estimated errors of the measurements are available. The development of highly efficient algorithms dealing with uncertain examples has emerged as an new direction. In this paper, we build a CFDTu model from data streams having uncertain attribute values. CFDTu applies an uncertain clustering algorithm that scans the data stream only once to obtain the sufficient statistical summaries. The statistics are stored in the Clustering Feature vectors, and are used for incremental decision tree induction. The vectors also serve as classifiers at the leaves to further refine the classification and reinforce any-time property. Experiments show that CFDTu outperforms a purely deterministic method in terms of accuracy and is highly scalable on uncertain data streams.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Aggarwal, C.C., Han, J., Wang, J., Yu, P.: A Framework for Clustering Uncertain Data Streams. In: Proc. 24th Int. Conf. on Data Engineering, pp. 150–159 (2008)

    Google Scholar 

  2. Aggarwal, C.C.: Managing and Mining Uncertain Data. Springer Publishing Company, Incorporated (2009)

    Google Scholar 

  3. Bi, J., Zhang, T.: Support Vector Classification with Input Data Uncertainty. In: NIPS 2004: Advances in Neural Information Processing Systems, vol. 16, pp. 161–168 (2004)

    Google Scholar 

  4. Bifet, A., Kirkby, R., Holmes, G., Pfahringer, B.: MOA: Massive Online Analysis (2007), http://sourceforge.net/projects/moa-datastream

  5. Bifet, A., Holmes, G., Pfahringer, B., Kirkby, R., Gavalda, R.: New ensemble methods for evolving data streams. In: Proc. 15th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, pp. 139–148 (2009)

    Google Scholar 

  6. Bifet, A., Holmes, G., Pfahringer, B., Frank, E.: Fast Perceptron Decision Tree Learning from Evolving Data Streams. In: Zaki, M.J., Yu, J.X., Ravindran, B., Pudi, V. (eds.) PAKDD 2010. LNCS, vol. 6119, pp. 299–310. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  7. Bifet, A., Frank, E., Holmes, G., Pfahringer, B.: Accurate Ensembles for Data Streams: Combining Restricted Hoeffding Trees using Stacking. In: JMLR: Workshop and Conference Proceedings, pp. 225–240 (2010b)

    Google Scholar 

  8. Cheng, R., Kalashnikov, D.V., Prabhakar, S.: Evaluating Probabilistic Queries over Imprecise Data. In: Proc. 22nd ACM SIGMOD Int. Conf. on Management of Data, pp. 73–84 (2003)

    Google Scholar 

  9. Chapelle, O., Scholkopf, B., Zien, A.: Semi-Supervised Learning. MIT Press, Cambridge (2006)

    Book  Google Scholar 

  10. Domingos, P., Hulten, G.: Mining high-speed data streams. In: Proc. 6th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, pp. 71–80 (2000)

    Google Scholar 

  11. Gama, J., Rocha, R., Medas, P.: Accurate decision trees for mining high-speed data streams. In: Proc. 9th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, pp. 523–528 (2003)

    Google Scholar 

  12. Ge, J., Xia, Y., Nadungodage, C.H.: A Neural Network for Uncertain Data Classification. In: Proc. 14th Pacific-Asia Conf. on Knowledge Discovery and Data Mining, pp. 449–460 (2010)

    Google Scholar 

  13. Hulten, G., Spencer, L., Domingos, P.: Mining time-changing data streams. In: Proc. 7th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, pp. 97–106 (2001)

    Google Scholar 

  14. Jin, W., Tung, A.K.H., Han, J.: Mining top-n local outliers in large databases. In: Proc. 7th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, pp. 293–298 (2001)

    Google Scholar 

  15. Liang, C., Zhang, Y., Song, Q.: Decision Tree for Dynamic and Uncertain Data Streams. In: JMLR: Workshop and Conference Proceedings, vol. 13, pp. 209–224 (2010)

    Google Scholar 

  16. Masud, M.M., Gao, J., Khan, L., Han, J., Thuraisingham, B.: A practical approach to classify evolving data streams: training with limited amount of labeled data. In: Proc. 8th Int. Conf. on Data Mining, pp. 929–934 (2008)

    Google Scholar 

  17. Pan, S., Wu, K., Zhang, Y., Li, X.: Classifier Ensemble for Uncertain Data Stream Classification. In: Zaki, M.J., Yu, J.X., Ravindran, B., Pudi, V. (eds.) PAKDD 2010. LNCS, vol. 6118, pp. 488–495. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  18. Pfahringer, B., Holmes, G., Kirkby, R.: New Options for Hoeffding Trees. In: Orgun, M.A., Thornton, J. (eds.) AI 2007. LNCS (LNAI), vol. 4830, pp. 90–99. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  19. Qin, B., Xia, Y., Prabhakar, S., Tu, Y.: A Rule-Based Classification Algorithm for Uncertain Data. In: Proc. 25th IEEE Int. Conf. of Data Engineering, pp. 1633–1640 (2009)

    Google Scholar 

  20. Qin, B., Xia, Y., Li, F.: DTU: A Decision Tree for Uncertain Data. In: Theeramunkong, T., Kijsirikul, B., Cercone, N., Ho, T.-B. (eds.) PAKDD 2009. LNCS, vol. 5476, pp. 4–15. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  21. Scholz, M., Klinkenberg, R.: An ensemble classifier for drifting concepts. In: Proc. 2nd Int. Workshop on Knowledge Discovery in Data Streams, pp. 53–64 (2005)

    Google Scholar 

  22. Street, W.N., Kim, Y.: A Streaming Ensemble Algorithm (SEA) for large-scale classification. In: Proc. 7th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, pp. 377–382 (2001)

    Google Scholar 

  23. Tsang, S., Kao, B., Yip, K.Y., Ho, W., Lee, S.D.: Decision Trees for Uncertain Data. In: Proc. 25th IEEE Int. Conf. of Data Engineering, pp. 441–444 (2009)

    Google Scholar 

  24. Wang, H., Fan, W., Yu, P., Han, J.: Mining concept-drifting data streams using ensemble classifiers. In: Proc. 9th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, pp. 226–235 (2003)

    Google Scholar 

  25. Yu, H., Yang, J., Han, J.: Classifying large data sets using SVMs with hierarchical clusters. In: Proc. 9th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, pp. 306–315 (2003)

    Google Scholar 

  26. Zhang, T., Ramakrishnan, R., Livny, M.: BIRCH: an efficient data clustering method for very large databases. In: Proc. ACM SIGMOD Int. Conf. on Management of Data, pp. 103–114 (1996)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Xu, W., Qin, Z., Hu, H., Zhao, N. (2011). Mining Uncertain Data Streams Using Clustering Feature Decision Trees. In: Tang, J., King, I., Chen, L., Wang, J. (eds) Advanced Data Mining and Applications. ADMA 2011. Lecture Notes in Computer Science(), vol 7121. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-25856-5_15

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-25856-5_15

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-25855-8

  • Online ISBN: 978-3-642-25856-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics