Skip to main content

Heterogeneous Ensemble for Feature Drifts in Data Streams

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7302))

Abstract

The nature of data streams requires classification algorithms to be real-time, efficient, and able to cope with high-dimensional data that are continuously arriving. It is a known fact that in high-dimensional datasets, not all features are critical for training a classifier. To improve the performance of data stream classification, we propose an algorithm called HEFT-Stream (Heterogeneous Ensemble with Feature drifT for Data Streams) that incorporates feature selection into a heterogeneous ensemble to adapt to different types of concept drifts. As an example of the proposed framework, we first modify the FCBF [13] algorithm so that it dynamically update the relevant feature subsets for data streams. Next, a heterogeneous ensemble is constructed based on different online classifiers, including Online Naive Bayes and CVFDT [5]. Empirical results show that our ensemble classifier outperforms state-of-the-art ensemble classifiers (AWE [15] and OnlineBagging [21]) in terms of accuracy, speed, and scalability. The success of HEFT-Stream opens new research directions in understanding the relationship between feature selection techniques and ensemble learning to achieve better classification performance.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bifet, A., Holmes, G., Kirkby, R.: Moa: Massive online analysis. The Journal of Machine Learning Research 11, 1601–1604 (2010)

    Google Scholar 

  2. Bifet, A., Holmes, G., Pfahringer, B., Kirkby, R., Gavald, R.: New ensemble methods for evolving data streams. In: 15th ACM SIGKDD, pp. 139–148. ACM (2009)

    Google Scholar 

  3. Breiman, L.: Bagging predictors. The Journal of Machine Learning Research 24(2), 123–140 (1996)

    MathSciNet  MATH  Google Scholar 

  4. Breiman, L.: Random forests. The Journal of Machine Learning Research 45(1), 5–32 (2001)

    Article  MATH  Google Scholar 

  5. Domingos, P., Hulten, G.: Mining high-speed data streams. In: The Sixth ACM SIGKDD, pp. 71–80. ACM (2000)

    Google Scholar 

  6. Domingos, P., Pazzani, M.: On the optimality of the simple bayesian classifier under zero-one loss. The Journal of Machine Learning Research 29(2-3), 103–130 (1997)

    MATH  Google Scholar 

  7. Eibl, G., Pfeiffer, K.-P.: Multiclass boosting for weak classifiers. The Journal of Machine Learning Research 6, 189–210 (2005)

    MathSciNet  MATH  Google Scholar 

  8. Freund, Y., Schapire, R.: Experiments with a new boosting algorithm. In: The 13th ICML, pp. 148–156 (1996)

    Google Scholar 

  9. Friedman, J.H.: Stochastic gradient boosting. Computational Statistics & Data Analysis 38(4), 367–378 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  10. Fumera, G., Roli, F.: A theoretical and experimental analysis of linear combiners for multiple classifier systems. IEEE Transactions on Pattern Analysis and Machine Intelligence 27(6), 942–956 (2005)

    Article  Google Scholar 

  11. Hsu, K.-W., Srivastava, J.: Diversity in Combinations of Heterogeneous Classifiers. In: Theeramunkong, T., Kijsirikul, B., Cercone, N., Ho, T.-B. (eds.) PAKDD 2009. LNCS, vol. 5476, pp. 923–932. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  12. Hulten, G., Spencer, L., Domingos, P.: Mining time-changing data streams. In: ACM SIGKDD, pp. 97–106. ACM (2001)

    Google Scholar 

  13. Lei, Y., Huan, L.: Feature selection for high-dimensional data: A fast correlation-based filter solution. In: The 20th ICML, pp. 856–863 (2003)

    Google Scholar 

  14. Liu, H., Yu, L.: Toward integrating feature selection algorithms for classification and clustering. IEEE Transactions on Knowledge and Data Engineering 17(4), 491–502 (2005)

    Article  Google Scholar 

  15. Oza, N.C.: Online bagging and boosting. In: 2005 IEEE International Conference on Systems, Man and Cybernetics, vol. 3, pp. 2340–2345. IEEE (2005)

    Google Scholar 

  16. Sattar, H., Ying, Y., Zahra, M., Mohammadreza, K.: Adapted one-vs-all decision trees for data stream classification. IEEE Transactions on Knowledge and Data Engineering 21, 624–637 (2009)

    Article  Google Scholar 

  17. Shen, C., Li, H.: On the dual formulation of boosting algorithms. IEEE Transactions on Pattern Analysis and Machine Intelligence 32(12), 2216–2231 (2010)

    Article  MathSciNet  Google Scholar 

  18. Street, W.N., Kim, Y.: A streaming ensemble algorithm (sea) for large-scale classification. In: The 7th ACM SIGKDD, pp. 377–382. ACM (2001)

    Google Scholar 

  19. Tin Kam, H.: The random subspace method for constructing decision forests. IEEE Transactions on Pattern Analysis and Machine Intelligence 20(8), 832–844 (1998)

    Article  Google Scholar 

  20. Tumer, K., Ghosh, J.: Linear and order statistics combiners for pattern classification. Springer (1999)

    Google Scholar 

  21. Wang, H., Fan, W., Yu, P.S., Han, J.: Mining concept-drifting data streams using ensemble classifiers. In: ACM SIGKDD, pp. 226–235. ACM (2003)

    Google Scholar 

  22. Woods, K., Philip Kegelmeyer, J.W., Bowyer, K.: Combination of multiple classifiers using local accuracy estimates. IEEE Transactions on Pattern Analysis and Machine Intelligence 19(4), 405–410 (1997)

    Article  Google Scholar 

  23. Zhenyu, L., Xindong, W., Bongard, J.: Active learning with adaptive heterogeneous ensembles. In: The 9th IEEE ICDM, pp. 327–336 (2009)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Nguyen, HL., Woon, YK., Ng, WK., Wan, L. (2012). Heterogeneous Ensemble for Feature Drifts in Data Streams. In: Tan, PN., Chawla, S., Ho, C.K., Bailey, J. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2012. Lecture Notes in Computer Science(), vol 7302. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-30220-6_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-30220-6_1

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-30219-0

  • Online ISBN: 978-3-642-30220-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics