Abstract
The problem of mining single-label data streams has been extensively studied in recent years. However, not enough attention has been paid to the problem of mining multi-label data streams. In this paper, we propose an improved binary relevance method to take advantage of dependence information among class labels, and propose a dynamic classifier ensemble approach for classifying multi-label concept-drifting data streams. The weighted majority voting strategy is used in our classification algorithm. Our empirical study on both synthetic data set and real-life data set shows that the proposed dynamic classifier ensemble with improved binary relevance approach outperforms dynamic classifier ensemble with binary relevance algorithm, and static classifier ensemble with binary relevance algorithm.
This work is supported by Young Cadreman Supporting Program of Northwest A&F University (01140301). Corresponding author: Zhang Yang.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Wang, H., Fan, W., Yu, P.S., Han, J.: Mining concept-drifting data streams using ensemble classifiers. In: Proceeding of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 226–235. ACM Press, New York (2003)
Hulten, G., Spencer, L., Domingos, P.: Mining Time-Changing Data Streams. In: ACM SIGKDD, pp. 97–106 (2001)
Widmer, G., Kubat, M.: Learning in the presence of concept drift and hidden contexts. Machine Learning 23(1), 69–101 (1996)
Qu, W., Zhang, Y., Zhu, J., Wang, Y.: Mining concept-drifting multi-label data streams using ensemble classifiers. In: Fuzzy Systems and Knowledge Discovery, Tianjin, China (to appear, 2009)
Tsoumakas, G., Katakis, I.: Multi-label classification: An overview. International Journal of Data Warehousing and Mining 3(3), 1–13 (2007)
Tsoumakas, G., Katakis, I., Vlahavas, I.: Mining Multi-label Data. In: Maimon, O., Rokach, L. (eds.) Data Mining and Knowledge Discovery Handbook, 2nd edn. Springer, Heidelberg (2009), http://mlkd.csd.auth.gr/multilabel.html
Clare, A., King, R.: Knowledge discovery in multi-label phenotype data. In: Siebes, A., De Raedt, L. (eds.) PKDD 2001. LNCS (LNAI), vol. 2168, pp. 42–53. Springer, Heidelberg (2001)
Schapire, R.E., Singer, Y.: Boostexter: a boosting-based system for text categorization. Machine Learning, 135–168 (2000)
McCallum, A.: Multi-label text classification with a mixture model trained by em. In: Proceedings of the AAAI 1999 Workshop on Text Learning, pp. 681–687 (1999)
Crammer, K., Singer, Y.: A family of additive online algorithms for category ranking. Journal of Machine Learning Research, 1025–1058 (2003)
Elisseeff, A., Weston, J.: A kernel method for multi-labeled classification. In: Advances in Neural Information Processing Systems, pp. 681–687 (2002)
Zhang, M.L., Zhou, Z.H.: Multi-label neural networks with applications to functional genomics and text categorization. IEEE Transactions on Knowledge and Data Engineering, 1338–1351 (2006)
Godbole, S., Sarawagi, S.: Discriminative methods for multi-labeled classification. In: Dai, H., Srikant, R., Zhang, C. (eds.) PAKDD 2004. LNCS (LNAI), vol. 3056, pp. 22–30. Springer, Heidelberg (2004)
Zhang, M.L., Zhou, Z.H.: A k-nearest neighbor based algorithm for multi-label classification. In: Proceedings of the 1st IEEE International Conference on Granular Computing, pp. 718–721 (2005)
Brinker, K., Hullermeier, E.: Case-based multilabel ranking. In: Proceedings of the 20th International Conference on Artificial Intelligence (IJCAI 2007), Hyderabad, India, pp. 702–707 (2007)
Thabtah, F., Cowling, P., Peng, Y.: Mmac: A new multi-class, multi-label associative classification approach. In: Proceedings of the 4th IEEE International Conference on Data Mining, ICDM 2004, pp. 217–224 (2004)
Tsymbal, A.: The problem of concept drift: definitions and related work. Technical Report TCD-CS-2004-15, Department of Computer Science, Trinity College Dublin, Ireland (2004)
Street, W., Kim, Y.: A streaming ensemble algorithm (SEA) for large scale classification. In: KDD 2001, 7th International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, August 2001, pp. 377–382 (2001)
Zhu, X., Wu, X., Yang, Y.: Dynamic classifier selection for effective mining from noisy data streams. In: Proceedings of the 4th international conference on Data Mining (ICDM 2004), pp. 305–312 (2004)
Kolter, J.Z., Maloof, M.A.: Dynamic weighted majority: a new ensemble method for tracking concept drift. In: ICDM 2003, 3rd International Conference on Data Mining, pp. 123–130 (2003)
Zhang, Y., Jin, X.: An automatic construction and organization strategy for ensemble learning on data streams. ACM SIGMOD Record 35(3), 28–33 (2006)
Tsymbal, A., Pechenizkiy, M., Cunningham, P.: Puuronen. S.: Handling local concept drift with dynamic integration of classifiers: domain of antibiotic resistance in nosocomial infections. In: Proceedings of 19th International Symposium on Computer-Based Medical Systems (CBMS 2006), pp. 679–684 (2006)
Gao, S., Wu, W., Lee, C.-H., Chua, T.-S.: A maximal figure-of-merit approach to text categorization. In: SIGIR 2003, pp. 174–181 (2003)
Katakis, I., Tsoumakas, G., Vlahavas, I.: Dynamic Feature Space and Incremental Feature Selection for the Classification of Textual Data Streams. In: ECML/PKDD 2006 International Workshop on Knowledge Discovery from Data Streams, Berlin, Germany, pp. 107–116 (2006)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Qu, W., Zhang, Y., Zhu, J., Qiu, Q. (2009). Mining Multi-label Concept-Drifting Data Streams Using Dynamic Classifier Ensemble. In: Zhou, ZH., Washio, T. (eds) Advances in Machine Learning. ACML 2009. Lecture Notes in Computer Science(), vol 5828. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-05224-8_24
Download citation
DOI: https://doi.org/10.1007/978-3-642-05224-8_24
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-05223-1
Online ISBN: 978-3-642-05224-8
eBook Packages: Computer ScienceComputer Science (R0)