Abstract
Classification is an important and practical tool which uses a model built on historical data to predict class labels for new arrival data. In the last few years, there have been many interesting studies on classification in data streams. However, most such studies assume that those data streams are relatively balanced and stable. Actually, skewed data streams (e.g., few positive but lots of negatives) are very important and typical, which appear in many real world applications. Concept drifts and skewed distributions, two common properties of data streams, make the task of learning in streams particularly difficult and the traditional data mining algorithms no longer work. In this paper, we propose a method (Selectively Re-train Approach Based on Clustering) which can deal with concept-drifting and skewed distribution simultaneously. We evaluate our algorithm on both synthetic and real data sets simulating skewed data streams. Empirical results show the proposed method yields better performance than the previous work.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Drummond, C., Holte, R.C.: C4.5, class imbalance, and cost sensitivity: Why under-sampling beats over-sampling, pp. 1–8 (2003)
Fawcett, T.: Roc graphs: Notes and practical considerations for re-searchers. Technical report, HP Laboratories (2004)
Gao, J., Ding, B., Fan, W., Han, J., Yu, P.S.: Classifying data streams with skewed class distributions and concept drifts. IEEE Internet Computing 12(6), 37–49 (2008)
Gao, J., Fan, W., Han, J., Yu, P.S.: A general framework for mining concept-drifting data streams with skewed distributions. In: Proc. 2007 SIAM Int. Conf. Data Mining (SDM 2007), Minneapolis (MN2007)
He, H., Garcia, E.A.: Learning from imbalanced data. IEEE Transactions on Knowledge and Data Engineering 21(9), 1263–1284 (2009)
Hulten, G., Spencer, L., Domingos, P.: Mining time-changing data streams. In: Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2001, pp. 97–106. ACM, New York (2001)
Kotsiantis, S.B., Pintelas, P.E.: Mixture of expert agents for handling imbalanced data sets (2003)
Masud, M.M., Gao, J., Khan, L., Han, J., Thuraisingham, B.M.: A practical approach to classify evolving data streams: Training with limited amount of labeled data. In: ICDM, pp. 929–934 (2008)
Nguyen, H.M., Cooper, E.W., Kamei, K.: Online learning from imbalanced data streams. In: SOCPAR 2011, pp. 347–352 (2011)
Gu, Q., Zhu, L., Cai, Z.: Evaluation measures of the classification performance of imbalanced data sets. In: Cai, Z., Li, Z., Kang, Z., Liu, Y. (eds.) ISICA 2009. CCIS, vol. 51, pp. 461–471. Springer, Heidelberg (2009)
Wang, H., Fan, W., Yu, P.S., Han, J.: Mining concept-drifting data streams using ensemble classifiers. In: Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2003, pp. 226–235. ACM, New York (2003)
Wang, Y., Zhang, Y., Wang, Y.: Mining data streams with skewed distributions by static classifier ensemble. In: Chien, B.-C., Hong, T.-P. (eds.) Opportunities and Challenges for Next-Generation Applied Intelligence. SCI, vol. 214, pp. 65–71. Springer, Heidelberg (2009)
Kolter, J.Z., Maloof, M.A.: Using additive expert ensembles to cope with concept drift. In: Proceedings of the 22nd International Conference on Machine Learning, pp. 449–456. ACM Press (2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Zhang, D., Shen, H., Hui, T., Li, Y., Wu, J., Sang, Y. (2014). A Selectively Re-train Approach Based on Clustering to Classify Concept-Drifting Data Streams with Skewed Distribution. In: Tseng, V.S., Ho, T.B., Zhou, ZH., Chen, A.L.P., Kao, HY. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2014. Lecture Notes in Computer Science(), vol 8444. Springer, Cham. https://doi.org/10.1007/978-3-319-06605-9_34
Download citation
DOI: https://doi.org/10.1007/978-3-319-06605-9_34
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-06604-2
Online ISBN: 978-3-319-06605-9
eBook Packages: Computer ScienceComputer Science (R0)