Abstract
Concept drift has become an important issue while analyzing data streams. Further, data streams can also have skewed class distributions, known as class imbalance. Actually, in the real world, it is likely that a data stream simultaneously has multiple concept drifts and an imbalanced class distribution. However, since most research approaches do not consider class imbalance and the concept drift problem at the same time, they probably have a good performance on the overall average accuracy, while the accuracy of the minority class is very poor. To deal with these challenges, this paper proposes a new weighting method which can further improve the accuracy of the minority class on the imbalanced data streams with concept drifts. The experimental results confirm that our method not only achieves an impressive performance on the average accuracy but also improves the accuracy of the minority class on the imbalanced data streams.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Tsymbal, A.: The problem of concept drift: definitions and related work. Technical report TCD-CS-2004-15, Computer Science Department, Trinity College, Dublin (2004)
Kelly, M.G., Hand, D.J., Adams, N.M.: The Impact of changing populations on classifier performance. In: Knowledge Discovery and Data Mining, pp. 367–371 (1999)
João, G., Indrė, Ž., Albert, B., Mykola, P., Abdelhamid, B.: A survey on concept drift adaptation. ACM Comput. Surv. 46, 1–37 (2014)
Haibo, H., Edwardo, A.G.: Learning from imbalanced data. IEEE Trans. Knowl. Data Eng. 21, 1263–1284 (2009)
Barua, S., Islam, M.M., Yao, X., Murase, K.: MWMOTE-majority weighted minority oversampling technique for imbalanced data set learning. IEEE Trans. Knowl. Data Eng. 26, 405–425 (2014)
Kolter, J.Z., Maloof, M.A.: Dynamic weighted majority: an ensemble method for drifting concepts. J. Mach. Learn. Res. 8, 2755–2790 (2007)
Blum, A.: Empirical support for winnow and weighted-majority algorithms: resultson a calendar scheduling domain. Mach. Learn. 26(1), 5–23 (1997)
Santos, S., Gonçalves Jr., P.M., Silva, G., de Barros, R.S.M.: Speeding up recovery from concept drifts. In: Calders, T., Esposito, F., Hüllermeier, E., Meo, R. (eds.) ECML PKDD 2014, Part III. LNCS, vol. 8726, pp. 179–194. Springer, Heidelberg (2014)
Brzeziński, D., Stefanowski, J.: Accuracy updated ensemble for data streams with concept drift. In: Corchado, E., Kurzyński, M., Woźniak, M. (eds.) HAIS 2011, Part II. LNCS, vol. 6679, pp. 155–163. Springer, Heidelberg (2011)
Minku, L.L., Yao, X.: DDD: a new ensemble approach for dealing with concept drift. IEEE Trans. Knowl. Data Eng. 24(4), 619–633 (2012)
Brzezinski, D., Stefanowski, J.: Reacting to different types of concept drift: the accuracy updated ensemble algorithm. IEEE Trans. Neural Netw. Learn. Syst. 10(10), 1–13 (2013)
Shuo, W., Leandro, L.M., Xin, Y.: A learning framework for online class imbalance learning. In: Computational Intelligence and Ensemble Learning (CIEL), pp. 36–45 (2013)
Ghazikhani, A., Reza, M., Hadi, S.Y.: Recursive least square perceptron model for non-stationary and imbalanced data stream classification. Evolving Syst. 4, 119–131 (2013)
Mirza, B., Zhiping, L., Kar-Ann, T.: Weighted online sequential extreme learning machine for class imbalance learning. Neural Process. Lett. 38, 465–486 (2013)
Shuo, W., Leandro, L.M., Xin, Y.: Resampling-based ensemble methods for online class imbalance learning. IEEE Trans. Knowl. Data Eng. 27(5), 1356–1368 (2015)
Oza, N.C., Russell, S.: Experimental comparisons of online and batch versions of bagging and boosting. In: Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 359–364. ACM (2001)
Bifet, A., Holmes, G., Kirkby, R., Pfahringer, B.: MOA: massive online analysis. J. Mach. Learn. Res. 11, 1601–1604 (2010)
Street, W.N., Kim, Y.: A streaming ensemble algorithm SEA for large-scale classification. In: Lee, D., Schkolnick, M., Provost, F.J., Srikant, R. (eds.) KDD, pp. 377–382. ACM (2001)
Oza, N.C., Russell, S.: Online bagging and boosting. In: Artificial Intelligence and Statistics 2001, pp. 105–112. Morgan Kaufmann (2001)
Kelly, M.G., Hand, D.J., Adams, N.M.: The impact of changing populations on classifier performance. In: Proceedings of the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 367–371. ACM (1999)
Harries, M., Wales, N.S.: SPLICE-2 Comparative Evaluation: Electricity Pricing (1999)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Wang, SK., Dai, BR. (2016). A G-Means Update Ensemble Learning Approach for the Imbalanced Data Stream with Concept Drifts. In: Madria, S., Hara, T. (eds) Big Data Analytics and Knowledge Discovery. DaWaK 2016. Lecture Notes in Computer Science(), vol 9829. Springer, Cham. https://doi.org/10.1007/978-3-319-43946-4_17
Download citation
DOI: https://doi.org/10.1007/978-3-319-43946-4_17
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-43945-7
Online ISBN: 978-3-319-43946-4
eBook Packages: Computer ScienceComputer Science (R0)