Skip to main content
Log in

Data stream classification: a review

  • Review Article
  • Published:
Iran Journal of Computer Science Aims and scope Submit manuscript

Abstract

The tremendous amount of data is generated regularly through areas like networking, telecommunication, stock market, satellite, weather forecasting, etc. So, the classification process becomes important to extract knowledge from such a huge amount of data. The handling of concept drifting data stream and skewed data classification are the major issues and challenges in the data streams mining field. In the presence of concept drift, the performance of the learning algorithm always degrades. On the other side in skewed data problems, majority class accuracy always dominates minority class accuracy which generates the wrong result. This paper discusses the implemented methods which worked for skewed data and concept drifting data as well as their merits and demerits. This paper focused on metrics used in the classification process and issues that arise while learning with skewed and concept drifting data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1

Similar content being viewed by others

References

  1. Aggarwal, C.: Data Streams: Models and Algorithms. Springer, New York (2007)

    MATH  Google Scholar 

  2. Bifet, A.: Adaptive Stream Mining: Pattern Learning and Mining from Evolving Data Streams. IOS Press, Amsterdam (2010)

    MATH  Google Scholar 

  3. Bifet, A., Kirkby, R.: Data Stream Mining a Practical Approach. University of Waikato, Hamilton (2009)

    Google Scholar 

  4. Fan, W., Huang, Y., Wang, H., and Yu, P.S.: Active mining of data streams. In: Proceedings of the SIAM International Conference on Data Mining (SDM ’04), (2004)

  5. Gaber, M.M., Zaslavsky, A., Krishnaswamy, S.: Mining data streams: a review. ACM SIGMOD Rec. 34(2), 1826 (2005)

    MATH  Google Scholar 

  6. Gama, J., Fernandes, R., Rocha, R.: Decision trees for mining data streams. Intell. Data Anal. 10(1), 23–45 (2006)

    Google Scholar 

  7. Gao, J., Fan, W., and Hang, J.: On appropriate assumptions to mine data streams: analysis and practice. In: Proceedings of the IEEE International Conference on Data Mining (ICDM ’07), Oct. (2007)

  8. Han, J., Kamber, M.: Data Mining: Concepts and Techniques, 2nd edn. Elsevier, Amsterdam (2006)

    MATH  Google Scholar 

  9. Pfahringer, B., Holmes, G., and Kirkby, R.: New options for Hoeffding trees. In: Proceedings of the 20th Australasian Joint Conference on Artificial Intelligence (AI ’07), pp. 90–99. (2007)

  10. Gama, J.: A survey on learning from data streams: current and future trends. Prog. Artif. Intell. 1(1), 45–55 (2012)

    Google Scholar 

  11. Grossi, V., Turini, F.: Stream mining: a novel architecture for ensemble-based classification. Knowl. Inf. Syst. 30(2), 247–281 (2012)

    Google Scholar 

  12. Wankhade, K., Dongre, S., Thool, R.: New evolving ensemble classifier for handling concept drifting data streams. In: The Proceedings of 2012 2nd IEEE International Conference on Parallel, Distributed and Grid Computing (PDGC-12), INDIA, pp. 657–662. (2012)

  13. Bose, R., van der Aalst, W., Zliobaite, I., Pechenizkiy, M.: Dealing with concept drifts in process mining. IEEE Trans. Neural Netw. Learn. Syst. 25(1), 154–171 (2014)

    Google Scholar 

  14. Kuncheva, L., Faithfull, W.: PCA feature extraction for change detection in multidimensional unlabeled data. IEEE Trans. Neural Netw. Learn. Syst. 25(1), 69–80 (2014)

    Google Scholar 

  15. Pratama, M., Anavatti, S., Angelov, P., Lughofer, E.: PANFIS: a novel incremental learning machine. IEEE Trans. Neural Netw. Learn. Syst. 25(1), 55–68 (2014)

    Google Scholar 

  16. Lughofer, E., Angelov, P.: Handling drifts and shifts in on-line data streams with evolving fuzzy systems. Appl. Soft Comput. 11(2), 2057–2068 (2011)

    Google Scholar 

  17. Kasabov, N.: Evolving fuzzy neural networks for supervised/unsupervised online knowledge-based learning. IEEE Trans. Syst. Man Cybern. B Cybern. 31(6), 902–918 (2001)

    Google Scholar 

  18. Angelov, P., Lughofer, E., Zhou, X.: Evolving fuzzy classifiers using different model architectures. Fuzzy Sets Syst. 159(23), 3160–3182 (2008)

    MathSciNet  MATH  Google Scholar 

  19. Faisal, M.A., Aung, Z., Williams, J.R., Sanchez, A.: Data-stream-based intrusion detection system for advanced metering infrastructure in smart grid: a feasibility study. IEEE Syst. J. 9(1), 31–44 (2015)

    Google Scholar 

  20. Domingos P., Hulten G.: Mining high-speed data streams. In: International Conference on Knowledge Discovery and Data Mining, 2000, pp. 71–80. (2000)

  21. Widmer, G., Kubat, M.: Learning in the presence of concept drift and hidden contexts. Mach. Learn. 23(1), 69–101 (1996)

    Google Scholar 

  22. Bifet, A., and Gavalda, R.: Learning from time-changing data with adaptive windowing. In: Proceedings of the 2007 SIAM International Conference on Data Mining, 2007, pp. 443–448. (2007)

  23. Hulten, G., Spencer, L., and Domingos, P.: Mining time-changing data streams. In: Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data, 2001, pp. 97–106. (2001)

  24. Fan, W., Huang, Y., and Yu, P.: Decision tree evolution using limited number of labeled data items from drifting data streams. In: Proceedings of the Fourth IEEE International Conference on Data Mining, 2004, pp. 379–382. (2004)

  25. Liu, J., Li, X., Hong, W.: Ambiguous decision trees for mining concept-drifting data streams. Pattern Recogn. Lett. 30(15), 1347–1355 (2009)

    Google Scholar 

  26. Vivekanandan, P., Nedunchezhian, R.: Mining rules of concept drift using genetic algorithm. J. Artif. Intell. Soft Comput. Res. 1(2), 135–145 (2011)

    Google Scholar 

  27. Wang H., Fan W., Yu V., and Han J.: Mining concept-drifting data streams using ensemble classifiers. In: ACM SIGKDD, Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 226–235. (2003)

  28. Masud M., Gao J., Khan L., Han J., and Thuraisingham, B.: A multi-partition multi-chunk ensemble technique to classify concept drifting data streams. In: Proceedings of the 13th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD'09), pp. 363–375. Springer, Berlin (2009)

  29. Kolter, J., Maloof, M.: Dynamic weighted majority: a new ensemble method for tracking concept drift. J. Mach. Learn. Res. 8, 2755–2790 (2007)

    MATH  Google Scholar 

  30. Oza N., Russell S.: Experimental comparisons of online and batch versions of bagging and boosting. In: Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’01), pp. 359–364 (2001)

  31. Pelossof R., Jones M., Vovsha I., Rudin C.: Online coordinate boosting, pp. 1–9. https://arxiv.org/abs/0810.4553 (2008)

  32. Bieft A., Holmes G., Pfahringr B., Kirkby R., Gavalda R.: New ensemble methods for evolving data streams. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’09), Paris, France, pp. 139–148 (2009)

  33. Y. Law, C. Zaniolo, “An Adaptive Nearest Neighbor Classification Algorithm for Data Streams”, In Proceedings of 9th European Conference on Principals and Practice of Knowledge Discovery in Databases (PKDD-2005), Porto, Portugal, Springer-Verlag LNAI 3721, pp 108–120, 2005.

  34. Agrawal, C., Han, J., Wang, J., Yu, P.: A framework for on-demand classification of evolving data streams. IEEE Trans. Knowl. Data Eng. 18(5), 577–589 (2006)

    Google Scholar 

  35. Troyano, F., Ruiz, J., Riquelme, J.: Data streams classification by incremental rule learning with parameterized generalization. In: Proceedings of the 2006 ACM Symposium on Applied computing (SAC’06), France, ACM, pp 657–661 (2006)

  36. Hashemi, S., Yang, Y., Mirzamomen, Z., Kangavari, M.: Adapted one-versus-all decision trees for data stream classification. IEEE Trans. Knowl. Data Eng. 21(5), 624–637 (2009)

    Google Scholar 

  37. Liang, C., Zhang, Y., Song, Q.: Decision tree for dynamic and uncertain data streams. In: Proceedings of 2nd Asian Conference on Machine Learning (ACML2010), Tokyo, Japan, pp. 209–224 (2010)

  38. Li, P., Wu, X., Hu, X.: Mining recurring concept drifts with limited labeled streaming data. In: Proceedings of 2nd Asian Conference on Machine Learning (ACML 2010), Tokyo, Japan, pp. 241–252 (2010)

  39. Li, H., Lee, S.: Mining frequent itemsets over data streams using efficient window sliding techniques. J. Expert Syst. Appl. 36, 1466–1477 (2009)

    Google Scholar 

  40. Zliobaite, I.: Ensemble learning for concept drift handling—the role of new expert. In: Proceedings of the 5th International Conference on Machine Learning and Data Mining in Pattern Recognition (MLDM 2007), Leipzig, Germany, pp. 251–260 (2007)

  41. Abdulsalam, H., Skillicorn, D., Martin, P.: Classification using streaming random forests. IEEE Trans. Knowl. Data Eng. 23(1), 22–36 (2011)

    Google Scholar 

  42. Attar, V., Sinha, P., Wankhade, K.: A fast and light classifier for data streams. Evol. Syst. 1(3), 199–207 (2010)

    Google Scholar 

  43. Lughofer, E.: Dynamic evolving cluster models using on-line split-and-merge operations. In: Proceedings of 10th International Conference on Machine Learning and Applications, IEEE, 2011, pp. 20–26 (2011)

  44. Cao, F., Ester, M., Qian, W., and Zhou, A.: Density-based clustering over an evolving data stream with noise. In: Proceedings of the SIAM Conference on Data Mining, 2006, pp. 326–337 (2006)

  45. Liu, L., Huang, H., Guo, Y., Chen, F.: rDenStream, a clustering algorithm over an evolving data stream. In: 2009 International Conference on Information Engineering and Computer Science, IEEE, 2009. (2009)

  46. Qian, L., and Qin, L.: A framework of cluster decision tree in data stream classification. In: 2012 4th International Conference on Intelligent Human-Machine Systems and Cybernetics (IHMSC), IEEE, vol. 1, 2012, pp. 38–41 (2012)

  47. Sun N., and Guo, Y.: A modified incremental learning approach for data stream classification. In: Sixth International Conference on Internet Computing for Science and Engineering (ICICSE), 2012, pp. 122–125 (2012)

  48. Hosseini, M.J., Ahmadi, Z., and Beigy, H.: Pool and accuracy based stream classification: a new ensemble algorithm on data stream classification using recurring concepts detection. In: IEEE 11th International Conference on Data Mining Workshops (ICDMW), 2011, pp. 588–595 (2011)

  49. Masud, M.M., Gao, J., Khan, L., Bhavani, B.: Classification and novel class detection in concept-drifting data streams under time constraints. IEEE Trans. Knowl. Data Eng. 23(6), 859–874 (2011)

    Google Scholar 

  50. Guha, S., Mishra, N., Motwani, R., O’Callaghan, L.: Clustering data streams, pp. 359–366. Annual IEEE Symposium, Foundations of Computer Science (2000)

    Google Scholar 

  51. O’callaghan, L., Mishra, N., Meyerson, A., Guha, S., and Motwani, R.: Streaming-data algorithms for high-quality clustering. In: Proceedings of the 18th International Conference on Data Engineering (ICDE.02), IEEE, 2002. pp. 685–694 (2002)

  52. Aggarwal, C., Han, J., Wang, J., and Yu, P.: A framework for clustering evolving data streams. In: Proceedings of the 29th International Conference on Very Large Data Bases, 2003, Vol. 29, pp. 81–92 (2003)

  53. Jia, C., Tan, C. Y., Yong, A.: A grid and density-based clustering algorithm for processing data stream. In: Proceedings of Second International Conference on Genetic and Evolutionary Computing, IEEE, 2008, pp. 517–521 (2008)

  54. Fong, S., Wong, R., Vasilakos, A.V.: Accelerated PSO swarm search feature selection for data stream mining big data. IEEE Trans. Serv. Comput. 9(1), 33–45 (2016)

    Google Scholar 

  55. Rutkowski, L., Jaworski, M., Pietruczuk, L., Duda, P.: A new method for data stream mining based on the misclassification error. IEEE Trans. Neural Netw. Learn. Syst. 26(5), 1048–1059 (2015)

    MathSciNet  Google Scholar 

  56. Rutkowski, L., Jaworski, M., Pietruczuk, L., Duda, P.: Decision trees for mining data streams based on the Gaussian approximation. IEEE Trans. Knowl. Data Eng. 26(1), 108–119 (2014)

    MATH  Google Scholar 

  57. Sun, Y., Tang, K., Minku, L.L., Wang, S., Yao, X.: Online ensemble learning of data streams with gradually evolved classes. IEEE Trans. Knowl. Data Eng. 28(6), 1532–1545 (2016)

    Google Scholar 

  58. Brzezinski, D., Stefanowski, J.: Reacting to different types of concept drift: the accuracy updated ensemble algorithm. IEEE Trans. Neural Netw. Learn. Syst. 25(1), 81–94 (2014)

    Google Scholar 

  59. Gomes, J.B., Gaber, M.M., Sousa, P.A.C., Menasalvas, E.: Mining recurring concepts in a dynamic feature space. IEEE Trans. Neural Netw. Learn. Syst. 25(1), 95–110 (2014)

    Google Scholar 

  60. Zhang, L., Lin, J., Karim, R.: Sliding window-based fault detection from high-dimensional data streams. IEEE Trans. Syst. Man Cybern. Syst. 47(2), 289–303 (2017)

    Google Scholar 

  61. Salehi, M., Leckie, C., Bezdek, J.C., Vaithianathan, T., Zhang, X.: Fast memory efficient local outlier detection in data streams. IEEE Trans. Knowl. Data Eng. 28(12), 3246–3260 (2016)

    Google Scholar 

  62. Zhang, P., Zhou, C., Wang, P., Gao, B.J., Zhu, X., Guo, L.: E-Tree: an efficient indexing structure for ensemble models on data streams. IEEE Trans. Knowl. Data Eng. 27(2), 461–474 (2015)

    Google Scholar 

  63. Qahtan, A., Wang, S., Zhang, X.: KDE-track: an efficient dynamic density estimator for data streams. IEEE Trans. Knowl. Data Eng. 29(3), 642–655 (2017)

    Google Scholar 

  64. Al-Khateeb, T., Masud, M.M., Al-Naami, K.M., Seker, S.E., Mustafa, A.M., Khan, L., Trabelsi, Z., Aggarwal, C., Han, J.: Recurring and novel class detection using class-based ensemble for evolving data stream. IEEE Trans. Knowl. Data Eng. 28(10), 2752–2764 (2016)

    Google Scholar 

  65. Li, X., Yu, W., Villegas, S.: Structural health monitoring of building structures with online data mining methods. IEEE Syst. J. 10(3), 1291–1300 (2016)

    Google Scholar 

  66. Chen, X., Vorvoreanu, M., Madhavan, K.: Mining social media data for understanding student's learning experiences. IEEE Trans. Learn. Technol. 7(3), 246–259 (2014)

    Google Scholar 

  67. Zhang, Q., Zhang, P., Long, G., Ding, W., Zhang, C., Wu, X.: Online learning from trapezoidal data streams. IEEE Trans. Knowl. Data Eng. 28(10), 2709–2723 (2016)

    Google Scholar 

  68. Liu, S., Qu, Q., Chen, L., Ni, L.M.: SMC: a practical schema for privacy-preserved data sharing over distributed data streams. IEEE Trans. Big Data 1(2), 68–81 (2015)

    Google Scholar 

  69. Canzian, L., Van Der Schaar, M.: Real-time stream mining: online knowledge extraction using classifier networks. IEEE Netw. 29(5), 10–16 (2015)

    Google Scholar 

  70. Tekin, C., van der Schaar, M.: Active learning in context-driven stream mining with an application to image mining. IEEE Trans. Image Process. 24(11), 3666–3679 (2015)

    MathSciNet  MATH  Google Scholar 

  71. de Faria, E.R., Gonçalves, I.R., Gama, J., de Leon Ferreira, A.C.P.C.: Evaluation of multiclass novelty detection algorithms for data streams. IEEE Trans. Knowl. Data Eng. 27(11), 2961–2973 (2015)

    Google Scholar 

  72. Hahsler, M., Bolaños, M.: Clustering data streams based on shared density between micro-clusters. IEEE Trans. Knowl. Data Eng. 28(6), 1449–1461 (2016)

    Google Scholar 

  73. Liu, B., Xiao, Y., Yu, P.S., Cao, L., Zhang, Y., Hao, Z.: Uncertain one-class learning and concept summarization learning on uncertain data streams. IEEE Trans. Knowl. Data Eng. 26(2), 468–484 (2014)

    Google Scholar 

  74. Dyer, K.B., Capo, R., Polikar, R.: COMPOSE: a semi-supervised learning framework for initially labeled nonstationary streaming data. IEEE Trans. Neural Netw. Learn. Syst. 25(1), 12–26 (2014)

    Google Scholar 

  75. UCI machine learning repository. https://archive.ics.uci.edu/ml/datasets.html

  76. He, H., Garcia, E.A.: Learning from imbalanced data. IEEE Trans. Knowl. Data Eng. 21(9), 1263–1284 (2009)

    Google Scholar 

  77. Galar, M., Fernandez, A., Barrenechea, E., Bustince, H., Herrera, F.: A review on ensembles for the class imbalance problem: bagging-, boosting-, and hybrid-based approaches. IEEE Trans. Syst. Man Cybern. Part C Appl. Rev. 42(4), 463–484 (2011)

    Google Scholar 

  78. Wang, S., Yao, X.: Multiclass imbalance problems: analysis and potential solutions. IEEE Trans. Syst. Man Cybern. Part B Cybern. 42(4), 1119–1130 (2012)

    Google Scholar 

  79. Azaria, A., Richardson, A., Kraus, S., Subrahmanian, V.S.: Behavioral analysis of insider threat: a survey and bootstrapped prediction in imbalanced data. IEEE Trans. Comput. Soc. Syst. 1(2), 135–155 (2014)

    Google Scholar 

  80. Cano, A., Zafra, A., Ventura, S.: Weighted data gravitation classification for standard and imbalanced data. IEEE Trans. Cybern. 43(6), 1672–1687 (2013)

    Google Scholar 

  81. Aksoylar, C., Qian, J., Saligrama, V.: Clustering and community detection with imbalanced clusters. IEEE Trans. Signal Inf. Process. Over Netw. 3(1), 61–76 (2017)

    MathSciNet  Google Scholar 

  82. Bae, S., Yoon, K.: Polyp detection via imbalanced learning and discriminative feature learning. IEEE Trans. Med. Imaging 34(11), 2379–2393 (2015)

    Google Scholar 

  83. Pérez-Ortiz, M., Gutiérrez, P.A., Hervás-Martínez, C., Yao, X.: Graph-based approaches for over-sampling in the context of ordinal regression. IEEE Trans. Knowl. Data Eng. 27(5), 1233–1245 (2015)

    Google Scholar 

  84. Wang, F., Xu, T., Tang, T., Zhou, M., Wang, H.: Bilevel feature extraction-based text mining for fault diagnosis of railway systems. IEEE Trans. Intell. Transp. Syst. 18(1), 49–58 (2017)

    Google Scholar 

  85. Lee, T., Lee, K.B., Kim, C.O.: Performance of machine learning algorithms for class-imbalanced process fault detection problems. IEEE Trans. Semicond. Manuf. 29(4), 436–445 (2016)

    Google Scholar 

  86. Khreich, W., Granger, E., Miri, A., Sabourin, R.: Iterative Boolean combination of classifiers in the roc space: an application to anomaly detection with HMMs. J Pattern Recogn. 43(8), 2732–2752 (2010)

    MATH  Google Scholar 

  87. Tavallaee, M., Stakhanova, N., Ghorbani, A.: Toward credible evaluation of anomaly-based intrusion–detection methods. IEEE Trans. Syst. Man Cybern. Part C Appl. Rev. 40(5), 516–524 (2010)

    Google Scholar 

  88. Huda, S., Yearwood, J., Jelinek, H.F., Hassan, M.M., Fortino, G., Buckland, M.: A hybrid feature selection with ensemble classification for imbalanced healthcare data: a case study for brain tumor diagnosis. IEEE Access 4, 9145–9154 (2016)

    Google Scholar 

  89. Dai, H.: Imbalanced protein data classification using ensemble FTM-SVM. IEEE Trans. NanoBiosci 14(4), 350–359 (2015)

    Google Scholar 

  90. Liu, N., Koh, Z.X., Chua, E.C., Tan, L.M., Lin, Z., Mirza, B., Ong, M.E.H.: Risk scoring for prediction of acute cardiac complications from imbalanced clinical data. IEEE J. Biomed. Health Inf. 18(6), 1894–1902 (2014)

    Google Scholar 

  91. Yu, H., Ni, J.: An improved ensemble learning method for classifying high-dimensional and imbalanced biomedicine data. IEEE/ACM Trans. Comput. Biol. Bioinform. 11(4), 657–666 (2014)

    Google Scholar 

  92. Chen, P., Hu, S., Zhang, J., Gao, X., Li, J., Xia, J., Wang, B.: A sequence-based dynamic ensemble learning system for protein ligand-binding site prediction. IEEE/ACM Trans. Comput. Biol. Bioinform. 13(5), 901–912 (2016)

    Google Scholar 

  93. Cuendet, G.L., Schoettker, P., Yüce, A., Sorci, M., Gao, H., Perruchoud, C., Thiran, J.: Facial image analysis for fully automatic prediction of difficult endotracheal intubation. IEEE Trans. Biomed. Eng. 63(2), 328–339 (2016)

    Google Scholar 

  94. Phua, C., Alahakoon, D., Lee, V.: Minority report in fraud detection: classification of skewed data. ACM SIGKDD Explor. Newsl. 6(1), 50–59 (2004)

    Google Scholar 

  95. del Castillo, M.D., Serrano, J.I.: A multi strategy approach for digital text categorization from imbalanced documents. ACM SIGKDD Explor. Newsl. 6(1), 70–79 (2004)

    Google Scholar 

  96. Ling, C. X., Li, C.: Data mining for direct marketing: problems and solutions. In: The Proceedings of 4th International Conference on Knowledge Discovery and Data Mining (KDD), 1998, pp. 73–79 (1998)

  97. Bermejo, P., Gámez, J.A., Puerta, J.M.: Improving the performance of naive Bayes multinomial in e-mail foldering by introducing distribution based balance of datasets. J. Expert Syst. Appl. 38(3), 2072–2080 (2011)

    Google Scholar 

  98. Liu, Y.H., Chen, Y.T.: Total margin-based adaptive fuzzy support vector machines for multiview face recognition. Proc. IEEE Int. Conf. Syst. Man Cybern. 2, 1704–1711 (2005)

    Google Scholar 

  99. Kim, S., Kim, H., Namkoong, Y.: Ordinal classification of imbalanced data with application in emergency and disaster information services. IEEE Intell. Syst. 31(5), 50–56 (2016)

    Google Scholar 

  100. Sanz, J.A., Bernardo, D., Herrera, F., Bustince, H., Hagras, H.: A compact evolutionary interval-valued fuzzy rule-based classification system for the modeling and prediction of real-world financial applications with imbalanced data. IEEE Trans. Fuzzy Syst. 23(4), 973–990 (2015)

    Google Scholar 

  101. Cao, H., Tan, V.Y.F., Pang, J.Z.F.: A parsimonious mixture of Gaussian trees model for oversampling in imbalanced and multimodal time-series classification. IEEE Trans. Neural Netw. Learn. Syst. 25(12), 2226–2239 (2014)

    Google Scholar 

  102. Pérez-Ortiz, M., Gutiérrez, P.A., Tino, P., Hervás-Martínez, C.: Oversampling the minority class in the feature space. IEEE Trans. Neural Netw. Learn. Syst. 27(9), 1947–1961 (2016)

    MathSciNet  Google Scholar 

  103. Amin, A., Anwar, S., Adnan, A., Nawaz, M., Howard, N., Qadir, J., Hawalah, A., Hussain, A.: Comparing oversampling techniques to handle the class imbalance problem: a customer churn prediction case study. IEEE Access 4, 7940–7957 (2016)

    Google Scholar 

  104. Abdi, L., Hashemi, S.: To Combat multi-class imbalanced problems by means of over-sampling techniques. IEEE Trans. Knowl. Data Eng. 28(1), 238–251 (2016)

    Google Scholar 

  105. Chawla, N.V., Hall, L.O., Bowyer, K.W.: SMOTE: synthetic minority oversampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)

    MATH  Google Scholar 

  106. Wankhade, K., Dongre, S.: A new adaptive ensemble boosting classifier for concept drifting stream data. Int. J. Model. Optim. (IJMO) 2(4), 488–492 (August 2012)

    Google Scholar 

  107. Wang, B., Pineau, J.: Online bagging and boosting for imbalanced data streams. IEEE Trans. Knowl. Data Eng. 28(12), 3353–3366 (2016)

    Google Scholar 

  108. Breiman, L.: Bagging predictors. J Mach. Learn. 24(2), 123–140 (1996)

    MATH  Google Scholar 

  109. Zhu, X.: Semi-supervised learning literature survey. Technical Report TR-1530, University of Wisconsin-Madison, 2007

  110. Rosenberg, C., Hebert, M., Schneiderman, H.: Semi-supervised self-training of object detection models. In: The Proceeding of IEEE Workshops Application of Computer Vision, 2005, pp. 29–36 (2005)

  111. Sindhwani, V., Keerthi, S. S.: Large scale semi-supervised linear SVMs. In: The Proceeding of International SIGIR Conference on Research and Development in Information Retrieval, 2006, pp. 477–484 (2006)

  112. Fujino, A., Ueda, N., Saito, K.: A hybrid generative/discriminative approach to semi-supervised classifier design. In: The Proceeding of National Conference on Artificial Intelligence, 2005, pp. 764–769 (2005)

  113. Lin, S., Wang, C., Wu, Z., Chung, Y.: Detect rare events via MICE algorithm with optimal threshold. In: The Proceeding of Seventh International Conference on Innovative Mobile and Internet Services in Ubiquitous Computing, IEEE, 2013, pp. 70–75 (2013)

  114. Seiffert, C., Khoshgoftaar, T.M., Hulse, J.V., Napolitano, A.: RUSBoost: a hybrid approach to alleviating class imbalance. IEEE Trans. Syst. Man Cybern. Part A Syst. Hum. 40(1), 185–197 (2010)

    Google Scholar 

  115. Oh, S., Lee, M.S., Zhang, B.: Ensemble learning with active example selection for imbalanced biomedical data classification. IEEE/ACM Trans. Comput. Biol. Bioinform. 8(2), 316–325 (2011)

    Google Scholar 

  116. Yang, P., Yoo, P.D., Fernando, J., Zhou, B.B., Zhang, Z., Zomaya, A.Y.: Sample subset optimization techniques for imbalanced and ensemble learning problems in bioinformatics applications. IEEE Trans. Cybern. 44(3), 445–455 (2014)

    Google Scholar 

  117. Ditzler, G., Polikar, R.: Incremental learning of concept drift from streaming imbalanced data. IEEE Trans. Knowl. Data Eng. 25(10), 2283–2301 (2013)

    Google Scholar 

  118. Sun, Y., Kamel, M. S., Wang, Y.: Boosting for learning multiple classes with imbalanced class distribution. In: The Proceeding of Sixth International Conference on Data Mining (ICDM), 2006, pp. 592–602 (2006)

  119. Huang, K., Kuo, Y., Yeh, I.: A novel fitness function in genetic algorithms to optimize neural networks for imbalanced data sets. In: The Proceeding of the Eighth International Conference on Intelligent Systems Design and Application, IEEE, 2008, pp. 647–650 (2008)

  120. Wang, S., Minku, L.L., Yao, X.: Resampling-based ensemble methods for online class imbalance learning. IEEE Trans. Knowl. Data Eng. 27(5), 1356–1368 (2015)

    Google Scholar 

  121. Ahumada, H., Grinblat, G. L., Uzal, L. C., Granitto, P. M., Ceccatto, A.: REPMAC: A new hybrid approach to highly imbalanced classification problems. In: The Proceeding of Eighth International Conference on Hybrid Intelligent Systems, IEEE, 2008, pp. 386–391 (2008)

  122. Jeatrakul, P., Wong, K.W.: Enhancing Classification Performance of Multi-Class Imbalanced Data Using the OAA-DB Algorithm, In the proceeding of IEEE World Congress on Computational Intelligence (WCCI), pp. 1–8. Brisbane, IEEE (2012)

    Google Scholar 

  123. Tan, S. C., Watada, J., Ibrahim, Z., Khalid, M., Jau, L. W., Chew, L. C.: Learning with imbalanced datasets using fuzzy ARTMAP-based neural network models. In: The Proceeding of 2011 IEEE International Conference on Fuzzy Systems, 2011, Taiwan, pp. 1084–1089 (2011)

  124. Cao, P., Li, B., Zhao, D., Zaiane, O.: A novel cost sensitive neural network ensemble for multiclass imbalance data learning. In: The Proceeding of International Joint Conference on Neural Networks (IJCNN), IEEE, 2013, pp. 1–8 (2013)

  125. Fu, J., Lee, S.: Certainty-enhanced active learning for improving imbalanced data classification. In: The Proceeding of 11th IEEE International Conference on Data Mining Workshops, IEEE, 2011, pp. 405–412 (2011)

  126. Antwi, D. K., Viktor, H. L., Japkowicz, N.: The PerfSim algorithm for concept drift detection in imbalanced data. In: The Proceeding of 12th IEEE International Conference on Data Mining Workshops, IEEE, 2012, pp. 619–628 (2012)

  127. Alhammady, H., Ramamohanarao, K.: Using emerging patterns and decision trees in rare-class classification. In: Proceedings of the Fourth IEEE International Conference on Data Mining (ICDM’04), 2004, pp. 315–318 (2004)

  128. Wang, P., Wang, H., Wu, X., Wang, W., Shi, B.: A low-granularity classifier for data streams with concept drifts and biased class distribution. IEEE Trans. Knowl. Data Eng. 19(9), 1202–1213 (2007)

    Google Scholar 

  129. Orriols-Puig, A., Bernadó-Mansilla, E., Goldberg, D.E., Sastry, K., Lanzi, P.L.: Facetwise analysis of XCS for problems with class imbalances. IEEE Trans. Evol. Comput. 13(5), 1093–1119 (2009)

    Google Scholar 

  130. He, J., Tong, H., Carbonell, J.: Rare category characterization. In: The Proceeding of IEEE International Conference on Data Mining, 2010, pp. 226–235 (2010)

  131. Hospedales, T.M., Gong, S., Xiang, T.: Finding rare classes: active learning with generative and discriminative models. IEEE Trans. Knowl. Data Eng. 25(2), 374–386 (2013)

    Google Scholar 

  132. Huang, K., Yang, H., King, I., Lyu, M.R.: Imbalanced learning with a biased minimax probability machine. IEEE Trans. Syst. Man Cybern. Cybern. 36(4), 913–923 (2006)

    Google Scholar 

  133. Su, C., Hsiao, Y.: An evaluation of the robustness of MTS for imbalanced data. IEEE Trans. Knowl. Data Eng. 19(10), 1321–1332 (2007)

    Google Scholar 

  134. Diamantini, C., Potena, D.: Bayes vector quantizer for class-imbalance problem. IEEE Trans. Knowl. Data Eng. 21(5), 638–651 (2009)

    Google Scholar 

  135. Castro, C.L., Braga, A.P.: Novel cost-sensitive approach to improve the multilayer perceptron performance on imbalanced data. IEEE Trans. Neural Netw. Learn. Syst. 24(6), 888–899 (2013)

    Google Scholar 

  136. Kwak, J., Lee, T., Kim, C.O.: An incremental clustering-based fault detection algorithm for class-imbalanced process data. IEEE Trans. Semicond. Manuf. 28(3), 1–11 (2015)

    Google Scholar 

  137. Zhang, X., Hu, B.: A new strategy of cost-free learning in the class imbalance problem. IEEE Trans. Knowl. Data Eng. 26(12), 2872–2885 (2014)

    Google Scholar 

  138. Das, B., Krishnan, N.C., Cook, D.J.: RACOG and wRACOG: two probabilistic oversampling techniques. IEEE Trans. Knowl. Data Eng. 27(1), 222–234 (2015)

    Google Scholar 

  139. Weiss, G.M.: Mining with rarity: a unifying framework. ACM SIGKDD Explor. Newsl. 6(1), 7–19 (2004)

    Google Scholar 

  140. Sun, Y., Kamel, M.S., Wong, A.K.C., Wang, Y.: Cost-sensitive boosting for classification of imbalanced data. Pattern Recogn. 40(12), 3358–3378 (2007)

    MATH  Google Scholar 

  141. Provost, F. J., Fawcett, T.: Analysis and visualization of classifier performance: comparison under imprecise class and cost distributions. In: The Proceeding of International Conference on Knowledge Discovery and Data Mining, 1997, pp. 43–48 (1997)

  142. Provost, F. J., Fawcett, T., Kohavi, R.: The case against accuracy estimation for comparing induction algorithms. In: The Proceeding of International Conference on Machine Learning, 1998, pp. 445–453 (1998)

  143. Davis, J., Goadrich, M.: The relationship between precision-recall and ROC curves. In: The Proceeding of International Conference on Machine Learning, 2006, pp. 233–240 (2006)

  144. Bunescu, R., Ge, R., Kate, R., Marcotte, E., Mooney, R., Ramani, A., Wong, Y.: Comparative experiments on learning information extractors for proteins and their interactions. Artif. Intell. Med. 33, 139–155 (2005)

    Google Scholar 

  145. Weiss, G.M., Provost, F.: Learning when training data are costly: the effect of class distribution on tree induction. J. Artif. Intell. Res. 19, 315–354 (2003)

    MATH  Google Scholar 

  146. Estabrooks, A., Jo, T., Japkowicz, N.: A multiple resampling method for learning from imbalanced data sets. Int. J. Comput. Intell. 20(1), 18–36 (2004)

    MathSciNet  Google Scholar 

  147. NIST Scientific and Technical Databases. https://nist.gov/srd/online.htm (2009)

  148. Park, S., Ha, Y.: Large imbalance data classification based on MapReduce for traffic accident prediction. In: The Proceeding of Eighth International Conference on Innovative Mobile and Internet Services in Ubiquitous Computing, 2014, pp. 45–49 (2014)

  149. Ngiam, J., Khosla, A., Kim, M., Nam, J., Lee, H., and Ng, A. Y.: Multimodal deep learning. In: The Proceeding of 28th International Conference Machine Learning, 2011, pp. 689–696 (2011)

  150. Srivastava, N., and Salakhutdinov, R.: Multimodal learning with deep Boltzmann machines. In: Proceeding of the Advance Neural Information Processing Systems, pp. 2222–2230 (2012)

  151. Zhang, Q., Yang, L.T., Chen, Z.: Deep computation model for unsupervised feature learning on big data. IEEE Trans. Serv. Comput. 9(1), 161 (2016)

    Google Scholar 

  152. Wankhade, K., Jondhale, K., Thool, V.: A hybrid approach for classification of rare class data. Knowl. Inf. Syst. 56(1), 197–221 (2018)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kapil K. Wankhade.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wankhade, K.K., Dongre, S.S. & Jondhale, K.C. Data stream classification: a review. Iran J Comput Sci 3, 239–260 (2020). https://doi.org/10.1007/s42044-020-00061-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s42044-020-00061-3

Keywords

Navigation