Abstract
The tremendous amount of data is generated regularly through areas like networking, telecommunication, stock market, satellite, weather forecasting, etc. So, the classification process becomes important to extract knowledge from such a huge amount of data. The handling of concept drifting data stream and skewed data classification are the major issues and challenges in the data streams mining field. In the presence of concept drift, the performance of the learning algorithm always degrades. On the other side in skewed data problems, majority class accuracy always dominates minority class accuracy which generates the wrong result. This paper discusses the implemented methods which worked for skewed data and concept drifting data as well as their merits and demerits. This paper focused on metrics used in the classification process and issues that arise while learning with skewed and concept drifting data.
Similar content being viewed by others
References
Aggarwal, C.: Data Streams: Models and Algorithms. Springer, New York (2007)
Bifet, A.: Adaptive Stream Mining: Pattern Learning and Mining from Evolving Data Streams. IOS Press, Amsterdam (2010)
Bifet, A., Kirkby, R.: Data Stream Mining a Practical Approach. University of Waikato, Hamilton (2009)
Fan, W., Huang, Y., Wang, H., and Yu, P.S.: Active mining of data streams. In: Proceedings of the SIAM International Conference on Data Mining (SDM ’04), (2004)
Gaber, M.M., Zaslavsky, A., Krishnaswamy, S.: Mining data streams: a review. ACM SIGMOD Rec. 34(2), 1826 (2005)
Gama, J., Fernandes, R., Rocha, R.: Decision trees for mining data streams. Intell. Data Anal. 10(1), 23–45 (2006)
Gao, J., Fan, W., and Hang, J.: On appropriate assumptions to mine data streams: analysis and practice. In: Proceedings of the IEEE International Conference on Data Mining (ICDM ’07), Oct. (2007)
Han, J., Kamber, M.: Data Mining: Concepts and Techniques, 2nd edn. Elsevier, Amsterdam (2006)
Pfahringer, B., Holmes, G., and Kirkby, R.: New options for Hoeffding trees. In: Proceedings of the 20th Australasian Joint Conference on Artificial Intelligence (AI ’07), pp. 90–99. (2007)
Gama, J.: A survey on learning from data streams: current and future trends. Prog. Artif. Intell. 1(1), 45–55 (2012)
Grossi, V., Turini, F.: Stream mining: a novel architecture for ensemble-based classification. Knowl. Inf. Syst. 30(2), 247–281 (2012)
Wankhade, K., Dongre, S., Thool, R.: New evolving ensemble classifier for handling concept drifting data streams. In: The Proceedings of 2012 2nd IEEE International Conference on Parallel, Distributed and Grid Computing (PDGC-12), INDIA, pp. 657–662. (2012)
Bose, R., van der Aalst, W., Zliobaite, I., Pechenizkiy, M.: Dealing with concept drifts in process mining. IEEE Trans. Neural Netw. Learn. Syst. 25(1), 154–171 (2014)
Kuncheva, L., Faithfull, W.: PCA feature extraction for change detection in multidimensional unlabeled data. IEEE Trans. Neural Netw. Learn. Syst. 25(1), 69–80 (2014)
Pratama, M., Anavatti, S., Angelov, P., Lughofer, E.: PANFIS: a novel incremental learning machine. IEEE Trans. Neural Netw. Learn. Syst. 25(1), 55–68 (2014)
Lughofer, E., Angelov, P.: Handling drifts and shifts in on-line data streams with evolving fuzzy systems. Appl. Soft Comput. 11(2), 2057–2068 (2011)
Kasabov, N.: Evolving fuzzy neural networks for supervised/unsupervised online knowledge-based learning. IEEE Trans. Syst. Man Cybern. B Cybern. 31(6), 902–918 (2001)
Angelov, P., Lughofer, E., Zhou, X.: Evolving fuzzy classifiers using different model architectures. Fuzzy Sets Syst. 159(23), 3160–3182 (2008)
Faisal, M.A., Aung, Z., Williams, J.R., Sanchez, A.: Data-stream-based intrusion detection system for advanced metering infrastructure in smart grid: a feasibility study. IEEE Syst. J. 9(1), 31–44 (2015)
Domingos P., Hulten G.: Mining high-speed data streams. In: International Conference on Knowledge Discovery and Data Mining, 2000, pp. 71–80. (2000)
Widmer, G., Kubat, M.: Learning in the presence of concept drift and hidden contexts. Mach. Learn. 23(1), 69–101 (1996)
Bifet, A., and Gavalda, R.: Learning from time-changing data with adaptive windowing. In: Proceedings of the 2007 SIAM International Conference on Data Mining, 2007, pp. 443–448. (2007)
Hulten, G., Spencer, L., and Domingos, P.: Mining time-changing data streams. In: Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data, 2001, pp. 97–106. (2001)
Fan, W., Huang, Y., and Yu, P.: Decision tree evolution using limited number of labeled data items from drifting data streams. In: Proceedings of the Fourth IEEE International Conference on Data Mining, 2004, pp. 379–382. (2004)
Liu, J., Li, X., Hong, W.: Ambiguous decision trees for mining concept-drifting data streams. Pattern Recogn. Lett. 30(15), 1347–1355 (2009)
Vivekanandan, P., Nedunchezhian, R.: Mining rules of concept drift using genetic algorithm. J. Artif. Intell. Soft Comput. Res. 1(2), 135–145 (2011)
Wang H., Fan W., Yu V., and Han J.: Mining concept-drifting data streams using ensemble classifiers. In: ACM SIGKDD, Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 226–235. (2003)
Masud M., Gao J., Khan L., Han J., and Thuraisingham, B.: A multi-partition multi-chunk ensemble technique to classify concept drifting data streams. In: Proceedings of the 13th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD'09), pp. 363–375. Springer, Berlin (2009)
Kolter, J., Maloof, M.: Dynamic weighted majority: a new ensemble method for tracking concept drift. J. Mach. Learn. Res. 8, 2755–2790 (2007)
Oza N., Russell S.: Experimental comparisons of online and batch versions of bagging and boosting. In: Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’01), pp. 359–364 (2001)
Pelossof R., Jones M., Vovsha I., Rudin C.: Online coordinate boosting, pp. 1–9. https://arxiv.org/abs/0810.4553 (2008)
Bieft A., Holmes G., Pfahringr B., Kirkby R., Gavalda R.: New ensemble methods for evolving data streams. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’09), Paris, France, pp. 139–148 (2009)
Y. Law, C. Zaniolo, “An Adaptive Nearest Neighbor Classification Algorithm for Data Streams”, In Proceedings of 9th European Conference on Principals and Practice of Knowledge Discovery in Databases (PKDD-2005), Porto, Portugal, Springer-Verlag LNAI 3721, pp 108–120, 2005.
Agrawal, C., Han, J., Wang, J., Yu, P.: A framework for on-demand classification of evolving data streams. IEEE Trans. Knowl. Data Eng. 18(5), 577–589 (2006)
Troyano, F., Ruiz, J., Riquelme, J.: Data streams classification by incremental rule learning with parameterized generalization. In: Proceedings of the 2006 ACM Symposium on Applied computing (SAC’06), France, ACM, pp 657–661 (2006)
Hashemi, S., Yang, Y., Mirzamomen, Z., Kangavari, M.: Adapted one-versus-all decision trees for data stream classification. IEEE Trans. Knowl. Data Eng. 21(5), 624–637 (2009)
Liang, C., Zhang, Y., Song, Q.: Decision tree for dynamic and uncertain data streams. In: Proceedings of 2nd Asian Conference on Machine Learning (ACML2010), Tokyo, Japan, pp. 209–224 (2010)
Li, P., Wu, X., Hu, X.: Mining recurring concept drifts with limited labeled streaming data. In: Proceedings of 2nd Asian Conference on Machine Learning (ACML 2010), Tokyo, Japan, pp. 241–252 (2010)
Li, H., Lee, S.: Mining frequent itemsets over data streams using efficient window sliding techniques. J. Expert Syst. Appl. 36, 1466–1477 (2009)
Zliobaite, I.: Ensemble learning for concept drift handling—the role of new expert. In: Proceedings of the 5th International Conference on Machine Learning and Data Mining in Pattern Recognition (MLDM 2007), Leipzig, Germany, pp. 251–260 (2007)
Abdulsalam, H., Skillicorn, D., Martin, P.: Classification using streaming random forests. IEEE Trans. Knowl. Data Eng. 23(1), 22–36 (2011)
Attar, V., Sinha, P., Wankhade, K.: A fast and light classifier for data streams. Evol. Syst. 1(3), 199–207 (2010)
Lughofer, E.: Dynamic evolving cluster models using on-line split-and-merge operations. In: Proceedings of 10th International Conference on Machine Learning and Applications, IEEE, 2011, pp. 20–26 (2011)
Cao, F., Ester, M., Qian, W., and Zhou, A.: Density-based clustering over an evolving data stream with noise. In: Proceedings of the SIAM Conference on Data Mining, 2006, pp. 326–337 (2006)
Liu, L., Huang, H., Guo, Y., Chen, F.: rDenStream, a clustering algorithm over an evolving data stream. In: 2009 International Conference on Information Engineering and Computer Science, IEEE, 2009. (2009)
Qian, L., and Qin, L.: A framework of cluster decision tree in data stream classification. In: 2012 4th International Conference on Intelligent Human-Machine Systems and Cybernetics (IHMSC), IEEE, vol. 1, 2012, pp. 38–41 (2012)
Sun N., and Guo, Y.: A modified incremental learning approach for data stream classification. In: Sixth International Conference on Internet Computing for Science and Engineering (ICICSE), 2012, pp. 122–125 (2012)
Hosseini, M.J., Ahmadi, Z., and Beigy, H.: Pool and accuracy based stream classification: a new ensemble algorithm on data stream classification using recurring concepts detection. In: IEEE 11th International Conference on Data Mining Workshops (ICDMW), 2011, pp. 588–595 (2011)
Masud, M.M., Gao, J., Khan, L., Bhavani, B.: Classification and novel class detection in concept-drifting data streams under time constraints. IEEE Trans. Knowl. Data Eng. 23(6), 859–874 (2011)
Guha, S., Mishra, N., Motwani, R., O’Callaghan, L.: Clustering data streams, pp. 359–366. Annual IEEE Symposium, Foundations of Computer Science (2000)
O’callaghan, L., Mishra, N., Meyerson, A., Guha, S., and Motwani, R.: Streaming-data algorithms for high-quality clustering. In: Proceedings of the 18th International Conference on Data Engineering (ICDE.02), IEEE, 2002. pp. 685–694 (2002)
Aggarwal, C., Han, J., Wang, J., and Yu, P.: A framework for clustering evolving data streams. In: Proceedings of the 29th International Conference on Very Large Data Bases, 2003, Vol. 29, pp. 81–92 (2003)
Jia, C., Tan, C. Y., Yong, A.: A grid and density-based clustering algorithm for processing data stream. In: Proceedings of Second International Conference on Genetic and Evolutionary Computing, IEEE, 2008, pp. 517–521 (2008)
Fong, S., Wong, R., Vasilakos, A.V.: Accelerated PSO swarm search feature selection for data stream mining big data. IEEE Trans. Serv. Comput. 9(1), 33–45 (2016)
Rutkowski, L., Jaworski, M., Pietruczuk, L., Duda, P.: A new method for data stream mining based on the misclassification error. IEEE Trans. Neural Netw. Learn. Syst. 26(5), 1048–1059 (2015)
Rutkowski, L., Jaworski, M., Pietruczuk, L., Duda, P.: Decision trees for mining data streams based on the Gaussian approximation. IEEE Trans. Knowl. Data Eng. 26(1), 108–119 (2014)
Sun, Y., Tang, K., Minku, L.L., Wang, S., Yao, X.: Online ensemble learning of data streams with gradually evolved classes. IEEE Trans. Knowl. Data Eng. 28(6), 1532–1545 (2016)
Brzezinski, D., Stefanowski, J.: Reacting to different types of concept drift: the accuracy updated ensemble algorithm. IEEE Trans. Neural Netw. Learn. Syst. 25(1), 81–94 (2014)
Gomes, J.B., Gaber, M.M., Sousa, P.A.C., Menasalvas, E.: Mining recurring concepts in a dynamic feature space. IEEE Trans. Neural Netw. Learn. Syst. 25(1), 95–110 (2014)
Zhang, L., Lin, J., Karim, R.: Sliding window-based fault detection from high-dimensional data streams. IEEE Trans. Syst. Man Cybern. Syst. 47(2), 289–303 (2017)
Salehi, M., Leckie, C., Bezdek, J.C., Vaithianathan, T., Zhang, X.: Fast memory efficient local outlier detection in data streams. IEEE Trans. Knowl. Data Eng. 28(12), 3246–3260 (2016)
Zhang, P., Zhou, C., Wang, P., Gao, B.J., Zhu, X., Guo, L.: E-Tree: an efficient indexing structure for ensemble models on data streams. IEEE Trans. Knowl. Data Eng. 27(2), 461–474 (2015)
Qahtan, A., Wang, S., Zhang, X.: KDE-track: an efficient dynamic density estimator for data streams. IEEE Trans. Knowl. Data Eng. 29(3), 642–655 (2017)
Al-Khateeb, T., Masud, M.M., Al-Naami, K.M., Seker, S.E., Mustafa, A.M., Khan, L., Trabelsi, Z., Aggarwal, C., Han, J.: Recurring and novel class detection using class-based ensemble for evolving data stream. IEEE Trans. Knowl. Data Eng. 28(10), 2752–2764 (2016)
Li, X., Yu, W., Villegas, S.: Structural health monitoring of building structures with online data mining methods. IEEE Syst. J. 10(3), 1291–1300 (2016)
Chen, X., Vorvoreanu, M., Madhavan, K.: Mining social media data for understanding student's learning experiences. IEEE Trans. Learn. Technol. 7(3), 246–259 (2014)
Zhang, Q., Zhang, P., Long, G., Ding, W., Zhang, C., Wu, X.: Online learning from trapezoidal data streams. IEEE Trans. Knowl. Data Eng. 28(10), 2709–2723 (2016)
Liu, S., Qu, Q., Chen, L., Ni, L.M.: SMC: a practical schema for privacy-preserved data sharing over distributed data streams. IEEE Trans. Big Data 1(2), 68–81 (2015)
Canzian, L., Van Der Schaar, M.: Real-time stream mining: online knowledge extraction using classifier networks. IEEE Netw. 29(5), 10–16 (2015)
Tekin, C., van der Schaar, M.: Active learning in context-driven stream mining with an application to image mining. IEEE Trans. Image Process. 24(11), 3666–3679 (2015)
de Faria, E.R., Gonçalves, I.R., Gama, J., de Leon Ferreira, A.C.P.C.: Evaluation of multiclass novelty detection algorithms for data streams. IEEE Trans. Knowl. Data Eng. 27(11), 2961–2973 (2015)
Hahsler, M., Bolaños, M.: Clustering data streams based on shared density between micro-clusters. IEEE Trans. Knowl. Data Eng. 28(6), 1449–1461 (2016)
Liu, B., Xiao, Y., Yu, P.S., Cao, L., Zhang, Y., Hao, Z.: Uncertain one-class learning and concept summarization learning on uncertain data streams. IEEE Trans. Knowl. Data Eng. 26(2), 468–484 (2014)
Dyer, K.B., Capo, R., Polikar, R.: COMPOSE: a semi-supervised learning framework for initially labeled nonstationary streaming data. IEEE Trans. Neural Netw. Learn. Syst. 25(1), 12–26 (2014)
UCI machine learning repository. https://archive.ics.uci.edu/ml/datasets.html
He, H., Garcia, E.A.: Learning from imbalanced data. IEEE Trans. Knowl. Data Eng. 21(9), 1263–1284 (2009)
Galar, M., Fernandez, A., Barrenechea, E., Bustince, H., Herrera, F.: A review on ensembles for the class imbalance problem: bagging-, boosting-, and hybrid-based approaches. IEEE Trans. Syst. Man Cybern. Part C Appl. Rev. 42(4), 463–484 (2011)
Wang, S., Yao, X.: Multiclass imbalance problems: analysis and potential solutions. IEEE Trans. Syst. Man Cybern. Part B Cybern. 42(4), 1119–1130 (2012)
Azaria, A., Richardson, A., Kraus, S., Subrahmanian, V.S.: Behavioral analysis of insider threat: a survey and bootstrapped prediction in imbalanced data. IEEE Trans. Comput. Soc. Syst. 1(2), 135–155 (2014)
Cano, A., Zafra, A., Ventura, S.: Weighted data gravitation classification for standard and imbalanced data. IEEE Trans. Cybern. 43(6), 1672–1687 (2013)
Aksoylar, C., Qian, J., Saligrama, V.: Clustering and community detection with imbalanced clusters. IEEE Trans. Signal Inf. Process. Over Netw. 3(1), 61–76 (2017)
Bae, S., Yoon, K.: Polyp detection via imbalanced learning and discriminative feature learning. IEEE Trans. Med. Imaging 34(11), 2379–2393 (2015)
Pérez-Ortiz, M., Gutiérrez, P.A., Hervás-Martínez, C., Yao, X.: Graph-based approaches for over-sampling in the context of ordinal regression. IEEE Trans. Knowl. Data Eng. 27(5), 1233–1245 (2015)
Wang, F., Xu, T., Tang, T., Zhou, M., Wang, H.: Bilevel feature extraction-based text mining for fault diagnosis of railway systems. IEEE Trans. Intell. Transp. Syst. 18(1), 49–58 (2017)
Lee, T., Lee, K.B., Kim, C.O.: Performance of machine learning algorithms for class-imbalanced process fault detection problems. IEEE Trans. Semicond. Manuf. 29(4), 436–445 (2016)
Khreich, W., Granger, E., Miri, A., Sabourin, R.: Iterative Boolean combination of classifiers in the roc space: an application to anomaly detection with HMMs. J Pattern Recogn. 43(8), 2732–2752 (2010)
Tavallaee, M., Stakhanova, N., Ghorbani, A.: Toward credible evaluation of anomaly-based intrusion–detection methods. IEEE Trans. Syst. Man Cybern. Part C Appl. Rev. 40(5), 516–524 (2010)
Huda, S., Yearwood, J., Jelinek, H.F., Hassan, M.M., Fortino, G., Buckland, M.: A hybrid feature selection with ensemble classification for imbalanced healthcare data: a case study for brain tumor diagnosis. IEEE Access 4, 9145–9154 (2016)
Dai, H.: Imbalanced protein data classification using ensemble FTM-SVM. IEEE Trans. NanoBiosci 14(4), 350–359 (2015)
Liu, N., Koh, Z.X., Chua, E.C., Tan, L.M., Lin, Z., Mirza, B., Ong, M.E.H.: Risk scoring for prediction of acute cardiac complications from imbalanced clinical data. IEEE J. Biomed. Health Inf. 18(6), 1894–1902 (2014)
Yu, H., Ni, J.: An improved ensemble learning method for classifying high-dimensional and imbalanced biomedicine data. IEEE/ACM Trans. Comput. Biol. Bioinform. 11(4), 657–666 (2014)
Chen, P., Hu, S., Zhang, J., Gao, X., Li, J., Xia, J., Wang, B.: A sequence-based dynamic ensemble learning system for protein ligand-binding site prediction. IEEE/ACM Trans. Comput. Biol. Bioinform. 13(5), 901–912 (2016)
Cuendet, G.L., Schoettker, P., Yüce, A., Sorci, M., Gao, H., Perruchoud, C., Thiran, J.: Facial image analysis for fully automatic prediction of difficult endotracheal intubation. IEEE Trans. Biomed. Eng. 63(2), 328–339 (2016)
Phua, C., Alahakoon, D., Lee, V.: Minority report in fraud detection: classification of skewed data. ACM SIGKDD Explor. Newsl. 6(1), 50–59 (2004)
del Castillo, M.D., Serrano, J.I.: A multi strategy approach for digital text categorization from imbalanced documents. ACM SIGKDD Explor. Newsl. 6(1), 70–79 (2004)
Ling, C. X., Li, C.: Data mining for direct marketing: problems and solutions. In: The Proceedings of 4th International Conference on Knowledge Discovery and Data Mining (KDD), 1998, pp. 73–79 (1998)
Bermejo, P., Gámez, J.A., Puerta, J.M.: Improving the performance of naive Bayes multinomial in e-mail foldering by introducing distribution based balance of datasets. J. Expert Syst. Appl. 38(3), 2072–2080 (2011)
Liu, Y.H., Chen, Y.T.: Total margin-based adaptive fuzzy support vector machines for multiview face recognition. Proc. IEEE Int. Conf. Syst. Man Cybern. 2, 1704–1711 (2005)
Kim, S., Kim, H., Namkoong, Y.: Ordinal classification of imbalanced data with application in emergency and disaster information services. IEEE Intell. Syst. 31(5), 50–56 (2016)
Sanz, J.A., Bernardo, D., Herrera, F., Bustince, H., Hagras, H.: A compact evolutionary interval-valued fuzzy rule-based classification system for the modeling and prediction of real-world financial applications with imbalanced data. IEEE Trans. Fuzzy Syst. 23(4), 973–990 (2015)
Cao, H., Tan, V.Y.F., Pang, J.Z.F.: A parsimonious mixture of Gaussian trees model for oversampling in imbalanced and multimodal time-series classification. IEEE Trans. Neural Netw. Learn. Syst. 25(12), 2226–2239 (2014)
Pérez-Ortiz, M., Gutiérrez, P.A., Tino, P., Hervás-Martínez, C.: Oversampling the minority class in the feature space. IEEE Trans. Neural Netw. Learn. Syst. 27(9), 1947–1961 (2016)
Amin, A., Anwar, S., Adnan, A., Nawaz, M., Howard, N., Qadir, J., Hawalah, A., Hussain, A.: Comparing oversampling techniques to handle the class imbalance problem: a customer churn prediction case study. IEEE Access 4, 7940–7957 (2016)
Abdi, L., Hashemi, S.: To Combat multi-class imbalanced problems by means of over-sampling techniques. IEEE Trans. Knowl. Data Eng. 28(1), 238–251 (2016)
Chawla, N.V., Hall, L.O., Bowyer, K.W.: SMOTE: synthetic minority oversampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)
Wankhade, K., Dongre, S.: A new adaptive ensemble boosting classifier for concept drifting stream data. Int. J. Model. Optim. (IJMO) 2(4), 488–492 (August 2012)
Wang, B., Pineau, J.: Online bagging and boosting for imbalanced data streams. IEEE Trans. Knowl. Data Eng. 28(12), 3353–3366 (2016)
Breiman, L.: Bagging predictors. J Mach. Learn. 24(2), 123–140 (1996)
Zhu, X.: Semi-supervised learning literature survey. Technical Report TR-1530, University of Wisconsin-Madison, 2007
Rosenberg, C., Hebert, M., Schneiderman, H.: Semi-supervised self-training of object detection models. In: The Proceeding of IEEE Workshops Application of Computer Vision, 2005, pp. 29–36 (2005)
Sindhwani, V., Keerthi, S. S.: Large scale semi-supervised linear SVMs. In: The Proceeding of International SIGIR Conference on Research and Development in Information Retrieval, 2006, pp. 477–484 (2006)
Fujino, A., Ueda, N., Saito, K.: A hybrid generative/discriminative approach to semi-supervised classifier design. In: The Proceeding of National Conference on Artificial Intelligence, 2005, pp. 764–769 (2005)
Lin, S., Wang, C., Wu, Z., Chung, Y.: Detect rare events via MICE algorithm with optimal threshold. In: The Proceeding of Seventh International Conference on Innovative Mobile and Internet Services in Ubiquitous Computing, IEEE, 2013, pp. 70–75 (2013)
Seiffert, C., Khoshgoftaar, T.M., Hulse, J.V., Napolitano, A.: RUSBoost: a hybrid approach to alleviating class imbalance. IEEE Trans. Syst. Man Cybern. Part A Syst. Hum. 40(1), 185–197 (2010)
Oh, S., Lee, M.S., Zhang, B.: Ensemble learning with active example selection for imbalanced biomedical data classification. IEEE/ACM Trans. Comput. Biol. Bioinform. 8(2), 316–325 (2011)
Yang, P., Yoo, P.D., Fernando, J., Zhou, B.B., Zhang, Z., Zomaya, A.Y.: Sample subset optimization techniques for imbalanced and ensemble learning problems in bioinformatics applications. IEEE Trans. Cybern. 44(3), 445–455 (2014)
Ditzler, G., Polikar, R.: Incremental learning of concept drift from streaming imbalanced data. IEEE Trans. Knowl. Data Eng. 25(10), 2283–2301 (2013)
Sun, Y., Kamel, M. S., Wang, Y.: Boosting for learning multiple classes with imbalanced class distribution. In: The Proceeding of Sixth International Conference on Data Mining (ICDM), 2006, pp. 592–602 (2006)
Huang, K., Kuo, Y., Yeh, I.: A novel fitness function in genetic algorithms to optimize neural networks for imbalanced data sets. In: The Proceeding of the Eighth International Conference on Intelligent Systems Design and Application, IEEE, 2008, pp. 647–650 (2008)
Wang, S., Minku, L.L., Yao, X.: Resampling-based ensemble methods for online class imbalance learning. IEEE Trans. Knowl. Data Eng. 27(5), 1356–1368 (2015)
Ahumada, H., Grinblat, G. L., Uzal, L. C., Granitto, P. M., Ceccatto, A.: REPMAC: A new hybrid approach to highly imbalanced classification problems. In: The Proceeding of Eighth International Conference on Hybrid Intelligent Systems, IEEE, 2008, pp. 386–391 (2008)
Jeatrakul, P., Wong, K.W.: Enhancing Classification Performance of Multi-Class Imbalanced Data Using the OAA-DB Algorithm, In the proceeding of IEEE World Congress on Computational Intelligence (WCCI), pp. 1–8. Brisbane, IEEE (2012)
Tan, S. C., Watada, J., Ibrahim, Z., Khalid, M., Jau, L. W., Chew, L. C.: Learning with imbalanced datasets using fuzzy ARTMAP-based neural network models. In: The Proceeding of 2011 IEEE International Conference on Fuzzy Systems, 2011, Taiwan, pp. 1084–1089 (2011)
Cao, P., Li, B., Zhao, D., Zaiane, O.: A novel cost sensitive neural network ensemble for multiclass imbalance data learning. In: The Proceeding of International Joint Conference on Neural Networks (IJCNN), IEEE, 2013, pp. 1–8 (2013)
Fu, J., Lee, S.: Certainty-enhanced active learning for improving imbalanced data classification. In: The Proceeding of 11th IEEE International Conference on Data Mining Workshops, IEEE, 2011, pp. 405–412 (2011)
Antwi, D. K., Viktor, H. L., Japkowicz, N.: The PerfSim algorithm for concept drift detection in imbalanced data. In: The Proceeding of 12th IEEE International Conference on Data Mining Workshops, IEEE, 2012, pp. 619–628 (2012)
Alhammady, H., Ramamohanarao, K.: Using emerging patterns and decision trees in rare-class classification. In: Proceedings of the Fourth IEEE International Conference on Data Mining (ICDM’04), 2004, pp. 315–318 (2004)
Wang, P., Wang, H., Wu, X., Wang, W., Shi, B.: A low-granularity classifier for data streams with concept drifts and biased class distribution. IEEE Trans. Knowl. Data Eng. 19(9), 1202–1213 (2007)
Orriols-Puig, A., Bernadó-Mansilla, E., Goldberg, D.E., Sastry, K., Lanzi, P.L.: Facetwise analysis of XCS for problems with class imbalances. IEEE Trans. Evol. Comput. 13(5), 1093–1119 (2009)
He, J., Tong, H., Carbonell, J.: Rare category characterization. In: The Proceeding of IEEE International Conference on Data Mining, 2010, pp. 226–235 (2010)
Hospedales, T.M., Gong, S., Xiang, T.: Finding rare classes: active learning with generative and discriminative models. IEEE Trans. Knowl. Data Eng. 25(2), 374–386 (2013)
Huang, K., Yang, H., King, I., Lyu, M.R.: Imbalanced learning with a biased minimax probability machine. IEEE Trans. Syst. Man Cybern. Cybern. 36(4), 913–923 (2006)
Su, C., Hsiao, Y.: An evaluation of the robustness of MTS for imbalanced data. IEEE Trans. Knowl. Data Eng. 19(10), 1321–1332 (2007)
Diamantini, C., Potena, D.: Bayes vector quantizer for class-imbalance problem. IEEE Trans. Knowl. Data Eng. 21(5), 638–651 (2009)
Castro, C.L., Braga, A.P.: Novel cost-sensitive approach to improve the multilayer perceptron performance on imbalanced data. IEEE Trans. Neural Netw. Learn. Syst. 24(6), 888–899 (2013)
Kwak, J., Lee, T., Kim, C.O.: An incremental clustering-based fault detection algorithm for class-imbalanced process data. IEEE Trans. Semicond. Manuf. 28(3), 1–11 (2015)
Zhang, X., Hu, B.: A new strategy of cost-free learning in the class imbalance problem. IEEE Trans. Knowl. Data Eng. 26(12), 2872–2885 (2014)
Das, B., Krishnan, N.C., Cook, D.J.: RACOG and wRACOG: two probabilistic oversampling techniques. IEEE Trans. Knowl. Data Eng. 27(1), 222–234 (2015)
Weiss, G.M.: Mining with rarity: a unifying framework. ACM SIGKDD Explor. Newsl. 6(1), 7–19 (2004)
Sun, Y., Kamel, M.S., Wong, A.K.C., Wang, Y.: Cost-sensitive boosting for classification of imbalanced data. Pattern Recogn. 40(12), 3358–3378 (2007)
Provost, F. J., Fawcett, T.: Analysis and visualization of classifier performance: comparison under imprecise class and cost distributions. In: The Proceeding of International Conference on Knowledge Discovery and Data Mining, 1997, pp. 43–48 (1997)
Provost, F. J., Fawcett, T., Kohavi, R.: The case against accuracy estimation for comparing induction algorithms. In: The Proceeding of International Conference on Machine Learning, 1998, pp. 445–453 (1998)
Davis, J., Goadrich, M.: The relationship between precision-recall and ROC curves. In: The Proceeding of International Conference on Machine Learning, 2006, pp. 233–240 (2006)
Bunescu, R., Ge, R., Kate, R., Marcotte, E., Mooney, R., Ramani, A., Wong, Y.: Comparative experiments on learning information extractors for proteins and their interactions. Artif. Intell. Med. 33, 139–155 (2005)
Weiss, G.M., Provost, F.: Learning when training data are costly: the effect of class distribution on tree induction. J. Artif. Intell. Res. 19, 315–354 (2003)
Estabrooks, A., Jo, T., Japkowicz, N.: A multiple resampling method for learning from imbalanced data sets. Int. J. Comput. Intell. 20(1), 18–36 (2004)
NIST Scientific and Technical Databases. https://nist.gov/srd/online.htm (2009)
Park, S., Ha, Y.: Large imbalance data classification based on MapReduce for traffic accident prediction. In: The Proceeding of Eighth International Conference on Innovative Mobile and Internet Services in Ubiquitous Computing, 2014, pp. 45–49 (2014)
Ngiam, J., Khosla, A., Kim, M., Nam, J., Lee, H., and Ng, A. Y.: Multimodal deep learning. In: The Proceeding of 28th International Conference Machine Learning, 2011, pp. 689–696 (2011)
Srivastava, N., and Salakhutdinov, R.: Multimodal learning with deep Boltzmann machines. In: Proceeding of the Advance Neural Information Processing Systems, pp. 2222–2230 (2012)
Zhang, Q., Yang, L.T., Chen, Z.: Deep computation model for unsupervised feature learning on big data. IEEE Trans. Serv. Comput. 9(1), 161 (2016)
Wankhade, K., Jondhale, K., Thool, V.: A hybrid approach for classification of rare class data. Knowl. Inf. Syst. 56(1), 197–221 (2018)
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Wankhade, K.K., Dongre, S.S. & Jondhale, K.C. Data stream classification: a review. Iran J Comput Sci 3, 239–260 (2020). https://doi.org/10.1007/s42044-020-00061-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s42044-020-00061-3