Abstract
Many researchers have applied clustering to handle semi-supervised classification of data streams with concept drifts. However, the generalization ability for each specific concept cannot be steadily improved, and the concept drift detection method without considering the local structural information of data cannot accurately detect concept drifts. This paper proposes to solve these problems by BIRCH (Balanced Iterative Reducing and Clustering Using Hierarchies) ensemble and local structure mapping. The local structure mapping strategy is utilized to compute local similarity around each sample and combined with semi-supervised Bayesian method to perform concept detection. If a recurrent concept is detected, a historical BIRCH ensemble classifier is selected to be incrementally updated; otherwise a new BIRCH ensemble classifier is constructed and added into the classifier pool. The extensive experiments on several synthetic and real datasets demonstrate the advantage of the proposed algorithm.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Liu Q, Ma H P, Chen E H, Xiong H. A survey of context-aware mobile recommendations. International Journal of Information Technology & Decision Making, 2013, 12(1): 139-172.
Li Y, Si J, Zhou G J, Chen S C. FREL: A stable feature selection algorithm. IEEE Transactions on Neural Networks and Learning Systems, 2014, 26(7): 1388-1402.
Peng Y, Lu B L. Discriminative extreme learning machine with supervised sparsity preserving for image classification. Neurocomputing, 2017, 261: 242-252.
Li Y, Li T, Liu H. Recent advances in feature selection and its applications. Knowledge and Information Systems, 2017, 53(3): 551-577.
Li Y F, Liang D M. Safe semi-supervised learning: A brief introduction. Frontiers of Computer Science, 2019, 13(4): 669-676.
Noorbehbahani F, Fanian A, Mousavi S R, Hasannejad H. An incremental intrusion detection system using a new semi-supervised stream classification method. International Journal of Communication Systems, 2017, 30(4): 1-26.
Sedhai S, Sun A. Semi-supervised spam detection in Twitter stream. IEEE Transactions on Computational Social Systems, 2017, 5(1): 169-175.
Haque A, Khan L, Baron M. SAND: Semi-supervised adaptive novel class detection and classification over data stream. In Proc. the 30th AAAI Conference on Artificial Intelligence, February 2016, pp.1652-1658.
Haque A, Khan L, Baron M, Thuraisingham B M, Aggarwal C C. Efficient handling of concept drift and concept evolution over stream data. In Proc. the 32nd International Conference on Data Engineering, May 2016, pp.481-492.
Wang Y, Li T. Improving semi-supervised co-forest algorithm in evolving data streams. Applied Intelligence, 2018, 48(10): 3248-3262.
Hosseini M J, Gholipour A, Beigy H. An ensemble of cluster-based classifiers for semi-supervised classification of non-stationary data streams. Knowledge and Information Systems, 2016, 46(3): 567-597.
Wu X D, Li P P, Hu X G. Learning from concept drifting data streams with unlabeled data. Neurocomputing, 2012, 92: 145-155.
Li P P, Wu X D, Hu X G. Mining recurring concept drifts with limited labeled streaming data. ACM Transactions on Intelligent Systems and Technology, 2012, 3(2): Article No. 32.
Masud M M, Gao J, Khan L et al. A practical approach to classify evolving data streams: Training with limited amount of labeled data. In Proc. the 8th IEEE International Conference on Data Mining, December 2008, pp.929-934.
Masud M M, Woolam C, Gao J et al. Facing the reality of data stream classification: Coping with scarcity of labeled data. Knowledge and Information Systems, 2012, 33(1): 213-244.
Xu W H, Qin Z, Chang Y. Semi-supervised learning based ensemble classifier for stream data. Pattern Recognition and Artificial Intelligence, 2012, 25(2): 292-299. (in Chinese)
Zhang P, Zhu X Q, Tan J L, Guo L. Classifier and cluster ensembles for mining concept drifting data streams. In Proc. the 10th IEEE International Conference on Data Mining, December 2010, pp.1175-1180.
Zhang T, Ramakrishnan R, Livny M. BIRCH: A new data clustering algorithm and its applications. Data Mining and Knowledge Discovery, 1997, 1(2): 141-182.
Gao J, Fan W, Jiang J, Han J. Knowledge transfer via multiple model local structure mapping. In Proc. the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, August 2008, pp.283-291.
Li Y C, Wang Y L, Liu Q et al. Incremental semi-supervised learning on streaming data. Pattern Recognition, 2019, 88: 383-396.
Zhou Z H. When semi-supervised learning meets ensemble learning. Frontiers of Electrical and Electronic Engineering in China, 2011, 6(1): 6-16.
Zhang M L, Zhou Z H. Classifier ensemble with unlabeled data. arXiv:0909.3593, 2009. https://arxiv.org/abs/0909.3593, August 2010.
Zhang M L, Zhou Z H. Exploiting unlabeled data to enhance ensemble diversity. Data Mining and Knowledge Discovery, 2013, 26(1): 98-129.
Bifet A, Holmes G, Kirkby R, Pfahringer B. MOA: Massive online analysis. Journal of Machine Learning Research, 2010, 11: 1601-1604.
Author information
Authors and Affiliations
Corresponding author
Electronic supplementary material
ESM 1
(PDF 508 kb)
Rights and permissions
About this article
Cite this article
Wen, YM., Liu, S. Semi-Supervised Classification of Data Streams by BIRCH Ensemble and Local Structure Mapping. J. Comput. Sci. Technol. 35, 295–304 (2020). https://doi.org/10.1007/s11390-020-9999-y
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11390-020-9999-y