Abstract
The plethora of existing methods in the streaming environment is sensitive to extensive and high-dimensional data. The distribution of these streaming data may change concerning time, known as concept drift. Several drift detectors are built to identify the drift near its occurrence point. Still, they lack proper attention to determine the feature relevance change over time, known as feature drift. Over time, the distribution change of the relevant features subset or the change in the relevant features subset itself may cause feature drift in the data stream. The paper proposes an adaptive principal component analysis based feature drift detection method (PCA-FDD) using the statistical measure to determine the feature drift. The proposed work presents a framework for identifying the most important features subset, feature drift, and incremental adaptation of the prediction model. The proposed method finds the relevant features subset by utilizing the incremental PCA and detects feature drift by observing the change in the percentage similarities among the most important features subset with respect to time. It also helps to forecast the prediction error of the base learning model. The proposed method is compared with state-of-the-art methods using synthetic and real-time datasets. The evaluation results exhibit that the proposed work performs better than the existing compared methods in terms of classification accuracy.
Similar content being viewed by others
Data availability
Not applicable
Code availability
Not applicable
References
Agrahari, S., Singh, A.K.: Concept drift detection in data stream mining: a literature review. J. King Saud Univ. (2021)
Agrahari, S., Singh, A.K.: Disposition-based concept drift detection and adaptation in data stream. Arab. J. Sci. Eng. (2022). https://doi.org/10.1007/s13369-022-06653-4
Hammoodi, M., Stahl, F., Tennant, M.: Towards online concept drift detection with feature selection for data stream classification (2016)
Cavalcante, R.C. , Minku, L.L. , Oliveira, A.L.: Fedd: feature extraction for explicit concept drift detection in time series. In: 2016 International Joint Conference on Neural Networks (IJCNN). IEEE, pp. 740–747 (2016)
Barddal, J.P., Enembreck, F., Gomes, H.M., Bifet, A., Pfahringer, B.: Merit-guided dynamic feature selection filter for data streams. Expert Syst. Appl. 116, 227–242 (2019)
Hammoodi, M.S., Stahl, F., Badii, A.: Real-time feature selection technique with concept drift detection using adaptive micro-clusters for data stream mining. Knowl. Based Syst. 161, 205–239 (2018)
Zhou, P., Hu, X., Li, P., Wu, X.: Ofs-density: a novel online streaming feature selection method. Pattern Recogn. 86, 48–61 (2019)
BenSaid, F., Alimi, A.M.: Online feature selection system for big data classification based on multi-objective automated negotiation. Pattern Recogn. 110, 107629 (2021)
Turkov, P., Krasotkina, O., Mottl, V., Sychugov, A.: Feature selection for handling concept drift in the data stream classification. In: International conference on machine learning and data mining in pattern recognition. Springer, pp. 614–629 (2016)
Li, W., Yue, H.H., Valle-Cervantes, S., Qin, S.J.: Recursive PCA for adaptive process monitoring. J. Process Control 10(5), 471–486 (2000)
Yu, L., Liu, H.: Feature selection for high-dimensional data: A fast correlation-based filter solution. In: Proceedings of the 20th international conference on machine learning (ICML-03), pp. 856–863 (2003)
Barddal, J.P., Gomes, H.M., Enembreck, F., Pfahringer, B.: A survey on feature drift adaptation: definition, benchmark, challenges and future directions. J. Syst. Softw. 127, 278–294 (2017)
Korycki, L., Krawczyk, B.: Unsupervised drift detector ensembles for data stream mining. In: 2019 IEEE international conference on data science and advanced analytics (DSAA). IEEE, pp. 317–325 (2019)
Nguyen, H.-L., Woon, Y.-K., Ng, W.-K., Wan, L.: Heterogeneous ensemble for feature drifts in data streams. In: Pacific-Asia conference on knowledge discovery and data mining. Springer, pp. 1–12 (2012)
Ditzler, G., Polikar, R.: Hellinger distance based drift detection for nonstationary environments. In: IEEE symposium on computational intelligence in dynamic and uncertain environments (CIDUE). IEEE, pp. 41–48 (2011)
Sethi, T.S., Kantardzic, M.: On the reliable detection of concept drift from streaming unlabeled data. Expert Syst. Appl. 82, 77–99 (2017)
Ding, F., Luo, C.: The entropy-based time domain feature extraction for online concept drift detection. Entropy 21(12), 1187 (2019)
Gözüaçık, Ö., Büyükçakır, A., Bonab, H., Can, F.: Unsupervised concept drift detection with a discriminative classifier. In: Proceedings of the 28th ACM international conference on information and knowledge management, pp. 2365–2368 (2019)
Gözüaçık, Ö., Can, F.: Concept learning using one-class classifiers for implicit drift detection in evolving data streams. Artif. Intell. Rev. 54(5), 3725–3747 (2021)
Demšar, J.: Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7, 1–30 (2006)
Funding
Not applicable
Author information
Authors and Affiliations
Contributions
Not applicable
Corresponding author
Ethics declarations
Conflict of interest
On behalf of all authors, the corresponding author states that there is no conflict of interest.
Ethical statement
Not applicable
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Agrahari, S., Singh, A.K. Adaptive PCA-based feature drift detection using statistical measure. Cluster Comput 25, 4481–4494 (2022). https://doi.org/10.1007/s10586-022-03695-z
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10586-022-03695-z