Skip to main content
Log in

Adaptive PCA-based feature drift detection using statistical measure

  • Published:
Cluster Computing Aims and scope Submit manuscript

Abstract

The plethora of existing methods in the streaming environment is sensitive to extensive and high-dimensional data. The distribution of these streaming data may change concerning time, known as concept drift. Several drift detectors are built to identify the drift near its occurrence point. Still, they lack proper attention to determine the feature relevance change over time, known as feature drift. Over time, the distribution change of the relevant features subset or the change in the relevant features subset itself may cause feature drift in the data stream. The paper proposes an adaptive principal component analysis based feature drift detection method (PCA-FDD) using the statistical measure to determine the feature drift. The proposed work presents a framework for identifying the most important features subset, feature drift, and incremental adaptation of the prediction model. The proposed method finds the relevant features subset by utilizing the incremental PCA and detects feature drift by observing the change in the percentage similarities among the most important features subset with respect to time. It also helps to forecast the prediction error of the base learning model. The proposed method is compared with state-of-the-art methods using synthetic and real-time datasets. The evaluation results exhibit that the proposed work performs better than the existing compared methods in terms of classification accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Data availability

Not applicable

Code availability

Not applicable

References

  1. Agrahari, S., Singh, A.K.: Concept drift detection in data stream mining: a literature review. J. King Saud Univ. (2021)

  2. Agrahari, S., Singh, A.K.: Disposition-based concept drift detection and adaptation in data stream. Arab. J. Sci. Eng. (2022). https://doi.org/10.1007/s13369-022-06653-4

    Article  Google Scholar 

  3. Hammoodi, M., Stahl, F., Tennant, M.: Towards online concept drift detection with feature selection for data stream classification (2016)

  4. Cavalcante, R.C. , Minku, L.L. , Oliveira, A.L.: Fedd: feature extraction for explicit concept drift detection in time series. In: 2016 International Joint Conference on Neural Networks (IJCNN). IEEE, pp. 740–747 (2016)

  5. Barddal, J.P., Enembreck, F., Gomes, H.M., Bifet, A., Pfahringer, B.: Merit-guided dynamic feature selection filter for data streams. Expert Syst. Appl. 116, 227–242 (2019)

    Article  Google Scholar 

  6. Hammoodi, M.S., Stahl, F., Badii, A.: Real-time feature selection technique with concept drift detection using adaptive micro-clusters for data stream mining. Knowl. Based Syst. 161, 205–239 (2018)

    Article  Google Scholar 

  7. Zhou, P., Hu, X., Li, P., Wu, X.: Ofs-density: a novel online streaming feature selection method. Pattern Recogn. 86, 48–61 (2019)

    Article  Google Scholar 

  8. BenSaid, F., Alimi, A.M.: Online feature selection system for big data classification based on multi-objective automated negotiation. Pattern Recogn. 110, 107629 (2021)

    Article  Google Scholar 

  9. Turkov, P., Krasotkina, O., Mottl, V., Sychugov, A.: Feature selection for handling concept drift in the data stream classification. In: International conference on machine learning and data mining in pattern recognition. Springer, pp. 614–629 (2016)

  10. Li, W., Yue, H.H., Valle-Cervantes, S., Qin, S.J.: Recursive PCA for adaptive process monitoring. J. Process Control 10(5), 471–486 (2000)

    Article  Google Scholar 

  11. Yu, L., Liu, H.: Feature selection for high-dimensional data: A fast correlation-based filter solution. In: Proceedings of the 20th international conference on machine learning (ICML-03), pp. 856–863 (2003)

  12. Barddal, J.P., Gomes, H.M., Enembreck, F., Pfahringer, B.: A survey on feature drift adaptation: definition, benchmark, challenges and future directions. J. Syst. Softw. 127, 278–294 (2017)

    Article  Google Scholar 

  13. Korycki, L., Krawczyk, B.: Unsupervised drift detector ensembles for data stream mining. In: 2019 IEEE international conference on data science and advanced analytics (DSAA). IEEE, pp. 317–325 (2019)

  14. Nguyen, H.-L., Woon, Y.-K., Ng, W.-K., Wan, L.: Heterogeneous ensemble for feature drifts in data streams. In: Pacific-Asia conference on knowledge discovery and data mining. Springer, pp. 1–12 (2012)

  15. Ditzler, G., Polikar, R.: Hellinger distance based drift detection for nonstationary environments. In: IEEE symposium on computational intelligence in dynamic and uncertain environments (CIDUE). IEEE, pp. 41–48 (2011)

  16. Sethi, T.S., Kantardzic, M.: On the reliable detection of concept drift from streaming unlabeled data. Expert Syst. Appl. 82, 77–99 (2017)

    Article  Google Scholar 

  17. Ding, F., Luo, C.: The entropy-based time domain feature extraction for online concept drift detection. Entropy 21(12), 1187 (2019)

    Article  Google Scholar 

  18. Gözüaçık, Ö., Büyükçakır, A., Bonab, H., Can, F.: Unsupervised concept drift detection with a discriminative classifier. In: Proceedings of the 28th ACM international conference on information and knowledge management, pp. 2365–2368 (2019)

  19. Gözüaçık, Ö., Can, F.: Concept learning using one-class classifiers for implicit drift detection in evolving data streams. Artif. Intell. Rev. 54(5), 3725–3747 (2021)

    Article  Google Scholar 

  20. Demšar, J.: Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7, 1–30 (2006)

    MathSciNet  MATH  Google Scholar 

Download references

Funding

Not applicable

Author information

Authors and Affiliations

Authors

Contributions

Not applicable

Corresponding author

Correspondence to Supriya Agrahari.

Ethics declarations

Conflict of interest

On behalf of all authors, the corresponding author states that there is no conflict of interest.

Ethical statement

Not applicable

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Agrahari, S., Singh, A.K. Adaptive PCA-based feature drift detection using statistical measure. Cluster Comput 25, 4481–4494 (2022). https://doi.org/10.1007/s10586-022-03695-z

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10586-022-03695-z

Keywords

Navigation