Skip to main content
Log in

UWFP-Outlier: an efficient frequent-pattern-based outlier detection method for uncertain weighted data streams

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

In this paper, we propose an efficient frequent-pattern-based outlier detection method, namely, UWFP-Outlier, for identifying the implicit outliers from uncertain weighted data streams. For reducing the time cost of the UWFP-Outlier method, in the weighted frequent pattern mining phase, we introduce the concepts of the maximal weight and maximal probability to form a compact anti-monotonic property, thereby reducing the scale of potential extensible patterns. For accurately detecting the outliers, in the outlier detection phase, we design two deviation indices to measure the deviation degree of each transaction in the uncertain weighted data streams by considering more factors that may influence its deviation degree; then, the transactions which have large deviation degrees are judged as outliers. The experimental results indicate that the proposed UWFP-Outlier method can accurately detect the outliers from uncertain weighted data streams with a lower time cost.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  1. Abualigah L, Khader A (2017) Unsupervised text feature selection technique based on hybrid particle swarm optimization algorithm with genetic operators for the text clustering. J Supercomput 73(11):4773–4795

    Article  Google Scholar 

  2. Abualigah L (2019) Feature selection and enhanced krill herd algorithm for text document clustering. Springer, Berlin

    Book  Google Scholar 

  3. Abualigah L, Hanandeh E (2015) Applying genetic algorithms to information retrieval using vector space model. Int J Comput Sci Eng Appl 5(1):19

    Google Scholar 

  4. Fahy C, Yang S, Gongora M (2018) Ant colony stream clustering: A fast density clustering algorithm for dynamic data streams. IEEE Trans Cybernet 49(6):2215–2228

    Article  Google Scholar 

  5. Jia H, Cheung YM (2017) Subspace clustering of categorical and numerical data with an unknown number of clusters. IEEE Trans Neural Networks Learn Syst 29(8):3308–3325

    MathSciNet  Google Scholar 

  6. Tran CT, Zhang M, Andreae P, Xue B, Bui LT (2018) An effective and efficient approach to classification with incomplete data. Knowl-Based Syst 154:1–16

    Article  Google Scholar 

  7. Xu S, Wang J (2017) Dynamic extreme learning machine for data stream classification. Neurocomputing 238:433–449

    Article  Google Scholar 

  8. Zhou T, Han G, Xu X, Han C, Huang Y, Qin J (2019) A learning-based multimodel integrated framework for dynamic traffic flow forecasting. Neural Process Lett 49(1):407–430

    Article  Google Scholar 

  9. Liu Y, Zhang Q, Fan ZP, You TH (2018) Maintenance spare parts demand forecasting for automobile 4S shop considering weather data. IEEE Trans Fuzzy Syst 27(5):943–955

    Article  Google Scholar 

  10. Hawkins DM (1980) Identification of outliers. Chapman and Hall, London

    Book  MATH  Google Scholar 

  11. Kontaki M, Gounaris A, Papadopoulos AN, Tsichlas K (2011) Continuous monitoring of distance-based outliers over data streams. In: IEEE International Conference on Data Engineering. IEEE, pp 135–146

  12. Angiulli F, Fassetti F (2010) Distance-based outlier queries in data streams: the novel task and algorithms. Data Min Knowl Discov 20(2):290–324

    Article  MathSciNet  Google Scholar 

  13. Qin X, Cao L, Rundensteiner EA, Madden S (2019) Scalable Kernel Density Estimation-based Local Outlier Detection over Large Data Streams. In: International Conference on Extending DB Technology. Springer, pp 421–432

  14. Tang B, He H (2017) A local density-based approach for outlier detection. Neurocomputing 241:171–180

    Article  Google Scholar 

  15. Zhang LW, Lin J, Karim R (2018) Adaptive kernel density-based anomaly detection for nonlinear systems. Knowl-Based Syst 139:50–63

    Article  Google Scholar 

  16. Huang J, Zhu Q, Yang L, Cheng DD, Wu Q (2017) A novel outlier cluster detection algorithm without top-n parameter. Knowl-Based Syst 121:32–40

    Article  Google Scholar 

  17. Fan J, Zhang Q, Zhu J, Zhang M, Yang Z, Cao H (2020) Robust deep auto-encoding Gaussian process regression for unsupervised anomaly detection. Neurocomputing 376:180–190

    Article  Google Scholar 

  18. Chen J, Sathe S, Aggarwal C, Turage D (2017) Outlier detection with autoencoder ensembles. In: Proceedings of the 2017 SIAM International Conference on Data Mining. SIAM, pp 90–98

  19. He ZY, Xu XF, Huang JZ, Deng SC (2005) FP-Outlier: Frequent pattern based outlier detection. Comput Sci Inf Syst 2(1):103–118

    Article  Google Scholar 

  20. Feng L, Wang L, Jin B (2010) Research on maximal frequent pattern outlier factor for online high dimensional time-series outlier detection. J Converg Inf Technol 5(10):66–71

    Google Scholar 

  21. Hao S, Cai S, Sun R, Li S (2019) An efficient outlier detection approach over uncertain data stream based on frequent itemset mining. Inf Technol Control 48(1):34–46

    Google Scholar 

  22. Cai S, Li Q, Li S, Yuan G, Sun R (2019) WMFP-Outlier: An efficient maximal frequent-pattern-based outlier detection approach for weighted data streams. Inf Technol Control 48(4):505–521

    Google Scholar 

  23. Hemalatha CS, Vaidehi V, Lakshmi R (2015) Minimal infrequent pattern based approach for mining outliers in data streams. Expert Syst Appl 42(4):1998–2012

    Article  Google Scholar 

  24. Cai S, Sun R, Hao S, Li S, Yuan G (2018) Minimal weighted infrequent itemset mining-based outlier detection approach on uncertain data stream. Neural Comput Appl. https://doi.org/10.1007/s00521-018-3876-4

    Article  Google Scholar 

  25. Cai S, Sun R, Hao S, Li S, Yuan G (2019) An efficient outlier detection approach on weighted data stream based on minimal rare pattern mining. China Commun 16(10):83–99

    Article  Google Scholar 

  26. Cai S, Li S, Yuan G, Hao S, Sun R (2020) MiFI-Outlier: Minimal infrequent itemset-based outlier detection approach on uncertain data stream. Knowl-Based Syst 191:105268

    Article  Google Scholar 

  27. Cao L, Yan Y, Madden S, Rundensteiner EA, Gopalsamy M (2019) Efficient discovery of sequence outlier patterns. Proc VLDB Endowment 12(8):920–932

  28. Djenouri Y, Zimek A, Chiarandini M (2018) Outlier detection in urban traffic flow distributions. In: IEEE International Conference on Data Mining (ICDM). IEEE, pp 935–940

  29. Djenouri Y, Belhadi A, Lin JCW, Djenouri D, Cano A (2019) A survey on urban traffic anomalies detection algorithms. IEEE Access 7:12192–12205

    Article  Google Scholar 

  30. Gupta M, Gao J, Sun Y, Han J (2012) ommunity trend outlier detection using soft temporal pattern mining. Joint European conference on machine learning and knowledge discovery in databases. Springer, Berlin, pp 692–708

    Chapter  Google Scholar 

  31. You L, Peng Q, Xiong Z, He D, Qiu M, Zhang X (2020) Integrating aspect analysis and local outlier factor for intelligent review spam detection. Future Gener Comput Syst 102:163–172

    Article  Google Scholar 

  32. Huynh HM, Nguyen LTT, Vo B, Nguyen A, Tseng VS (2020) Efficient methods for mining weighted clickstream patterns. Expert Syst Appl 142:112993

    Article  Google Scholar 

  33. Djenouri Y, Lin JCW, Nørvåg K, Ramampiaro H (2019) Highly efficient pattern mining based on transaction decomposition. In: IEEE 35th International Conference on Data Engineering (ICDE). IEEE, pp 1646–1649

  34. Djenouri Y, Djenouri D, Belhadi A, Fournier-Viger P, Lin JCW (2018) A new framework for metaheuristic-based frequent itemset mining. Appl Intell 48(12):4775–4791

    Article  MATH  Google Scholar 

  35. Yun U, Leggett JJ (2005) WFIM: Weighted Frequent Itemset Mining with a weight range and a minimum weight. In: SIAM International Conference on Data Mining. SIAM, pp 636–640

  36. Ahmed CF, Tanbeer SK, Jeong BS, Lee YK, Choi HJ (2012) Single-pass incremental and interactive mining for weighted frequent patterns. Expert Syst Appl 39(9):7976–7994

    Article  Google Scholar 

  37. Lin CW, Gan WS, Fournier-Viger P, Hong TP, Tseng V (2016) Weighted frequent itemset mining over uncertain databases. Appl Intell 44(1):232–250

    Article  Google Scholar 

  38. Ahmed AU, Ahmed CF, Samiullah M, Adnan N, Leung KS (2016) Mining interesting patterns from uncertain databases. Inf Sci 354:60–85

    Article  MATH  Google Scholar 

  39. Pervaiz Z, Ghafoor A, Aref WG (2015) Precision-bounded access control using sliding-window query views for privacy-preserving data streams. IEEE Trans Knowl Data Eng 27(7):1992–2004

    Article  Google Scholar 

  40. Shan J, Luo J, Ni G, Wu Z, Duan W (2016) CVS: fast cardinality estimation for large-scale data streams over sliding windows. Neurocomputing 194:107–116

    Article  Google Scholar 

  41. Zhang L, Lin J, Karim R (2017) Sliding window-based fault detection from high-dimensional data streams. IEEE Trans Syst Man Cybern Syst 47(2):289–303

    Google Scholar 

  42. Han J, Pei J, Yin Y (2000) Mining frequent patterns without candidate generation. ACM Sigmod Record 29(2):1–12

    Article  Google Scholar 

Download references

Acknowledgements

This work was supported in part by the Fundamental Research Funds for the Central Universities under Grant No. 2018XD004.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ruizhi Sun.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Cai, S., Li, L., Li, Q. et al. UWFP-Outlier: an efficient frequent-pattern-based outlier detection method for uncertain weighted data streams. Appl Intell 50, 3452–3470 (2020). https://doi.org/10.1007/s10489-020-01718-z

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-020-01718-z

Keywords

Navigation