Abstract
In this paper, we propose an efficient frequent-pattern-based outlier detection method, namely, UWFP-Outlier, for identifying the implicit outliers from uncertain weighted data streams. For reducing the time cost of the UWFP-Outlier method, in the weighted frequent pattern mining phase, we introduce the concepts of the maximal weight and maximal probability to form a compact anti-monotonic property, thereby reducing the scale of potential extensible patterns. For accurately detecting the outliers, in the outlier detection phase, we design two deviation indices to measure the deviation degree of each transaction in the uncertain weighted data streams by considering more factors that may influence its deviation degree; then, the transactions which have large deviation degrees are judged as outliers. The experimental results indicate that the proposed UWFP-Outlier method can accurately detect the outliers from uncertain weighted data streams with a lower time cost.
Similar content being viewed by others
References
Abualigah L, Khader A (2017) Unsupervised text feature selection technique based on hybrid particle swarm optimization algorithm with genetic operators for the text clustering. J Supercomput 73(11):4773–4795
Abualigah L (2019) Feature selection and enhanced krill herd algorithm for text document clustering. Springer, Berlin
Abualigah L, Hanandeh E (2015) Applying genetic algorithms to information retrieval using vector space model. Int J Comput Sci Eng Appl 5(1):19
Fahy C, Yang S, Gongora M (2018) Ant colony stream clustering: A fast density clustering algorithm for dynamic data streams. IEEE Trans Cybernet 49(6):2215–2228
Jia H, Cheung YM (2017) Subspace clustering of categorical and numerical data with an unknown number of clusters. IEEE Trans Neural Networks Learn Syst 29(8):3308–3325
Tran CT, Zhang M, Andreae P, Xue B, Bui LT (2018) An effective and efficient approach to classification with incomplete data. Knowl-Based Syst 154:1–16
Xu S, Wang J (2017) Dynamic extreme learning machine for data stream classification. Neurocomputing 238:433–449
Zhou T, Han G, Xu X, Han C, Huang Y, Qin J (2019) A learning-based multimodel integrated framework for dynamic traffic flow forecasting. Neural Process Lett 49(1):407–430
Liu Y, Zhang Q, Fan ZP, You TH (2018) Maintenance spare parts demand forecasting for automobile 4S shop considering weather data. IEEE Trans Fuzzy Syst 27(5):943–955
Hawkins DM (1980) Identification of outliers. Chapman and Hall, London
Kontaki M, Gounaris A, Papadopoulos AN, Tsichlas K (2011) Continuous monitoring of distance-based outliers over data streams. In: IEEE International Conference on Data Engineering. IEEE, pp 135–146
Angiulli F, Fassetti F (2010) Distance-based outlier queries in data streams: the novel task and algorithms. Data Min Knowl Discov 20(2):290–324
Qin X, Cao L, Rundensteiner EA, Madden S (2019) Scalable Kernel Density Estimation-based Local Outlier Detection over Large Data Streams. In: International Conference on Extending DB Technology. Springer, pp 421–432
Tang B, He H (2017) A local density-based approach for outlier detection. Neurocomputing 241:171–180
Zhang LW, Lin J, Karim R (2018) Adaptive kernel density-based anomaly detection for nonlinear systems. Knowl-Based Syst 139:50–63
Huang J, Zhu Q, Yang L, Cheng DD, Wu Q (2017) A novel outlier cluster detection algorithm without top-n parameter. Knowl-Based Syst 121:32–40
Fan J, Zhang Q, Zhu J, Zhang M, Yang Z, Cao H (2020) Robust deep auto-encoding Gaussian process regression for unsupervised anomaly detection. Neurocomputing 376:180–190
Chen J, Sathe S, Aggarwal C, Turage D (2017) Outlier detection with autoencoder ensembles. In: Proceedings of the 2017 SIAM International Conference on Data Mining. SIAM, pp 90–98
He ZY, Xu XF, Huang JZ, Deng SC (2005) FP-Outlier: Frequent pattern based outlier detection. Comput Sci Inf Syst 2(1):103–118
Feng L, Wang L, Jin B (2010) Research on maximal frequent pattern outlier factor for online high dimensional time-series outlier detection. J Converg Inf Technol 5(10):66–71
Hao S, Cai S, Sun R, Li S (2019) An efficient outlier detection approach over uncertain data stream based on frequent itemset mining. Inf Technol Control 48(1):34–46
Cai S, Li Q, Li S, Yuan G, Sun R (2019) WMFP-Outlier: An efficient maximal frequent-pattern-based outlier detection approach for weighted data streams. Inf Technol Control 48(4):505–521
Hemalatha CS, Vaidehi V, Lakshmi R (2015) Minimal infrequent pattern based approach for mining outliers in data streams. Expert Syst Appl 42(4):1998–2012
Cai S, Sun R, Hao S, Li S, Yuan G (2018) Minimal weighted infrequent itemset mining-based outlier detection approach on uncertain data stream. Neural Comput Appl. https://doi.org/10.1007/s00521-018-3876-4
Cai S, Sun R, Hao S, Li S, Yuan G (2019) An efficient outlier detection approach on weighted data stream based on minimal rare pattern mining. China Commun 16(10):83–99
Cai S, Li S, Yuan G, Hao S, Sun R (2020) MiFI-Outlier: Minimal infrequent itemset-based outlier detection approach on uncertain data stream. Knowl-Based Syst 191:105268
Cao L, Yan Y, Madden S, Rundensteiner EA, Gopalsamy M (2019) Efficient discovery of sequence outlier patterns. Proc VLDB Endowment 12(8):920–932
Djenouri Y, Zimek A, Chiarandini M (2018) Outlier detection in urban traffic flow distributions. In: IEEE International Conference on Data Mining (ICDM). IEEE, pp 935–940
Djenouri Y, Belhadi A, Lin JCW, Djenouri D, Cano A (2019) A survey on urban traffic anomalies detection algorithms. IEEE Access 7:12192–12205
Gupta M, Gao J, Sun Y, Han J (2012) ommunity trend outlier detection using soft temporal pattern mining. Joint European conference on machine learning and knowledge discovery in databases. Springer, Berlin, pp 692–708
You L, Peng Q, Xiong Z, He D, Qiu M, Zhang X (2020) Integrating aspect analysis and local outlier factor for intelligent review spam detection. Future Gener Comput Syst 102:163–172
Huynh HM, Nguyen LTT, Vo B, Nguyen A, Tseng VS (2020) Efficient methods for mining weighted clickstream patterns. Expert Syst Appl 142:112993
Djenouri Y, Lin JCW, Nørvåg K, Ramampiaro H (2019) Highly efficient pattern mining based on transaction decomposition. In: IEEE 35th International Conference on Data Engineering (ICDE). IEEE, pp 1646–1649
Djenouri Y, Djenouri D, Belhadi A, Fournier-Viger P, Lin JCW (2018) A new framework for metaheuristic-based frequent itemset mining. Appl Intell 48(12):4775–4791
Yun U, Leggett JJ (2005) WFIM: Weighted Frequent Itemset Mining with a weight range and a minimum weight. In: SIAM International Conference on Data Mining. SIAM, pp 636–640
Ahmed CF, Tanbeer SK, Jeong BS, Lee YK, Choi HJ (2012) Single-pass incremental and interactive mining for weighted frequent patterns. Expert Syst Appl 39(9):7976–7994
Lin CW, Gan WS, Fournier-Viger P, Hong TP, Tseng V (2016) Weighted frequent itemset mining over uncertain databases. Appl Intell 44(1):232–250
Ahmed AU, Ahmed CF, Samiullah M, Adnan N, Leung KS (2016) Mining interesting patterns from uncertain databases. Inf Sci 354:60–85
Pervaiz Z, Ghafoor A, Aref WG (2015) Precision-bounded access control using sliding-window query views for privacy-preserving data streams. IEEE Trans Knowl Data Eng 27(7):1992–2004
Shan J, Luo J, Ni G, Wu Z, Duan W (2016) CVS: fast cardinality estimation for large-scale data streams over sliding windows. Neurocomputing 194:107–116
Zhang L, Lin J, Karim R (2017) Sliding window-based fault detection from high-dimensional data streams. IEEE Trans Syst Man Cybern Syst 47(2):289–303
Han J, Pei J, Yin Y (2000) Mining frequent patterns without candidate generation. ACM Sigmod Record 29(2):1–12
Acknowledgements
This work was supported in part by the Fundamental Research Funds for the Central Universities under Grant No. 2018XD004.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Cai, S., Li, L., Li, Q. et al. UWFP-Outlier: an efficient frequent-pattern-based outlier detection method for uncertain weighted data streams. Appl Intell 50, 3452–3470 (2020). https://doi.org/10.1007/s10489-020-01718-z
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-020-01718-z