Abstract
The emergence and proliferation of the internet of things (IoT) devices have resulted in the generation of big and uncertain data due to the varied accuracy and decay of sensors and their different sensitivity ranges. Since data uncertainty plays an important role in IoT data, mining the useful information from uncertain dataset has become an important issue in recent decades. Past works focus on mining the high sequential patterns from the uncertain database. However, the utility of a derived sequence increases along with the size of the sequence, which is an unfair measure to evaluate the utility of a sequence since any combination of a high-utility sequence will also be the high-utility sequence, even though the utility of a sequence is merely low. In this paper, we address the limitation of the previous potential high-utility sequential pattern mining and present a potentially high average-utility sequential pattern mining framework for discovering the set of potentially high average-utility sequential patterns (PHAUSPs) from the uncertain dataset by considering the size of a sequence, which can provide a fair measure of the patterns than the previous works. First, a baseline potentially high average-utility sequential pattern algorithm and three pruning strategies are introduced to completely mine the set of the desired PHAUSPs. To reduce the computational cost and accelerate the mining process, a projection algorithm called PHAUP is then designed, which leads to a reduction in the size of candidates of the desired patterns. Several experiments in terms of runtime, number of candidates, memory overhead, number of discovered pattern, and scalability are then evaluated on both real-life and artificial datasets, and the results showed that the proposed algorithm achieves promising performance, especially the PHAUP approach.









Similar content being viewed by others
Abbreviations
- ARM:
-
Association rule mining
- auub :
-
Average-utility upper-bound value
- AU list:
-
Average-utility list
- FIM:
-
Frequent itemset mining
- HAUIM:
-
High average-utility itemset mining
- HAUIs:
-
High average-utility itemsets
- HTWUIs:
-
High transaction-weighted utilization itemsets
- HUIM:
-
High-utility itemset mining
- HUIs:
-
High-utility itemsets
- HUSPs:
-
High-utility sequential patterns
- HUSPM:
-
High-utility sequential pattern mining
- PHAUSPM:
-
Potential high average-utility sequential pattern mining
- PHAUSPs:
-
Potential high average-utility sequential patterns
- PHAUB:
-
The designed baseline algorithm
- PHAUP:
-
The designed projection-based algorithm
- PHAUUBDC:
-
Potential high average-utility upper-bound downward closure
- PHAUUBSPs:
-
Potential high average-utility upper-bound sequential patterns
- PHUSPs:
-
Potential high-utility sequential patterns
- SPM:
-
Sequential pattern mining
- SWDC:
-
Sequential weighted downward closure
- suub :
-
Sequence utility upper-bound value
- TWU:
-
Transaction-weighted utility
- TWDC:
-
Transaction-weighted down closure
- UFIM:
-
Frequent itemset mining on uncertain databases
- UFIs:
-
Frequent itemsets in uncertain databases
- \( \mu \) :
-
Minimum expected support threshold
- \( \delta \) :
-
Minimum high average-utility threshold
References
Aggarwal CC, Li Y, Wang J, Wang J (2009) Frequent pattern mining with uncertain data. In: ACM SIGKDD international conference on knowledge discovery and data mining, pp 29–38
Agrawal R, Imielinski T, Swami AA (1990) Database mining: a performance perspective. IEEE Trans Knowl Data Eng 5(6):914–925
Agrawal R, Imielinski T, Swami A (1993) Mining association rules between sets of items in large database. In: ACM SIGMOD international conference on management of data, pp 207–216
Agrawal R, Srikant R (1994) Fast algorithms for mining association rules in large databases. In: International conference on very large data bases, pp 619–624
Agrawal R, Srikant R (1995) Mining sequential patterns. In: IEEE international conference on data engineering, pp 3–14
Ahmed CF, Tanbeer SK, Jeong BS (2010) A novel approach for mining high-utility sequential patterns in sequence databases. ETRI J 32(5):676–686
Alkan OK, Karagoz P (2015) CRoM and HuspExt: improving efficiency of high utility sequential pattern extraction. IEEE Trans Knowl Data Eng 27(10):2645–2657
Bernecker T, Kriegel HP, Renz M, Verhein F, Zue A (2009) Probabilistic frequent itemset mining in uncertain databases. In: ACM SIGKDD international conference on knowledge discovery and data mining, pp 119–128
Bui N, Vo B, Huynh VN, Lin CW, Nguyen LTT (2016) Mining closed high utility itemsets in uncertain databases. In: Proceedings of the 7th symposium on information and communication technology, pp 7–14
Chau M, Cheng R, Kao B (2005) Uncertain data mining: a new research direction. In: The workshop on the sciences of the artificial, pp 1–8
Chui CK, Kao B, Hung E (2007) Mining frequent itemsets from uncertain data. In: The Pacific-Asia conference on knowledge discovery and data mining, pp 47–58
Fournier-Viger P, Lin JCW, Gomariz A, Gueniche T, Soltani A, Deng Z, Lam HT (2016) The SPMF open-source data mining library version 2. In: European conference on principles of data mining and knowledge discovery, pp 36–40
Ge J, Xia Y, Wang J, Nadungodage CH, Prabhakar S (2017) Sequential pattern mining in databases with temporal uncertainty. Knowl Inf Syst 53(3):821–850
Han J, Pei J, Yin Y, Mao R (2004) Mining frequent patterns without candidate generation: a frequent-pattern tree approach. Data Min Knowl Discov 8(1):53–87
Hong TP, Lee CH, Wang SL (2011) Effective utility mining with the measure of average utility. Expert Syst Appl 38(7):8259–8265
Lan GC, Hong TP, Tseng VS (2012) Efficiently mining high average-utility itemsets with an improved upper-bound strategy. Int J Inf Technol Decis Mak 11:1009–1030
Lan GC, Hong TP, Tseng VS, Wang SL (2014) Applying the maximum utility measure in high utility sequential pattern mining. Expert Syst Appl 41(11):5071–5081
Lan Y, Wang Y, Wang Y, Yi S, Yu D (2015) Mining high utility itemsets over uncertain databases. In: International conference on cyber-enabled distributed computing and knowledge discovery, pp 235–238
Leung CKS, Mateo MAF, Brajczuk DA (2008) A tree-based approach for frequent pattern mining from uncertain data. In: The Pacific-Asia conference on knowledge discovery and data mining, pp 653-661
Lin CW, Hong TP, Lu WH (2010) Efficiently mining high average utility itemsets with a tree structure. Lecture Notes Comput Sci 5990:131–139
Lin CW, Hong TP, Lu WH (2011) An effective tree structure for mining high utility itemsets. Expert Syst Appl 38:7419–7424
Lin JCW, Li T, Fournier-Viger P, Hong TP, Zhan J, Voznak M (2016) An efficient algorithm to mine high average-utility itemsets. Adv Eng Inform 30(2):233–243
Lin JCW, Ren S, Fournier-Viger P, Hong TP, Su JH, Vo B (2017) A fast algorithm for mining high average-utility itemsets. Appl Intell 47(2):331–346
Lin JCW, Gan W, Fournier-Viger P, Hong TP (2017) Efficiently mining uncertain high-utility itemsets. Soft Comput 21(11):2801–2820
Lin JCW, Yang L, Fournier-Viger P, Hong TP (2019) Mining of skyline patterns by considering both frequent and utility constraints. Eng Appl Artif Intell 77:229–238
Liu Y, Liao W, Choudhary A (2005) A two-phase algorithm for fast discovery of high utility itemsets. In: The Pacific-Asia conference on knowledge discovery and data mining, pp 689–695
Liu M, Qu J (2012) Mining high utility itemsets without candidate generation. In: ACM international conference on information and knowledge management, pp 55–64
Lu T, Vo B, Nguyen HT, Hong TP (2014) A new method for mining high average utility itemsets. Lecture Notes Comput Sci 8838:33–42
Muzammal M, Gohar M, Rahman AU, Qu Q, Ahmad A, Jeon G (2018) Trajectory mining using uncertain sensor data. IEEE Access 6:4895–4903
Muzammal M, Rajeev (2015) Mining sequential patterns from probabilistic databases. In: The Pacific-Asia conference on knowledge discovery and data mining vol 44(2), pp 325–358
Pei J, Han J, Mortazavi-Asl B, Wang J, Pinto H, Chen Q, Dayal U, Hsu MC (2004) Mining sequential patterns by pattern-growth: the PrefixSpan approach. IEEE Trans Knowl Data Eng 16(11):1424–1440
Sun L, Cheng R, Cheung DW, Cheng J (2010) Mining uncertain data with probabilistic guarantees. In: ACM SIGKDD international conference on knowledge discovery and data mining, pp 273–282
Tong Y, Chen L, Cheng Y, Yu PS (2012) Mining frequent itemsets over uncertain databases. VLDB Endow 5(11):1650–1661
Tseng VS, Shie BE, Wu CW, Yu PS (2013) Efficient algorithms for mining high utility itemsets from transactional databases. IEEE Trans Knowl Data Eng 25:1772–1786
Wang J, Huang J, Chen Y (2016) An efficiently mining high utility sequential patterns. Knowl Inf Syst 49(2):597–627
Wang L, Cheng R, Lee SD, Cheung D (2010) Accelerating probabilistic frequent itemset mining: a model-based approach. In: ACM international conference on information and knowledge management, pp 429–438
Wang J, Liu F, Jin C (2017) PHUIMUS: a potential high utility itemsets mining algorithm based on stream data with uncertainty. Math Problems Eng, vol 2017, Article ID 8576829, p 13
Yao H, Hamilton HJ (2006) Mining itemset utilities from transaction databases. Data Knowl Eng 59(3):603–626
Yao H, Hamilton HJ, Butz CJ (2004) A foundational approach to mining itemset utilities from databases. In: SIAM international conference on data mining, pp 211–225
Yin J, Zheng Z, Cao L (2012) USpan: an efficient algorithm for mining high utility sequential patterns. In: ACM SIGKDD international conference on knowledge discovery and data mining, pp 660–668
Yin J, Zheng Z, Cao L, Song Y, Wei W (2013) Efficiently mining top-k high utility sequential patterns. In: IEEE international conference on data mining, pp 1259–1264
Zhang B, Lin JCW, Fournier-Viger P, Li T (2017) Mining of high utility-probability sequential patterns from uncertain databases. PLoS One 12(7):1–21
Zhao Z, Yan D, Ng W (2014) Mining probabilistically frequent sequential patterns in large uncertain databases. IEEE Trans Knowl Data Eng 26(5):1171–1184
Zida S, Fournier-Viger P, Wu CW, Lin JCW, Tseng VS (2015) Efficient mining of high-utility sequential rules. In: International workshop on machine learning and data mining in pattern recognition, pp 157–171
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Lin, J.CW., Li, T., Pirouz, M. et al. High average-utility sequential pattern mining based on uncertain databases. Knowl Inf Syst 62, 1199–1228 (2020). https://doi.org/10.1007/s10115-019-01385-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-019-01385-8