Abstract
When focusing on the general area of data mining, high-utility itemset mining (HUIM) can be defined as an offset of frequent itemset mining (FIM). It is known to emphasize more factors critically, which gives HUIM its intrinsic edge. Due to the flourishing development of the IoT technique, the uncertainty patterns mining is also attractive. Potential high-utility itemset mining (PHUIM) is introduced to reveal valuable patterns in an uncertainty database. Unfortunately, even though the previous methods are all very effective and powerful to mine, the potential high-utility itemsets quickly. These algorithms are not specifically designed for a database with an enormous number of records. In the previous methods, uncertainty transaction datasets would be load in the memory ultimately. Usually, several pre-defined operators would be applied to modify the original dataset to reduce the seeking time for scanning the data. However, it is impracticable to apply the same way in a big-data dataset. In this work, a dataset is assumed to be too big to be loaded directly into memory and be duplicated or modified; then, a MapReduce framework is proposed that can be used to handle these types of situations. One of our main objectives is to attempt to reduce the frequency of dataset scans while still maximizing the parallelization of all processes. Through in-depth experimental results, the proposed Hadoop algorithm is shown to perform strongly to mine all of the potential high-utility itemsets in a big-data dataset and shows excellent performance in a Hadoop computing cluster.








Similar content being viewed by others
References
Aggarwal CC, Li Y, Wang J, Wang J (2009) Frequent pattern mining with uncertain data. In: ACM SIGKDD international conference on knowledge discovery and data mining , pp 29–38
Aggarwal CC, Yu PS (2009) A survey of uncertain data algorithms and applications. IEEE Trans Knowl Data Eng 21(5):609–623
Agrawal R, Imielinski T, Swami A (1993) Database mining: A performance perspective. IEEE Trans Knowl Data Eng 5(6):914–925
Agrawal R, Imielinski T, Swami A (1993) Mining association rules between sets of items in large database. In: ACM SIGMOD international conference on management of data, pp 207–216
Agrawal R, Srikant R (1994) Fast algorithms for mining association rules in large databases. In: The international conference on very large data bases, pp 487–499
Ahmed CF, Tanbeer SK, Jeong BS, Le YK (2009) Efficient tree structures for high utility pattern mining in incremental databases. IEEE Trans Knowl Data Eng 21(12):1708–1721
Ahmed U, Lin JCW, Srivastava G, Yasin R, Djenouri Y (2020) An evolutionary model to mine high expected utility patterns from uncertain databases. In: IEEE transactions on emerging topics in computational intelligence
Bernecker T, Kriegel HP, Renz M, Verhein F, Zuefl A (2009) Probabilistic frequent itemset mining in uncertain databases. In: ACM SIGKDD international conference on knowledge discovery and data mining, pp 119–128
Braun P, Cuzzocrea A, Leung CK, Pazdor AG, Souza J, Tanbeer SK (2019) Pattern mining from big IoT data with fog computing: models, issues, and research perspectives. In: IEEE/ACM international symposium on cluster cloud and grid computing, pp 584–591
Chan R, Yang Q, Shen YD (2003) Mining high utility itemsets. In: IEEE international conference on data mining, pp 19–26
Chen MS, Han J, Yu PS (1996) Data mining: an overview from a database perspective. IEEE Trans Knowl Data Eng 8(6):866–883
Chui CK, Kao B, Hung E (2007) Mining frequent itemsets from uncertain data. In: The pacific-asia conference on knowledge discovery and data mining, pp 47–58
Fournier-Viger P, Wu CW, Zida S, Tseng VS (2014) FHM: Faster high-utility itemset mining using estimated utility co-occurrence pruning. Int Symp Methodol Intell Syst 8502:83–92
Fournier-Viger P, Lin CW, Gomariz A, Gueniche T, Soltani A, Deng Z, Lam HT (2016) The SPMF open-source data mining library version 2. Europ Conf Princip Data Mining Knowl Dis 9853:36–40
Guo C, Zhuang R, Su C, Liu CZ, Choo KKR (2019) Secure and efficient K nearest neighbor query over encrypted uncertain data in Cloud-IoT ecosystem. IEEE Internet Things J 6 (6):9868– 9879
Han J, Pei J, Yin Y, Mao R (2004) Mining frequent patterns without candidate generation: A frequent-pattern tree approach. Data Min Knowl Disc 8(1):53–87
Leung CKS, Hao B (2009) Mining of frequent itemsets from streams of uncertain data. In: IEEE International conference on data engineering, pp 1663–1670
Leung CKS, Mateo MAF, Brajczuk DA (2008) A tree-based approach for frequent pattern mining from uncertain data. In: The pacific-asia conference on knowledge discovery and data mining, pp 653–661
Lin CW, Hong TP, Lu WH (2011) An effective tree structure for mining high utility itemsets. Expert Syst Appl 38(6):7419–7424
Liu J, Wang K, Fung BCM (2012) Direct discovery of high utility itemsets without candidate generation. In: IEEE international conference on data mining, pp 984–989
Lin CW, Hong TP (2012) A new mining approach for uncertain databases using cufp trees. Expert Syst Appl 39(4):4084–4093
Liu C, Chen L, Zhang C (2013) Summarizing probabilistic frequent patterns: a fast approach. In: ACM SIGKDD international conference on knowledge discovery and data mining, pp 527–535
Liu M, Qu J (2012) Mining high utility itemsets without candidate generation. In: ACM international conference on information and knowledge management, pp 55–64
Liu Y, Liao WK, Choudhary A (2005) A two-phase algorithm for fast discovery of high utility itemsets. In: The pacific-asia conference on knowledge discovery and data mining, pp 689–695
Lin YC, Wu CW, Tseng VS (2015) Mining high utility itemsets in big data. In: Pacific-asia conference on knowledge discovery and data mining, pp 659–661
Lin JCW, Gan W, Fournier-Viger P, Hong TP, Tseng VS (2016) Efficient algorithms for mining high-utility itemsets in uncertain databases. Knowl-Based Syst 96:171–187
Lin JCW, Gan W, Fournier-Viger P, Hong TP, Tseng VS (2017) Efficiently mining uncertain high-utility itemsets. Soft Comput 21:2801–2820
Lin JCW, Srivastava G, Zhang Y, Djenouri Y, Aloqaily M (2020) Privacy preserving multi-objective sanitization model in 6G IoT environments. IEEE Internet of Things Journal
Lin JCW, Shao Y, Djenouri Y, Yun U (2020) ASRNN: a recurrent neural network with an attention model for sequence labeling. Knowledge-Based Systems
Sun L, Cheng R, Cheung DW, Cheng J (2010) Mining uncertain data with probabilistic guarantees. In: Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining 2010 Jul 25, pp 273–282
Srivastava G, Lin JCW, Jolfaei A, Li Y, Djenouri Y (2020) Uncertain-driven analytics of sequence data in IoCV environments. IEEE Trans Intell Transp Syst, 1–12
Srivastava G, Lin JCW, Pirouz M, Li Y, Yun U (2020) A pre-large weighted-fusion system of sensed high-utility patterns. IEEE Sensors Journal
Tong Y, Chen L, Cheng Y, Yu PS (2012) Mining frequent itemsets over uncertain databases. VLDB Endowment 5(11):1650–1661
Tseng VS, Wu CW, Shie BE, Yu PS (2010) UP-Growth: an efficient algorithm for high utility itemset mining. In: ACM SIGKDD international conference on knowledge discovery and data, pp 253–262
Wang L, Cheng R, Lee SD, Cheung D (2010) Accelerating probabilistic frequent itemset mining: a model-based approach. In: ACM international conference on information and knowledge management, pp 429–438
Wang L, Cheung DL, Cheng R, Lee SD, Yang XS (2012) Efficient mining of frequent item sets on large uncertain databases. IEEE Trans Knowl Data Eng 24(12):2170–2183
Yao H, Hamilton HJ (2006) Mining itemset utilities from transaction databases. Data Knowl Eng 59(3):603–626
Yao H, Hamilton HJ, Butz CJ (2004) A foundational approach to mining itemset utilities from databases. In: SIAM international conference on data mining, pp 211–225
Zida S, Fournier-Viger P, Lin JCW, Wu CW, Tseng VS (2015) EFIM: a highly efficient algorithm for high-utility itemset mining. In: Mexican international conference on artificial intelligence, pp 530–546
Zhang B, Lin JCW, Fournier-Viger P, Li T (2017) Mining of high utility-probability sequential patterns from uncertain databases. PloS One 12(7):e0180931
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Wu, J.MT., Srivastava, G., Lin, J.CW. et al. Mining of High-Utility Patterns in Big IoT-based Databases. Mobile Netw Appl 26, 216–233 (2021). https://doi.org/10.1007/s11036-020-01701-5
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11036-020-01701-5