Abstract
Discovering high utility itemsets in transaction databases is a key task for studying the behavior of customers. It consists of finding groups of items bought together that yield a high profit. Several algorithms have been proposed to mine high utility itemsets using various approaches and more or less complex data structures. Among existing algorithms, one-phase algorithms employing the utility-list structure have shown to be the most efficient. In recent years, the simplicity of the utility-list structure has led to the development of numerous utility-list based algorithms for various tasks related to utility mining. However, a major limitation of utility-list based algorithms is that creating and maintaining utility-lists are time consuming and can consume a huge amount of memory. The reasons are that numerous utility lists are built and that the utility-list intersection/join operation to construct a utility-list is costly. This paper addresses this issue by proposing an improved utility-list structure called utility-list buffer to reduce the memory consumption and speed up the join operation. This structure is integrated into a novel algorithm named ULB-Miner (Utility-List Buffer for high utility itemset Miner), which introduces several new ideas to more efficiently discover high utility itemsets. ULB-Miner uses the designed utility-list buffer structure to efficiently store and retrieve utility-lists, and reuse memory during the mining process. Moreover, the paper also introduces a linear time method for constructing utility-list segments in a utility-list buffer. An extensive experimental study on various datasets shows that the proposed algorithm relying on the novel utility-list buffer structure is highly efficient in terms of both execution time and memory consumption. The ULB-Miner algorithm is up to 10 times faster than the FHM and HUI-Miner algorithms and consumes up to 6 times less memory. Moreover, it performs well on both dense and sparse datasets.
Similar content being viewed by others
References
Agrawal R, Srikan R (1994) Fast algorithms for mining association rules in large databases. In: Proceedings of 20th international conference on very large data bases (VLDB 1994). Morgan Kaufmann, pp 487–499
Agrawal R, Srikant R (1994) Quest synthetic data generator. Available at. http://www.almaden.ibm.com/cs/quest/syndata.html
Ahmed C, Tanbeer S, Jeong BS, Lee YK (2009) Efficient tree structures for high utility pattern mining in incremental databases. IEEE Trans Knowl Data Eng 21(12):1708–1721
Ahmed CF, Tanbeer SK, Jeong BS, Lee YK (2009) Efficient mining of utility-based web path traversal patterns. In: Proceedings of the 11th international conference on advanced communication technology - vol 3, ICACT’09, pp. 2215–2218
Chan R, Yang Q, Shen YD (2003) Mining high utility itemsets. In: Proceedings of the 3rd IEEE international conference on data mining, pp 19–26
Dam TL, Li K, Fournier-Viger P, Duong QH (2016) An efficient algorithm for mining top-rank-k frequent patterns. Appl Intell 45(1):96–111
Dam TL, Li K, Fournier-Viger P, Duong QH (2017) CLS-Miner: efficient and effective closed high utility itemset mining. Frontiers of Computer Science, pp 1–27
Dam TL, Li K, Fournier-Viger P, Duong QH (2017) An efficient algorithm for mining top-k on-shelf high utility itemsets. Knowl Inf Syst 52(3):621–655
Duong QH, Liao B, Fournier-Viger P, Dam TL (2016) An efficient algorithm for mining the top-k high utility itemsets, using novel threshold raising and pruning strategies. Knowl-Based Syst 104:106–122
Fournier-Viger P, Gomariz A, Gueniche T, Soltani A, Wu CW, Tseng V (2014) SPMF: A java open-source pattern mining library. J Mach Learn Res 15:3569–3573
Fournier-Viger P, Lin JC, Duong Q, Dam T (2016) FHM+: Faster high-utility itemset mining using length upper-bound reduction. In: Proceedings of the 29th international conference on industrial engineering and other applications of applied intelligent systems, pp 115–127
Fournier-Viger P, Lin JCW, Duong QH, Dam TL (2016) PHM: Mining periodic high-utility itemsets. In: Proceedings of the 16th industrial conference on data mining. Springer, pp 64–79. Springer
Fournier-Viger P, Wu CW, Zida S, Tseng V (2014) FHM: Faster high-utility itemset mining using estimated utility co-occurrence pruning. In: Proceedings of the 21st international symposium on methodologies for intelligent systems, pp 83–92
Fournier-Viger P, Zida S (2015) FOSHU: Faster on-shelf high utility itemset mining – with or without negative unit profit. In: Proceedings of the 30th annual ACM symposium on applied computing, SAC ’15, pp 857–864
Grahne G, Zhu J (2005) Fast algorithms for frequent itemset mining using fp-trees. IEEE Trans Knowl Data Eng 17(10):1347–1362
Han J, Wang J, Lu Y, Tzvetkov P (2002) Mining top-k frequent closed patterns without minimum support. In: Proceedings of the IEEE international conference on data mining, pp 211–218
Han JW, Pei J, Yin YW (2004) Mining frequent patterns without candidate generation: A frequent-pattern tree approach. Data Min Knowl Disc 8(1):53–87
Joshi M, Bhalodia D (2016) Mining high utility itemset using graphics processor. In: Proceedings of the international symposium on intelligent systems technologies and applications, pp 665–674
Krishnamoorthy S (2015) Pruning strategies for mining high utility itemsets. Expert Syst Appl 42(5):2371–2381
Lan GC, Hong TP, Tseng V (2014) An efficient projection-based indexing approach for mining high utility itemsets. Knowl Inf Syst 38(1):85–107
Lee S, Park JS (2016) Top-k high utility itemset mining based on utility-list structures. In: Proceedings of the international conference on big data and smart computing, pp 101–108
Lin JCW, Gan W, Fournier-Viger P, Hong TP, Tseng V (2016) Efficient algorithms for mining high-utility itemsets in uncertain databases. Knowl-Based Syst 96:171–187
Liu M, Qu J (2012) Mining high utility itemsets without candidate generation. In: Proceedings of the 21st ACM international conference on information and knowledge management, CIKM ’12, pp 55–64
Liu Y, Liao WK, Choudhary A (2005) A two-phase algorithm for fast discovery of high utility itemsets. In: Proceedings of the 9th pacific-asia conference on advances in knowledge discovery and data mining, PAKDD’05, pp 689–695
Sahoo J, Das AK, Goswami A (2016) An efficient fast algorithm for discovering closed+ high utility itemsets. Appl Intell 45(1):44–74
Song W, Liu Y, Li J (2014) BAHUI: Fast and memory efficient mining of high utility itemsets based on bitmap. Int J Data Warehouse Min 10(1):1–15
Song W, Liu Y, Li J (2014) Mining high utility itemsets by dynamically pruning the tree structure. Appl Intell 40(1):29– 43
Song W, Zhang Z, Li J (2016) A high utility itemset mining algorithm based on subsume index. Knowl Inf Syst 49(1):315– 340
Thilagu M, Nadarajan R (2012) Efficiently mining of effective web traversal patterns with average utility. Procedia Technol 6:444–451
Tseng V, Shie BE, Wu CW, Yu P (2013) Efficient algorithms for mining high utility itemsets from transactional databases. IEEE Trans Knowl Data Eng 25(8):1772–1786
Tseng V, Wu CW, Fournier-Viger P, Yu P (2016) Efficient algorithms for mining top-k high utility itemsets. IEEE Trans Knowl Data Eng 28(1):54–67
Wang JZ, Huang JL, Chen YC (2016) On efficiently mining high utility sequential patterns. Knowl Inf Syst 49(2):597–627
Wu CW, Fournier-Viger P, Gu JY, Tseng V (2015) Mining closed+ high utility itemsets without candidate generation. In: 2015 conference on technologies and applications of artificial intelligence (TAAI), pp 187–194
Wu CW, Shie BE, Tseng V, Yu PS (2012) Mining top-k high utility itemsets. In: Proceedings of the 18th ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’12, pp 78–86
Liu Y-C, Cheng C-P, Tseng V (2013) Mining differential top-k co-expression patterns from time course comparative gene expression datasets. BMC Bioinformatics 14:230
Yun U, Ryang H, Lee G, Fujita H (2017) An efficient algorithm for mining high utility patterns from incremental databases with one database scan. Knowl-Based Syst 124:188–206
Yun U, Ryang H, Ryu KH (2014) High utility itemset mining with techniques for reducing overestimated utilities and pruning candidates. Expert Syst Appl 41(8):3861–3878
Zaki MJ, Gouda K (2003) Fast vertical mining using diffsets. In: Proceedings of the 9th ACM SIGKDD international conference on knowledge discovery and data mining, pp 326–335
Acknowledgments
This research was partly funded by the Norwegian University of Science and Technology (NTNU) through the MUSED project and partly supported by the Youth 1000 funding of Prof. Philippe Fournier-Viger. The work of Mrs. Dam was carried out during the tenure of an ERCIM “Alain Bensoussan” Fellowship Programme.
Author information
Authors and Affiliations
Corresponding authors
Rights and permissions
About this article
Cite this article
Duong, QH., Fournier-Viger, P., Ramampiaro, H. et al. Efficient high utility itemset mining using buffered utility-lists. Appl Intell 48, 1859–1877 (2018). https://doi.org/10.1007/s10489-017-1057-2
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-017-1057-2