A hybrid framework for mining high-utility itemsets in a sparse transaction database

Dawar, Siddharth; Goyal, Vikram; Bera, Debajyoti

doi:10.1007/s10489-017-0932-1

A hybrid framework for mining high-utility itemsets in a sparse transaction database

Published: 25 April 2017

Volume 47, pages 809–827, (2017)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Siddharth Dawar¹,
Vikram Goyal¹ &
Debajyoti Bera¹

628 Accesses
32 Citations
Explore all metrics

Abstract

High-utility itemset mining aims to find the set of items with utility no less than a user-defined threshold in a transaction database. High-utility itemset mining is an emerging research area in the field of data mining and has important applications in inventory management, query recommendation, systems operation research, bio-medical analysis, etc. Currently, known algorithms for this problem can be classified as either 1-phase or 2-phase algorithms. The 2-phase algorithms typically consist of tree-based algorithms which generate candidate high-utility itemsets and verify them later. A tree data structure generates candidate high-utility itemsets quickly by storing some upper bound utility estimate at each node. The 1-phase algorithms typically consist of inverted-list based and transaction projection based algorithms which avoid the generation of candidate high-utility itemsets. The inverted list and transaction projection allows computation of exact utility estimates. We propose a novel hybrid framework that combines a tree-based and an inverted-list based algorithm to efficiently mine high-utility itemsets. Algorithms based on the framework can harness benefits of both types of algorithms. We report experiment results on real and synthetic datasets to demonstrate the effectiveness of our framework.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

More Efficient Algorithms for Mining High-Utility Itemsets with Multiple Minimum Utility Thresholds

HUITWU: An Efficient Algorithm for High-Utility Itemset Mining in Transaction Databases

Article 08 July 2016

An efficient utility-list based high-utility itemset mining algorithm

Article 13 July 2022

Notes

In an earlier work [5], we designed a similar hybrid algorithm for solving a similar problem of mining high-utility itemsets with discounts where UP-Hist Growth [7] and FHM [10] were combined.

References

Agrawal R, Srikant R et al (1994) Fast algorithms for mining association rules Proceeding 20th international conference on very large data bases, VLDB, vol 1215, pp 487–499
Ahmed C F, Tanbeer S K, Jeong B S, Lee Y K (2009) Efficient tree structures for high utility pattern mining in incremental databases. IEEE Trans Knowl Data Eng 21(12):1708–1721. doi:10.1109/TKDE.2009.46
Article Google Scholar
Ahmed C F, Tanbeer S K, Jeong B S, Lee Y K (2011) Huc-prune: an efficient candidate pruning technique tomine high utility patterns. Appl Intell 34(2):181–198. doi:10.1007/s10489-009-0188-5
Article Google Scholar
Ahmed CF, Tanbeer SK, Jeong BS, Choi HJ (2012) Interactive mining of high utility patterns over data streams. Expert Syst Appl 39(15):11,979–11,991. doi:10.1016/j.eswa.2012.03.062. http://www.sciencedirect.com/science/article/pii/S0957417412005854
Article Google Scholar
Bansal R, Dawar S, Goyal V (2015) An efficient algorithm for mining high-utility itemsets with discount notion. Springer International Publishing, Cham, pp 84–98. doi:10.1007/978-3-319-27057-9_6
Google Scholar
Chan R, Yang Q, Shen YD (2003) Mining high utility itemsets Third IEEE international conference on data mining, 2003. ICDM 2003. doi:10.1109/ICDM.2003.1250893, pp 19–26
Google Scholar
Dawar S, Goyal V (2014) Up-hist tree: an efficient data structure for mining high utility patterns from transaction databases Proceedings of the 19th international database engineering & applications symposium, ACM, New York, NY, USA, IDEAS ’15. doi:10.1145/2790755.2790771, pp 56–61
Chapter Google Scholar
Erwin A, Gopalan RP, Achuthan NR (2008) Efficient mining of high utility itemsets from large datasets. Springer, Berlin, pp 554–561. doi:10.1007/978-3-540-68125-0_50
Google Scholar
Fournier-Viger P, Gomariz A, Gueniche T, Soltani A, Wu C W, Tseng V S (2014) Spmf: a java open-source pattern mining library. J Mach Learn Res 15(1):3389–3393
MATH Google Scholar
Fournier-Viger P, Wu CW, Zida S, Tseng VS (2014) FHM: Faster High-utility itemset mining using estimated utility co-occurrence pruning. Springer International Publishing, Cham, pp 83–92. doi:10.1007/978-3-319-08326-1_9
Google Scholar
Goethals B, Zaki M (2003) The frequent itemset mining implementations repository. http://fimi.ua.ac.be/
Goyal V, Dawar S, Sureka A (2015) High utility rare itemset mining over transaction databases. Springer International Publishing, Cham, pp 27–40. doi:10.1007/978-3-319-16313-0_3
Google Scholar
Han J, Pei J, Yin Y (2000) Mining frequent patterns without candidate generation Proceedings of the 2000 ACM SIGMOD international conference on management of data, ACM, New York, NY, USA, SIGMOD ’00. doi:10.1145/342009.335372, pp 1–12
Google Scholar
Lan G C, Hong T P, Tseng V S (2014) An efficient projection-based indexing approach for mining high utility itemsets. Knowl Inf Syst 38(1):85–107. doi:10.1007/s10115-012-0492-y
Article Google Scholar
Leung C K S, Khan Q I, Li Z, Hoque T (2007) Cantree: a canonical-order tree for incremental frequent-pattern mining. Knowl Inf Syst 11(3):287–311. doi:10.1007/s10115-006-0032-8
Article Google Scholar
Li HF, Huang HY, Chen YC, Liu YJ, Lee SY (2008) Fast and memory efficient mining of high utility itemsets in data streams 2008 8th IEEE international conference on data mining. doi:10.1109/ICDM.2008.107, pp 881–886
Chapter Google Scholar
Li YC, Yeh JS, Chang CC (2008) Isolated items discarding strategy for discovering high utility itemsets. Data & Knowledge Engineering 64(1):198–217. doi:10.1016/j.datak.2007.06.009. http://www.sciencedirect.com/science/article/pii/S0169023X07001218
Article Google Scholar
Li YC, Yeh JS, Chang CC (2008) Isolated items discarding strategy for discovering high utility itemsets. Data Knowl Eng 64(1):198–217. doi:10.1016/j.datak.2007.06.009. http://www.sciencedirect.com/science/article/pii/S0169023X07001218
Article Google Scholar
Liu M, Qu J (2012) Mining high utility itemsets without candidate generation Proceedings of the 21st ACM international conference on information and knowledge management, ACM, New York, NY, USA, CIKM ’12. doi:10.1145/2396761.2396773, pp 55–64
Google Scholar
Liu Y, Liao Wk, Choudhary A (2005) A fast high utility itemsets mining algorithm Proceedings of the 1st international workshop on utility-based data mining, ACM, New York, NY, USA, UBDM ’05. doi:10.1145/1089827.1089839, pp 90–99
Chapter Google Scholar
Liu Y, Liao Wk, Choudhary A (2005) A two-phase algorithm for fast discovery of high utility itemsets. Springer, Berlin, pp 689–695. doi:10.1007/11430919_79
Google Scholar
Park JS, Chen MS, Yu PS (1995) An effective hash-based algorithm for mining association rules Proceedings of the 1995 ACM SIGMOD international conference on management of data, ACM, New York, NY, USA, SIGMOD ’95. doi:10.1145/223784.223813, pp 175–186
Chapter Google Scholar
Pisharath J, Liu Y, Wk Liao, Choudhary A, Memik G, Parhi J (2005) Nu-minebench 2.0. Department of Electrical and Computer Engineering, Northwestern University, Tech Rep
Rathore S, Dawar S, Goyal V, Patel D (2016) Top-k high utility episode mining from a complex event sequence 21St international conference on management of data, COMAD 2016, Pune, India, March 11–13, 2016. http://comad.in/comad2016/proceedings/paper_19.pdf, pp 56–63
Google Scholar
Shie BE, Tseng VS, Yu PS (2010) Online mining of temporal maximal utility itemsets from data streams Proceedings of the 2010 ACM symposium on applied computing, ACM, New York, NY, USA, SAC ’10. doi:10.1145/1774088.1774436, pp 1622–1626
Chapter Google Scholar
Shie BE, Hsiao HF, Tseng VS, Yu PS (2011) Mining high utility mobile sequential patterns in mobile commerce environments. Springer, Berlin, pp 224–238. doi:10.1007/978-3-642-20149-3_18
Google Scholar
Shie BE, Yu PS, Tseng VS (2012) Efficient algorithms for mining maximal high utility itemsets from data streams with different models. Expert Syst Appl 39(17):12,947–12,960. doi:10.1016/j.eswa.2012.05.035. http://www.sciencedirect.com/science/article/pii/S095741741200749X
Article Google Scholar
Tseng VS, Wu CW, Shie BE, Yu PS (2010) Up-growth: an efficient algorithm for high utility itemset mining Proceedings of the 16th ACM SIGKDD international conference on knowledge discovery and data mining, ACM, New York, NY, USA, KDD ’10. doi:10.1145/1835804.1835839, pp 253–262
Google Scholar
Tseng V S, Shie B E, Wu C W, Yu P S (2013) Efficient algorithms for mining high utility itemsets from transactional databases. IEEE Trans Knowl Data Eng 25(8):1772–1786. doi:10.1109/TKDE.2012.59
Article Google Scholar
Vu L, Alaghband G (2011) A fast algorithm combining fp-tree and tid-list for frequent pattern mining Proceedings of information and knowledge engineering, pp 472–477
Google Scholar
Wu CW, Lin YF, Yu PS, Tseng VS (2013) Mining high utility episodes in complex event sequences Proceedings of the 19th ACM SIGKDD international conference on knowledge discovery and data mining, ACM, New York, NY, USA, KDD ’13. doi:10.1145/2487575.2487654, pp 536–544
Chapter Google Scholar
Yin J, Zheng Z, Cao L (2012) Uspan: an efficient algorithm for mining high utility sequential patterns Proceedings of the 18th ACM SIGKDD international conference on knowledge discovery and data mining, ACM, New York, NY, USA, KDD ’12. doi:10.1145/2339530.2339636, pp 660–668
Chapter Google Scholar
Yin J, Zheng Z, Cao L, Song Y, Wei W (2013) Efficiently mining top-k high utility sequential patterns 2013 IEEE 13th international conference on data mining. doi:10.1109/ICDM.2013.148, pp 1259–1264
Chapter Google Scholar
Yun U, Ryang H, Ryu KH (2014) High utility itemset mining with techniques for reducing overestimated utilities and pruning candidates. Expert Syst Appl 41(8):3861–3878. doi:10.1016/j.eswa.2013.11.038. http://www.sciencedirect.com/science/article/pii/S0957417413009585
Article Google Scholar
Zaki M J, Parthasarathy S, Ogihara M, Li W, et al. (1997) New algorithms for fast discovery of association rules KDD, vol 97, pp 283–286
Zida S, Fournier-Viger P, Lin JCW, Wu CW, Tseng VS (2015) EFIM: A highly efficient algorithm for high-utility itemset mining. Springer International Publishing, Cham, pp 530–546. doi:10.1007/978-3-319-27060-9_44
Google Scholar

Download references

Acknowledgments

This work was supported in parts by Infosys Centre for Artificial Intelligence, IIIT-Delhi and Visvesvaraya Ph.D scheme for Electronics and IT.

Author information

Authors and Affiliations

Department of Computer Science, Indraprastha Institute of Information Technology, Delhi, India
Siddharth Dawar, Vikram Goyal & Debajyoti Bera

Authors

Siddharth Dawar
View author publications
You can also search for this author in PubMed Google Scholar
Vikram Goyal
View author publications
You can also search for this author in PubMed Google Scholar
Debajyoti Bera
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Vikram Goyal.

Ethics declarations

Conflict of interests

The authors declare that they have no conflict of interest.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Dawar, S., Goyal, V. & Bera, D. A hybrid framework for mining high-utility itemsets in a sparse transaction database. Appl Intell 47, 809–827 (2017). https://doi.org/10.1007/s10489-017-0932-1

Download citation

Published: 25 April 2017
Issue Date: October 2017
DOI: https://doi.org/10.1007/s10489-017-0932-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A hybrid framework for mining high-utility itemsets in a sparse transaction database

Abstract

Access this article

Similar content being viewed by others

More Efficient Algorithms for Mining High-Utility Itemsets with Multiple Minimum Utility Thresholds

HUITWU: An Efficient Algorithm for High-Utility Itemset Mining in Transaction Databases

An efficient utility-list based high-utility itemset mining algorithm

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interests

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A hybrid framework for mining high-utility itemsets in a sparse transaction database

Abstract

Access this article

Similar content being viewed by others

More Efficient Algorithms for Mining High-Utility Itemsets with Multiple Minimum Utility Thresholds

HUITWU: An Efficient Algorithm for High-Utility Itemset Mining in Transaction Databases

An efficient utility-list based high-utility itemset mining algorithm

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interests

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation