More Efficient Algorithms for Mining High-Utility Itemsets with Multiple Minimum Utility Thresholds

Gan, Wensheng; Lin, Jerry Chun-Wei; Fournier-Viger, Philippe; Chao, Han-Chieh

doi:10.1007/978-3-319-44403-1_5

Wensheng Gan¹⁵,
Jerry Chun-Wei Lin¹⁵,
Philippe Fournier-Viger¹⁶ &
…
Han-Chieh Chao^15,17

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9827))

Included in the following conference series:

International Conference on Database and Expert Systems Applications

879 Accesses
12 Citations

Abstract

Mining high-utility itemsets (HUIs) is a popular data mining task, which consists of discovering sets of items that yield a high profit in a transaction database. Although HUI mining has numerous applications, a key limitation is that a single minimum utility threshold (minutil) is used to assess the utility of all items. This simplifying assumption is unrealistic since in real-life all items do not have the same unit profit, and thus do not have an equal chance of generating a high profit. As a result, if the minutil threshold is set high, patterns containing items having a low unit profit are often missed, while if minutil is set low, the number of patterns becomes unmanageable. To address this issue, this paper presents an efficient tree-based algorithm named HIMU for mining HUIs using multiple minimum utility thresholds. A novel tree structure called multiple item utility Set-enumeration (MIU)-tree and the global and conditional downward closure (GDC and CDC) properties of HUIs in the MIU-tree are proposed. Moreover, a vertical compact utility-list structure is adopted to store the information required for discovering HUIs without performing additional database scans and generating candidates. An extensive experimental study on real-world and synthetic datasets show that this greatly improves the efficiency of the algorithm in terms of runtime and scalability.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Frequent itemset mining dataset repository. http://fimi.ua.ac.be/data/
Agrawal, R., Imielinski, T., Swami, A.: Database mining: a performance perspective. IEEE Trans. Knowl. Data Eng. 5(6), 914–925 (1993)
Article Google Scholar
Agrawal, R., Srikant, R.: Fast algorithms for mining association rules in large databases. In: The International Conference on Very Large Data Bases, pp. 487–499 (1994)
Google Scholar
Microsoft. Example database foodmart of Microsoft analysis services. http://www.Almaden.ibm.com/cs/quest/syndata.html
Ahmed, C.F., Tanbeer, S.K., Jeong, B.S., Le, Y.K.: Efficient tree structures for high utility pattern mining in incremental databases. IEEE Trans. Knowl. Data Eng. 21(12), 1708–1721 (2009)
Article Google Scholar
Chan, R., Yang, Q., Shen, Y.D.: Mining high utility itemsets. In: The International Conference on Data Mining, pp. 19–26 (2003)
Google Scholar
Liu, B., Hsu, W., Ma, Y.: Mining association rules with multiple minimum supports. In: ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 337–341 (1999)
Google Scholar
Fournier-Viger, P., Wu, C.-W., Zida, S., Tseng, V.S.: FHM: faster high-utility itemset mining using estimated utility co-occurrence pruning. In: Andreasen, T., Christiansen, H., Cubero, J.-C., Raś, Z.W. (eds.) ISMIS 2014. LNCS, vol. 8502, pp. 83–92. Springer, Heidelberg (2014)
Google Scholar
Lin, J.C.W., Gan, W., Fournier-Viger, P., Hong, T.P.: Mining high-utility itemsets with multiple minimum utility thresholds. In: ACM International Conference on Computer Science & Software Engineering, pp. 9–17 (2015)
Google Scholar
Liu, M., Qu, J.: Mining high utility itemsets without candidate generation. In: ACM International Conference on Information and Knowledge Management, pp. 55–64 (2012)
Google Scholar
Liu, Y., Liao, W., Choudhary, A.K.: A two-phase algorithm for fast discovery of high utility itemsets. In: Ho, T.-B., Cheung, D., Liu, H. (eds.) PAKDD 2005. LNCS (LNAI), vol. 3518, pp. 689–695. Springer, Heidelberg (2005)
Chapter Google Scholar
Kiran, R.U., Reddy, P.K.: Novel techniques to reduce search space in multiple minimum supports-based frequent pattern mining algorithms. In: ACM International Conference on Extending Database Technology, pp. 11–20 (2011)
Google Scholar
Tseng, V.S., Wu, C.W., Shie, B.E., Yu, P.S.: UP-growth: an efficient algorithm for high utility itemset mining. In: ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 253–262 (2010)
Google Scholar
Tseng, V.S., Shie, B.E., Wu, C.W., Yu, P.S.: Efficient algorithms for mining high utility itemsets from transactional databases. IEEE Trans. Knowl. Data Eng. 25(8), 1772–1786 (2013)
Article Google Scholar
Hu, Y.H., Chen, Y.L.: Mining association rules with multiple minimum supports: a new mining algorithm and a support tuning mechanism. Decis. Support Syst. 42(1), 1–24 (2006)
Article Google Scholar
Yao, H., Hamilton, J., Butz, C.J.: A foundational approach to mining itemset utilities from databases. In: SIAM International Conference on Data Mining, pp. 211–225 (2004)
Google Scholar

Download references

Acknowledgment

This research was partially supported by the National Natural Science Foundation of China (NSFC) under Grant No. 61503092, and by the Tencent Project under grant CCF-TencentRAGR20140114.

Author information

Authors and Affiliations

School of Computer Science and Technology, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, China
Wensheng Gan, Jerry Chun-Wei Lin & Han-Chieh Chao
School of Natural Sciences and Humanities, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, China
Philippe Fournier-Viger
Department of Computer Science and Information Engineering, National Dong Hwa University, Hualien, Taiwan
Han-Chieh Chao

Authors

Wensheng Gan
View author publications
You can also search for this author in PubMed Google Scholar
Jerry Chun-Wei Lin
View author publications
You can also search for this author in PubMed Google Scholar
Philippe Fournier-Viger
View author publications
You can also search for this author in PubMed Google Scholar
Han-Chieh Chao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jerry Chun-Wei Lin .

Editor information

Editors and Affiliations

Clausthal University of Technology , Clausthal-Zellerfeld, Germany
Sven Hartmann
Victoria University of Wellington , Wellington, New Zealand
Hui Ma

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Gan, W., Lin, J.CW., Fournier-Viger, P., Chao, HC. (2016). More Efficient Algorithms for Mining High-Utility Itemsets with Multiple Minimum Utility Thresholds. In: Hartmann, S., Ma, H. (eds) Database and Expert Systems Applications. DEXA 2016. Lecture Notes in Computer Science(), vol 9827. Springer, Cham. https://doi.org/10.1007/978-3-319-44403-1_5

Download citation

DOI: https://doi.org/10.1007/978-3-319-44403-1_5
Published: 06 August 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-44402-4
Online ISBN: 978-3-319-44403-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics