Efficient algorithms to mine concise representations of frequent high utility occupancy patterns

Duong, Hai; Pham, Huy; Truong, Tin; Fournier-Viger, Philippe

doi:10.1007/s10489-024-05296-2

Efficient algorithms to mine concise representations of frequent high utility occupancy patterns

Published: 18 March 2024

Volume 54, pages 4012–4042, (2024)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Hai Duong¹,
Huy Pham¹,
Tin Truong¹ &
…
Philippe Fournier-Viger ORCID: orcid.org/0000-0002-7680-9899²

134 Accesses
1 Altmetric
Explore all metrics

Abstract

Identifying all frequent high utility occupancy itemsets (FHUOIs) in a quantitative transaction dataset is a new trend in data mining. By combining both factors of frequency and utility occupancy, these patterns are more suitable for several applications in the real world. These patterns not only reflect the interests of most users but also contribute a high proportion of the utility in supporting transactions. Nonetheless, the set of all discovered FHUOIs may be very large, especially for large and dense datasets or for low values of predefined minimum thresholds. For this reason, it is often quite challenging for users to analyze and use the obtained patterns. To address this issue, this paper proposes two novel algorithms named MaxCloFHUOIM and CloFHUOIM to extract compact representations of FHUOIs. The former is designed to simultaneously mine two concise representations of FHUOIs that consist of all closed FHUOIs and all maximal FHUOIs, whereas the latter only discovers the closed FHUOIs, which provide a lossless summary of all FHUOIs. The proposed algorithms rely on a novel weak upper bound on utility occupancy, to reduce the search space by quickly pruning itemsets with low utility occupancy. Especially, the algorithms integrate two new efficient strategies to prune non-closed FHUOI candidate branches early in the prefix search tree. Results from an in-depth experimental evaluation conducted on several benchmark real-life and synthetic quantitative datasets demonstrate that MaxCloFHUOIM and CloFHUOIM have excellent performance in terms of runtime, memory usage, and scalability. In particular, the proposed algorithms are up to two orders of magnitude faster than a baseline algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

The pattern frequency distribution theory: a mathematic establishment toward rational and reliable pattern mining

Article 20 August 2022

CG-FHAUI: an efficient algorithm for simultaneously mining succinct pattern sets of frequent high average utility itemsets

Article 07 May 2024

Spatial data management in apache spark: the GeoSpark perspective and beyond

Article 22 October 2018

Data availability

The datasets used to evaluate algorithms and the synthetic dataset generator used in this study, are public and were obtained from [48]. The generated synthetic datasets will be shared on request.

Notes

iff means “if and only if”.

References

Agrawal R, Srikant R (1994) Fast Algorithms for Mining Association Rules in Large Databases. In: Proceedings of the 20th International Conference on Very Large Data Bases (VLDB’94). pp 487–499
Zaki MJ (2000) Scalable algorithms for association mining. IEEE Trans Knowl Data Eng 12:372–390
Article Google Scholar
Nguyen LTT, Vo B, Mai T, Nguyen TL (2018) A weighted approach for class association rules. In: Sieminski A, Kozierkiewicz A, Nunez M, Ha Q (eds) Modern approaches for intelligent information and database systems. Studies in computational intelligence, vol 769. Springer, Cham, pp 213–222
Djenouri Y, Belhadi A, Fournier-Viger P, Fujita H (2018) Mining diversified association rules in big datasets: A cluster/GPU/genetic approach. Inf Sci (N Y) 459:117–134
Article MathSciNet Google Scholar
Han J, Pei J, Yin Y, Mao R (2004) Mining frequent patterns without candidate generation: a frequent-pattern tree. Data Min Knowl Discov 8:53–87
Article MathSciNet Google Scholar
Vo B, Le T, Coenen F, Hong TP (2016) Mining frequent itemsets using the N-list and subsume concepts. Int J Mach Learn Cybern 7:253–265
Article Google Scholar
Moturi S, Tirumalarao SN, Vemuru S (2018) Frequent itemset mining algorithm: a survey. J Theor Appl Inf Technol 96:744–755
Google Scholar
Tang L, Zhang L, Luo P, Wang M (2012) Incorporating occupancy into frequent pattern mining for high quality pattern recommendation. In: ACM International Conference Proceeding Series. pp 75–84
Zhang L, Luo P, Tang L et al (2015) Occupancy-based frequent pattern mining. ACM Trans Knowl Discov Data 10:1–33
Google Scholar
Deng ZH (2020) Mining high occupancy itemsets. Future Gener Comput Syst 102:222–229
Article Google Scholar
Zhang K, Zhang Y, Wang Z (2020) Frequent Pattern Mining Based on Occupation and Correlation. In: ICEICT 2020 - IEEE 3rd International Conference on Electronic Information and Communication Technology. pp 161–166
Kim H, Ryu T, Lee C et al (2022) Mining high occupancy patterns to analyze incremental data in intelligent systems. ISA Trans 131:460–475
Article Google Scholar
Nguyen LTT, Mai T, Pham GH et al (2023) An efficient method for mining high occupancy itemsets based on equivalence class and early pruning. Knowl Based Syst 267:110441
Article Google Scholar
Tseng VS, Wu C, Shie B, Yu PS (2010) UP-Growth: An Efficient Algorithm for High Utility Itemset Mining. In: Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining. pp 253–262
Liu M, Qu J (2012) Mining high utility itemsets without candidate generation. In: Proceedings of ACM International Conference on Information and Knowledge Management. pp 55–64
Liu Y, Wang L, Feng L, Jin B (2021) Mining high utility itemsets based on pattern growth without candidate generation. Mathematics 9:1–22
Google Scholar
Shen B, Wen Z, Zhao Y, Zhou D, Zheng W (2016) OCEAN: Fast discovery of high utility occupancy itemsets. In: Bailey J, Khan L, Washio T, Dobbie G, Huang J, Wang R (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2016. Lecture Notes in Computer Science, vol 9651. Springer, Cham, pp 354–365
He J, Han X, Wang J, Zhang K (2022) Efficient high-utility occupancy itemset mining algorithm on massive data. Expert Syst Appl 210:118329
Article Google Scholar
Gan W, Lin JCW, Fournier-Viger P et al (2020) HUOPM: High-Utility Occupancy Pattern Mining. IEEE Trans Cybern 50:1195–1208
Article Google Scholar
Agrawal R, Imieliński T, Swami A (2005) Mining association rules between sets of items in large databases. ACM SIGMOD Rec 22:207–216
Article Google Scholar
Yao H, Hamilton HJ, Butz CJ (2004) A Foundational Approach to Mining Itemset Utilities from Databases. In: Proceedings of the Fourth SIAM International Conference on Data Mining. pp 482–486
Liu Y, Liao W, Choudhary A (2005) A fast high utility itemsets mining algorithm. In: Proceedings of the 1st international workshop on Utility-based data mining. pp 90–99
Zida S, Fournier-Viger P, Lin JC-W, et al (2015) EFIM: A Highly Efficient Algorithm for High-Utility Itemset Mining. In: Proceedings of Mexican International Conference on Artificial Intelligence (MICAI 2015). pp 530–546
Krishnamoorthy S (2017) HMiner: efficiently mining high utility itemsets. Expert Syst Appl 90:168–183
Article Google Scholar
Qu J-F, Fournier-Viger P, Liu M et al (2023) Mining high utility itemsets using prefix trees and utility vectors. IEEE Trans Knowl Data Eng 35:10224–10236
Article Google Scholar
Yin J, Zheng Z, Cao L (2012) USpan: An efficient algorithm for mining high utility sequential patterns. In: Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. pp 660–668
Truong-Chi T, Fournier-Viger P (2019) A survey of high utility sequential pattern mining. In: Fournier-Viger P, Lin JW, Nkambou R, Vo B, Tseng V (eds) High-Utility Pattern Mining. Studies in Big Data, vol 51. Springer, Cham, pp 97–129
Truong T, Duong H, Le B et al (2021) Efficient algorithms for mining frequent high utility sequences with constraints. Inf Sci (N Y) 568:239–264
Article MathSciNet Google Scholar
Nguyen A, Nguyen NT, Nguyen LTT, Vo B (2023) Mining inter-sequence patterns with Itemset constraints. Appl Intell 53:19827–19842
Article Google Scholar
Zhang C, Yang Y, Du Z et al (2024) HUSP-SP: faster utility mining on sequence data. ACM Trans Knowl Discov Data 18:1–21
Google Scholar
Hong T-P, Lee CH, Wang SL (2009) Mining high average-utility itemsets. In: Proceedings of IEEE International Conference on Systems, Man and Cybernetics. pp 2526–2530
Wu JMT, Lin JCW, Pirouz M, Fournier-Viger P (2018) New tighter upper bounds for mining high average-utility itemsets. In: ACM International Conference Proceeding Series. pp 27–32
Truong T, Duong H, Le B et al (2019) Efficient high average-utility itemset mining using novel vertical weak upper-bounds. Knowl Based Syst 183:104847
Article Google Scholar
Kim H, Yun U, Baek Y et al (2021) Efficient list-based mining of high average utility patterns with maximum average pruning strategies. Inf Sci (N Y) 543:85–105
Article Google Scholar
Li G, Shang T, Zhang Y (2023) Efficient mining high average-utility itemsets with effective pruning strategies and novel list structure. Appl Intell 53:6099–6118
Article Google Scholar
Truong T, Duong H, Le B, Fournier-Viger P (2020) EHAUSM: An efficient algorithm for high average utility sequence mining. Inf Sci (N Y) 515:302–323
Article MathSciNet Google Scholar
Singh K, Kumar R, Biswas B (2022) High average-utility itemsets mining: a survey. Appl Intell 52:3901–3938
Article Google Scholar
Tseng VS, Wu C, Fournier-Viger P, Yu PS (2015) Efficient algorithms for mining the concise and lossless representation of high utility itemsets. IEEE Trans Knowl Data Eng 27:726–739
Article Google Scholar
Chen CM, Chen L, Gan W et al (2021) Discovering high utility-occupancy patterns from uncertain data. Inf Sci (N Y) 546:1208–1229
Article MathSciNet Google Scholar
Vemulapalli S, Mogalla S (2021) High utility-occupancy sequential pattern mining algorithm based on utility-occupancy framework. Int J Eng Trends Technol 69:228–235
Article Google Scholar
Wu CW, Fournier-Viger P, Gu JY, Tseng VS (2019) Mining compact high utility itemsets without candidate generation. In: Fournier-Viger P, Lin JW, Nkambou R, Vo B, Tseng V (eds) High-utility pattern mining. Studies in big data, vol 51. Springer, Cham, pp 279–302
Duong H, Hoang T, Tran T et al (2022) Efficient algorithms for mining closed and maximal high utility itemsets. Knowl Based Syst 257:109921
Article Google Scholar
Wu C-W, Fournier-Viger P, Gu J-Y, Tseng VS (2015) Mining Closed + High Utility Itemsets without Candidate Generation. In: Conference on Technologies and Applications of Artificial Intelligence. pp 187–194
Fournier-Viger P, Zida S, Lin JC-W, et al (2016) EFIM-Closed: Fast and Memory Efficient Discovery of Closed High-Utility Itemsets. In: International Conference on Machine Learning and Data Mining in Pattern Recognition. pp 199–213
Nguyen LTT, Vu VV, Lam MTH et al (2019) An efficient method for mining high utility closed itemsets. Inf Sci (N Y) 495:78–99
Article Google Scholar
Fournier-Viger P, Wu C-W, Tseng VS (2014) Novel Concise Representations of High Utility Itemsets Using Generator Patterns. In: International Conference on Advanced Data Mining and Applications. pp 30–43
Fournier-Viger P, Gomariz A, Campos M (2014) Fast Vertical Mining of Sequential Patterns Using Co-occurrence Information. In: Proceedings of 18th Pacific-Asia Conference on Knowledge Discovery and Data Mining, PAKDD ’2014. pp 40–52
Fournier-Viger P, Gomariz A, Soltani A et al (2014) SPMF: A java open-source pattern mining library. J Mach Learn Res 15:3569–3573
Google Scholar

Download references

Funding

This work was supported by Dalat University, Vietnam [Decision number: 1574/QĐ-ĐHĐL, 2023].

Author information

Authors and Affiliations

Department of Mathematics and Computer Science, Dalat University, Dalat, Vietnam
Hai Duong, Huy Pham & Tin Truong
Shenzhen University, Shenzhen, China
Philippe Fournier-Viger

Authors

Hai Duong
View author publications
You can also search for this author in PubMed Google Scholar
Huy Pham
View author publications
You can also search for this author in PubMed Google Scholar
Tin Truong
View author publications
You can also search for this author in PubMed Google Scholar
Philippe Fournier-Viger
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Hai Duong, Huy Pham and Tin Truong contributed equally to the study conception and design. Material preparation, data collection and analysis were performed by Hai Duong, Huy Pham, and Tin Truong. The first draft of the manuscript was written by Hai Duong, Huy Pham, Tin Truong and Philippe Fournier-Viger. All authors read and approved the content of the manuscript.

Corresponding author

Correspondence to Philippe Fournier-Viger.

Ethics declarations

Compliance with ethical standards

The authors have no conflict of interest. This research was carried using public data. No experiments were conducted with humans or animals.

Competing interests

All authors certify that they have no affiliations with or involvement in any organization or entity with any financial interest or non-financial interest in the subject matter or materials discussed in this manuscript.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix 1: Proof of Proposition 1

a.
For any \(B,C\in \left[A\right]:B\subseteq C\subseteq t,\) since \({u}_{rel}\left(B, t\right)\le {u}_{rel}\left(C, t\right)\) and \(\rho \left(B\right)=\rho (C)\), then \(supp\left(B\right)=supp(C)\), so \(occ\left(B\right)=\frac{1}{supp\left(B\right)}{\sum }_{t\in \rho \left(B\right)}{u}_{rel}\left(B, t\right)\le \frac{1}{supp\left(C\right)}{\sum }_{t\in \rho \left(C\right)}{u}_{rel}\left(C, t\right)=occ(C)\), i.e. \({\mathcal{M}}_{\sim }(occ)\).
b.
For any \(A\in \mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{I}\), we first prove that \(\mathcal{C}\mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{I}\left(A\right)\ne \mathrm{\varnothing }\). Indeed, if \(A\in \mathcal{C}\mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{I}\), then \(\exists {C}_{0}\stackrel{\scriptscriptstyle{\text{def}}}{=}A\in \mathcal{C}\mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{I}\left(A\right)\). Otherwise, \(A\notin \mathcal{C}\mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{I}\), \(\exists {C}_{1}\in \mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{I}:\) \({C}_{1}\supset A\) and \(support({C}_{1})=support(A)\). By a similar argument, if \({C}_{1}\in \mathcal{C}\mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{I}\), then \({C}_{1}\in \mathcal{C}\mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{I}\left(A\right)\). Otherwise, \({C}_{1}\notin \mathcal{C}\mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{I}\), \(\exists {C}_{2}\in \mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{I}:\) \({C}_{2}\supset {C}_{1}\supset A\) and \(support({C}_{2})=support({C}_{1})=support(A)\), and so on. Since \(\mathcal{A}\) is finite, this process must terminate, i.e. there always exists \(n\in {\mathbb{N}}\): \({C}_{n}\in \mathcal{C}\mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{I}\) with \({C}_{n}\supset \dots \supset {C}_{1}\supset A\) and \(support\left({C}_{n}\right)=\dots =support({C}_{1})=support(A).\) Thus, \({C}_{n}\in \mathcal{C}\mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{I}\left(A\right)\) or \(\mathcal{C}\mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{I}\left(A\right)\ne \varnothing .\) Furthermore, assume by contradiction that there exist two different and non-nested closed FHUO itemsets \(B\) and \(C\) in \(\mathcal{C}\mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{I}\left(A\right)\), then for \(D\stackrel{\scriptscriptstyle{\text{def}}}{=}B\bigcup C\supset B\supseteq A\), since \(B,C\in \mathcal{C}\mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{I}\left(A\right)\subseteq \mathcal{C}\mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{I}\subseteq \mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{I}\) and \(support(C)=support(B)=support(A),\) then \(\rho (C)=\rho (B)=\rho (A)\) and \(\rho (D)=\rho (B)\bigcap \rho (C)=\rho (A).\) Therefore, \(D\in [A],\) so \(support(D)=support(A)\ge ms\) or \(D\in \mathcal{F}\mathcal{I}.\) Because \({\mathcal{M}}_{\sim }(occ),\) \(occ(D)\ge occ(B)\ge muo\), i.e. \(\exists D\in \mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{I}:\) \(D\supset B\) and \(support(D)=support(A)\). But this contradicts the fact that \(B\in \mathcal{C}\mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{I}.\) Hence, \(|\mathcal{C}\mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{I}\left(A\right)|=1.\) Let \(C\stackrel{\scriptscriptstyle{\text{def}}}{=}closure(A),\) then by the previous argument, \(C\) is the maximum itemset and has the highest utility occupancy in \([A]\).
c.
Let \(\mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{C}\mathcal{I}\stackrel{\scriptscriptstyle{\text{def}}}{=}\left\{A\in \mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{I} \right| \nexists B\in \mathcal{I}\mathcal{S}:B\supset A\wedge supp\left(B\right)=supp(A)\}\). Firstly, since \(A\in \mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{C}\mathcal{I}\) \(\iff\) [\(A\in \mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{I}\) and \(A\in \mathcal{I}\mathcal{S}:\nexists B\in \mathcal{I}\mathcal{S}:B\supset A\wedge supp\left(B\right)=supp(A)\)] \(\iff\) \(A\in \mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{I} \bigcap \mathcal{C}\mathcal{I}\), i.e. \(\mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{C}\mathcal{I}=\mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{I} \bigcap\) \(\mathcal{C}\mathcal{I}\). Secondly, we will prove that \(\mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{C}\mathcal{I}=\mathcal{C}\mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{I}\). The first inclusion \(\mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{C}\mathcal{I}\subseteq \mathcal{C}\mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{I}\) is evident. Conversely, for any \(A\in \mathcal{C}\mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{I}\), i.e. \(A\in \mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{I}\) and \((\nexists B\in \mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{I}:B\supset A supp\left(B\right)=supp(A))\)^(*), assume by contradiction that \(A\notin \mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{C}\mathcal{I}\). Then, \(\exists B\in \mathcal{I}\mathcal{S}:B\supset A: supp\left(B\right)=supp(A)\), so \(\rho (B)=\rho (A)\), i.e. \(A,B\in [A]\). Since \({\mathcal{M}}_{\sim }(occ)\) and \(A\in \mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{I}\), \(occ\left(B\right)\ge occ\left(A\right)\ge mo\) and \(supp\left(B\right)=supp(A)\ge ms\) so \(B\in \mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{I}.\) This contradicts ^(*). Hence, \(\mathcal{C}\mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{I}\subseteq \mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{C}\mathcal{I}\). Thus, \(\mathcal{C}\mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{I}=\mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{C}\mathcal{I}\). Furthermore, since \(supp\) is anti-monotonic, for any \(B=A+C\supset A\) with \(x\in C\ne \varnothing\) and \(A\subset B{\prime}=A+\{x\}\subseteq B\), we have \(\rho (A)\supseteq \rho (B^{\prime})\supseteq \rho (B)\), so \(supp(A)\ge supp(B^{\prime})\ge supp(B)\). Additionally, if \(supp(A)=supp(B)\), then \(supp(A)=supp(B^{\prime})=supp(B)\). Hence, \(A\) is closed \(\iff\) [\(A\) has no \(1-forward\) and \(1-backward\)]. Thus, the last assertion is proven.

Appendix 2: Proof of Theorem 1

a.
For any proper FE \(B\) of \(A\), \(A\subset B=A\oplus E\in \mathcal{F}\mathcal{I}\) with \(E\ne \varnothing\), and transaction \({t}_{i}\in \rho (B)\subseteq TS(A)\subseteq \rho (A)\), then \(m\stackrel{\scriptscriptstyle{\text{def}}}{=}supp(A)\ge n\stackrel{\scriptscriptstyle{\text{def}}}{=}|TS(A)|\ge p\stackrel{\scriptscriptstyle{\text{def}}}{=}supp\left(B\right)\ge ms,\) \(u\left(B,{t}_{i}\right)=u\left(A,{t}_{i}\right)+u\left(E,{t}_{i}\right)\le u\left(A,{t}_{i}\right)+u\left(rem\left(A,{t}_{i}\right)\right)={ub}_{rem}(A,\) \({t}_{i})\), so \({u}_{rel}\left(B, {t}_{i}\right)\le {ub}_{rem\_\_rel}\left(A,{t}_{i}\right)\). Since \(\mathcal{X}\stackrel{\scriptscriptstyle{\text{def}}}{=}\{{wubocc}_{i}, i=1..n\}\subseteq \mathcal{Y}\stackrel{\scriptscriptstyle{\text{def}}}{=}\{{\widehat{\Phi }}_{i}, i=1..m\}\), \(\mathcal{X}\downarrow\), \(\mathcal{Y}\downarrow\) and \(\rho (B)\subseteq TS(A)\), then the series \(\{{S}_{k}\stackrel{\scriptscriptstyle{\text{def}}}{=}\frac{1}{k}{\sum }_{1\le i\le k}{wubocc}_{i},1\le k\le n\}\) is descending, so \(occ\left(B\right)\stackrel{\scriptscriptstyle{\text{def}}}{=}\frac{1}{p}{\sum }_{{t}_{i}\in \rho \left(B\right)}{u}_{rel}\left(B, {t}_{i}\right)\le \frac{1}{p}{\sum }_{{t}_{i}\in \rho \left(B\right)}{ub}_{re{m}_{rel}}\left(A,{t}_{i}\right)\le {S}_{p}\le {S}_{ms}=wubocc(A)\le \frac{1}{ms}{\sum }_{1\le i\le ms}{\widehat{\Phi }}_{i}=\widehat{\Phi }(A)\).
b.
If \(wubocc\left(A\right)<muo\), then \(occ\left(B\right)\le wubocc\left(A\right)<muo\) for any proper FE \(B\) (of \(A\)), which is always a low utility occupancy itemset.

Appendix 3: Proof of Theorem 2

We will prove that

\(C\) is a CFHUOI ⇔ [\(\left\{C\in \mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{I}\right|C has no 1-forward\, and\,(\nexists D\in \mathcal{C}\mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{I}:D\supset C\, and\, supp\left(D\right)=supp(C))\)].

+ “⇒”: For any CFHUOI \(C=A\oplus y\), i.e. \(C\in \mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{I}\) and (\(\nexists D\in \mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{I}:D\supset C\, and\, supp\left(D\right)=supp(C)\))^(*), assume conversely that \(C\) has \(1-forward\) \(D=A\oplus y\oplus z\) or \((\exists D\in \mathcal{C}\mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{I}:D\supset C and supp\left(D\right)=supp(C))\). Then \(D\supset C\, and\, supp\left(D\right)=supp(C)\ge ms\), so \(D\in [C]\). Since \({M}_{\sim }(occ)\) and \(C\in \mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{I}\), \(occ(D)\ge occ(C)\ge muo\), i.e. \(D\in \mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{I}\). But this contradicts ^(*).

+ “⇐”: If [\(C\in \mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{I}\), \(C\) has no \(1-forward\) and \((\nexists D\in \mathcal{C}\mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{I}:D\supset C\, and\, supp\left(D\right)=supp(C))\)] and assume conversely that \(C=A\oplus y\) is not a CFHUOI, i.e. \(\exists E\in \mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{I}:E=F\oplus y\oplus G\supset C\) with \(y\succ F\supseteq A\) and \(G\succ y\) (such that \(F\supset A\) or \(G\ne \varnothing\)) and \(supp\left(E\right)=supp(C)\). Since such \(E\notin \mathcal{C}\mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{I}\), i.e. \(E\) is not yet considered, then \(F=A\) and \(G\ne \varnothing\), i.e. \(\exists z\in G:z\succ y\). For a special forward extension \(D\) of \(C\), \(E\supseteq D=A\oplus y\oplus z\supset C\), we have \(supp\left(E\right)=supp\left(D\right)=supp(C)\), then \(D\) is a \(1-forward\) of \(C\), i.e. a contradiction occurs.

Appendix 4: Proof of Proposition 3

Assume conversely that \(wubocc(C)<muo\) and \(occ(C)\ge muo\) but \(C\) has \(1-forward\), i.e. \(\exists D=C\oplus y\supset C:\) \(supp\left(D\right)=supp(C)\) and \(\rho \left(D\right)=\rho (C).\) Then, \(D\in [C],\) so \(muo>wubocc(C)\ge occ(D)\ge occ(C)\ge muo\) since \({\mathcal{M}}_{\sim }(occ)\). This is a contradiction. The remaining assertions are derived from Theorem 1 and 2.

Appendix 5: Proof of Proposition 4

a.
Assume conversely that \(\exists A\in \mathcal{M}\mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{I}\subseteq \mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{I}\) but \(A\notin \mathcal{C}\mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{I}\), i.e. \(\exists B\in \mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{I}:B\supset A \bigwedge supp\left(B\right)=supp(A)\), so \(A\notin \mathcal{M}\mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{I}\) and a contradiction occurs. The remaining inclusion, \(\mathcal{C}\mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{I}\subseteq \mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{I}\), is evident.
b.
We first prove that \(\mathcal{M}\mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{I}\subseteq \left\{A\in \mathcal{C}\mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{I}\right| \nexists B\in \mathcal{C}\mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{I}:B\supset A\}\). For any \(A\in \mathcal{M}\mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{I}\subseteq \mathcal{C}\mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{I}\), i.e. (\(\nexists B\in \mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{I}:B\supset A\))⁽⁺⁾, assume \(A\notin \left\{A\in \mathcal{C}\mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{I}\right| \nexists B\in \mathcal{C}\mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{I}:B\supset A\}\), i.e. \(\exists B\in \mathcal{C}\mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{I}\subseteq \mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{I}:B\supset A\), and this contradicts ⁽⁺⁾. Second, we prove the inverse inclusion. Assume that \(A\in \mathcal{C}\mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{I}\subseteq \mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{I}\) and (\(\nexists B\in \mathcal{C}\mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{I}:B\supset A\))⁽⁺⁺⁾, but \(A\notin \mathcal{M}\mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{I}\), i.e. \(\exists B^{\prime}\in \mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{I}:B^{\prime}\supset A\). Then, by Proposition 1b, \(\exists B\stackrel{\scriptscriptstyle{\text{def}}}{=}closure(B^{\prime})\in \mathcal{C}\mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{I}({B}^{\prime})\subseteq \mathcal{C}\mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{I}: B\supseteq B^{\prime}\supset A\), but this contradicts ⁽⁺⁺⁾.

Appendix 6: Proof of Theorem 3

For any FE \(G\) of \({B}^{\prime}\) with \(F\succ y\), i.e. \(G={B}{\prime}\oplus F={A}^{\prime}\oplus y\oplus F\in Br\left({B}^{\prime}\right)\). Since \(A\subseteq A\mathrm{^{\prime}}\subset C\) and \(D\succ y\), then for \(H\stackrel{\scriptscriptstyle{\text{def}}}{=}S \bigcup F=C\oplus y\oplus (D \bigcup F)\) (when \(F=\varnothing\), \(H=S\) and \(G={B}{\prime}\)), we have \(G\subset H\). Due to \(supp\left(B\right)=supp(S)\) and \(B\subseteq B\mathrm{^{\prime}}\subset S\), \(\rho \left(B\right)=\rho \left(B^{\prime}\right)=\rho (S)\), so \(\forall T\supseteq B^{\prime}\), \(T\supseteq S\). Then, for any \(T\supseteq G=B^{\prime}\oplus F\), we have \(T\supseteq S\) and \(T\supseteq F\), so \(T\supseteq H\), i.e. \(\rho \left(G\right)\subseteq \rho \left(H\right)\). Because the reverse inclusion is evident, we obtain \(\rho \left(G\right)=\rho \left(H\right)\). Therefore, \(H\in \left[G\right]\). If \(occ\left(G\right)<muo\) or \(supp\left(G\right)<ms\), i.e. \(G\notin \mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{I}\), then \(G\notin \mathcal{C}\mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{I}\). Otherwise, if \(occ\left(G\right)\ge muo\) and \(supp\left(G\right)\ge ms\), then \(supp\left(H\right)=supp\left(G\right)\ge ms\) and \(occ\left(H\right)\ge occ\left(G\right)\ge muo\) since \({\mathcal{M}}_{\sim }\left(occ\right)\). Hence, \(\exists H\in \mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{I}:H\supset G\) and \(supp\left(H\right)=supp\left(G\right)\), i.e. \(G\notin \mathcal{C}\mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{I}\). Thus, the \(Br\left({B}^{\prime}\right)\) branch is \(Non-CloFHUO\).

Appendix 7: Proof of Corollary 1

For \(C=A\oplus y\) is being considered, since ^(*) in (1) does not hold, i.e. \(\exists S\in \mathcal{C}\mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{I}:S\supset C and supp\left(S\right)=supp\left(C\right)\), then such \(S\) must be already considered and \(S=C^{\prime}\oplus y\oplus D\) with \(C^{\prime}\supset A\). Thus, by Theorem 3, the \(Br\left(C\right)\) branch is \(Non-CloFHUO\).

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Duong, H., Pham, H., Truong, T. et al. Efficient algorithms to mine concise representations of frequent high utility occupancy patterns. Appl Intell 54, 4012–4042 (2024). https://doi.org/10.1007/s10489-024-05296-2

Download citation

Accepted: 28 January 2024
Published: 18 March 2024
Issue Date: March 2024
DOI: https://doi.org/10.1007/s10489-024-05296-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Efficient algorithms to mine concise representations of frequent high utility occupancy patterns

Abstract

Access this article

Similar content being viewed by others

The pattern frequency distribution theory: a mathematic establishment toward rational and reliable pattern mining

CG-FHAUI: an efficient algorithm for simultaneously mining succinct pattern sets of frequent high average utility itemsets

Spatial data management in apache spark: the GeoSpark perspective and beyond

Data availability

Notes

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Compliance with ethical standards

Competing interests

Additional information

Publisher's Note

Appendices

Appendix 1: Proof of Proposition 1

Appendix 2: Proof of Theorem 1

Appendix 3: Proof of Theorem 2

Appendix 4: Proof of Proposition 3

Appendix 5: Proof of Proposition 4

Appendix 6: Proof of Theorem 3

Appendix 7: Proof of Corollary 1

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Efficient algorithms to mine concise representations of frequent high utility occupancy patterns

Abstract

Access this article

Similar content being viewed by others

The pattern frequency distribution theory: a mathematic establishment toward rational and reliable pattern mining

CG-FHAUI: an efficient algorithm for simultaneously mining succinct pattern sets of frequent high average utility itemsets

Spatial data management in apache spark: the GeoSpark perspective and beyond

Data availability

Notes

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Compliance with ethical standards

Competing interests

Additional information

Publisher's Note

Appendices

Appendix 1: Proof of Proposition 1

Appendix 2: Proof of Theorem 1

Appendix 3: Proof of Theorem 2

Appendix 4: Proof of Proposition 3

Appendix 5: Proof of Proposition 4

Appendix 6: Proof of Theorem 3

Appendix 7: Proof of Corollary 1

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation