Skip to main content
Log in

Frequent high minimum average utility sequence mining with constraints in dynamic databases using efficient pruning strategies

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

High utility sequence mining is a popular data mining task, which aims at finding sequences having a high utility (importance) in a quantitative sequence database. Though it has several applications, state-of-the-art algorithms have one or more of the following limitations: (1) they rely on a utility function that tends to be biased toward finding long patterns, (2) some algorithms do take pattern length into account using an average-utility function but they adopt an optimistic perspective that can be risky or misleading for some applications, (3) they do not let the user specify additional constraints on patterns to be found. To address these three limitations, this paper defines a novel task of mining frequent high minimum average-utility sequences (FHAUS) with constraints in a quantitative sequence database. This task has the following benefits. First, it uses the average-utility au function based on the minimum utility, which takes the length of a pattern into account to calculate its utility. This helps finding short patterns missed by traditional algorithms and it is based on more safe pessimistic utility calculations. Second, the user can specify a set of monotonic and anti-monotonic constraints C on patterns to filter irrelevant patterns and improve the performance of the mining process. To efficiently find all FHAUSs with constraints, this paper first proposes some novel upper bounds (UBs) and weak upper bounds (WUBs) on the average-utility, which satisfy downward-closure (DC) properties or DC-like properties. Then, to effectively reduce the search space, the paper designs novel width pruning, depth pruning, reducing, and tightening strategies based on the proposed bounds. These proposed novel theoretical results are integrated into an algorithm named C-FHAUSPM (Constrained Frequent High minimum Average-Utility Sequential Pattern Mining) for efficiently discovering all FHAUSs with constraints. Results from extensive experiments on both real-life and synthetic quantitative sequence databases show that C-FHAUSPM is highly efficient in terms of runtime and memory usage.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

References

  1. Ahmed CF, Tanbeer SK, Jeong BS (2010) A novel approach for mining high-utility sequential patterns in sequence databases. ETRI 32(5):676–686

    Article  Google Scholar 

  2. Ahmed CF, Tanbeer SK, Jeong BS (2010) Mining high utility web access sequences in dynamic web log data. In Proceedings of 11th ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing, SNPD2010, pp.76–81

  3. Shie BE, Cheng JH, Chuang KT, Tseng VS (2012) A one-phase method for mining high utility mobile sequential patterns in mobile commerce environments. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), pp.616–626

  4. Shie BE, Hsiao HF, Tseng VS, Yu PS (2011) Mining high utility mobile sequential patterns in mobile commerce environments. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), pp.224–238

  5. Shie BE, Yu PS, Tseng VS (2013) Mining interesting user behavior patterns in mobile commerce environments. Appl Intell 38(3):418–435

    Article  Google Scholar 

  6. Gan W, Lin JC, Zhang J, Chao H, Fujita H, Yu PS (2020) ProUM: projection-based utility mining on sequence data. Inf. Sci. (Ny). 513 222–240 Elsevier Inc.

  7. Zihayat M, Davoudi H, An A (2017) Top-k utility-based gene regulation sequential pattern discovery. In Proceedings of 2016 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2016, pp.266–273

  8. Truong T, Tran A, Duong H, Le B, Fournier-Viger P (2020) EHUSM : mining high utility sequences with a pessimistic utility model. Data Sci Pattern Recognit 4(1):65–83

    Google Scholar 

  9. Truong T, Duong H, Le B, Fournier-Viger P (2019) FMaxCloHUSM: An efficient algorithm for mining frequent closed and maximal high utility sequences. Eng Appl Artif Intell 85(1):1–20

    Article  Google Scholar 

  10. Truong T, Tran A, Duong H, Le B (2019) Hupsmt: An efficient algorithm for mining high utility-probability sequences in uncertain databases with multiple minimum utility thresholds. Comput Sci Cybern 35(1):1–20

    Article  Google Scholar 

  11. Lan GC, Hong T-P, Tseng VS, Wang SL (2014) Applying the maximum utility measure in high utility sequential pattern mining. Expert Syst Appl 41(11):5071–5081

    Article  Google Scholar 

  12. Yin J, Zheng Z, Cao L (2012) USpan: An efficient algorithm for mining high utility sequential patterns. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp.660–668

  13. Zaki MJ (2001) SPADE: An efficient algorithm for mining frequent sequences. Mach Learn 42(1):31–60

    Article  Google Scholar 

  14. Pei J, Han J, Mortazavi-Asl B, Pinto H, Chen Q, Dayal U, et al. (2001) PrefixSpan: mining sequential patterns by prefix-projected growth. In Proceedings of the 17th International Conference on Data Engineering, pp.215–224

  15. Fournier-Viger P, Gomariz A, Campos M (2014) Fast vertical Mining of Sequential Patterns Using co-occurrence Information. In Proceedings of 18th Pacific-Asia Conference on Knowledge Discovery and Data Mining, PAKDD ‘2014, pp.40–52

  16. Wang JZ, Huang JL, Chen YC (2016) On efficiently mining high utility sequential patterns. Knowl Inf Syst 49(2):597–627

    Article  Google Scholar 

  17. Truong T, Fournier-Viger P (2019) A survey of high utility sequential pattern mining. In P. Fournier-Viger, J. C.-W. Lin, R. Nkambou, V. Bay, & V. S. Tseng, High-Utility Pattern Mining: Theory, Algorithms and Applications, pp.97–129

  18. Gan W, Lin JC-W, Zhang J, Fournier-Viger P, Chao H, Yu PS (2019) Fast utility mining on complex sequences. CoRR 1904(2):1–15

    Google Scholar 

  19. Hong T-P, Lee CH, Wang SL (2011) Effective utility mining with the measure of average utility. Expert Syst Appl 38(7):8259–8265

    Article  Google Scholar 

  20. Lan GC, Hong T-P, Tseng VS (2012) Efficiently mining high average-utility Itemsets with an improved upper-bound strategy. Inf Technol Decis Mak 11(05):1009–1030

    Article  Google Scholar 

  21. Lin JC-W, Ren S, Fournier-Viger P (2018) MEMU: more efficient algorithm to mine high average-utility patterns with multiple minimum average-utility thresholds. IEEE Access 6(8):7593–7609

    Article  Google Scholar 

  22. Lin JC-W, Ren S, Fournier-Viger P, Hong T-P (2017) EHAUPM: efficient high average-utility pattern mining with tighter upper bounds. IEEE Access 5(8):12927–12940

    Article  Google Scholar 

  23. Wu JMT, Lin JC-W, Pirouz M, Fournier-Viger P (2018) TUB-HAUPM: tighter upper bound for mining high average-utility patterns. IEEE Access 6(1):18655–18669

    Article  Google Scholar 

  24. Yun U, Kim D (2017) Mining of high average-utility itemsets using novel list structure and pruning strategy. Futur Gener Comput Syst 68(1):346–360

    Article  Google Scholar 

  25. Thilagu M, Nadarajan R (2012) Efficiently Mining of Effective web Traversal Patterns with average utility. Procedia Technol 6(1):444–451

    Article  Google Scholar 

  26. Truong T, Duong H, Le B, Fournier-Viger P (2020) EHAUSM: An efficient algorithm for high average utility sequence mining. Inf Sci (Ny) 515(1):302–323

    Article  MathSciNet  Google Scholar 

  27. Fournier-Viger P, Li J, Lin JC-W, Truong T, Uday Kiran R (2020) Mining cost-effective patterns in event logs. Knowledge-Based Syst 191:105241

    Article  Google Scholar 

  28. Nguyen LTT, Nguyen P, Nguyen TDD, Vo B, Fournier-Viger P, Tseng VS (2019) Mining high-utility itemsets in dynamic profit databases. Knowledge-Based Syst 175(1):130–144

    Article  Google Scholar 

  29. Alkan OK, Karagoz P (2015) CRoM and HuspExt: improving efficiency of high utility sequential pattern extraction. IEEE Trans Knowl Data Eng 27(10):2645–2657

    Article  Google Scholar 

  30. Reddy PPC, Uday Kiran R, Zettsu K, Toyoda M, Krishna Reddy P, Kitsuregawa M (2019) Discovering spatial high utility frequent Itemsets in spatiotemporal databases. In Proceedings of International Conference on Big Data Analytics (BDA 2019), pp.287–306

  31. Liu Y, Liao W, Choudhary A (2005) A fast high utility itemsets mining algorithm. In Proceedings of the 1st international workshop on Utility-based data mining, pp.90–99

  32. Nguyen LTT, Vu D, Nguyen TDD, Vo B (2020) Mining maximal high utility Itemsets on dynamic profit databases. Cybern Syst 51(2):1–21 Taylor & Francis

    Article  Google Scholar 

  33. Nguyen LTT, Vu VV, Lam MTH, Duong TTM, Manh LT, Nguyen TTT et al (2019) An efficient method for mining high utility closed itemsets. Inf Sci (NY) 495:78–99 Elsevier Inc

    Article  Google Scholar 

  34. Gan W, Lin JC, Chao H, Fujita H, Yu PS (2019) Correlated utility-based pattern mining. Inf Sci (NY) 504:470–486 Elsevier Inc

    Article  MathSciNet  Google Scholar 

  35. Truong T, Duong H, Le B, Fournier-Viger P (2018) Efficient vertical Mining of High Average-Utility Itemsets Based on novel upper-bounds. IEEE Trans Knowl Data Eng 31(2):301–314

    Article  Google Scholar 

  36. Hong T-P, Lee CH, Wang SL (2009) Mining high average-utility itemsets. In Proceedings of IEEE International Conference on Systems, Man and Cybernetics, pp.2526–2530

  37. Lin JC-W, Li T, Fournier-Viger P, Hong T-P, Zhan J, Voznak M (2016) An efficient algorithm to mine high average-utility itemsets. Adv Eng Informatics 30(2):233–243

    Article  Google Scholar 

  38. Lin JC-W, Ren S, Fournier-Viger P, Hong T-P, Su J-H, Vo B (2017) A fast algorithm for mining high average-utility itemsets. Appl Intell 47(2):331–346

    Article  Google Scholar 

  39. Kim H, Yun U, Baek Y, Kim J, Vo B, Yoon E et al (2021) Efficient list based mining of high average utility patterns with maximum average pruning strategies. Inf Sci (NY) 543:85–105 Elsevier Inc

    Article  Google Scholar 

  40. Truong T, Duong H, Le B, Fournier-Viger P, Yun U (2019) Efficient high average-utility itemset mining using novel vertical weak upper-bounds. Knowledge-Based Syst 183(1):104847

    Article  Google Scholar 

  41. Wu R, Li Q, Chen X (2019) Mining contrast sequential pattern based on subsequence time distribution variation with discreteness constraints. Appl Intell 49(12):4348–4360

    Article  Google Scholar 

  42. Pei J, Han J, Wang W (2007) Constraint-based sequential pattern mining: the pattern-growth methods. Intell Inf Syst 28(2):133–160

    Article  Google Scholar 

  43. Zaki MJ (2000) Sequence mining in categorical domains: incorporating constraints. In Proceedings of the ninth international conference on Information and knowledge management, pp.422–429

  44. Chen YL, Chiang MC, Ko MT (2003) Discovering time-interval sequential patterns in sequence databases. Expert Syst Appl 25(3):343–354

    Article  Google Scholar 

  45. Zhang J, Wang Y, Yang D (2015) CCSpan: mining closed contiguous sequential patterns. Knowledge-Based Syst 89(1):1–13

    Article  Google Scholar 

  46. Mallick B, Garg D, Grover PS (2014) Constraint-based sequential pattern mining: a pattern growth algorithm incorporating compactness, length and monetary. Inf Technol 11(1):33–42

    Google Scholar 

  47. Yun U, Ryu KH (2010) Discovering important sequential patterns with length-decreasing weighted support constraints. Inf Technol Decis Mak 09(04):575–599

    Article  Google Scholar 

  48. Van T, Vo B, Le B (2018) Mining sequential patterns with itemset constraints. Knowl Inf Syst 57(2):311–330

    Article  Google Scholar 

  49. Van T, Yoshitaka A, Le B (2018) Mining web access patterns with super-pattern constraint. Appl Intell 48(11):3902–3914

    Article  Google Scholar 

  50. Fournier-Viger P, Lin JC-W, Gomaris A, Gueniche T, Soltani A, Deng Z et al (2014) SPMF: a Java open-source pattern mining library version 2. Mach Learn Res 15(1):3389–3393

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Philippe Fournier-Viger.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendices

1.1 Appendix 1 (Proof of Theorem 1)

To prove Theorem 1, the following lemma is introduced.

Lemma 1 (Properties of taub)

For each \( {\varPsi}^{\prime}\in \mathcal{D}\hbox{'} \), consider any sequence α and its forward extension β in Ψ', and m  =  length(β)  ≥  l  =  length(α).

  1. a.

    The sequence \( \left\{\frac{1}{k}{S}_k^{\alpha },1\le k\le n\right\} \) is decreasing.

  2. b.

    \( \mathcal{T}\left(\beta, {\varPsi}^{\prime}\right)\subseteq \mathcal{T}\left(\alpha, {\varPsi}^{\prime}\right) \) and \( \frac{1}{m}{S}_m^{\beta}\le \frac{1}{m}{S}_m^{\alpha } \).

  3. c.

    AM_aub(α,  Ψ') and taub(α,  Ψ') are gradually tighter UBs on au(α,  Ψ') in each Ψ', i.e., AM_aub(α,  Ψ')  ≥  taub(α,  Ψ')  ≥  au(α,  Ψ'),  ∀  α  ⊆  Ψ'.

  4. d.

    \( \mathcal{AMF}\left( taub,{\varPsi}^{\prime}\right) \), or taub(α,  Ψ') is anti-monotone w.r.t forward extensions in each Ψ', i.e., taub(β,  Ψ')  ≤  taub(α,  Ψ'), for any forward extension β of α in Ψ'.

Proof.

The two assertions a and d are proven similarly to Lemma 1 in.

  • b. For any forward extension β of α in Ψ', Ψ'  ⊇  β  =  α  ⋄  δ  ⊇  α, since \( {\alpha}_{first}^{\prime}\subseteq {\beta}_{first}^{\prime}\subseteq {\varPsi}^{\prime } \), \( lastItem\left({\beta}_{first}^{\prime}\right) \) does not precede \( lastItem\left({\alpha}_{first}^{\prime}\right) \) (in Ψ'). Thus, \( \mathcal{V}\left(\beta, {\varPsi}^{\prime}\right)\subseteq \mathcal{V}\left(\alpha, {\varPsi}^{\prime}\right) \) and \( \mathcal{T}\left(\beta, {\varPsi}^{\prime}\right)\subseteq \mathcal{T}\left(\alpha, {\varPsi}^{\prime}\right) \), so \( \frac{1}{m}{S}_m^{\beta}\le \frac{1}{m}{S}_m^{\alpha } \).

  • c. For any \( {\alpha}^{\prime}\in \mathcal{O}\left(\alpha, {\varPsi}^{\prime}\right), \) we always have l  =  length(α'), \( au\left({\alpha}_{first}^{\prime}\right)\le \frac{1}{l}{S}_l^{\alpha }= taub\left(\alpha, {\varPsi}^{\prime}\right) \), so \( au\left(\alpha, {\varPsi}^{\prime}\right)=\min \left\{ au\left({\alpha}^{\prime}\right)|{\alpha}^{\prime}\in \mathcal{O}\left(\alpha, {\varPsi}^{\prime}\right)\right\}\le au\left({\alpha}_{first}^{\prime}\right)\le taub\left(\alpha, {\varPsi}^{\prime}\right) \). Moreover, since \( {S}_l^{\alpha}\le \min \left\{u\left({\varPsi}^{\prime}\right),l\times \max \right\{q\mid \)   (a,  q)  ∈  Ψ'}},  ∀  α  ⊆  Ψ', it follows that taub(α,  Ψ')  ≤  AM_aub(α,  Ψ').

Proof of Theorem 1.

  • a. (i). For any super-sequence β of α, β  ⊇  α, since ρ(β)  ⊆  ρ(α) and m  =  length(β)  ≥  l  =  length(α), then \( \frac{1}{m}u\left({\varPsi}^{\prime}\right)\le \frac{1}{l}u\left({\varPsi}^{\prime}\right) \) and AM_aub(β)  ≤  AM_aub(α), i.e., \( \mathcal{AM}\left( AM\_ aub\right) \).

  • (ii). By Lemma 1.d, for any forward extension β of α in Ψ', since ρ(β)  ⊆  ρ(α) and taub(β,  Ψ')  ≤  taub(α,  Ψ'), then\( t\_ aub\left(\beta \right)={\sum}_{\varPsi^{\prime}\in \rho \left(\beta \right)} taub\left(\beta, {\varPsi}^{\prime}\right)\le {\sum}_{\varPsi^{\prime}\in \rho \left(\alpha \right)} taub\left(\beta, {\varPsi}^{\prime}\right)\le {\sum}_{\varPsi^{\prime}\in \rho \left(\alpha \right)} taub\left(\alpha, {\varPsi}^{\prime}\right)=t\_ aub\left(\alpha \right). \) Thus, \( \mathcal{AMF}\Big(t\_ aub \)).

  • (iii). Without loss of the generality, we only consider any forward extension β  =  α  ⋄  z of α  =  ε  ⋄  y. By Lemma 1.d, \( l\_ aub\left(\beta \right)={\sum}_{\varPsi^{\prime}\in \rho \left(\beta \right)} taub\left(\alpha, {\varPsi}^{\prime}\right)\le {\sum}_{\varPsi^{\prime}\in \rho \left(\alpha \right)} taub\left(\varepsilon, {\varPsi}^{\prime}\right)=l\_ aub\left(\alpha \right) \). Similarly, for any backward extension β  =  ε  ⋄  δ  ⋄  y of α  =  ε  ⋄  y, since ε  ⋄  δ is a forward extension ε, we also have \( l\_ aub\left(\beta \right)={\sum}_{\varPsi^{\prime}\in \rho \left(\beta \right)} taub\left(\varepsilon \diamond \delta, {\varPsi}^{\prime}\right)\le {\sum}_{\varPsi^{\prime}\in \rho \left(\alpha \right)} taub\left(\varepsilon, {\varPsi}^{\prime}\right)=l\_ aub\left(\alpha \right) \). Hence, \( \mathcal{AMB}i\left(l\_ aub\right) \).

  • b. + First, we prove that t_aub and l_aub are UBs on au, and t_aub is tighter than l_aub. For any β  =  α  ⋄  δ  ⊃  α and Ψ'  ∈  ρ(β)  ⊆  ρ(α), by Lemma 1.c and d, we have \( au\left(\alpha \right)={\sum}_{\varPsi^{\prime}\in \rho \left(\alpha \right)} au\left(\alpha, {\varPsi}^{\prime}\right)\le {\sum}_{\varPsi^{\prime}\in \rho \left(\alpha \right)} taub\left(\alpha, {\varPsi}^{\prime}\right)=t\_ aub\left(\alpha \right) \), i.e., t_aub is an UB on au. Moreover, \( t\_ aub\left(\alpha \diamond y\right)\overset{\mathrm{def}}{=}{\sum}_{\varPsi^{\prime}\in \rho \left(\alpha \diamond y\right)} taub\left(\alpha \diamond y,{\varPsi}^{\prime}\right)\le {\sum}_{\varPsi^{\prime}\in \rho \left(\alpha \diamond y\right)} taub\left(\alpha, {\varPsi}^{\prime}\right)=l\_ aub\left(\alpha \diamond y\right) \), so taub is tighter than l_aub. However, as shown in Example 3, l_aub and AM_aub are incomparable.

+ Next, we prove that t_waub is a WUB on au, and it is tighter than t_aub. By Lemma 1.a, \( twaub\left(\alpha, {\varPsi}^{\prime}\right)\le topwaub\left(\alpha, {\varPsi}^{\prime}\right)=\frac{1}{l+1}{S}_{l+1}^{\alpha}\le \frac{1}{l}{S}_l^{\alpha }= taub\left(\alpha, {\varPsi}^{\prime}\right) \). Besides, since PE(α)  ⊆  ρ(α), then \( t\_ waub\left(\alpha \right)={\sum}_{\varPsi^{\prime}\in SC\left(\alpha \right)} twaub\left(\alpha, {\varPsi}^{\prime}\right)\le {\sum}_{\varPsi^{\prime}\in \rho \left(\alpha \right)} taub\left(\alpha, {\varPsi}^{\prime}\right)=t\_ aub\left(\alpha \right) \). By Theorem 1.a, t_waub is tighter than t_aub.

Moreover, for any proper (forward) extension β  =  α  ⋄  δ of α with δ≠ <> and any m: such that \( l<m= length\left(\beta \right)\le n=\mid \mathcal{T}\left(\alpha, {\varPsi}^{\prime}\right)\mid \), consider Ψ'  ∈  ρ(β)  ⊆  PE(α)  ⊆  ρ(α).

(i). Since \( \mathcal{T}\left(\beta, {\varPsi}^{\prime}\right)\subseteq \mathcal{T}\left(\alpha, {\varPsi}^{\prime}\right) \) by Lemma 1.b-c, we have \( au\left(\beta, {\varPsi}^{\prime}\right)\le taub\left(\beta, {\varPsi}^{\prime}\right)=\frac{1}{m}{S}_m^{\beta}\le \frac{1}{m}{S}_m^{\alpha}\le \frac{1}{l+1}{S}_{l+1}^{\alpha }= topwaub\left(\alpha, {\varPsi}^{\prime}\right) \).

(ii). On the other hand, for any \( {\beta}^{\prime}\in \mathcal{O}\left(\beta, {\varPsi}^{\prime}\right) \), there exists \( {\alpha}^{\prime}\in \mathcal{O}\left(\alpha, {\varPsi}^{\prime}\right):{\beta}^{\prime }={\alpha}^{\prime}\diamond {\delta}^{\prime } \), length(rem(α',  Ψ'))  >  0, 1  ≤  k  =  length(δ')  ≤  length(rem(α',  Ψ')) and length(β')  =  m  =  l  +  k, \( au\left({\beta}^{\prime}\right)=\frac{1}{m}\left(u\left({\alpha}^{\prime}\right)+{\sum}_{\left({a}_i,{q}_i\right)\in {\delta}^{\prime }}{q}_i\right)\le \frac{l.{au}^{\prime }+k. Mu}{l+k}=g(k) \). Since \( {g}^{\prime }(k)=\frac{l.\left( Mu-{au}^{\prime}\right)}{{\left(l+k\right)}^2} \) is in the [1,  length(rem(α',  Ψ'))] interval, if Mu  >  au', the function g(m) increases, so g(k)  ≤  g(length(rem(α',  Ψ')))  =  remwaub'(α',  Ψ'). Otherwise, g(k) decreases, so g(k)  ≤  g(1)  =  remwaub'(α',  Ψ'). Thus, we always have au(β')  ≤  remwaub'(α',  Ψ').Hence, au(β,  Ψ') \( =\min \left\{ au\left({\beta}^{\prime}\right)|{\beta}^{\prime}\in \mathcal{O}\left(\beta, {\varPsi}^{\prime}\right)\right\} \) ≤ max{ \( {remwaub}^{\prime}\left({\alpha}^{\prime },{\varPsi}^{\prime}\right)\mid {\alpha}^{\prime}\in \mathcal{O}\left(\alpha, {\varPsi}^{\prime}\right)\wedge length\left( rem\left({\alpha}^{\prime },{\varPsi}^{\prime}\right)\right)>0\Big\}=\max \left\{{remwaub}^{\prime}\left({\alpha}^{\prime },{\varPsi}^{\prime}\right)|{\alpha}^{\prime}\in \mathcal{O}\left(\alpha, {\varPsi}^{\prime}\right)\wedge rem\left({\alpha}^{\prime },{\varPsi}^{\prime}\right)\ne <>\right\} \) =remwaub(α, Ψ').

From (i) and (ii), we have au(β,  Ψ')  ≤    min  {topwau b(α,  Ψ'),  remwaub(α,  Ψ')}  =  twaub(α,  Ψ'). Since ρ(β)  ⊆  PE(α), we obtain \( au\left(\beta \right)={\sum}_{\varPsi^{\prime}\in \rho \left(\beta \right)} au\left(\beta, {\varPsi}^{\prime}\right)\le {\sum}_{\varPsi^{\prime}\in PE\left(\alpha \right)} twaub\left(\alpha, {\varPsi}^{\prime}\right)=t\_ waub\left(\alpha \right) \). Thus, t_waub is a WUB on au.

1.2 Appendix 2 (Proof of Theorem 2)

  1. a.

    If (t_aub(α)  <  mu), then by Theorem 1.a.(ii), ∀β  =  α  ⋄  δ  ⊇  α, ρ(β)  ⊆  ρ(α), au(β)  ≤  t_aub(β)  ≤  t_aub(α)  <  mu, i.e., we can deeply prune the branch(α).

  2. b.

    If (t_waub(α)  <  mu), then for any proper forward extension β  =  α  ⋄  δ(⊃α) of α, we have au(β)  ≤  t_waub(α)  <  mu. Hence, propBranch(α) can be pruned.

  3. c.

    If WidthPCl_aub(α), for example l_aub(α)  <  mu, we can also prune branch(α) because \( \mathcal{AMB}i\left(l\_ aub\right) \) yields \( \mathcal{AMF}\left(l\_ aub\right) \). Besides, since \( \mathcal{AMB}i\left(l\_ aub\right) \), the remaining assertions are also true. Indeed, for example, for any y  ∈  I(αix), then y  ≻  x, l_aub(αixiy)  ≥  mu. Thus, l_aub(αiy)  ≥  l_aub(αi xiy)  ≥  mu, so y  ∈  I(α), i.e., I(αix)  ⊆  I(α). The remaining assertions are similarly proven.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Truong, T., Duong, H., Le, B. et al. Frequent high minimum average utility sequence mining with constraints in dynamic databases using efficient pruning strategies. Appl Intell 52, 6106–6128 (2022). https://doi.org/10.1007/s10489-021-02520-1

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-021-02520-1

Keywords

Navigation