Skip to main content
Log in

Fast generation of sequential patterns with item constraints from concise representations

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

Constraint-based frequent sequence mining is an important and necessary task in data mining since it shows results very close to the requirements and interests of users. Most existing algorithms for performing this task are based on a traditional approach that mines patterns directly from a sequence database (SDB). However, in fact, SDBs are often very large. The algorithms thus often exhibit poor performance because the number of generated candidates and the search space are enormous, especially for low minimum support thresholds. In addition, these algorithms must read an SDB again when a constraint is changed by the user. In the context of frequently varied constraints, repeatedly scanning SDBs consume much time. To address this issue, we propose a novel approach for generating frequent sequences with various constraints from the two sets of frequent closed sequences (\( {{\mathcal{F}}{\mathcal{C}}{\mathcal{S}}} \)) and frequent generator sequences (\( {{\mathcal{F}}{\mathcal{G}}{\mathcal{S}}} \)), which are the concise representations of the set \( {{\mathcal{F}}{\mathcal{S}}} \) of all frequent sequences. The proposed approach is based on novel theoretical results that show an explicit relationship between \( {{\mathcal{F}}{\mathcal{S}}} \) and these two sets and have been strictly proved. The approach is then used to develop an efficient algorithm named MFS-IC for quickly generating frequent sequences with item constraints, a task that has many real-life applications. Extensive experiments on real-life and synthetic databases show that the proposed MFS-IC algorithm outperforms state-of-the-art algorithms, which directly mine frequent sequences with constraints from an SDB, in terms of runtime, memory usage and scalability.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Notes

  1. iff means “if and only if”.

References

  1. Abboud Y, Boyer A, Brun A (2017) CCPM: a scalable and noise-resistant closed contiguous sequential patterns mining algorithm. In: 13th International conference on machine learning and data mining MLDM 2017. Springer, pp 147–162

  2. Abboud Y, Brun A, Boyer A (2019) C3Ro: an efficient mining algorithm of extended-closed contiguous robust sequential patterns in noisy data. Expert Syst Appl 131:172–189

    Article  Google Scholar 

  3. Agrawal R, Srikant R (1995) Mining sequential patterns. In: Proceedings of the eleventh international conference on data engineering. Washington, DC, pp 3–14

  4. Amo SD, Furtado DA (2007) First-order temporal pattern mining with regular expression constraints. Data Knowl Eng 62(3):401–420

    Article  Google Scholar 

  5. Anh TN, Hai DV, Tin TC, Bac LH (2012) Mining frequent itemsets with dualistic constraints. In: Proceedings of PRICAI 2012, LNAI, pp 807–813

  6. Anh T, Tin T, Bac L (2014) Structures of frequent itemsets and classifying structures of association rule set by order relations. Int J Intell Inf Database Syst 8(4):295–323

    Google Scholar 

  7. Ayres J, Flannick J, Gehrke J, Yiu T (2002) Sequential pattern mining using a bitmap representation. In: Proceedings of the eighth ACM SIGKDD international conference on knowledge discovery and data mining, KDD’02. ACM, New York, pp 429–435

  8. Bac L, Hai D, Tin T, Fournier-Viger P (2017) FCloSM, FGenSM: two efficient algorithms for mining frequent closed and generator sequences using the local pruning strategy. Int J Knowl Inf Syst (KAIS) 53(1):71–107

    Article  Google Scholar 

  9. Bao H, Bay V, Snasel V (2017) An efficient parallel method for mining frequent closed sequential patterns. IEEE Access 5:17392–17402

    Article  Google Scholar 

  10. Béchet N, Cellier P, Charnois T, Crémilleux B (2015) Sequence mining under multiple constraints. In: The 30th annual ACM symposium on applied computing, pp 908–914

  11. Buffett S (2018) Candidate list maintenance in high utility sequential pattern mining. In: 2018 IEEE International conference on big data, pp 644–652

  12. Chen E, Cao H, Li Q, Qian T (2008) Efficient strategies for tough aggregate constraint-based sequential pattern mining. Inf Sci (NY) 178(6):1498–1518

    Article  MathSciNet  MATH  Google Scholar 

  13. Chen YL, Chiang MC, Ko MT (2003) Discovering time-interval sequential patterns in sequence databases. Expert Syst Appl 25(3):343–354

    Article  Google Scholar 

  14. Fournier-Viger P, Gomariz A, Campos M, Thomas R (2014) Fast vertical mining of sequential patterns using co-occurrence information. In: Proceedings of 18th Pacific-Asia conference on knowledge discovery and data mining, PAKDD ‘2014, pp 40–52

  15. Fournier-Viger P, Gomariz A, Gueniche T, Soltani A, Wu C, Tseng VS (2014) SPMF: a Java open-source pattern mining library. J Mach Learn Res 15(1):3389–3393

    MATH  Google Scholar 

  16. Fournier-Viger P, Gomariz A, Šebek M, Hlosta M (2014) VGEN: fast vertical mining of sequential generator patterns. In: Proceedings of 16th international conference on data warehousing and knowledge discovery, DWKD’14. Springer International Publishing, Munich, pp 476–488

  17. Fumarola F, Lanotte PF, Ceci M, Malerba D (2016) CloFAST: closed sequential pattern mining using sparse and vertical id-lists. Knowl Inf Syst 48(2):429–463

    Article  Google Scholar 

  18. Gan W, Lin JC, Fournier-Viger P, Chao H, Yu PS (2019) A survey of parallel sequential pattern mining. ACM Trans Knowl Discov Data (TKDD) 13(3):1–34

    Article  Google Scholar 

  19. Garofalakis M, Rastogi R, Shim K (1999) SPIRIT: sequential pattern mining with regular expression constraints. In: Proceedings of the 25th VLDB conference, pp 223–234

  20. Gomariz A, Campos M, Marin R, Goethals B (2013) ClaSP: an efficient algorithm for mining frequent closed sequences. In: Proceedings of 17th Pacific-Asia conference, PAKDD ‘13. Springer, Gold Coast, pp 50–61

  21. Hai D, Tin T, Bac L (2018) Efficient algorithms for simultaneously mining concise representations of sequential patterns based on extended pruning conditions. Int J Eng Appl Artif Intell 67:197–210

    Article  Google Scholar 

  22. Hai D, Tin T, Bay V (2014) An efficient method for mining frequent itemsets with double constraints. Int J Eng Appl Artif Intell (EAAI) 27:148–154

    Article  Google Scholar 

  23. He Z, Zhang S, Gu F, Wu J (2019) Mining conditional discriminative sequential patterns. Inf Sci 478:524–539

    Article  Google Scholar 

  24. Ho J, Lukov L, Chawla S (2005) Sequential pattern mining with constraints on large protein databases. In: Proceedings of the 12th international conference on management of data (COMAD), pp 89–100

  25. Leleu M, Rigotti C, Boulicaut JF, Euvrard G (2003) Constraint-based mining of sequential patterns over datasets with consecutive repetitions. In: Knowledge discovery in databases: PKDD 2003, pp 303–314

  26. Mallick B, Garg D, Grover PS (2014) Constraint-based sequential pattern mining: a pattern growth algorithm incorporating compactness, length and monetary. Int Arab J Inf Technol 11(1):33–42

    Google Scholar 

  27. Masseglia F, Poncelet P, Teisseire M (2009) Efficient mining of sequential patterns with time constraints: reducing the combinations. Expert Syst Appl 36(2):2677–2690

    Article  Google Scholar 

  28. Ng RT, Lakshmanan LVS, Han J, Pang A (1998) Exploratory mining and pruning optimizations of constrained associations rules. In: Proceedings of the 1998 ACM-SIG-MOD international conference on the management of data, pp 13–24

  29. Orlando S, Perego R, Silvestri C (2004) A new algorithm for gap constrained sequence mining. In: The 2004 ACM symposium on applied computing, pp 540–547

  30. Pei J, Han J, Lakshmanan LVS (2001) Mining frequent itemsets with convertible constraints. In: Proceedings of the IEEE ICDE’01, pp 433–442

  31. Pei J, Han J, Mortazavi-Asl B, Wang J, Pinto H, Chen Q, Dayal U, Hsu M (2004) Mining sequential patterns by pattern-growth: the PrefixSpan approach. J IEEE Trans Knowl Data Eng 16(11):1424–1440

    Article  Google Scholar 

  32. Pei J, Han J, Wang W (2007) Constraint-based sequential pattern mining: the pattern growth methods. J Intell Inf Syst 28(2):133–160

    Article  Google Scholar 

  33. Rodríguez-González AY, Lezama F, Iglesias-Alvarez CA, Martínez-Trinidad JF, Carrasco-Ochoa JA, Cote EM (2018) Closed frequent similar pattern mining: reducing the number of frequent similar patterns without information loss. Expert Syst Appl 96:271–283

    Article  Google Scholar 

  34. Sainju AM, Aghajarian D, Jiang Z, Prasad SK (2018) Parallel grid-based colocation mining algorithms on GPUs for big spatial event data. IEEE Trans Big Data. https://doi.org/10.1109/TBDATA.2018.2871062

  35. Srikant R, Agrawal R (1996) Mining sequential patterns: generalizations and performance improvements. In: Proceedings of the 5th international conference on extending database technology: advances in database technology, EDBT’96. ACM, pp 3–17

  36. Tin TC, Hai DV, Ngan HNT (2016) Structure of frequent itemsets with extended double constraints. Vietnam J Comput Sci 3(2):119–135

    Article  Google Scholar 

  37. Van T, Vo B, Le B (2018) Mining sequential patterns with itemset constraints. Knowl Inf Syst 25(2):311–330

    Article  Google Scholar 

  38. Van T, Yoshitaka A, Le B (2018) Mining web access patterns with super-pattern constraint. Appl Intell 48(11):3902–3914

    Article  Google Scholar 

  39. Wang J, Han J, Li Chun (2007) Frequent closed sequence mining without candidate maintenance. IEEE Trans Knowl Data Eng 19(8):1042–1056

    Article  Google Scholar 

  40. Wu R, Li Q, Chen X (2019) Mining contrast sequential pattern based on subsequence time distribution variation with discreteness constraints. Appl Intell. https://doi.org/10.1007/s10489-019-01492-7

  41. Yan X, Han J, Afshar R (2003) CloSpan: mining closed sequential patterns in large datasets. In: Proceedings of the 2003 SIAM international conference on data mining, pp 166–177

  42. Yen SJ, Lee YS (2004) Mining sequential patterns with item constraints. In: Data warehousing and knowledge discovery, pp 381–390

  43. Yun U, Ryu KH (2010) Discovering important sequential patterns with length-decreasing weighted support constraints. Int J Inf Technol Decis Mak 9(4):575–599

    Article  MATH  Google Scholar 

  44. Zaki MJ (2000) Sequence mining in categorical domains: incorporating constraints. In: Proceedings of the ninth international conference on Information and knowledge management, pp 422–429

  45. Zaki MJ (2001) SPADE: an efficient algorithm for mining frequent sequences. Mach Learn 42(1):31–60

    Article  MATH  Google Scholar 

  46. Zhang J, Wang Y, Yang D (2015) CCSpan: mining closed contiguous sequential patterns. Knowl Based Syst 89:1–13

    Article  Google Scholar 

Download references

Acknowledgements

This work is funded by Vietnam’s National Foundation for Science and Technology Development (NAFOSTED) under Grant Number 102.05-2017.300.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Bac Le.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Appendix

1.1 Appendix 1: Proof of Proposition 1

  1. (i)

    \( {{\mathcal{F}}{\mathcal{S}}}(\sigma , \gamma ) \subseteq {{\mathcal{F}}{\mathcal{S}}}^{{\prime }} (\sigma ,\gamma ) \)”: Consider any \( \alpha \in {{\mathcal{F}}{\mathcal{S}}}(\sigma , \gamma ):\gamma \sqsubseteq \alpha \sqsubseteq \sigma \). Without loss of generality, we can assume that \( \gamma = E_{1} \to E_{2} \to \cdots \to E_{p} \), \( \alpha = F_{1} \to F_{2} \to \cdots \to F_{q} \), \( \sigma = E_{1}^{{\prime }} \to E_{2}^{{\prime }} \to \cdots \to E_{q}^{{\prime }} \), \( F_{i} \subseteq E_{i}^{{\prime }} ,\;\forall i = 1, \ldots ,q \) and \( \exists lp = \{ j_{1} ,j_{2} , \ldots ,j_{p} \} \in LP(\alpha ,\gamma ) \subseteq LP(\sigma ,\gamma ) \), with \( 1 \le j_{1} < j_{2} < \cdots < j_{p} \le q: E_{k} \subseteq F_{jk} \subseteq E_{jk}^{{\prime }} ,\;\forall k = 1, \ldots ,p \). Set \( d_{i} \mathop = \limits^{\text{def}} \left\{ {\begin{array}{*{20}l} {F_{i} , } \hfill & {{\text{if}}\;i \notin lp} \hfill \\ {F_{i} \backslash E_{k} , } \hfill & { {\text{if}}\;i = j_{k} \in lp} \hfill \\ \end{array} } \right. \), \( D_{i} \mathop = \limits^{\text{def}} \left\{ {\begin{array}{*{20}l} {E^{{\prime }}_{i} , } \hfill & {{\text{if}}\; i \notin lp} \hfill \\ {E^{{\prime }}_{i} \backslash E_{k} ,} \hfill & {{\text{if}}\;i = j_{k} \in lp} \hfill \\ \end{array} } \right. \), \( \delta (lp)\mathop = \limits^{\text{def}} d_{1} \to d_{2} \to \cdots \to d_{q} \), \( E_{i}^{\sim} \mathop = \limits^{\text{def}} \left\{ {\begin{array}{*{20}l} {\emptyset ,} \hfill & {{\text{if}}\; i \notin lp} \hfill \\ {E_{k} ,} \hfill & {{\text{if}}\;i = j_{k} \in lp} \hfill \\ \end{array} } \right. \), \( \forall i = 1, \ldots ,q,\;Ex(\gamma ,lp)\mathop = \limits^{\text{def}} E_{1}^{\sim} \to E_{2}^{\sim} \to \cdots \to E_{q}^{\sim} \), then \( d_{i} \subseteq D_{i} \) and \( F_{i} = E_{i}^{\sim} + d_{i} ,\;\forall i = 1, \ldots ,q \), so \( \alpha = Ex(\gamma ,lp) \oplus \delta (lp) \in {{\mathcal{F}}{\mathcal{S}}}^{{\prime }} (\sigma ,\gamma ) \).

  2. (ii)

    \( {{\mathcal{F}}{\mathcal{S}}}(\sigma , \gamma ) \supseteq {{\mathcal{F}}{\mathcal{S}}}^{{\prime }} (\sigma ,\gamma ) \)”: \( \forall \alpha \in {{\mathcal{F}}{\mathcal{S}}}^{{\prime }} (\sigma ,\gamma ) \), \( \alpha = Ex(\gamma , \, lp) \oplus \delta (lp) = F_{1} \to F_{2} \to \cdots \to F_{q} \), then \( F_{i} = E_{i}^{\sim} + d_{i} \), \( d_{i} \subseteq D_{i} \), \( E_{i}^{\sim} \subseteq F_{i} \subseteq E_{i}^{\sim} + D_{i} \subseteq E_{i}^{{\prime }} ,\;\;\forall i = 1, \ldots ,q \). Thus, \( \gamma \sqsubseteq \alpha \sqsubseteq \sigma \), \( \alpha \in {{\mathcal{F}}{\mathcal{S}}}(\sigma ,\gamma ) \).□

1.2 Appendix 2: Proof of Theorem 1

  1. (a)

    First, it is found that \( \forall \alpha \in {{\mathcal{F}}{\mathcal{S}}},\exists \sigma \in CloSet(\alpha ) \) such that \( ro\mathop = \limits^{\text{def}} \rho (\alpha ) = \rho (\sigma ) \), so \( \sigma \in {{\mathcal{F}}{\mathcal{C}}{\mathcal{S}}} \) and \( ro \in RO \), then \( FS(ro) = [\sigma ] \). Since \( \sim \) is an equivalence relation, then different equivalence classes \( [\sigma ] \) or \( FS(ro) \) are disjoint. Thus, the assertion \( {{\mathcal{F}}{\mathcal{S}}\ominus } = \mathop \sum \nolimits_{ro \in RO} FS(ro) \) is obvious.

  2. (b)

    First, by contradiction, it is easy to prove that all different subsets \( {{\mathcal{F}}{\mathcal{S}}}^{*} (\sigma_{i} ,\gamma_{j} ),\forall \sigma_{i} \in CS(ro) \), \( \forall \gamma_{j} \in GenSet(\sigma_{i} ) \), are disjoint. Indeed, assume conversely that \( \exists \sigma_{m} ,\;\sigma_{i} \in CS(ro),\exists \gamma_{k} \in GenSet(\sigma_{m} ) \), \( \exists \gamma_{j} \in GenSet(\sigma_{i} ):\;(\sigma_{m} \ne \sigma_{i} \;{\text{or}}\;\gamma_{k} \ne \gamma_{j} ) \) and \( \exists \alpha \in {{\mathcal{F}}{\mathcal{S}}}^{*} (\sigma_{m} ,\gamma_{k} ) \cap {{\mathcal{F}}{\mathcal{S}}}^{*} (\sigma_{i} ,\gamma_{j} )\;( \ne \emptyset ) \). Indeed, it is found that two cases \( \sigma_{m} \ne \sigma_{i} \) or (\( \sigma_{m} \equiv \sigma_{i} \) and \( \gamma_{k} \ne \gamma_{j} \)) always lead to contradicting the condition \( not(DCondC(\alpha ,\sigma_{i} ,CS(\rho (\sigma_{i} )))) \) in \( {{\mathcal{F}}{\mathcal{S}}}^{*} (\sigma_{i} ,\gamma ) \) or \( not(DCondG(\alpha ,\sigma_{i} ,\gamma_{j} )) \) in \( {{\mathcal{F}}{\mathcal{S}}}^{*} (\sigma_{i} ,\gamma_{j} ) \). Second, we will prove that \( FS(ro) = \mathop \sum \nolimits_{{\sigma_{i} \in CS(ro)}} \mathop \sum \nolimits_{{\gamma_{j} \in GenSet(\sigma_{i} )}} {{\mathcal{F}}{\mathcal{S}}}^{*} (\sigma_{i} , \gamma_{j} ),\;\;\forall ro \in RO \).

    1. (i)

      \( \subseteq \)”: \( \forall \alpha \in FS(ro) \), then \( \alpha \in {{\mathcal{F}}{\mathcal{S}}} \) and \( \rho (\alpha ) = ro \). Since \( CloSet(\alpha ) \ne \emptyset \) and \( GenSet(\alpha ) \ne \emptyset , \, [\exists \sigma_{i} \in CloSet(\alpha ),\exists \gamma_{j} \in GenSet(\alpha ):\sigma_{i} \in {{\mathcal{C}}{\mathcal{S}}},\;\gamma_{j} \in {{\mathcal{G}}{\mathcal{S}}},\;\gamma_{j} \sqsubseteq \alpha \sqsubseteq \sigma_{i} ,\;\rho (\sigma_{i} ) = \rho (\gamma_{j} ) = \rho (\alpha )]^{(*)} \), so \( \gamma_{j} \in GenSet(\sigma_{i} ),\;supp(\sigma_{i} ) = supp(\gamma_{j} ) = supp(\alpha ) \ge ms \), \( \sigma_{i} \in {{\mathcal{F}}{\mathcal{C}}{\mathcal{S}}} \) and \( \sigma_{i} \in CS(ro) \). Hence, \( \alpha \in {{\mathcal{F}}{\mathcal{S}}}(\sigma_{i} ,\gamma_{j} ) \). Without loss of generality, we can select \( \sigma_{i} \) and \( \gamma_{j} \) that are, respectively, the first closed and generator sequences satisfying the condition (*). Thus, \( FS(ro) \subseteq \mathop \sum \nolimits_{{\sigma_{i} \in CS(ro)}} \mathop \sum \nolimits_{{\gamma_{j} \in GenSet(\sigma_{i} )}} {{\mathcal{F}}{\mathcal{S}}}^{*} (\sigma_{i} , \gamma_{j} ) \).

    2. (ii)

      \( \supseteq \)”: \( \forall \sigma_{i} \in CS(ro) \), \( \forall \gamma_{j} \in GenSet(\sigma_{i} ) \), \( \forall \alpha \in {{\mathcal{F}}{\mathcal{S}}}^{*} (\sigma_{i} ,\gamma_{j} ) \), we have \( \alpha \in FS(ro) \). Thus, \( {{\mathcal{F}}{\mathcal{S}}}^{*} (\sigma_{i} , \gamma_{j} ) \subseteq FS(ro) \) and \( FS(ro) \supseteq \mathop \sum \nolimits_{{\sigma_{i} \in CS(ro)}} \mathop \sum \nolimits_{{\gamma_{j} \in GenSet(\sigma_{i} )}} {{\mathcal{F}}{\mathcal{S}}}^{*} (\sigma_{i} , \gamma_{j} ) \)

1.3 Appendix 3: Proof of Theorem 2

  1. (a)

    It is clear that \( {{\mathcal{F}}{\mathcal{S}}}^{**} (\sigma ,\gamma ) \subseteq { {\mathcal{F}}{\mathcal{S}}}^{{\prime }} \left( {\sigma , \gamma } \right) \) In the proof of the above three pruning cases, \( \forall \alpha \in {{\mathcal{F}}{\mathcal{S}}}^{{\prime }} (\sigma , \gamma ) \), if \( \exists i = 1, \, 2, \, 3 :\;DCond_{i} \) is true, then \( \alpha \) is a duplicate of some previously generated sequences. Thus, \( {{\mathcal{F}}{\mathcal{S}}}^{{\prime }} (\sigma , \gamma ) \subseteq {{\mathcal{F}}{\mathcal{S}}}^{**} (\sigma ,\gamma ) \) and \( {{\mathcal{F}}{\mathcal{S}}}^{{\prime }} \text{(}\sigma , \gamma \text{)} = {{\mathcal{F}}{\mathcal{S}}}^{**} (\sigma ,\gamma ) \).

  2. (b)

    We will prove the first assertion by contradiction. Assume conversely that there are two sequences \( \alpha = Ex(\gamma ,lp_{m} ) \oplus \delta (lp_{m} ) \) and \( \beta = Ex(\gamma ,lp_{n} ) \oplus \delta (lp_{n} ) \) (in \( {{\mathcal{F}}{\mathcal{S}}}^{**} (\sigma ,\gamma ) \)) according to two different position lists in \( LP(\sigma ,\gamma ):lp_{m} \ne lp_{n} \), but \( \alpha \equiv \beta \), i.e., there exist two \( lp_{m} = \{ j_{1} ,j_{2} , \ldots ,j_{p} \} \), \( lp_{n} = \{ i_{1} ,i_{2} , \ldots ,i_{p} \} \in LP(\sigma ,\gamma ) \) with \( m < n, \, 1 \le j_{1} < j_{2} < \cdots < j_{p} \le q \), \( 1 \le i_{1} < i_{2} < \cdots < i_{p} \le q \), \( d(lp_{m} )_{i} \mathop = \limits^{\text{def}} \left\{ {\begin{array}{*{20}l} {E^{{\prime }}_{i} , } \hfill & {{\text{if}}\; i \notin lp_{m} } \hfill \\ {E^{{\prime }}_{i} \backslash E_{k} ,} \hfill & {{\text{if}}\;i = j_{k} \in lp_{m} } \hfill \\ \end{array} } \right. \), \( d_{i} \mathop = \limits^{\text{def}} d(lp_{n} )_{i} \mathop = \limits^{\text{def}} \left\{ {\begin{array}{*{20}l} {E_{i}^{'} , } \hfill & { {\text{if}}\; i \notin lp_{n} } \hfill \\ {E^{{\prime }}_{i} \backslash E_{k} , } \hfill & {{\text{if}}\; i = i_{k} \in lp_{n} } \hfill \\ \end{array} } \right. \), \( F_{i} \subseteq d(lp_{m} )_{i} \), \( F^{{\prime }}_{i} \subseteq d_{i} ,\;\forall i = 1, \ldots ,q \), and two corresponding sequences \( \alpha = F_{1} \to F_{2} \to \cdots \to F_{q} \) and \( \beta = F_{1}^{{\prime }} \to F_{2}^{{\prime }} \to \cdots \to F^{{\prime }}_{q} \) that belong to \( {{\mathcal{F}}{\mathcal{S}}}^{**} (\sigma ,\gamma ) \) such that \( \alpha \equiv \beta \) and \( DCond_{k} (lp_{m} ) \), \( DCond_{k} (lp_{n} ) \) are false, \( \forall k = 1, \, 2, \, 3 \).

After deleting all empty itemsets in \( \alpha \) and \( \beta \), we obtain \( \alpha = F_{{u_{1} }} \to F_{{u_{2} }} \to \cdots \to F_{{u_{N} }} \), \( \beta = F^{{\prime }}_{{v_{1} }} \to F^{{\prime }}_{{v_{2} }} \to \cdots \to F^{{\prime }}_{{v_{N} }} \) with \( N = size(\alpha ) = size(\beta ) \), \( 1 \le u_{1} < u_{2} < \cdots < u_{N} \le q, \, 1 \le v_{1} < v_{2} < \cdots < v_{N} \le q \) and \( F_{{u_{i} }} = F^{{\prime }}_{{v_{i} }} \ne \emptyset ,\;\;\forall i = 1, \ldots ,N \) (because \( \alpha \equiv \beta \)). We set \( i_{0} = j_{0} = u_{0} = v_{0} = k_{0} \equiv 0,k_{p + 1} = N \) and \( 1 \le k_{1} < k_{2} < \cdots < k_{p} \le N:j_{r} = u_{{k_{r} }} \) (so \( F_{{u_{{k_{r} }} }} = F_{{j_{r} }} \supseteq E_{r} \subseteq F^{{\prime }}_{{j_{r} }} \)), \( \forall r = 1,..,p \).

For \( \forall r = 1,..,p + 1 \), \( \forall k = (k_{r - 1} + 1),.., k_{r} \), then \( u_{{k_{r - 1} }} = j_{r - 1} < u_{k} \le j_{r} \le i_{r} \). By induction, we can prove that if \( \{ (v_{h} = u_{h} ,\forall h = \, 0,..,k - 1),\;(v_{k - 1} < i_{r} )\;{\text{and}}\;(\forall r^{{\prime }} = 0,..,r - 1,\;j_{{r^{{\prime }} }} = i_{{r^{{\prime }} }} = u_{{k_{{r^{{\prime }} }} }} = v_{{k_{{r^{{\prime }} }} }} )\} \), then \( \{ v_{k} = u_{k} ,\;{\text{and}}\;{\text{if}}(r \le p)\;{\text{then}}[(v_{k} < i_{r} ,\;{\text{if}}\;k < k_{r} )\;{\text{and}}(j_{r} = i_{r} = u_{{k_{r} }} = v_{{k_{r} }} )]\} \).

Finally, under the hypothesis\( \alpha \equiv \beta \), then \( \forall r = 1, \ldots ,p + 1,\forall k = (k_{r - 1} + 1), \ldots ,k_{r} \), we always have \( (v_{k} = u_{k} ),\;(j_{r} = i_{r} = u_{{k_{r} }} = v_{{k_{r} }} ,\;{\text{if}}\;r \le p) \). Therefore, \( (\forall k^{{\prime }} = 1,..,N,\;v_{{k^{{\prime }} }} = u_{{k^{{\prime }} }} )^{(*)} \) and \( (\forall r = 1,..,p,j_{r} = i_{r} )^{(**)} \).

  • If \( lp_{m} = lp_{n} \), then \( j_{r} = i_{r} ,\;\forall r = 1,..,p \). Since \( \alpha \) and \( \beta \) are generated from two different position lists in \( {{\mathcal{F}}{\mathcal{S}}}^{**} (\sigma ,\gamma ),\exists i_{0} = 1, \ldots ,N \) such that \( u_{{i_{0} }} \ne v_{{i_{0} }} \). This contradicts (*).

  • If \( lp_{m} \ne lp_{n} \), i.e., \( \exists m_{0} = 1,..,p \) such that \( i_{{m_{0} }} \ne j_{{m_{0} }} \). This also contradicts (**). In other words, the hypothesis\( \alpha \equiv \beta \) always yields a contradiction.

Thus, \( \alpha \ne \beta \), i.e., all sequences in \( {{\mathcal{F}}{\mathcal{S}}}^{**} (\sigma ,\gamma ) \) are distinctly generated.

From Proposition 1 and Theorem 1.a, we have \( {{\mathcal{F}}{\mathcal{S}}}(\sigma ,\gamma ) = {{\mathcal{F}}{\mathcal{S}}}^{'} (\sigma ,\gamma )= {{\mathcal{F}}{\mathcal{S}}}^{**} (\sigma ,\gamma ) \) and the remaining assertions are deduced from Theorem 1.□

1.4 Appendix 4: Proof of Theorem 3

  1. (a)

    First, based on the equivalence relation \( \sim \), it is easily found that \( {{\mathcal{F}}{\mathcal{S}}}^{E} = \mathop \sum \nolimits_{{ro \in RO^{E} }} {{\mathcal{F}}{\mathcal{S}}}^{E} (ro) \).

  2. (b)
    • \( \subseteq \)”: \( \forall \alpha \in {{\mathcal{F}}{\mathcal{S}}}^{E} \), according to Theorem 2, we have \( {\mathcal{C}}_{I} (\alpha ) \) and \( \exists ro \in RO \), \( \sigma_{i} \in CS(ro) \subseteq {{\mathcal{C}}{\mathcal{S}}} \) and \( \gamma_{j} \in GenSet(\sigma_{i} ):\alpha \in {{\mathcal{F}}{\mathcal{S}}}^{*} (\sigma_{i} ,\gamma_{j} ) \), so \( \gamma_{j} \sqsubseteq \alpha \sqsubseteq \sigma_{i} \), \( {\mathcal{C}}_{I} (\gamma_{j} ) \) and \( \gamma_{j} \in GenSet^{E} (\sigma_{i} ) \), \( ro = \rho (\sigma_{i} ) \in RO^{E} \), \( \sigma_{i} \in CS^{E} (ro) \) and \( \alpha \in {{\mathcal{F}}{\mathcal{S}}}^{*E} (\sigma_{i} ,\gamma_{j} ) \). All the subsets \( {{\mathcal{F}}{\mathcal{S}}}^{*E} (\sigma_{i} ,\gamma_{j} ) \) are disjoint because \( GenSet^{E} (\sigma_{i} ) \subseteq GenSet(\sigma_{i} ),\;{{\mathcal{F}}{\mathcal{S}}}^{*E} (\sigma_{i} ,\gamma_{j} ) \subseteq {{\mathcal{F}}{\mathcal{S}}}^{*} (\sigma_{i} ,\gamma_{j} ) \).

    • \( \supseteq \)”: This inclusion is clearly because \( GenSet^{E} (\sigma_{i} ) \subseteq GenSet(\sigma_{i} ) \) and \( {{\mathcal{F}}{\mathcal{S}}}^{*E} (\sigma_{i} ,\gamma_{j} ) \subseteq {{\mathcal{F}}{\mathcal{S}}}^{E} (\sigma_{i} ,\gamma_{j} ) \).

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Duong, H., Truong, T., Tran, A. et al. Fast generation of sequential patterns with item constraints from concise representations. Knowl Inf Syst 62, 2191–2223 (2020). https://doi.org/10.1007/s10115-019-01418-2

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-019-01418-2

Keywords

Navigation