Skip to main content

Advertisement

Log in

FCloSM, FGenSM: two efficient algorithms for mining frequent closed and generator sequences using the local pruning strategy

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

Mining frequent sequences in sequential databases are highly valuable for many real-life applications. However, in several cases, especially when databases are huge and when low minimum support thresholds are used, the cardinality of the result set can be enormous. Consequently, algorithms for discovering frequent sequences exhibit poor performance, showing an important increase in execution time, memory consumption and storage space usage. To address this issue, researchers have studied the tasks of mining frequent closed and generator sequences, as they provide several benefits when compared to the set of frequent sequences. One of the most important benefits is that the cardinalities of frequent closed and generator sequences are generally much less than the cardinality of frequent sequences. Hence, humans find it more convenient to analyze the information provided by closed and generator sequences. Moreover, it was shown that frequent closed sequences have the advantage of being lossless, and they thus preserve information about the frequency of all frequent subsequences, while generator sequences can provide higher accuracy for sequence classification tasks since they are the smallest patterns that characterize groups of sequences. Besides, frequent closed sequences can be combined with generators to produce non-redundant sequential rules and recover the complete set of frequent sequences and their frequencies. This paper proposes two novel algorithms named FCloSM and FGenSM to mine frequent closed and generator sequences efficiently. These algorithms are based on new pruning conditions called extended early elimination (3E) and early pruning techniques named EPCLO and EPGEN, designed to identify non-closed and non-generator patterns early. Based on these techniques, two local pruning strategies called LPCLO and LPGEN are proposed to eliminate non-closed and non-generator patterns more efficiently at two successive levels of the prefix search tree without performing subsequence relation checking. These theoretical results, which are the basis of FCloSM and FGenSM, are mathematically proved and are shown to be more general than those presented in previous work. Extensive experiments show that FCloSM and FGenSM are one to two orders of magnitude faster than the state-of-the-art algorithms for discovering frequent closed sequences (CloSpan, BIDE, ClaSP and CM-ClaSP) and for mining frequent generators (FEAT, FSGP and VGEN), and that FCloSM and FGenSM consume much less memory.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15

Similar content being viewed by others

References

  1. Agrawal R, Srikant R (1995) Mining sequential patterns. In: Proceedings of the eleventh international conference on data engineering, ICDE ’95. IEEE Computer Society, Washington, DC, pp 3–14

  2. Agustina T, Sitanggang IS (2015) Sequential patterns for hotspot occurrences based weather data using Clospan algorithm. In: 3rd international conference on adaptive and intelligent agroindustry (ICAIA). IEEE, pp 245–249

  3. Ayres J, Flannick J, Gehrke J, Yiu T (2002) Sequential pattern mining using a bitmap representation. In: Proceedings of the eighth ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’02. ACM, New York, NY, pp 429–435

  4. Baralis E, Bruno G, Chiusano S, Domenici VC, Mahoto NA, Petrigni C (2010) Analysis of medical pathways by means of frequent closed sequences. In: International conference on Knowledge-based and intelligent information and engineering systems. Springer, Berlin, Heidelberg, pp 418–425

  5. Chen Z, El-Nasr MS, Canossa A, Badler J, Tignor S, Colvin R (2015) Modeling individual differences through frequent pattern mining on role-playing game actions. In: Eleventh artificial intelligence and interactive digital entertainment conference, AIIDE 2015

  6. Fournier-Viger P, Nkambou R, Tseng VS (2011) RuleGrowth: mining sequential rules common to several sequences by pattern-growth. In: Proceedings of the 2011 ACM symposium on applied computing, SAC ’11. ACM, New York, NY, pp 956–961

  7. Fournier-Viger P, Faghihi U, Nkambou R, Mephu Nguifo E (2012) CMRULES: an efficient algorithm for mining sequential rules common to several sequences. Knowl Based Syst 25(1):63–76

    Article  Google Scholar 

  8. Fournier-Viger P, Nkambou R, Mephu Nguifo E, Mayers A, Faghihi U (2013) A multi-paradigm intelligent tutoring system for robotic arm training. IEEE Trans Learn Technol 6(4):364–377

    Article  Google Scholar 

  9. Fournier-Viger P, Wu CW, Tseng VS (2013) Mining maximal sequential patterns without candidate maintenance. In: Proceedings of 9th international conference on advanced data mining and applications, ADMA’13. Springer, Hangzhou, China, pp 169–180

  10. Fournier-Viger P, Gomariz A, Campos M, Thomas R (2014) Fast vertical mining of sequential patterns using co-occurrence information. In: Proceedings of 18th Pacific-Asia conference on knowledge discovery and data mining, PAKDD’2014. pp 40–52

  11. Fournier-Viger P, Gomariz A, Gueniche T, Soltani A, Wu C, Tseng VS (2014) SPMF: a java open-source pattern mining library. J Mach Learn Res 15(1):3389–3393

    MATH  Google Scholar 

  12. Fournier-Viger P, Gomariz A, Šebek M, Hlosta M (2014) VGEN: Fast vertical mining of sequential generator patterns. In: Proceedings of 16th international conference on data warehousing and knowledge discovery, DWKD’14. Springer International Publishing, Munich, Germany, pp 476–488

  13. Gao C, Wang J, He Y, Zhou L (2008) Efficient mining of frequent sequence generators. In: Proceedings of the 17th international conference on World Wide Web, WWW ’08. ACM, New York, NY, pp 1051–1052

  14. García-Rudolph A, Gibert K (2016) Understanding effects of cognitive rehabilitation under a knowledge discovery approach. Eng Appl Artif Intell 55:165–185

    Article  Google Scholar 

  15. Gomariz A, Campos M, Marin R, Goethals B (2013) ClaSP: an efficient algorithm for mining frequent closed sequences. In: Proceedings of 17th Pacific-Asia conference, PAKDD ’13. Springer, Gold Coast, Australia, pp 50–61

  16. Gomez M, Rouvoy R, Adams B, Seinturier L (2016) Reproducing context-sensitive Crashes of mobile apps using crowdsourced monitoring. In: Proceedings of the 3rd IEEE/ACM international conference on mobile software engineering and systems (MOBILESoft’16). ACM, New York, NY, pp 88–99

  17. Grunwald P, Myung IJ, Pitt M (2005) Advances in minimum description length: theory and applications. MIT Press, London

    Google Scholar 

  18. Hai D, Tin T, Bay V (2014) An efficient method for mining frequent itemsets with double constraints. Int J Eng Appl Artif Intell (EAAI) 27:148–154

    Article  Google Scholar 

  19. Harms SK, Deogun J, Tadesse T (2002) Discovering sequential association rules with constraints and time lags in multiple sequences. In: Proceedings of 13th international symposium, ISMIS 2002. Springer, Lyon, France, pp 432–441

  20. Huang H, Yao L, Tsai CY (2016) Transportation service quality improvement through closed sequential pattern mining approach. Cybern Inf Technol 16(3):185–194

    Google Scholar 

  21. Ignatov DI, Mitrofanova E, Muratova A, Gizdatullin D (2015) Pattern mining and machine learning for demographic sequences. In: International conference on knowledge engineering and the semantic web. Springer International Publishing, pp 225–239

  22. Jorritsma W, Cnossen F, Dierckx RA, Oudkerk M, Van Ooijen PM (2016) Pattern mining of user interaction logs for a post-deployment usability evaluation of a radiology PACS client. Int J Med Inf 85(1):36–42

    Article  Google Scholar 

  23. Li J, Li H, Wong L, Pei J, Dong G (2006) Minimum description length principle: generators are preferable to closed patterns. In: Proceedings of the 21st national conference on Artificial intelligence, AAAI ’06. ACM, pp 409–414

  24. Lo D, Khoo SC, Li J (2008) Mining and ranking generators of sequential patterns. In: Proceedings of the 2008 SIAM international conference on data mining, SIAM ’08. SIAM, pp 553–564

  25. Lo D, Khoo SC, Wong L (2011) Non-redundant sequential rules: theory and algorithm. Inf Syst 34(4):438–453

    Google Scholar 

  26. Luo C, Chung S (2005) Efficient mining of maximal sequential patterns using multiple samples. In: SIAM international conference on data mining (SDM’05), pp 415–426

  27. Minh-Thai T, Bac L, Bay V, Hong T (2016) Mining non-redundant sequential rules with dynamic bit vectors and pruning techniques. Int J Artif Intell 45(2):333–342

    Google Scholar 

  28. Pasquier N, Bastide Y, Taouil R, Lakhal L (1999) Discovering frequent closed itemsets for association rules. In: Proceedings of the 7th international conference on database theory, ICDT ’12. ACM, London, UK, pp 398–416

  29. Pei J, Han J, Mortazavi-Asl B, Wang J, Pinto H, Chen Q, Dayal U, Hsu M (2004) Mining sequential patterns by pattern-growth: the prefixspan approach. J IEEE Trans Knowl Data Eng 16(11):1424–1440

    Article  Google Scholar 

  30. Pham TT, Luo J, Hong TP, Vo B (2012) MSGPs: a novel algorithm for mining sequential generator patterns. In: Proceedings of 4th international conference on computational collective intelligence, ICCCI 2012. Springer, Ho Chi Minh City, Vietnam, pp 393–401

  31. Pham TT, Luo J, Hong TP, Vo B (2013) An efficient algorithm for mining sequential rules with interestingness measures. Int J Innov Comput Inf Control 9:4811–4824

    Google Scholar 

  32. Pham TT, Luo J, Hong TP, Vo B (2014) An efficient method for mining non-redundant sequential rules using attributed prefix-trees. Eng Appl Artif Intell (EAAI) 32:88–99

    Article  Google Scholar 

  33. Rahman A, Xu Y, Radke K, Foo E (2016) Finding anomalies in SCADA logs using rare sequential pattern mining. In: International conference on network and system security. Springer International Publishing, pp 499–506

  34. Saraswati A, Chang CF, Ghose A, Dam HK (2015) Learning relationships between the business layer and the application layer in ArchiMate models. In: International conference on conceptual modeling. Springer International Publishing, pp 499–513

  35. Schweizer D, Zehnder M, Wache H, Witschel HF, Zanatta D, Rodriguez M (2015) Using consumer behavior data to reduce energy consumption in smart homes: applying machine learning to save energy without lowering comfort of inhabitants. In: IEEE 14th international conference on machine learning and applications (ICMLA). IEEE, pp 1123–1129

  36. Srikant R, Agrawal R (1996) Mining sequential patterns: generalizations and performance improvements. In: Proceedings of the 5th international conference on extending database technology: advances in database technology, EDBT ’96. ACM, pp 3–17

  37. Truong T, Tran A (2010) Structure of set of association rules based on concept lattice. In: Advances in intelligent information and database systems, AIIDS ’10. Springer, pp 217–227

  38. Truong T, Duong H, Hoang NTN (2016) Structure of frequent itemsets with extended double constraints. Vietnam J Comput Sci 3(2):119–135

    Article  Google Scholar 

  39. Wang J, Han J, Li Chun (2007) Frequent closed sequence mining without candidate maintenance. IEEE Trans Knowl Data Eng 19(8):1042–1056

    Article  Google Scholar 

  40. Yan X, Han J, Afshar R (2003) CloSpan: mining closed sequential patterns in large datasets. In: Proceedings of the 2003 SIAM international conference on data mining, pp 166–177

  41. Yi S, Zhao T, Zhang Y, Ma S, Che Z (2011) An effective algorithm for mining sequential generators. Proc Eng 15:3653–3657

    Article  Google Scholar 

  42. Zaki MJ (2001) SPADE: an efficient algorithm for mining frequent sequences. Mach Learn 42(1):31–60

    Article  MATH  Google Scholar 

  43. Zhang W (2016) Learning from access logs to mitigate insider threats. Doctoral dissertation, Vanderbilt University

  44. Zhao Y, Wang G, Li Y, Wang Z (2011) Finding novel diagnostic gene patterns based on interesting non-redundant contrast sequence rules. In: IEEE 11th international conference on data mining. IEEE, pp 972–981

  45. Zhao Y, Li Y, Yin Y, Sheng G (2015) Finding top-k covering irreducible contrast sequence rules for disease diagnosis. Comput Math Methods Med 2015:353146. doi:10.1155/2015/353146

  46. Zhao Y, Wang G, Yin Y, Li Y, Wang Z (2016) Improving ELM-based microarray data classification by diversified sequence features selection. Neural Comput Appl 27(1):155–166

    Article  Google Scholar 

Download references

Acknowledgements

This work was supported by Vietnam’s National Foundation for Science and Technology Development (NAFOSTED) under Grant No. 102.05-2015.07.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Bac Le.

Appendices

Appendices

1.1 Appendix 1

To prove Proposition 1 and Theorem 1, we first consider the relation between the measures remSize and length of suffixes stated in Lemma 1.

Lemma 1

(Relation between remSize and length of suffixes).

If \(\alpha \sqsubset \beta \sqsubseteq \varPsi , \textit{suf} = \textit{suf}(\varPsi , \alpha ), {suf}{'} = \textit{suf}(\varPsi , \beta )\), then

  1. a.

    \(\textit{suf }\sqsupseteq \textit{suf}'\).

  2. b.

    \(\textit{ suf }= \textit{suf}' \Leftrightarrow \textit{ length}(\textit{suf}) =\textit{ length}(\textit{suf}')\)

    $$\begin{aligned} \Leftrightarrow \textit{remSize}(\textit{suf}) =\textit{ remSize}(\textit{suf}')\text { and }{} \textit{ lastItemOf}(\alpha ) = \textit{lastItemOf}(\beta ). \end{aligned}$$

Note that if remSize(suf) = remSize \((\textit{suf}')\), then, in the case of 1\(-{\mathcal{SDB}}\)s, the last condition in Lemma 1.b, lastItemOf \((\alpha ) = \textit{lastItemOf}(\beta )\), is always satisfied, but for \(n-{\mathcal{SDB}}\)s, this condition can be false, i.e., it is necessary to check.

Proof

Let \(\delta = \textit{prefix}(\varPsi , \alpha )\) and \(\delta ' =\textit{ prefix}(\varPsi ,\beta )\).

  1. a.

    It is clear that \(\delta \sqsubseteq \delta '\). Thus, suf \(\sqsupseteq \textit{suf}'\).

  2. b.

    The first equivalence is obviously true. Now, we prove the second equivalence.

\(\bullet \)\(\Rightarrow \)”: If length(suf) = length \(({\textit{suf}}')\), then length \((\delta )\) = length \((\delta ')\). Hence, \(\delta = \delta '\), and thus lastItemOf \((\delta )\) = lastItemOf \((\delta ')\) and size \((\delta )= \textit{size}(\delta ')\). Since remSize(suf) = size(S) - size \((\delta )\) and lastItemOf \((\alpha ) =\) lastItemOf \((\delta )\), lastItemOf \((\beta ) = \textit{lastItemOf}(\delta ')\), we obtain remSize \((\textit{suf})=\textit{ remSize}(\textit{suf}')\) and lastItemOf \((\alpha )= \textit{lastItemOf}(\beta )\).

\(\bullet \)\(\Leftarrow \)”: If remSize \((\textit{suf}) =\textit{ remSize}(\textit{suf}')\), then size \((\delta ) = \textit{size}(\delta ') = k\) and the \((k-\)1) first events of \(\delta \) and \(\delta '\) are identical, so lastEventOf \((\delta ) \subseteq \textit{ lastEvent}(\delta ')\). Moreover, because \(\alpha \)and \(\beta \) share the same last item, lastItemOf \((\alpha )= \textit{lastItemOf}(\beta )\). Thus, lastItemOf \((\delta ) = \textit{lastItemOf}(\delta ')\) and lastEventOf \((\delta )=\textit{ lastEvent}(\delta ')\). Hence, \(\delta = \delta '\), \(\textit{suf }= \textit{suf}'\) and length(suf) = length \(({\textit{suf}}')\). \(\square \)

1.2 Appendix 2

To prove Theorem 1, we need to demonstrate the anti-monotonicity of the operators: \(\rho \), support, PDB, SI and SE as described in Proposition 1.

Proposition 1

(Properties of SE and SI in PDBs). For two arbitrary sequences \(\alpha \) and \(\beta \) such that \(\alpha \sqsubseteq \beta \), the following assertions hold.

  1. a.
    1. (i).

      \(\rho (\alpha ) \supseteq \rho (\beta )\), support \((\alpha )\geqslant \textit{ support}(\beta )\). (anti-monotonicity of the \(\rho \) and support operators)

    2. (ii).

      \(\rho (\alpha )= \rho (\beta ) \Leftrightarrow \textit{ support}(\alpha )=\textit{ support}(\beta )\).

    3. (iii).

      \(\textit{support}(\alpha ) =|{\mathcal {D}}_{\alpha }|\).

  2. b.

    (Anti-monotonicity of PDB, SI, SE)

    1. (i).

      \({\mathcal {D}}_{\alpha } \sqsupseteq {\mathcal {D}}_{\beta }\).

    2. (ii).

      SI \(({\mathcal {D}}_{\alpha }) \geqslant \textit{SI}({\mathcal {D}}_{\beta })\) and \(\textit{SE}({\mathcal {D}}_{\alpha }) \geqslant \textit{ SE}({\mathcal {D}}_{\beta })\).

Proof

  1. a.

    These assertions are obviously true by the definitions of the \(\rho \) and support operators.

  2. b.
    1. (i).

      \(\varPsi \in \rho (\alpha )\Leftrightarrow (\varPsi \in D\) and \(\alpha \sqsubseteq \varPsi )\)

      \(\Leftrightarrow (\varPsi \in {\mathcal {D}}\) and \(\varPsi = \delta \diamondsuit \; suf, \delta = \hbox {prefix}(\varPsi , \alpha )\), \(\textit{suf }=\textit{ suf}(\varPsi , \alpha ))\)

      \(\Leftrightarrow (suf = suf(\varPsi , \alpha ) \in {\mathcal {D}}_{\alpha }\), with \(\varPsi \in {\mathcal {D}}\) and \(\varPsi = \delta \diamondsuit \) suf, p= prefix \((\varPsi ,\alpha ))\) or support \((\alpha ) =|\rho (\alpha )| =|{\mathcal {D}}_{\alpha }|\).

    2. (ii).

      \(\forall \varPsi \in \rho (\beta ), \textit{suf}' =\) suf \((\varPsi , \beta )\in \mathcal {D}_{\beta }\), we have \((\varPsi = \delta '\diamondsuit \textit{suf}')\wedge (\delta ' =\) prefix \((\varPsi , \beta ))\)

      \(\Leftrightarrow (\varPsi = \delta '\diamondsuit \textit{suf}') \wedge (\beta \sqsubseteq \delta ')\wedge (\not \exists \delta '': S = \delta ''\diamondsuit r' \wedge \beta \sqsubseteq \delta ''\sqsubset \delta ')\)

      \(\Rightarrow (\varPsi \in \rho (\alpha ))\wedge (\varPsi = \delta ' \diamondsuit \textit{suf}')\wedge (\alpha \sqsubseteq \delta ')\), because \(\alpha \sqsubseteq \beta \sqsubseteq \delta ' \sqsubseteq \varPsi \).

    There are two possible cases:

    • If \(\not \exists \delta ''\): \(\varPsi = \delta ''\diamondsuit \gamma ' \wedge \alpha \sqsubseteq \delta '' \sqsubset \delta '\), then \(\delta ' =\textit{ prefix}(\varPsi , \alpha )\), and thus \(\textit{suf}' =\textit{suf}(\varPsi , \alpha )\in {\mathcal {D}}_{\alpha }\).

    • Otherwise, \(\exists \delta '': \varPsi = \delta '' \diamondsuit \gamma ' \wedge \alpha \sqsubseteq \delta ''\sqsubset \delta '\), then \(\delta ''\) is a prefix (of \(\varPsi \)) containing \(\alpha \). We call \(\gamma =\textit{ prefix}(\varPsi ,\alpha )\) the smallest prefix (of \(\varPsi )\) containing \(\alpha \), i.e., \(\exists \textit{suf }=\textit{ suf}(\varPsi , \alpha ): (\varPsi = \gamma \diamondsuit \textit{suf }) \wedge (\alpha \sqsubseteq \gamma \sqsubseteq \delta ''\sqsubset \delta ')\), thus \(\textit{suf}'\sqsubset \) suf. Therefore, suf \(\in {\mathcal {D}}_{\alpha } \) and \(\textit{suf}'\sqsubset \textit{ suf}\). Finally, in all cases, \(\varPsi \in \rho (\alpha ), \exists \) suf \((\varPsi ,\alpha ) \in {\mathcal {D}}_{{\alpha }} \) and \(\textit{suf}'\sqsubseteq \textit{ suf}\). Hence, \(\mathcal {D}_{\beta } \sqsubseteq {\mathcal {D}}_{\alpha }\).

  3. (iii).

    By a. (i) and b. (ii), \(\forall \varPsi \in \rho (\beta ) \subseteq \rho (\alpha ), \textit{suf}' =\textit{ suf}(\varPsi , \beta ) \in \mathcal {D}_{\beta } \textit{suf }=\textit{ suf}(\varPsi ,\alpha ) \in {\mathcal {D}}_{\alpha }\) and \(\textit{suf}' \sqsubseteq \) suf, it follows that length \(({\textit{suf}}')\leqslant \textit{ length}(\textit{suf})\) and remSize \(({\textit{suf}}')\leqslant \textit{ remSize}(\textit{suf})\). Thus, \(\textit{SI}({\mathcal {D}}_{\beta }) \leqslant \textit{ SI}({\mathcal {D}}_{\alpha })\) and SE \((\mathcal {D}_{\beta })\leqslant \textit{ SE}({\mathcal {D}}_{\alpha })\).

\(\square \)

1.3 Appendix 3: Proof of Theorem 1

  1. a.

    \({\mathcal {D}}_{\alpha }= \mathcal {D}_{\beta }\)

    \(\Leftrightarrow (\rho (\alpha )= \rho (\beta ))\wedge (\forall \varPsi \in \rho (\alpha ),\textit{ suf}=\textit{ suf}(\varPsi , \alpha ) \in {\mathcal {D}}_{\alpha }, \textit{suf}' =\textit{ suf}(\varPsi , \beta ) \in {\mathcal {D}}_{\beta },\textit{ suf }= \textit{suf}')\)

    \(\Rightarrow \textit{SI}({\mathcal {D}}_{\alpha })=\textit{ SI}(\mathcal {D}_{\beta })\Rightarrow \textit{ SE}({\mathcal {D}}_{\alpha })=\textit{ SE}(\mathcal {D}_{\beta })\).

Conversely, assume that SI \(({\mathcal {D}}_{\alpha })=\) SI \((\mathcal {D}_{\beta })\). By a. (i) and b. (ii)–(iii) of Proposition 1 and because \(\forall \varPsi \in \rho (\alpha )\), length(suf \((\varPsi , \alpha ))+1 >0\), we have that \(\rho (\alpha )= \rho (\beta )\) and \(\forall \varPsi \in \rho (\alpha )\), suf \(=\) suf \((\varPsi , \alpha ) \in {\mathcal {D}}_{\alpha }, \textit{suf}' =\textit{ suf}(\varPsi , \beta ) \in \mathcal {D}_{\beta }, \textit{suf }\sqsupseteq \textit{suf}', \textit{ length}({\textit{suf}}')=\) length(suf), we have that suf \(= \textit{suf}'\), i.e., \({\mathcal {D}}_{\alpha } = \mathcal {D}_{\beta } \) and lastItemOf \((\alpha )=\) lastItemOf \((\beta )\). Thus, SI \(({\mathcal {D}}_{\alpha })=\) SI \((\mathcal {D}_{\beta })\Leftrightarrow {\mathcal {D}}_{\alpha } = \mathcal {D}_{\beta } \Rightarrow (\textit{SE}({\mathcal {D}}_{\alpha })=\textit{ SE}(\mathcal {D}_{\beta })\) and lastItemOf \((\alpha ) =\textit{lastItemOf}(\beta ))\).

Finally, assume that SE \(({\mathcal {D}}_{\alpha })=\textit{ SE}(\mathcal {D}_{\beta })\). By a. (i) and b. (ii)–(iii) of Proposition 1 and \(\forall \varPsi \in \rho (\alpha )\), remSize(suf \((\varPsi , \alpha )) +1 >0\), we must have \(\rho (\alpha )= \rho (\beta )\), thus support \((\alpha )=\textit{ support}(\beta )\) and the first assertion b.(i) is proved. Moreover, \(\forall \varPsi \in \rho (\alpha )\), suf \((\varPsi , \alpha ) \in {\mathcal {D}}_{\alpha }\), suf \((\varPsi , \beta ) \in \mathcal {D}_{\beta } \) and remSize(suf \((\varPsi , \alpha ))=\textit{ remSize}(\textit{suf}(\varPsi , \beta ))\).

Additionally, if lastItemOf \((\alpha ) =\) lastItemOf \((\beta )\), i.e., \(\alpha \) and \(\beta \) share the same last item, then \(\forall \varPsi \in \rho (\alpha )\), length(suf \((\varPsi , \alpha ))=\) length(suf \((\varPsi , \beta ))\), so suf \((\varPsi , \alpha )=\textit{ suf}(\varPsi , \beta )\). Hence, SI \(({\mathcal {D}}_{\alpha })=\textit{ SI}(\mathcal {D}_{\beta })\).

  1. b.
    1. (ii).

      For any itemset A, consider two s-extensions of \(\alpha \) and \(\beta \) with A: \(\gamma = \alpha {\diamondsuit }_{{\alpha }}A, \delta = \beta {\diamondsuit }_{{\alpha }}A\). Similarly, because of the above arguments, if SE \(({\mathcal {D}}_{\alpha })=\textit{ SE}(\mathcal {D}_{\beta })\), then \(\rho (\alpha )= \rho (\beta )\) and \(\forall \varPsi \in \rho (\alpha )\), remSize(suf \((\varPsi ,\alpha ))=\textit{ remSize}(\textit{suf}(\varPsi , \beta ))\). Moreover, since \(\gamma \) and \(\delta \) share the same last event A, we obtain \(\rho (\gamma )= \rho (\delta )\) and \(\forall \varPsi \in \rho (\gamma )\), remSize(suf \((\varPsi , \gamma ))= \textit{remSize}(\textit{suf}(\varPsi ,\delta ))\). Therefore, \(\gamma \) and \(\delta \) also share the same last item ofA, so we have that length(suf \((\varPsi , \gamma ))=\textit{ length}(\textit{suf}(\varPsi ,\delta ))\). Hence, SI \(({\mathcal {D}}_{\gamma }) =\textit{ SI}(\mathcal {D}_{\delta })\) and \({\mathcal {D}}_{\gamma } = \mathcal {D}_{\delta }\).

    2. (iii).

      In addition, assume that lastEventOf \((\alpha ) =\) lastEventOf \((\beta )= A\), and for any itemset B, such that \(A \prec _{alp} B \)(which means that all items of A are always preceeding all items of Baccording to the total order relation \(\prec _{alp})\), we consider two i-extensions of \(\alpha \) and \(\beta \) with B: \(\gamma ' = \alpha {\diamondsuit }_{{i}}B, \delta ' = \beta {\diamondsuit }_{{i}}B\). Then, the equality of SE \(({\mathcal {D}}_{\alpha })=\textit{ SE}(\mathcal {D}_{\beta })\) holds only if \(\rho (\alpha )= \rho (\beta )\) and \(\forall \varPsi \in \rho (\alpha )\), remSize(suf \((\varPsi , \alpha ))=\textit{ remSize}(\textit{suf}(\varPsi ,\beta ))\). Moreover, since \(A \prec _{alp} B \) and lastEventOf \((\alpha )=\) lastEventOf \((\beta )= A\), then lastEventOf \((\gamma ') = \textit{lastEventOf }(\delta ') = A \cup B\), i.e., \(\gamma '\) and \(\delta '\) share the same last itemset \(A \cup B \) and also the same last item of \(A\cup B\). Therefore, \(\rho (\gamma ') = \rho (\delta ')\) and \(\forall \varPsi \in \rho (\gamma ')\), remSize(suf \((\varPsi \), \(\gamma '))= \textit{remSize}(\textit{suf}(\varPsi , \delta '))\) and length(suf \((\varPsi , \gamma '))=\textit{ length}(\textit{suf}(\varPsi ,\delta '))\). Thus, SI \((\mathcal {D}_{\gamma '})=\textit{ SI}(\mathcal {D}_{\delta '})\) and \(\mathcal {D}_{\gamma '}= \mathcal {D}_{\delta '}\).

\(\square \)

1.4 Appendix 4: Proof of Corollary 1

Note that for the two cases, we always have \(\mathcal {D}_{\gamma } = \mathcal {D}_{\delta }\) and lastEventOf \((\gamma ) =\) lastEventOf \((\delta )\). Hence, all i-extensions and s-extensions \(\gamma '\) of \(\gamma \) and \(\delta '\) of \(\delta \)with the same itemset also have the same last event, and thus, they have the same PDB, \(\mathcal {D}_{\gamma '} = \mathcal {D}_{\delta '}\). The same situation then also occurs for their next descendants. \(\square \)

1.5 Appendix 5: Proof of Corollaries 2 and 3

Corollaries 2–3 a. (i)–(iii). These assertions are true because (1.1), path \((q)\sqsubset \) i_ new, path \((u)\sqsubset \) i_ new, and path \((r)\sqsubset \) i_ new, lastEventOf(path \((r))= \textit{lastEventOf}(\textit{i}\_\textit{ new}) =\{ q,u\}\) and (2.1).

Corollaries 2–3 b. These assertions are also true since path \((v)\sqsubset \) s_ new, (1.1) and the last events of s_ new and path(v) are identical to v and hence (2.1). \(\square \)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Le, B., Duong, H., Truong, T. et al. FCloSM, FGenSM: two efficient algorithms for mining frequent closed and generator sequences using the local pruning strategy. Knowl Inf Syst 53, 71–107 (2017). https://doi.org/10.1007/s10115-017-1032-6

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-017-1032-6

Keywords

Navigation