Abstract
Mining sequential patterns in a sequence database (SDB) is an important and useful data mining task. Most existing algorithms for performing this task directly mine the set \( {\mathcal{F}\mathcal{S}} \) of all frequent sequences in an SDB. However, these algorithms often exhibit poor performance on large SDBs due to the enormous search space and cardinality of \( {\mathcal{F}\mathcal{S}} \). In addition, constraint-based mining algorithms relying on this approach must read an SDB again when a constraint is changed by the user. To address this issue, this paper proposes a novel approach for generating \( {\mathcal{F}\mathcal{S}} \) from the two sets of frequent closed sequences \( \left( {{\mathcal{F}\mathcal{C}\mathcal{S}}} \right) \) and frequent generator sequences \( ({\mathcal{F}\mathcal{G}\mathcal{S}}) \), which are concise representations of \( {\mathcal{F}\mathcal{S}} \). The proposed approach is based on a novel explicit relationship between \( {\mathcal{F}\mathcal{S}} \) and these two sets. This relationship is the theoretical basis for a novel efficient algorithm named GFS-CR that directly enumerates \( {\mathcal{F}\mathcal{S}} \) from \( {\mathcal{F}\mathcal{C}\mathcal{S}} \) and \( {\mathcal{F}\mathcal{G}\mathcal{S}} \) rather than mining them from an SDB. Experimental results show that GFS-CR outperforms state-of-the-art algorithms in terms of runtime and scalability.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
iff means “if and only if”.
- 2.
\( \sum\nolimits_{1 \leqslant i \leqslant n} {A_{i} } \) denotes the union of disjoint sets \( A_{1} ,A_{2} , \ldots , A_{n} \), i.e. \( A_{i} \cap A_{j} = \emptyset \), \( \forall i \ne j \), \( 1 \leqslant i \leqslant ,j \leqslant n \).
References
Agrawal, R., Srikant, R.: Mining sequential patterns. In: Proceedings of 11th International Conference on Data Engineering, pp. 3–14 (1995)
Tran, A., Duong, H., Truong, T., Le, B.: Mining frequent itemsets with dualistic constraints. In: Anthony, P., Ishizuka, M., Lukose, D. (eds.) PRICAI 2012. LNCS (LNAI), vol. 7458, pp. 807–813. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-32695-0_77
Anh, T., Tin, T., Bac, L.: Structures of frequent itemsets and classifying structures of association rule set by order relations. Intell. Inf. Database Syst. 8(4), 295–323 (2014)
Ayres, J., Flannick, J., Gehrke, J., Yiu, T.: Sequential pattern mining using a bitmap representation. In: Proceedings of 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 429–435 (2002)
Bac, L., Hai, D., Tin, T., Fournier-Viger, P.: FCloSM, FGenSM: two efficient algorithms for mining frequent closed and generator sequences using the local pruning strategy. Knowl. Inf. Syst. 53(1), 71–107 (2017)
Fournier-Viger, P., Gomariz, A., Campos, M., Thomas, R.: Fast vertical mining of sequential patterns using co-occurrence information. In: Tseng, V.S., Ho, T.B., Zhou, Z.-H., Chen, A.L.P., Kao, H.-Y. (eds.) PAKDD 2014. LNCS (LNAI), vol. 8443, pp. 40–52. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-06608-0_4
Fournier-Viger, P., Gomariz, A., Gueniche, T., Soltani, A., Wu, C., Tseng, V.S.: SPMF: a Java open-source pattern mining library. Mach. Learn. Res. 15(1), 3389–3393 (2014)
Fournier-Viger, P., Gomariz, A., Šebek, M., Hlosta, M.: VGEN: fast vertical mining of sequential generator patterns. In: Bellatreche, L., Mohania, M.K. (eds.) DaWaK 2014. LNCS, vol. 8646, pp. 476–488. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10160-6_42
Gomariz, A., Campos, M., Marin, R., Goethals, B.: ClaSP: an efficient algorithm for mining frequent closed sequences. In: Pei, J., Tseng, V.S., Cao, L., Motoda, H., Xu, G. (eds.) PAKDD 2013. LNCS (LNAI), vol. 7818, pp. 50–61. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-37453-1_5
Hai, D., Tin, T., Bac, L.: Efficient algorithms for simultaneously mining concise representations of sequential patterns based on extended pruning conditions. Eng. Appl. Artif. Intell. 67, 197–210 (2018)
Hai, D., Tin, T., Bay, V.: An efficient method for mining frequent itemsets with double constraints. Eng. Appl. Artif. Intell. 27, 148–154 (2014)
Pei, J., et al.: Mining sequential patterns by pattern-growth: the PrefixSpan approach. IEEE Trans. Knowl. Data Eng. 16(11), 1424–1440 (2004)
Srikant, R., Agrawal, R.: Mining sequential patterns: generalizations and performance improvements. In: Apers, P., Bouzeghoub, M., Gardarin, G. (eds.) EDBT 1996. LNCS, vol. 1057, pp. 1–17. Springer, Heidelberg (1996). https://doi.org/10.1007/BFb0014140
Tin, T., Hai, D., Ngan, H.N.T.: Structure of frequent itemsets with extended double constraints. Vietnam J. Comput. Sci. 3(2), 119–135 (2016)
Wang, J., Han, J., Li, C.: Frequent closed sequence mining without candidate maintenance. IEEE Trans. Knowl. Data Eng. 19(8), 1042–1056 (2007)
Yan, X., Han, J., Afshar, R.: CloSpan: mining closed sequential patterns in large datasets. In: Proceedings of SIAM International Conference on Data Mining, pp. 166–177 (2003)
Zaki, M.J.: SPADE: an efficient algorithm for mining frequent sequences. Mach. Learn. 42(1), 31–60 (2001)
Acknowledgment
This work was supported by Vietnam’s National Foundation for Science and Technology Development (NAFOSTED) under Grant Number 102.05-2017.300.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendices
Appendices
1.1 Appendix 1: Proof of Proposition 1
-
(i)
“\( {\mathcal{F}\mathcal{S}}\left( {\sigma , \gamma } \right) \subseteq {\mathcal{F}\mathcal{S}} '\left( {\sigma ,\gamma } \right) \)”: Consider any \( \alpha \in {\mathcal{F}\mathcal{S}}\left( {\sigma , \gamma } \right):\gamma \;{ \sqsubseteq }\;\alpha \;{ \sqsubseteq }\;\sigma \). Without loss of generality, we can assume that \( \gamma = E_{1} \to E_{2} \to \ldots \to E_{p} \), \( \alpha = F_{1} \to F_{2} \to \ldots \to F_{q} \), \( \sigma = E_{1}^{'} \to E_{2}^{'} \to \ldots \to E_{q}^{'} \), \( F_{i} \subseteq E_{i}^{'} \), ∀i = 1, …, q and \( \exists lp = \{ j_{1} ,j_{2} , \ldots ,j_{p} \} \) \( \in LP(\alpha ,\gamma ) \subseteq LP(\sigma ,\gamma ) \), with \( 1 \leqslant j_{1} < j_{2} < \ldots j_{p} \leqslant q \): \( E_{k} \subseteq F_{jk} \subseteq E_{jk}^{'} \), ∀k = 1,…, p. Set \( d_{i} \mathop = \limits^{\text{def}} \left\{ {\begin{array}{*{20}c} {F_{i} ,} & { if\;i \notin lp} \\ {F_{i} \backslash E_{k} , } & {if\;i = j_{k} \in lp} \\ \end{array} } \right. \), \( D_{i} \mathop = \limits^{\text{def}} \left\{ {\begin{array}{*{20}c} {E^{{\prime }}_{i} , } & {if\; i \notin lp} \\ {E^{{\prime }}_{i} \backslash E_{k} , } & {if \;i = j_{k} \in lp} \\ \end{array} } \right. \), δ(lp) \( \mathop = \limits^{\text{def}} d_{1} \to d_{2} \to \ldots \to d_{q} \), \( E_{i}^{{ \sim }} \mathop = \limits^{\text{def}} \left\{ {\begin{array}{*{20}l} {\emptyset , \quad \quad if\;i \notin lp} \hfill \\ {E_{k} , \;if\; i = j_{k} \in lp} \hfill \\ \end{array} } \right. \), ∀i = 1, …, q, Ex(γ, lp) \( \mathop = \limits^{\text{def}} E_{1}^{{ \sim }} \to E_{2}^{{ \sim }} \to \ldots \to E_{q}^{{ \sim }} \), then \( d_{i} \subseteq D_{i} \) and \( F_{i} = E_{i}^{{ \sim }} + d_{i} \), ∀i = 1,…, q, so \( \alpha \, = \,Ex(\gamma, lp) \oplus \varDelta \left( {lp} \right)\, \in \,{\mathcal{F}\mathcal{S}} '\left( {\sigma ,\gamma } \right) \).
-
(ii)
“\( {\mathcal{F}\mathcal{S}}\left( {\sigma , \gamma } \right) \supseteq {\mathcal{F}\mathcal{S}} '\left( {\sigma ,\gamma } \right) \)’’: \( \forall \alpha \, \in \,{\mathcal{F}\mathcal{S}} '\left( {\sigma ,\gamma } \right) \), \( \alpha = Ex(\gamma , \, lp) \, \oplus \, \varDelta \left( {lp} \right) = F_{1} \to F_{2} \to \ldots \to F_{q} \), then \( F_{i} = E_{i}^{{ \sim }} + d_{i} \), \( d_{i} \subseteq D_{i} \), \( E_{i}^{{ \sim }} \subseteq F_{i} \subseteq E_{i}^{\sim} + D_{i} \subseteq E_{i}^{'} \), ∀i = 1,…, q. Thus, \( \gamma \;{ \sqsubseteq }\;\alpha \;{ \sqsubseteq }\;\sigma \), \( \alpha \, \in \,{\mathcal{F}\mathcal{S}}\left( {\sigma , \gamma } \right) \). □
1.2 Appendix 2: Proof of Theorem 1
-
a.
First, we prove that \( \forall \alpha \, \in \,{\mathcal{F}\mathcal{S}} \), \( \exists \sigma \in CloSet(\alpha ) \) such that \( ro\mathop = \limits^{\text{def}} \rho (\alpha ) = \rho (\sigma ) \), so \( \sigma \, \in \,{\mathcal{F}\mathcal{C}\mathcal{S}} \) and \( ro \in RO \), then FS(ro) =[σ]. Indeed, \( \beta \, \in \,FS\left( {ro} \right) \Leftrightarrow [\beta \, \in \,{\mathcal{F}\mathcal{S}}\, \wedge \,\rho (\beta )\, = \,ro] \Leftrightarrow [\beta \, \in \,{\mathcal{F}\mathcal{S}}\, \wedge \,\rho (\beta )\, = \,\rho (\sigma )\, = \,ro] \Leftrightarrow \beta \in [\sigma ] \).
-
(i)
“\( {\mathcal{F}\mathcal{S}}\, \subseteq \,\mathop {\bigcup }\nolimits_{ro\, \in \,RO} FS\left( {ro} \right) \)”: \( \forall \alpha \, \in \,{\mathcal{F}\mathcal{S}} \), \( \exists \sigma \in CloSet( \alpha )\;{\text{ and}}\,ro\mathop = \limits^{\text{def}} \rho (\sigma )\, = \,\rho ( \alpha ) \). Then \( \alpha \in [\sigma ],supp(\sigma ) = supp( \alpha ) \geqslant ms \). Thus, \( \sigma \, \in \,{\mathcal{F}\mathcal{C}\mathcal{S}} \), ro ∊ RO and \( \alpha \) ∊ FS(ro).
-
(ii)
“\( \mathop {\bigcup }\nolimits_{ro\, \in \,RO} FS\left( {ro} \right)\, \subseteq \,{\mathcal{F}\mathcal{S}} \)”: \( \forall ro\, \in \,RO \) and \( \alpha \) ∊ FS(ro), then \( \exists \sigma \, \in \,{\mathcal{F}\mathcal{C}\mathcal{S}}:\,\alpha { \sqsubseteq }\sigma \) and \( ro = \rho (\sigma ) = \rho (\alpha ) \), so \( supp(\alpha ) = supp(\sigma ) \geqslant ms \) and \( \alpha \, \in \,{\mathcal{F}\mathcal{S}} \). Hence, \( {\mathcal{F}\mathcal{S}}\, = \,\mathop {\bigcup }\nolimits_{ro\, \in \,RO} FS\left( {ro} \right) \).
Since \( { \sim } \) is an equivalence relation, then different equivalence classes [σ] or FS(ro) are disjoint. Thus, \( {\mathcal{F}\mathcal{S}}\, = \,\mathop {\bigcup }\nolimits_{ro\, \in \,RO} FS\left( {ro} \right) \).
-
b.
First, we prove that all different subsets \( {\mathcal{F}\mathcal{S}}^{*} (\sigma_{i} ,\,\gamma_{j} ) \), \( \forall \sigma_{i} \in CS\left( {ro} \right) \), \( \forall \gamma_{j} \in GenSet(\sigma_{i} ) \), are disjoint. Indeed, assume conversely that \( \exists \sigma_{m} \), \( \sigma_{i} \in CS\left( {ro} \right) \), \( \exists \gamma_{k} \in GenSet(\sigma_{m} ) \), \( \exists \gamma_{j} \in GenSet(\sigma_{i} ) \): (\( \sigma_{m} \ne \sigma_{i} \) or \( \gamma_{k} \ne \gamma_{j} \)) and \( \exists \alpha \, \in \,{\mathcal{F}\mathcal{S}}^{*} (\sigma_{m} ,\,\gamma_{k} )\, \cap \,{\mathcal{F}\mathcal{S}}^{*} (\sigma_{i} ,\,\gamma_{j} )( \ne \emptyset ) \). Consider the following two cases.
-
(i)
If \( \sigma_{m} \ne \sigma_{i} \), then without loss of generality, we can assume that m < i and \( \exists \gamma \in GenSet(\alpha ) \) (because \( GenSet(\alpha ) \, \ne \emptyset ) \). We have \( \gamma \, \in \,\mathcal{G}\mathcal{S} \), \( \gamma \;{ \sqsubseteq }\;\alpha \;{ \sqsubseteq }\;\sigma_{i} \), \( \rho (\gamma ) = \rho (\alpha ) = \rho (\sigma_{i} ) \), because \( \alpha \, \in \,{\mathcal{F}\mathcal{S}}^{*} (\sigma_{i} ,\,\gamma_{j} ) \). Then \( \gamma \in GenSet(\sigma_{i} ) \), \( \alpha \in {\mathcal{F}\mathcal{S}}(\sigma_{i} ,\gamma ) \) and \( \alpha \in {\mathcal{F}\mathcal{S}}^{*} (\sigma_{i} ,\gamma ) \). Moreover, since \( \alpha \, \in \,{\mathcal{F}\mathcal{S}}^{*} (\sigma_{m} ,\,\gamma_{k} )\, \subseteq \,{\mathcal{F}\mathcal{S}}(\sigma_{m} ,\,\gamma_{k} ) \), then \( \alpha { \sqsubseteq }\sigma_{m} \) with m < i. This contradicts the condition not(DCondC(\( \alpha \), \( \sigma_{i} \), CS(ρ(\( \sigma_{i} \))))) in \( {\mathcal{F}\mathcal{S}}^{*} (\sigma_{i} ,\,\gamma ) \).
-
(ii)
Otherwise, if \( \sigma_{m} \equiv \sigma_{i} \) and \( \gamma_{k} \ne \gamma_{j} \), then without loss of generality, we can assume that k < j. Since \( \alpha \, \in \,{\mathcal{F}\mathcal{S}}^{*} (\sigma_{i} ,\,\gamma_{k} ) \subseteq {\mathcal{F}\mathcal{S}}(\sigma_{i} ,\,\gamma_{k} ) \), then \( \alpha \;{ \sqsupseteq }\;\gamma_{k} \), with k < j. This also contradicts the condition not \( (DCondG(\alpha ,\sigma_{i} ,\gamma_{j} )) \) in \( {\mathcal{F}\mathcal{S}}^{*} (\sigma_{i} ,\,\gamma_{j} ) \).
Finally, we prove that \( FS\left( {ro} \right) = \mathop \sum \nolimits_{{\sigma_{i} \in CS\left( {ro} \right)}} \mathop \sum \nolimits_{{\gamma_{j} \in GenSet\left( {\sigma_{i} } \right)}} {\mathcal{F}\mathcal{S}}^{*} \left( {\sigma_{i} , \gamma_{j} } \right) \), \( \forall ro \in RO \).
-
(iii)
“⊆”: \( \forall \alpha \in FS\left( {ro} \right) \), then \( \alpha \in {\mathcal{F}\mathcal{S}} \) and ρ(α) = ro. Since \( CloSet\left( \alpha \right) \ne \emptyset \) and \( GenSet\left( \alpha \right) \ne \emptyset \), [\( \exists \sigma_{i} \in CloSet(\alpha ) \), \( \exists \gamma_{j} \in GenSet(\alpha ):\sigma_{i} \in {\mathcal{C}\mathcal{S}},\gamma_{j} \in {\mathcal{G}\mathcal{S}} \), \( \gamma_{j} \; { \sqsubseteq }\;\alpha \;{ \sqsubseteq }\;\sigma_{i} \), \( \rho (\sigma_{i} ) = \rho (\gamma_{j} ) = \rho \left( \alpha \right)]^{{^{\left( * \right)} }} \), so \( \gamma_{j} \in GenSet(\sigma_{i} ) \), \( supp(\sigma_{i} ) = supp(\gamma_{j} ) = supp\left( \alpha \right) \geqslant ms \), \( \sigma_{i} \in {\mathcal{F}\mathcal{C}\mathcal{S}} \) and \( \sigma_{i} \in CS\left( {ro} \right) \). Hence, \( \alpha \in {\mathcal{F}\mathcal{S}}(\sigma_{i} ,\gamma_{j} ) \). Without loss of generality, we can select \( \sigma_{i} \) and \( \gamma_{j} \) that are, respectively, the first closed and generator sequences satisfying the condition(*). Thus, \( FS\left( {ro} \right) \subseteq \mathop \sum \nolimits_{{\sigma_{i} \in CS\left( {ro} \right)}} \mathop \sum \nolimits_{{\gamma_{j} \in GenSet\left( {\sigma_{i} } \right)}} {\mathcal{F}\mathcal{S}}^{*} \left( {\sigma_{i} ,\gamma_{j} } \right) \).
-
(iv)
“⊇”: \( \forall \sigma_{i} \in CS\left( {ro} \right) \), \( \forall \gamma_{j} \in GenSet(\sigma_{i} ) \), \( \forall \alpha \in {\mathcal{F}\mathcal{S}}^{*} (\sigma_{i} ,\gamma_{j} ) \), we have \( \sigma_{i} \in {\mathcal{F}\mathcal{C}\mathcal{S}} \), \( \alpha { \sqsubseteq }\sigma_{i} \), \( \rho \left( \alpha \right) \, = \rho (\sigma_{i} ) = ro \), so \( supp\left( \alpha \right) = supp(\sigma_{i} ) \geqslant ms \), i.e. \( \alpha \in FS\left( {ro} \right) \) . Thus, \( {\mathcal{F}\mathcal{S}}^{*} \left( {\sigma_{i} ,\gamma_{j} } \right) \subseteq FS\left( {ro} \right)\,{\text{and}}\,FS\left( {ro} \right) \supseteq \sum\nolimits_{{\sigma_{i} \in CS\left( {ro} \right)}} {\sum\nolimits_{{\gamma_{j} \in GenSet\left( {\sigma_{i} } \right)}} {{\mathcal{F}\mathcal{S}}^{*} \left( {\sigma_{i} ,\gamma_{j} } \right)} } \). □
1.3 Appendix 3: Proof of Theorem 2
-
(a)
It is clear that \( {\mathcal{F}\mathcal{S}}^{**} \left( {\sigma ,\gamma } \right) \subseteq { \mathcal{F}\mathcal{S}}^{{\prime }} \left( {\sigma , \gamma } \right) \). In the proof of the above three pruning cases, \( \forall \alpha \in {\mathcal{F}\mathcal{S}}^{{\prime }} \left( {\sigma , \gamma } \right) \), if ∃i = 1, 2, 3: \( DCond_{i} \) is true, then α is a duplicate of some previously generated sequences. Thus, \( {\mathcal{F}\mathcal{S}} '\left( {\sigma , \gamma } \right) \subseteq {\mathcal{F}\mathcal{S}}^{**} \left( {\sigma ,\gamma } \right) \) and \( {\mathcal{F}\mathcal{S}} '\left( {\sigma , \gamma } \right) = {\mathcal{S}}^{**} \left( {\sigma ,\gamma } \right) \).
-
(b)
We will prove the first assertion by contradiction. Assume conversely that there are two sequences \( \alpha = Ex(\gamma ,lp_{m} ) \) ⊕ \( \varDelta (lp_{m} ) \) and \( \beta = Ex(\gamma ,lp_{n} ) \oplus \), \( \varDelta (lp_{n} ) \) (in \( {\mathcal{F}\mathcal{S}}^{**} \left( {\sigma ,\gamma } \right) \)) according to two different position lists in LP(σ, γ): \( lp_{m} \ne lp_{n} \), but α ≡ β, i.e., there exist two \( lp_{m} = \{ j_{1} ,j_{2} , \ldots ,j_{p} \} \), \( lp_{n} = \{ i_{1} ,i_{2} , \ldots ,i_{p} \} \in LP\left( {\sigma ,\gamma } \right) \) with m < n, \( 1 \leqslant j_{1} < j_{2} < \ldots j_{p} \geqslant q \), \( 1 \leqslant i_{1} < i_{2} < \ldots < i_{p} \leqslant q \), \( d(lp_{m} )_{i} \mathop = \limits^{\text{def}} \left\{ {\begin{array}{*{20}l} {E^{{\prime }}_{i} , \quad \quad if\, i \notin lp_{m} } \hfill \\ {E^{{\prime }}_{i} \backslash E_{k} , \, if\, i = j_{k} \in lp_{m} } \hfill \\ \end{array} } \right. \), \( d_{i} \mathop = \limits^{\text{def}} d(lp_{n} )_{i} \mathop = \limits^{\text{def}} \left\{ {\begin{array}{*{20}l} {E_{i}^{'} ,\quad \quad if \,i \notin lp_{n} } \hfill \\ {E^{{\prime }}_{i} \backslash E_{k} , \, if\, i = i_{k} \in lp_{n} } \hfill \\ \end{array} } \right. \), \( F_{i} \subseteq d\left( {lp_{m} } \right)_{i} \), \( F^{{\prime }}_{i} \subseteq d_{i} \), ∀i = 1,…, q, and two corresponding sequences \( \alpha = F_{1} \to F_{2} \to \ldots \to F_{q} \) and \( \beta = F_{1}^{'} \to F_{2}^{'} \to \ldots \to F^{{\prime }}_{q} \) that belong to \( {\mathcal{F}\mathcal{S}}^{**} \left( {\sigma ,\gamma } \right) \) such that α ≡ β and \( DCond_{k} (lp_{m} ) \), \( DCond_{k} (lp_{n} ) \) are false, ∀k = 1, 2, 3.
After deleting all empty itemsets in α and β, we obtain \( \alpha = F_{{u_{1} }} \to F_{{u_{2} }} \to \ldots \to F_{{u_{N} }} \), \( \beta = F^{{\prime }}_{{v_{1} }} \to F^{{\prime }}_{{v_{2} }} \to \ldots \to F^{{\prime }}_{{v_{N} }} \) with N = size(α) = size(β), \( 1 \leqslant u_{1} < u_{2} < \ldots < u_{N} \leqslant q \), \( 1 \leqslant v_{1} < v_{2} < \ldots < v_{N} \leqslant q \) and \( F_{{u_{i} }} = F^{{\prime }}_{{v_{i} }} \ne \emptyset ,\forall i = 1, \ldots ,N \) (because α ≡ β). We set \( i_{0} = j_{0} = u_{0} = v_{0} = k_{0} \equiv \, 0 \), \( k_{p + 1} = N \) and \( 1 \leqslant k_{1} < k_{2} < \ldots < k_{p} \leqslant N \) : \( j_{r} = u_{{k_{r} }} \) (so \( F_{{u_{{k_{r} }} }} = F_{{j_{r} }} \supseteq E_{r} \subseteq F^{{\prime }}_{{j_{r} }} \)), ∀r = 1, …, p.
For ∀r = 1, …, p + 1, \( \forall k = \left( {k_{r - 1} + 1} \right), \ldots , k_{r} \), then \( u_{{k_{r - 1} }} = j_{r - 1} < u_{k} \leqslant j_{r} \leqslant i_{r} \) and we prove that if {(\( v_{h} = u_{h} \), ∀h = 0, …, k − 1), \( \left( {v_{k - 1} < i_{r} } \right) \) and \( (\forall r^{{\prime }} = \, 0, \, ..,r - 1,j_{{r^{{\prime }} }} = i_{{r^{{\prime }} }} = u_{{k_{{r^{{\prime }} }} }} = v_{{k_{{r^{{\prime }} }} }} )\} \), (H(r, k)), then {\( v_{k} = u_{k} \), and if (r ⩽ p) then [(\( v_{k} < i_{r} \), if \( k < k_{r} \)) and \( \left( {j_{r} = i_{r} = u_{{k_{r} }} = v_{{k_{r} }} } \right) \)]}, (C(r, k)).
Since \( i_{0} = j_{0} = u_{0} = v_{0} = k_{0} = 0 \) and \( i_{0} < i_{1} \), then the hypothesis H(1, 1) is always true. For ∀r = 1, …, p + 1, \( \forall k = \, (k_{r - 1} + 1), ..,k_{r} \), assume that the hypothesis H(r, k) is true.
-
(i)
First, we consider any k such that [\( k_{r - 1} + 1 \leqslant k < k_{r} \), for r ⩽ p] or [\( k_{p} + 1 \leqslant k \leqslant N \), with r = p + 1]. If r ⩽ p, then \( i_{r - 1} = j_{r - 1} = u_{{k_{r - 1} }} < u_{k} < j_{r} = u_{{k_{r} }} \) and \( v_{k} \leqslant i_{r} \), because \( F_{{u_{k} }} = F^{{\prime }}_{{v_{k} }} \ne \emptyset \) and \( v_{k - 1} < i_{r} \). Consider the following three cases.
If (\( r \leqslant p \) and \( v_{k} = i_{r} \)): then \( j_{r - 1} < u_{k} < j_{r} \leqslant v_{k} = i_{r} \), \( E^{{\prime }}_{{u_{k} }} \supseteq F_{{u_{k} }} = F^{{\prime }}_{{v_{k} }} = F^{{\prime }}_{{i_{r} }} \supseteq E_{r} \), so \( u_{k} \in lp_{pre} \mathop = \limits^{\text{def}} \{ j_{1} , \ldots ,j_{r - 1} , u_{k} ,j_{r + 1} , \ldots ,j_{p} \} \) (and \( lp_{pre} \mathop = \limits^{\text{def}} \{ u_{k} ,j_{2} , \ldots ,j_{p} \} \) if r = 1), \( lp_{pre} \prec lp_{m} \), FI(pre, m) = r, \( lp_{pre} \left[ r \right] \, = u_{k} \) and the \( u_{k}^{th} \) itemset \( \left( {F_{{u_{k} }} } \right) \) contains \( E_{r} \), so \( DCond_{2} (lp_{pre} ,lp_{\varvec{m}} ,u_{k} ) \) is true. Therefore, \( v_{k} < i_{r} \).
If (\( u_{k} < v_{k} \) and (\( v_{k} < i_{r} \), if \( r \leqslant p \))): then \( i_{r - 1} = j_{r - 1} = u_{{k_{r - 1} }} \leqslant u_{k - 1} = v_{k - 1} < u_{k} < v_{k} \) and (\( v_{k} < i_{r} \), with \( r \leqslant p \)), \( v_{k} \notin lp_{n} \), \( u_{k} \notin lp_{n} \), \( d\left( {lp_{n} } \right)_{{v_{k} }} = E^{{\prime }}_{{v_{k} }} \), \( d\left( {lp_{n} } \right)_{{u_{k} }} = E^{{\prime }}_{{u_{k} }} \), \( F^{{\prime }}_{{v_{k - 1} }} \ne \emptyset \), \( F^{{\prime }}_{r'} = \emptyset \), \( \forall r^{{\prime }} = (v_{k - 1} + 1), \ldots ,(v_{k} - 1) \), \( v_{k - 1} + 1 \le u_{k} \le v_{k} - 1 \). Moreover, for the non-empty itemset \( its_{{t_{v_{k}} }} \) of \( \beta \) such that \( its_{{t_{v_{k}} }} = F^{\prime }_{{v_{k} }} = F_{{u_{k} }} \subseteq E^{\prime }_{{v_{k} }} \cap E^{\prime }_{{u_{k} }} = d\left( {lp_{n} } \right)_{{v_{k} }} \cap d\left( {lp_{n} } \right)_{{u_{k} }} \), so \( DCond_{1} (lp_{n} ,v_{k} ,u_{k} ) \) is true.
If \( v_{k} < u_{k} \), then \( j_{r - 1} = u_{{k_{r - 1} }} \le u_{k - 1} = v_{k - 1} < v_{k} < u_{k} \) and (\( u_{k} < j_{r} \), with \( r \leqslant p \)). Similarly, \( u_{k} \notin lp_{m} \), \( v_{k} \notin lp_{m} \), \( d\left( {lp_{m} } \right)_{{v_{k} }} = E^{{\prime }}_{{v_{k} }} \), \( d\left( {lp_{m} } \right)_{{u_{k} }} = E^{{\prime }}_{{u_{k} }} \), \( F_{{u_{k - 1} }} \ne \emptyset \), \( F_{{r^{{\prime }} }} = \emptyset \), \( \forall r^{{\prime }} = (u_{k - 1} + 1), \ldots ,(u_{k} - 1) \), \( u_{k - 1} + 1 \leqslant v_{k} \leqslant u_{k} - 1 \). Moreover, for the non-empty itemset \( its_{{u_{k} }} \) of \( \alpha \) such that \( its_{{u_{k} }} = F_{{u_{k} }} = F_{{v_{k} }}^{{\prime }} \subseteq E_{{u_{k} }}^{{\prime }} \cap E_{{v_{k} }}^{{\prime }} = d\left( {lp_{m} } \right)_{{u_{k} }} \cap d\left( {lp_{m} } \right)_{{v_{k} }} \), so \( DCond_{1} (lp_{m} ,u_{k} ,v_{k} ) \) is true.
Hence, \( u_{k} = v_{k} \) and (\( v_{k} < j_{r} \leqslant i_{r} \), with \( r \leqslant p \)). \( \left( {C\left( {r, \, k} \right)^{(1)} } \right) \).
-
(ii)
Second, if \( r \leqslant p \), we consider \( k = k_{r} \). Then \( i_{r - 1} = j_{r - 1} = u_{{k_{r - 1} }} < u_{k} = u_{{k_{r} }} = j_{r} \le i_{r} \) Since \( F_{{u_{k} }} = F^{{\prime }}_{{v_{k} }} \ne \emptyset \) and \( v_{k - 1} = v_{{k_{r} - 1}} < i_{r} \), then \( v_{{k_{r} }} \le i_{r} \). Consider the following cases.
If \( v_{{k_{r} }} < u_{{k_{r} }} = j_{r} \le i_{r} \), then \( i_{r - 1} = j_{r - 1} = u_{{k_{r - 1} }} = v_{{k_{r - 1} }} < v_{{k_{r} }} < u_{{k_{r} }} = j_{r} \le i_{r} \), \( E^{{\prime }}_{{v_{{k_{r} }} }} \supseteq F^{{\prime }}_{{v_{{k_{r} }} }} = F_{{u_{{k_{r} }} }} = F_{{j_{r} }} \supseteq E_{r} \). Thus, \( v_{{k_{r} }} \in lp_{pre} \mathop = \limits^{\text{def}} \{ i_{1} , \ldots ,i_{r - 1} , v_{{k_{r} }} ,i_{r + 1} , \ldots ,i_{p} \} \) (and \( lp_{pre} \mathop = \limits^{\text{def}} \{ v_{{k_{1} }} ,i_{2} , \ldots ,i_{p} \} \), if r = 1), \( lp_{pre} \prec lp_{n} \), \( FI\left( {pre,n} \right) \, = r,lp_{pre} \left[ r \right] \, = v_{{k_{r} }} \) and with the itemset \( F^{{\prime }}_{{v_{{k_{r} }} }} \) of \( \beta \) such that \( F^{{\prime }}_{{v_{{k_{r} }} }} \supseteq E_{r} \), so \( DCond_{2} (lp_{pre} ,lp_{n} ,v_{{k_{r} }} ) \) is true (or \( DCond_{3} (lp_{pre} \mathop = \limits^{\text{def}} \{ j_{1} , \ldots ,j_{r - 1} , v_{{k_{r} }} ,j_{r + 1} , \ldots ,j_{p} \} ,lp_{m} ,u_{{k_{r} }} ) \) is true).
Therefore, \( j_{r} = u_{{k_{r} }} \leqslant v_{{k_{r} }} \leqslant i_{r} \). But the case \( u_{{k_{r} }} < v_{{k_{r} }} \le i_{r} \) cannot occur.
Indeed, if \( j_{r} = u_{{k_{r} }} < v_{{k_{r} }} = i_{r} \), then \( i_{r - 1} = j_{r - 1} = u_{{k_{r - 1} }} = v_{{k_{r - 1} }} \leqslant u_{{k_{r} - 1}} < u_{{k_{r} }} = j_{r} < v_{{k_{r} }} = i_{r} \). Thus, \( F^{{\prime }}_{{r^{{\prime }} }} = \emptyset \), \( \forall r^{{\prime }} = u_{{k_{r} }} , \ldots ,(v_{{k_{r} }} - 1) \), \( E^{{\prime }}_{{u_{{k_{r} }} }} \supseteq F_{{u_{{k_{r} }} }} = F_{{j_{r} }} \supseteq E_{r} \), \( u_{{k_{r} }} \in lp_{pre} \mathop = \limits^{\text{def}} \{ i_{1} , \ldots ,i_{r - 1} , u_{{k_{r} }} ,i_{r + 1} , \ldots ,i_{p} \} \) (and \( lp_{pre} \mathop = \limits^{\text{def}} \{ u_{{k_{1} }} ,i_{2} , \ldots ,i_{p} \} \), if r = 1), \( lp_{pre} \prec lp_{n} \), FI(pre, n) = r, \( lp_{pre} \left[ r \right] \, = u_{{k_{r} }} \), \( lp_{n} \left[ r \right] \, = i_{r} = v_{{k_{r} }} \). Since \( its_{{v_{{k_{r} }} }} = F^{{\prime }}_{{v_{{k_{r} }} }} \backslash E_{r} \subseteq d(lp_{n} )_{{v_{{k_{r} }} }} = E^{{\prime }}_{{v_{{k_{r} }} }} \backslash E_{r} \) and \( F_{{u_{{k_{r} }} }} \backslash E_{r} \subseteq d(lp_{pre} )_{{u_{{k_{r} }} }} = E^{{\prime }}_{{u_{{k_{r} }} }} \backslash E_{r} \), then for itemset \( its_{{v_{{k_{r} }} }} \) of \( \beta \) such that \( its_{{v_{{k_{r} }} }} = F^{{\prime }}_{{v_{{k_{r} }} }} \backslash E_{r} = F_{{u_{{k_{r} }} }} \backslash E_{r} \subseteq d(lp_{n} )_{{v_{{k_{r} }} }} \cap d(lp_{pre} )_{{u_{{k_{r} }} }} \). Therefore, \( DCond_{3} (lp_{pre} ,lp_{n} ,v_{{k_{r} }} ) \) is true.
If \( j_{r} = u_{{k_{r} }} < v_{{k_{r} }} < i_{r} \), then \( i_{r - 1} = j_{r - 1} = u_{{k_{r - 1} }} = v_{{k_{r - 1} }} \leqslant v_{{k_{r} - 1}} = u_{{k_{r} - 1}} < u_{{k_{r} }} = j_{r} < v_{{k_{r} }} < i_{r} \). Hence, \( F^{{\prime }}_{{r^{{\prime }} }} = \emptyset \), \( \forall r^{{\prime }} = \, (v_{{k_{r} - 1}} + 1), \, .., \, (v_{{k_{r} }} - 1) \), \( v_{{k_{r} - 1}} + 1 \leqslant u_{{k_{r} }} \leqslant v_{{k_{r} }} - 1,u_{{k_{r} }} \) and \( v_{{k_{r} }} \notin lp_{n} ,d\left( {lp_{n} } \right)_{{v_{{k_{r} }} }} = E^{{\prime }}_{{v_{{k_{r} }} }} ,F_{{u_{{k_{r} }} }} \subseteq d\left( {lp_{n} } \right)_{{u_{{k_{r} }} }} = E^{{\prime }}_{{u_{{k_{r} }} }} \). Thus, for the non-empty itemset \( its_{{v_{{k_{r} }} }} \) of \( \beta \) such that \( its_{{v_{{k_{r} }} }} = F^{{\prime }}_{{v_{{k_{r} }} }} = F_{{u_{{k_{r} }} }} \subseteq d\left( {lp_{n} } \right)_{{v_{{k_{r} }} }} \cap d\left( {lp_{n} } \right)_{{u_{{k_{r} }} }} \) , the condition \( DCond_{1} \)(\( lp_{n} \), \( v_{{k_{r} }} \), \( u_{{k_{r} }} \)) is true.
Thus, \( v_{{k_{r} }} = u_{{k_{r} }} = j_{r} \leqslant i_{r} \). However, if \( v_{{k_{r} }} = u_{{k_{r} }} = j_{r} < i_{r} \), then \( i_{r - 1} = j_{r - 1} = u_{{k_{r - 1} }} = v_{{k_{r - 1} }} < v_{{k_{r} }} = u_{{k_{r} }} = j_{r} < i_{r} \), \( E^{{\prime }}_{{j_{r} }} \supseteq F^{{\prime }}_{{j_{r} }} = F^{'}_{{v_{{k_{r} }} }} = F_{{u_{{k_{r} }} }} \supseteq E_{r} \). Thus, \( j_{r} \in lp_{pre} \mathop = \limits^{\text{def}} \{ i_{1} , \ldots ,i_{r - 1} , j_{r} ,i_{r + 1} , \ldots ,i_{p} \} \) (and \( lp_{pre} \mathop = \limits^{\text{def}} \{ j_{1} ,i_{2} , \ldots ,i_{p} \} \), if r = 1), \( lp_{pre} \prec lp_{n} \), FI(pre, n) = r, \( lp_{pre} \)[r] = \( j_{r} \) and the itemset \( F^{{\prime }}_{{j_{r} }} \) of \( \beta \) contains \( E_{r} \), so \( DCond_{2} \)(\( lp_{pre} \), \( lp_{n} \), \( j_{r} \)) is true.
Hence, \( v_{{k_{r} }} = u_{{k_{r} }} = j_{r} = i_{r} \). (C(r, k)(2)).
From (C(r, k) (1)) − (C(r, k) (2)), we have (C(r, k)).
Finally, under the hypothesis α ≡ β, then ∀r = 1, …, p + 1, ∀k = \( (k_{r - 1} + 1), \ldots , k_{r} \), we always have (\( v_{k} = u_{k} \)), (\( j_{r} = i_{r} = u_{{k_{r} }} = v_{{k_{r} }} \), if \( r \leqslant p \)). Therefore,
If \( lp_{m} = lp_{n} \), then \( j_{r} = i_{r} \), ∀r = 1,.., p. Since α and β are generated from two different position lists in \( {\mathcal{F}\mathcal{S}}^{**} \left( {\sigma ,\gamma } \right) \), so \( \exists i_{0} = 1, \ldots ,N \) such that \( u_{{i_{0} }} \ne v_{{i_{0} }} \). This contradicts (*).
If \( lp_{m} \ne lp_{n} \), i.e. \( \exists m_{0} = 1, \ldots ,p \) such that \( i_{{m_{0} }} \ne j_{{m_{0} }} \). This also contradicts (**). In other words, the hypothesis α ≡ β always leads to a contradiction.
Thus, α ≠ β, i.e., all sequences in \( {\mathcal{F}\mathcal{S}}^{**} \left( {\sigma ,\gamma } \right) \) are distinctly generated.
From Proposition 1 and Theorem 1.a, we have \( {\mathcal{F}\mathcal{S}}\left( {\sigma ,\gamma } \right) = {\mathcal{F}\mathcal{S}}^{{\prime }} \left( {\sigma ,\gamma } \right) = {\mathcal{F}\mathcal{S}}^{**} \left( {\sigma ,\gamma } \right) \) and the remaining assertions are deduced from Theorem 1. □
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Duong, H., Truong, T., Le, B., Fournier-Viger, P. (2019). An Explicit Relationship Between Sequential Patterns and Their Concise Representations. In: Madria, S., Fournier-Viger, P., Chaudhary, S., Reddy, P. (eds) Big Data Analytics. BDA 2019. Lecture Notes in Computer Science(), vol 11932. Springer, Cham. https://doi.org/10.1007/978-3-030-37188-3_20
Download citation
DOI: https://doi.org/10.1007/978-3-030-37188-3_20
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-37187-6
Online ISBN: 978-3-030-37188-3
eBook Packages: Computer ScienceComputer Science (R0)