An Explicit Relationship Between Sequential Patterns and Their Concise Representations

Duong, Hai; Truong, Tin; Le, Bac; Fournier-Viger, Philippe

doi:10.1007/978-3-030-37188-3_20

Hai Duong^12,13,
Tin Truong¹³,
Bac Le¹² &
…
Philippe Fournier-Viger¹⁴

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11932))

Included in the following conference series:

International Conference on Big Data Analytics

999 Accesses
1 Citations

Abstract

Mining sequential patterns in a sequence database (SDB) is an important and useful data mining task. Most existing algorithms for performing this task directly mine the set $ {\mathcal{F}\mathcal{S}} $ of all frequent sequences in an SDB. However, these algorithms often exhibit poor performance on large SDBs due to the enormous search space and cardinality of $ {\mathcal{F}\mathcal{S}} $. In addition, constraint-based mining algorithms relying on this approach must read an SDB again when a constraint is changed by the user. To address this issue, this paper proposes a novel approach for generating $ {\mathcal{F}\mathcal{S}} $ from the two sets of frequent closed sequences $ \left( {{\mathcal{F}\mathcal{C}\mathcal{S}}} \right) $ and frequent generator sequences $ ({\mathcal{F}\mathcal{G}\mathcal{S}}) $, which are concise representations of $ {\mathcal{F}\mathcal{S}} $. The proposed approach is based on a novel explicit relationship between $ {\mathcal{F}\mathcal{S}} $ and these two sets. This relationship is the theoretical basis for a novel efficient algorithm named GFS-CR that directly enumerates $ {\mathcal{F}\mathcal{S}} $ from $ {\mathcal{F}\mathcal{C}\mathcal{S}} $ and $ {\mathcal{F}\mathcal{G}\mathcal{S}} $ rather than mining them from an SDB. Experimental results show that GFS-CR outperforms state-of-the-art algorithms in terms of runtime and scalability.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
iff means “if and only if”.
2.
$ \sum\nolimits_{1 \leqslant i \leqslant n} {A_{i} } $ denotes the union of disjoint sets $ A_{1} ,A_{2} , \ldots , A_{n} $, i.e. $ A_{i} \cap A_{j} = \emptyset $, $ \forall i \ne j $, $ 1 \leqslant i \leqslant ,j \leqslant n $.

References

Agrawal, R., Srikant, R.: Mining sequential patterns. In: Proceedings of 11th International Conference on Data Engineering, pp. 3–14 (1995)
Google Scholar
Tran, A., Duong, H., Truong, T., Le, B.: Mining frequent itemsets with dualistic constraints. In: Anthony, P., Ishizuka, M., Lukose, D. (eds.) PRICAI 2012. LNCS (LNAI), vol. 7458, pp. 807–813. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-32695-0_77
Chapter Google Scholar
Anh, T., Tin, T., Bac, L.: Structures of frequent itemsets and classifying structures of association rule set by order relations. Intell. Inf. Database Syst. 8(4), 295–323 (2014)
Google Scholar
Ayres, J., Flannick, J., Gehrke, J., Yiu, T.: Sequential pattern mining using a bitmap representation. In: Proceedings of 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 429–435 (2002)
Google Scholar
Bac, L., Hai, D., Tin, T., Fournier-Viger, P.: FCloSM, FGenSM: two efficient algorithms for mining frequent closed and generator sequences using the local pruning strategy. Knowl. Inf. Syst. 53(1), 71–107 (2017)
Article Google Scholar
Fournier-Viger, P., Gomariz, A., Campos, M., Thomas, R.: Fast vertical mining of sequential patterns using co-occurrence information. In: Tseng, V.S., Ho, T.B., Zhou, Z.-H., Chen, A.L.P., Kao, H.-Y. (eds.) PAKDD 2014. LNCS (LNAI), vol. 8443, pp. 40–52. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-06608-0_4
Chapter Google Scholar
Fournier-Viger, P., Gomariz, A., Gueniche, T., Soltani, A., Wu, C., Tseng, V.S.: SPMF: a Java open-source pattern mining library. Mach. Learn. Res. 15(1), 3389–3393 (2014)
MATH Google Scholar
Fournier-Viger, P., Gomariz, A., Šebek, M., Hlosta, M.: VGEN: fast vertical mining of sequential generator patterns. In: Bellatreche, L., Mohania, M.K. (eds.) DaWaK 2014. LNCS, vol. 8646, pp. 476–488. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10160-6_42
Chapter Google Scholar
Gomariz, A., Campos, M., Marin, R., Goethals, B.: ClaSP: an efficient algorithm for mining frequent closed sequences. In: Pei, J., Tseng, V.S., Cao, L., Motoda, H., Xu, G. (eds.) PAKDD 2013. LNCS (LNAI), vol. 7818, pp. 50–61. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-37453-1_5
Chapter Google Scholar
Hai, D., Tin, T., Bac, L.: Efficient algorithms for simultaneously mining concise representations of sequential patterns based on extended pruning conditions. Eng. Appl. Artif. Intell. 67, 197–210 (2018)
Article Google Scholar
Hai, D., Tin, T., Bay, V.: An efficient method for mining frequent itemsets with double constraints. Eng. Appl. Artif. Intell. 27, 148–154 (2014)
Article Google Scholar
Pei, J., et al.: Mining sequential patterns by pattern-growth: the PrefixSpan approach. IEEE Trans. Knowl. Data Eng. 16(11), 1424–1440 (2004)
Article Google Scholar
Srikant, R., Agrawal, R.: Mining sequential patterns: generalizations and performance improvements. In: Apers, P., Bouzeghoub, M., Gardarin, G. (eds.) EDBT 1996. LNCS, vol. 1057, pp. 1–17. Springer, Heidelberg (1996). https://doi.org/10.1007/BFb0014140
Chapter Google Scholar
Tin, T., Hai, D., Ngan, H.N.T.: Structure of frequent itemsets with extended double constraints. Vietnam J. Comput. Sci. 3(2), 119–135 (2016)
Article Google Scholar
Wang, J., Han, J., Li, C.: Frequent closed sequence mining without candidate maintenance. IEEE Trans. Knowl. Data Eng. 19(8), 1042–1056 (2007)
Article Google Scholar
Yan, X., Han, J., Afshar, R.: CloSpan: mining closed sequential patterns in large datasets. In: Proceedings of SIAM International Conference on Data Mining, pp. 166–177 (2003)
Google Scholar
Zaki, M.J.: SPADE: an efficient algorithm for mining frequent sequences. Mach. Learn. 42(1), 31–60 (2001)
Article Google Scholar

Download references

Acknowledgment

This work was supported by Vietnam’s National Foundation for Science and Technology Development (NAFOSTED) under Grant Number 102.05-2017.300.

Author information

Authors and Affiliations

VNU-HCMC, University of Science, Ho Chi Minh, Vietnam
Hai Duong & Bac Le
Department of Mathematics and Computer Science, University of Dalat, Dalat, Vietnam
Hai Duong & Tin Truong
School of Humanities and Social Sciences, Harbin Institute of Technology (Shenzhen), Shenzhen, China
Philippe Fournier-Viger

Authors

Hai Duong
View author publications
You can also search for this author in PubMed Google Scholar
Tin Truong
View author publications
You can also search for this author in PubMed Google Scholar
Bac Le
View author publications
You can also search for this author in PubMed Google Scholar
Philippe Fournier-Viger
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tin Truong .

Editor information

Editors and Affiliations

Missouri University of Science and Technology, Rolla, MO, USA
Sanjay Madria
Harbin Institute of Technology, Shenzhen, China
Philippe Fournier-Viger
Ahmedabad University, Ahmedabad, India
Sanjay Chaudhary
International Institute of Information Technology, Hyderabad, India
P. Krishna Reddy

Appendices

1.1 Appendix 1: Proof of Proposition 1

(i)
“$ {\mathcal{F}\mathcal{S}}\left( {\sigma , \gamma } \right) \subseteq {\mathcal{F}\mathcal{S}} '\left( {\sigma ,\gamma } \right) $”: Consider any $ \alpha \in {\mathcal{F}\mathcal{S}}\left( {\sigma , \gamma } \right):\gamma \;{ \sqsubseteq }\;\alpha \;{ \sqsubseteq }\;\sigma $. Without loss of generality, we can assume that $ \gamma = E_{1} \to E_{2} \to \ldots \to E_{p} $, $ \alpha = F_{1} \to F_{2} \to \ldots \to F_{q} $, $ \sigma = E_{1}^{'} \to E_{2}^{'} \to \ldots \to E_{q}^{'} $, $ F_{i} \subseteq E_{i}^{'} $, ∀i = 1, …, q and $ \exists lp = \{ j_{1} ,j_{2} , \ldots ,j_{p} \} $ $ \in LP(\alpha ,\gamma ) \subseteq LP(\sigma ,\gamma ) $, with $ 1 \leqslant j_{1} < j_{2} < \ldots j_{p} \leqslant q $: $ E_{k} \subseteq F_{jk} \subseteq E_{jk}^{'} $, ∀k = 1,…, p. Set $ d_{i} \mathop = \limits^{\text{def}} \left\{ {\begin{array}{*{20}c} {F_{i} ,} & { if\;i \notin lp} \\ {F_{i} \backslash E_{k} , } & {if\;i = j_{k} \in lp} \\ \end{array} } \right. $, $ D_{i} \mathop = \limits^{\text{def}} \left\{ {\begin{array}{*{20}c} {E^{{\prime }}_{i} , } & {if\; i \notin lp} \\ {E^{{\prime }}_{i} \backslash E_{k} , } & {if \;i = j_{k} \in lp} \\ \end{array} } \right. $, δ(lp) $ \mathop = \limits^{\text{def}} d_{1} \to d_{2} \to \ldots \to d_{q} $, $ E_{i}^{{ \sim }} \mathop = \limits^{\text{def}} \left\{ {\begin{array}{*{20}l} {\emptyset , \quad \quad if\;i \notin lp} \hfill \\ {E_{k} , \;if\; i = j_{k} \in lp} \hfill \\ \end{array} } \right. $, ∀i = 1, …, q, Ex(γ, lp) $ \mathop = \limits^{\text{def}} E_{1}^{{ \sim }} \to E_{2}^{{ \sim }} \to \ldots \to E_{q}^{{ \sim }} $, then $ d_{i} \subseteq D_{i} $ and $ F_{i} = E_{i}^{{ \sim }} + d_{i} $, ∀i = 1,…, q, so $ \alpha \, = \,Ex(\gamma, lp) \oplus \varDelta \left( {lp} \right)\, \in \,{\mathcal{F}\mathcal{S}} '\left( {\sigma ,\gamma } \right) $.
(ii)
“$ {\mathcal{F}\mathcal{S}}\left( {\sigma , \gamma } \right) \supseteq {\mathcal{F}\mathcal{S}} '\left( {\sigma ,\gamma } \right) $’’: $ \forall \alpha \, \in \,{\mathcal{F}\mathcal{S}} '\left( {\sigma ,\gamma } \right) $, $ \alpha = Ex(\gamma , \, lp) \, \oplus \, \varDelta \left( {lp} \right) = F_{1} \to F_{2} \to \ldots \to F_{q} $, then $ F_{i} = E_{i}^{{ \sim }} + d_{i} $, $ d_{i} \subseteq D_{i} $, $ E_{i}^{{ \sim }} \subseteq F_{i} \subseteq E_{i}^{\sim} + D_{i} \subseteq E_{i}^{'} $, ∀i = 1,…, q. Thus, $ \gamma \;{ \sqsubseteq }\;\alpha \;{ \sqsubseteq }\;\sigma $, $ \alpha \, \in \,{\mathcal{F}\mathcal{S}}\left( {\sigma , \gamma } \right) $. □

1.2 Appendix 2: Proof of Theorem 1

a.
First, we prove that $ \forall \alpha \, \in \,{\mathcal{F}\mathcal{S}} $, $ \exists \sigma \in CloSet(\alpha ) $ such that $ ro\mathop = \limits^{\text{def}} \rho (\alpha ) = \rho (\sigma ) $, so $ \sigma \, \in \,{\mathcal{F}\mathcal{C}\mathcal{S}} $ and $ ro \in RO $, then FS(ro) =[σ]. Indeed, $ \beta \, \in \,FS\left( {ro} \right) \Leftrightarrow [\beta \, \in \,{\mathcal{F}\mathcal{S}}\, \wedge \,\rho (\beta )\, = \,ro] \Leftrightarrow [\beta \, \in \,{\mathcal{F}\mathcal{S}}\, \wedge \,\rho (\beta )\, = \,\rho (\sigma )\, = \,ro] \Leftrightarrow \beta \in [\sigma ] $.
(i)
“$ {\mathcal{F}\mathcal{S}}\, \subseteq \,\mathop {\bigcup }\nolimits_{ro\, \in \,RO} FS\left( {ro} \right) $”: $ \forall \alpha \, \in \,{\mathcal{F}\mathcal{S}} $, $ \exists \sigma \in CloSet( \alpha )\;{\text{ and}}\,ro\mathop = \limits^{\text{def}} \rho (\sigma )\, = \,\rho ( \alpha ) $. Then $ \alpha \in [\sigma ],supp(\sigma ) = supp( \alpha ) \geqslant ms $. Thus, $ \sigma \, \in \,{\mathcal{F}\mathcal{C}\mathcal{S}} $, ro ∊ RO and $ \alpha $ ∊ FS(ro).
(ii)
“$ \mathop {\bigcup }\nolimits_{ro\, \in \,RO} FS\left( {ro} \right)\, \subseteq \,{\mathcal{F}\mathcal{S}} $”: $ \forall ro\, \in \,RO $ and $ \alpha $ ∊ FS(ro), then $ \exists \sigma \, \in \,{\mathcal{F}\mathcal{C}\mathcal{S}}:\,\alpha { \sqsubseteq }\sigma $ and $ ro = \rho (\sigma ) = \rho (\alpha ) $, so $ supp(\alpha ) = supp(\sigma ) \geqslant ms $ and $ \alpha \, \in \,{\mathcal{F}\mathcal{S}} $. Hence, $ {\mathcal{F}\mathcal{S}}\, = \,\mathop {\bigcup }\nolimits_{ro\, \in \,RO} FS\left( {ro} \right) $.

Since $ { \sim } $ is an equivalence relation, then different equivalence classes [σ] or FS(ro) are disjoint. Thus, $ {\mathcal{F}\mathcal{S}}\, = \,\mathop {\bigcup }\nolimits_{ro\, \in \,RO} FS\left( {ro} \right) $.

b.
First, we prove that all different subsets $ {\mathcal{F}\mathcal{S}}^{*} (\sigma_{i} ,\,\gamma_{j} ) $, $ \forall \sigma_{i} \in CS\left( {ro} \right) $, $ \forall \gamma_{j} \in GenSet(\sigma_{i} ) $, are disjoint. Indeed, assume conversely that $ \exists \sigma_{m} $, $ \sigma_{i} \in CS\left( {ro} \right) $, $ \exists \gamma_{k} \in GenSet(\sigma_{m} ) $, $ \exists \gamma_{j} \in GenSet(\sigma_{i} ) $: ($ \sigma_{m} \ne \sigma_{i} $ or $ \gamma_{k} \ne \gamma_{j} $) and $ \exists \alpha \, \in \,{\mathcal{F}\mathcal{S}}^{*} (\sigma_{m} ,\,\gamma_{k} )\, \cap \,{\mathcal{F}\mathcal{S}}^{*} (\sigma_{i} ,\,\gamma_{j} )( \ne \emptyset ) $. Consider the following two cases.

(i)
If $ \sigma_{m} \ne \sigma_{i} $, then without loss of generality, we can assume that m < i and $ \exists \gamma \in GenSet(\alpha ) $ (because $ GenSet(\alpha ) \, \ne \emptyset ) $. We have $ \gamma \, \in \,\mathcal{G}\mathcal{S} $, $ \gamma \;{ \sqsubseteq }\;\alpha \;{ \sqsubseteq }\;\sigma_{i} $, $ \rho (\gamma ) = \rho (\alpha ) = \rho (\sigma_{i} ) $, because $ \alpha \, \in \,{\mathcal{F}\mathcal{S}}^{*} (\sigma_{i} ,\,\gamma_{j} ) $. Then $ \gamma \in GenSet(\sigma_{i} ) $, $ \alpha \in {\mathcal{F}\mathcal{S}}(\sigma_{i} ,\gamma ) $ and $ \alpha \in {\mathcal{F}\mathcal{S}}^{*} (\sigma_{i} ,\gamma ) $. Moreover, since $ \alpha \, \in \,{\mathcal{F}\mathcal{S}}^{*} (\sigma_{m} ,\,\gamma_{k} )\, \subseteq \,{\mathcal{F}\mathcal{S}}(\sigma_{m} ,\,\gamma_{k} ) $, then $ \alpha { \sqsubseteq }\sigma_{m} $ with m < i. This contradicts the condition not(DCondC($ \alpha $, $ \sigma_{i} $, CS(ρ($ \sigma_{i} $)))) in $ {\mathcal{F}\mathcal{S}}^{*} (\sigma_{i} ,\,\gamma ) $.
(ii)
Otherwise, if $ \sigma_{m} \equiv \sigma_{i} $ and $ \gamma_{k} \ne \gamma_{j} $, then without loss of generality, we can assume that k < j. Since $ \alpha \, \in \,{\mathcal{F}\mathcal{S}}^{*} (\sigma_{i} ,\,\gamma_{k} ) \subseteq {\mathcal{F}\mathcal{S}}(\sigma_{i} ,\,\gamma_{k} ) $, then $ \alpha \;{ \sqsupseteq }\;\gamma_{k} $, with k < j. This also contradicts the condition not $ (DCondG(\alpha ,\sigma_{i} ,\gamma_{j} )) $ in $ {\mathcal{F}\mathcal{S}}^{*} (\sigma_{i} ,\,\gamma_{j} ) $.

Finally, we prove that $ FS\left( {ro} \right) = \mathop \sum \nolimits_{{\sigma_{i} \in CS\left( {ro} \right)}} \mathop \sum \nolimits_{{\gamma_{j} \in GenSet\left( {\sigma_{i} } \right)}} {\mathcal{F}\mathcal{S}}^{*} \left( {\sigma_{i} , \gamma_{j} } \right) $, $ \forall ro \in RO $.
(iii)
“⊆”: $ \forall \alpha \in FS\left( {ro} \right) $, then $ \alpha \in {\mathcal{F}\mathcal{S}} $ and ρ(α) = ro. Since $ CloSet\left( \alpha \right) \ne \emptyset $ and $ GenSet\left( \alpha \right) \ne \emptyset $, [$ \exists \sigma_{i} \in CloSet(\alpha ) $, $ \exists \gamma_{j} \in GenSet(\alpha ):\sigma_{i} \in {\mathcal{C}\mathcal{S}},\gamma_{j} \in {\mathcal{G}\mathcal{S}} $, $ \gamma_{j} \; { \sqsubseteq }\;\alpha \;{ \sqsubseteq }\;\sigma_{i} $, $ \rho (\sigma_{i} ) = \rho (\gamma_{j} ) = \rho \left( \alpha \right)]^{{^{\left( * \right)} }} $, so $ \gamma_{j} \in GenSet(\sigma_{i} ) $, $ supp(\sigma_{i} ) = supp(\gamma_{j} ) = supp\left( \alpha \right) \geqslant ms $, $ \sigma_{i} \in {\mathcal{F}\mathcal{C}\mathcal{S}} $ and $ \sigma_{i} \in CS\left( {ro} \right) $. Hence, $ \alpha \in {\mathcal{F}\mathcal{S}}(\sigma_{i} ,\gamma_{j} ) $. Without loss of generality, we can select $ \sigma_{i} $ and $ \gamma_{j} $ that are, respectively, the first closed and generator sequences satisfying the condition^(*). Thus, $ FS\left( {ro} \right) \subseteq \mathop \sum \nolimits_{{\sigma_{i} \in CS\left( {ro} \right)}} \mathop \sum \nolimits_{{\gamma_{j} \in GenSet\left( {\sigma_{i} } \right)}} {\mathcal{F}\mathcal{S}}^{*} \left( {\sigma_{i} ,\gamma_{j} } \right) $.
(iv)
“⊇”: $ \forall \sigma_{i} \in CS\left( {ro} \right) $, $ \forall \gamma_{j} \in GenSet(\sigma_{i} ) $, $ \forall \alpha \in {\mathcal{F}\mathcal{S}}^{*} (\sigma_{i} ,\gamma_{j} ) $, we have $ \sigma_{i} \in {\mathcal{F}\mathcal{C}\mathcal{S}} $, $ \alpha { \sqsubseteq }\sigma_{i} $, $ \rho \left( \alpha \right) \, = \rho (\sigma_{i} ) = ro $, so $ supp\left( \alpha \right) = supp(\sigma_{i} ) \geqslant ms $, i.e. $ \alpha \in FS\left( {ro} \right) $ . Thus, $ {\mathcal{F}\mathcal{S}}^{*} \left( {\sigma_{i} ,\gamma_{j} } \right) \subseteq FS\left( {ro} \right)\,{\text{and}}\,FS\left( {ro} \right) \supseteq \sum\nolimits_{{\sigma_{i} \in CS\left( {ro} \right)}} {\sum\nolimits_{{\gamma_{j} \in GenSet\left( {\sigma_{i} } \right)}} {{\mathcal{F}\mathcal{S}}^{*} \left( {\sigma_{i} ,\gamma_{j} } \right)} } $. □

1.3 Appendix 3: Proof of Theorem 2

(a)
It is clear that $ {\mathcal{F}\mathcal{S}}^{**} \left( {\sigma ,\gamma } \right) \subseteq { \mathcal{F}\mathcal{S}}^{{\prime }} \left( {\sigma , \gamma } \right) $. In the proof of the above three pruning cases, $ \forall \alpha \in {\mathcal{F}\mathcal{S}}^{{\prime }} \left( {\sigma , \gamma } \right) $, if ∃i = 1, 2, 3: $ DCond_{i} $ is true, then α is a duplicate of some previously generated sequences. Thus, $ {\mathcal{F}\mathcal{S}} '\left( {\sigma , \gamma } \right) \subseteq {\mathcal{F}\mathcal{S}}^{**} \left( {\sigma ,\gamma } \right) $ and $ {\mathcal{F}\mathcal{S}} '\left( {\sigma , \gamma } \right) = {\mathcal{S}}^{**} \left( {\sigma ,\gamma } \right) $.
(b)
We will prove the first assertion by contradiction. Assume conversely that there are two sequences $ \alpha = Ex(\gamma ,lp_{m} ) $ ⊕ $ \varDelta (lp_{m} ) $ and $ \beta = Ex(\gamma ,lp_{n} ) \oplus $, $ \varDelta (lp_{n} ) $ (in $ {\mathcal{F}\mathcal{S}}^{**} \left( {\sigma ,\gamma } \right) $) according to two different position lists in LP(σ, γ): $ lp_{m} \ne lp_{n} $, but α ≡ β, i.e., there exist two $ lp_{m} = \{ j_{1} ,j_{2} , \ldots ,j_{p} \} $, $ lp_{n} = \{ i_{1} ,i_{2} , \ldots ,i_{p} \} \in LP\left( {\sigma ,\gamma } \right) $ with m < n, $ 1 \leqslant j_{1} < j_{2} < \ldots j_{p} \geqslant q $, $ 1 \leqslant i_{1} < i_{2} < \ldots < i_{p} \leqslant q $, $ d(lp_{m} )_{i} \mathop = \limits^{\text{def}} \left\{ {\begin{array}{*{20}l} {E^{{\prime }}_{i} , \quad \quad if\, i \notin lp_{m} } \hfill \\ {E^{{\prime }}_{i} \backslash E_{k} , \, if\, i = j_{k} \in lp_{m} } \hfill \\ \end{array} } \right. $, $ d_{i} \mathop = \limits^{\text{def}} d(lp_{n} )_{i} \mathop = \limits^{\text{def}} \left\{ {\begin{array}{*{20}l} {E_{i}^{'} ,\quad \quad if \,i \notin lp_{n} } \hfill \\ {E^{{\prime }}_{i} \backslash E_{k} , \, if\, i = i_{k} \in lp_{n} } \hfill \\ \end{array} } \right. $, $ F_{i} \subseteq d\left( {lp_{m} } \right)_{i} $, $ F^{{\prime }}_{i} \subseteq d_{i} $, ∀i = 1,…, q, and two corresponding sequences $ \alpha = F_{1} \to F_{2} \to \ldots \to F_{q} $ and $ \beta = F_{1}^{'} \to F_{2}^{'} \to \ldots \to F^{{\prime }}_{q} $ that belong to $ {\mathcal{F}\mathcal{S}}^{**} \left( {\sigma ,\gamma } \right) $ such that α ≡ β and $ DCond_{k} (lp_{m} ) $, $ DCond_{k} (lp_{n} ) $ are false, ∀k = 1, 2, 3.

After deleting all empty itemsets in α and β, we obtain $ \alpha = F_{{u_{1} }} \to F_{{u_{2} }} \to \ldots \to F_{{u_{N} }} $, $ \beta = F^{{\prime }}_{{v_{1} }} \to F^{{\prime }}_{{v_{2} }} \to \ldots \to F^{{\prime }}_{{v_{N} }} $ with N = size(α) = size(β), $ 1 \leqslant u_{1} < u_{2} < \ldots < u_{N} \leqslant q $, $ 1 \leqslant v_{1} < v_{2} < \ldots < v_{N} \leqslant q $ and $ F_{{u_{i} }} = F^{{\prime }}_{{v_{i} }} \ne \emptyset ,\forall i = 1, \ldots ,N $ (because α ≡ β). We set $ i_{0} = j_{0} = u_{0} = v_{0} = k_{0} \equiv \, 0 $, $ k_{p + 1} = N $ and $ 1 \leqslant k_{1} < k_{2} < \ldots < k_{p} \leqslant N $ : $ j_{r} = u_{{k_{r} }} $ (so $ F_{{u_{{k_{r} }} }} = F_{{j_{r} }} \supseteq E_{r} \subseteq F^{{\prime }}_{{j_{r} }} $), ∀r = 1, …, p.

For ∀r = 1, …, p + 1, $ \forall k = \left( {k_{r - 1} + 1} \right), \ldots , k_{r} $, then $ u_{{k_{r - 1} }} = j_{r - 1} < u_{k} \leqslant j_{r} \leqslant i_{r} $ and we prove that if {($ v_{h} = u_{h} $, ∀h = 0, …, k − 1), $ \left( {v_{k - 1} < i_{r} } \right) $ and $ (\forall r^{{\prime }} = \, 0, \, ..,r - 1,j_{{r^{{\prime }} }} = i_{{r^{{\prime }} }} = u_{{k_{{r^{{\prime }} }} }} = v_{{k_{{r^{{\prime }} }} }} )\} $, (H(r, k)), then {$ v_{k} = u_{k} $, and if (r ⩽ p) then [($ v_{k} < i_{r} $, if $ k < k_{r} $) and $ \left( {j_{r} = i_{r} = u_{{k_{r} }} = v_{{k_{r} }} } \right) $]}, (C(r, k)).

Since $ i_{0} = j_{0} = u_{0} = v_{0} = k_{0} = 0 $ and $ i_{0} < i_{1} $, then the hypothesis H(1, 1) is always true. For ∀r = 1, …, p + 1, $ \forall k = \, (k_{r - 1} + 1), ..,k_{r} $, assume that the hypothesis H(r, k) is true.

(i)
First, we consider any k such that [$ k_{r - 1} + 1 \leqslant k < k_{r} $, for r ⩽ p] or [$ k_{p} + 1 \leqslant k \leqslant N $, with r = p + 1]. If r ⩽ p, then $ i_{r - 1} = j_{r - 1} = u_{{k_{r - 1} }} < u_{k} < j_{r} = u_{{k_{r} }} $ and $ v_{k} \leqslant i_{r} $, because $ F_{{u_{k} }} = F^{{\prime }}_{{v_{k} }} \ne \emptyset $ and $ v_{k - 1} < i_{r} $. Consider the following three cases.

If ($ r \leqslant p $ and $ v_{k} = i_{r} $): then $ j_{r - 1} < u_{k} < j_{r} \leqslant v_{k} = i_{r} $, $ E^{{\prime }}_{{u_{k} }} \supseteq F_{{u_{k} }} = F^{{\prime }}_{{v_{k} }} = F^{{\prime }}_{{i_{r} }} \supseteq E_{r} $, so $ u_{k} \in lp_{pre} \mathop = \limits^{\text{def}} \{ j_{1} , \ldots ,j_{r - 1} , u_{k} ,j_{r + 1} , \ldots ,j_{p} \} $ (and $ lp_{pre} \mathop = \limits^{\text{def}} \{ u_{k} ,j_{2} , \ldots ,j_{p} \} $ if r = 1), $ lp_{pre} \prec lp_{m} $, FI(pre, m) = r, $ lp_{pre} \left[ r \right] \, = u_{k} $ and the $ u_{k}^{th} $ itemset $ \left( {F_{{u_{k} }} } \right) $ contains $ E_{r} $, so $ DCond_{2} (lp_{pre} ,lp_{\varvec{m}} ,u_{k} ) $ is true. Therefore, $ v_{k} < i_{r} $.

If ($ u_{k} < v_{k} $ and ($ v_{k} < i_{r} $, if $ r \leqslant p $)): then $ i_{r - 1} = j_{r - 1} = u_{{k_{r - 1} }} \leqslant u_{k - 1} = v_{k - 1} < u_{k} < v_{k} $ and ($ v_{k} < i_{r} $, with $ r \leqslant p $), $ v_{k} \notin lp_{n} $, $ u_{k} \notin lp_{n} $, $ d\left( {lp_{n} } \right)_{{v_{k} }} = E^{{\prime }}_{{v_{k} }} $, $ d\left( {lp_{n} } \right)_{{u_{k} }} = E^{{\prime }}_{{u_{k} }} $, $ F^{{\prime }}_{{v_{k - 1} }} \ne \emptyset $, $ F^{{\prime }}_{r'} = \emptyset $, $ \forall r^{{\prime }} = (v_{k - 1} + 1), \ldots ,(v_{k} - 1) $, $ v_{k - 1} + 1 \le u_{k} \le v_{k} - 1 $. Moreover, for the non-empty itemset $ its_{{t_{v_{k}} }} $ of $ \beta $ such that $ its_{{t_{v_{k}} }} = F^{\prime }_{{v_{k} }} = F_{{u_{k} }} \subseteq E^{\prime }_{{v_{k} }} \cap E^{\prime }_{{u_{k} }} = d\left( {lp_{n} } \right)_{{v_{k} }} \cap d\left( {lp_{n} } \right)_{{u_{k} }} $, so $ DCond_{1} (lp_{n} ,v_{k} ,u_{k} ) $ is true.

If $ v_{k} < u_{k} $, then $ j_{r - 1} = u_{{k_{r - 1} }} \le u_{k - 1} = v_{k - 1} < v_{k} < u_{k} $ and ($ u_{k} < j_{r} $, with $ r \leqslant p $). Similarly, $ u_{k} \notin lp_{m} $, $ v_{k} \notin lp_{m} $, $ d\left( {lp_{m} } \right)_{{v_{k} }} = E^{{\prime }}_{{v_{k} }} $, $ d\left( {lp_{m} } \right)_{{u_{k} }} = E^{{\prime }}_{{u_{k} }} $, $ F_{{u_{k - 1} }} \ne \emptyset $, $ F_{{r^{{\prime }} }} = \emptyset $, $ \forall r^{{\prime }} = (u_{k - 1} + 1), \ldots ,(u_{k} - 1) $, $ u_{k - 1} + 1 \leqslant v_{k} \leqslant u_{k} - 1 $. Moreover, for the non-empty itemset $ its_{{u_{k} }} $ of $ \alpha $ such that $ its_{{u_{k} }} = F_{{u_{k} }} = F_{{v_{k} }}^{{\prime }} \subseteq E_{{u_{k} }}^{{\prime }} \cap E_{{v_{k} }}^{{\prime }} = d\left( {lp_{m} } \right)_{{u_{k} }} \cap d\left( {lp_{m} } \right)_{{v_{k} }} $, so $ DCond_{1} (lp_{m} ,u_{k} ,v_{k} ) $ is true.

Hence, $ u_{k} = v_{k} $ and ($ v_{k} < j_{r} \leqslant i_{r} $, with $ r \leqslant p $). $ \left( {C\left( {r, \, k} \right)^{(1)} } \right) $.
(ii)
Second, if $ r \leqslant p $, we consider $ k = k_{r} $. Then $ i_{r - 1} = j_{r - 1} = u_{{k_{r - 1} }} < u_{k} = u_{{k_{r} }} = j_{r} \le i_{r} $ Since $ F_{{u_{k} }} = F^{{\prime }}_{{v_{k} }} \ne \emptyset $ and $ v_{k - 1} = v_{{k_{r} - 1}} < i_{r} $, then $ v_{{k_{r} }} \le i_{r} $. Consider the following cases.

If $ v_{{k_{r} }} < u_{{k_{r} }} = j_{r} \le i_{r} $, then $ i_{r - 1} = j_{r - 1} = u_{{k_{r - 1} }} = v_{{k_{r - 1} }} < v_{{k_{r} }} < u_{{k_{r} }} = j_{r} \le i_{r} $, $ E^{{\prime }}_{{v_{{k_{r} }} }} \supseteq F^{{\prime }}_{{v_{{k_{r} }} }} = F_{{u_{{k_{r} }} }} = F_{{j_{r} }} \supseteq E_{r} $. Thus, $ v_{{k_{r} }} \in lp_{pre} \mathop = \limits^{\text{def}} \{ i_{1} , \ldots ,i_{r - 1} , v_{{k_{r} }} ,i_{r + 1} , \ldots ,i_{p} \} $ (and $ lp_{pre} \mathop = \limits^{\text{def}} \{ v_{{k_{1} }} ,i_{2} , \ldots ,i_{p} \} $, if r = 1), $ lp_{pre} \prec lp_{n} $, $ FI\left( {pre,n} \right) \, = r,lp_{pre} \left[ r \right] \, = v_{{k_{r} }} $ and with the itemset $ F^{{\prime }}_{{v_{{k_{r} }} }} $ of $ \beta $ such that $ F^{{\prime }}_{{v_{{k_{r} }} }} \supseteq E_{r} $, so $ DCond_{2} (lp_{pre} ,lp_{n} ,v_{{k_{r} }} ) $ is true (or $ DCond_{3} (lp_{pre} \mathop = \limits^{\text{def}} \{ j_{1} , \ldots ,j_{r - 1} , v_{{k_{r} }} ,j_{r + 1} , \ldots ,j_{p} \} ,lp_{m} ,u_{{k_{r} }} ) $ is true).

Therefore, $ j_{r} = u_{{k_{r} }} \leqslant v_{{k_{r} }} \leqslant i_{r} $. But the case $ u_{{k_{r} }} < v_{{k_{r} }} \le i_{r} $ cannot occur.

Indeed, if $ j_{r} = u_{{k_{r} }} < v_{{k_{r} }} = i_{r} $, then $ i_{r - 1} = j_{r - 1} = u_{{k_{r - 1} }} = v_{{k_{r - 1} }} \leqslant u_{{k_{r} - 1}} < u_{{k_{r} }} = j_{r} < v_{{k_{r} }} = i_{r} $. Thus, $ F^{{\prime }}_{{r^{{\prime }} }} = \emptyset $, $ \forall r^{{\prime }} = u_{{k_{r} }} , \ldots ,(v_{{k_{r} }} - 1) $, $ E^{{\prime }}_{{u_{{k_{r} }} }} \supseteq F_{{u_{{k_{r} }} }} = F_{{j_{r} }} \supseteq E_{r} $, $ u_{{k_{r} }} \in lp_{pre} \mathop = \limits^{\text{def}} \{ i_{1} , \ldots ,i_{r - 1} , u_{{k_{r} }} ,i_{r + 1} , \ldots ,i_{p} \} $ (and $ lp_{pre} \mathop = \limits^{\text{def}} \{ u_{{k_{1} }} ,i_{2} , \ldots ,i_{p} \} $, if r = 1), $ lp_{pre} \prec lp_{n} $, FI(pre, n) = r, $ lp_{pre} \left[ r \right] \, = u_{{k_{r} }} $, $ lp_{n} \left[ r \right] \, = i_{r} = v_{{k_{r} }} $. Since $ its_{{v_{{k_{r} }} }} = F^{{\prime }}_{{v_{{k_{r} }} }} \backslash E_{r} \subseteq d(lp_{n} )_{{v_{{k_{r} }} }} = E^{{\prime }}_{{v_{{k_{r} }} }} \backslash E_{r} $ and $ F_{{u_{{k_{r} }} }} \backslash E_{r} \subseteq d(lp_{pre} )_{{u_{{k_{r} }} }} = E^{{\prime }}_{{u_{{k_{r} }} }} \backslash E_{r} $, then for itemset $ its_{{v_{{k_{r} }} }} $ of $ \beta $ such that $ its_{{v_{{k_{r} }} }} = F^{{\prime }}_{{v_{{k_{r} }} }} \backslash E_{r} = F_{{u_{{k_{r} }} }} \backslash E_{r} \subseteq d(lp_{n} )_{{v_{{k_{r} }} }} \cap d(lp_{pre} )_{{u_{{k_{r} }} }} $. Therefore, $ DCond_{3} (lp_{pre} ,lp_{n} ,v_{{k_{r} }} ) $ is true.

If $ j_{r} = u_{{k_{r} }} < v_{{k_{r} }} < i_{r} $, then $ i_{r - 1} = j_{r - 1} = u_{{k_{r - 1} }} = v_{{k_{r - 1} }} \leqslant v_{{k_{r} - 1}} = u_{{k_{r} - 1}} < u_{{k_{r} }} = j_{r} < v_{{k_{r} }} < i_{r} $. Hence, $ F^{{\prime }}_{{r^{{\prime }} }} = \emptyset $, $ \forall r^{{\prime }} = \, (v_{{k_{r} - 1}} + 1), \, .., \, (v_{{k_{r} }} - 1) $, $ v_{{k_{r} - 1}} + 1 \leqslant u_{{k_{r} }} \leqslant v_{{k_{r} }} - 1,u_{{k_{r} }} $ and $ v_{{k_{r} }} \notin lp_{n} ,d\left( {lp_{n} } \right)_{{v_{{k_{r} }} }} = E^{{\prime }}_{{v_{{k_{r} }} }} ,F_{{u_{{k_{r} }} }} \subseteq d\left( {lp_{n} } \right)_{{u_{{k_{r} }} }} = E^{{\prime }}_{{u_{{k_{r} }} }} $. Thus, for the non-empty itemset $ its_{{v_{{k_{r} }} }} $ of $ \beta $ such that $ its_{{v_{{k_{r} }} }} = F^{{\prime }}_{{v_{{k_{r} }} }} = F_{{u_{{k_{r} }} }} \subseteq d\left( {lp_{n} } \right)_{{v_{{k_{r} }} }} \cap d\left( {lp_{n} } \right)_{{u_{{k_{r} }} }} $ , the condition $ DCond_{1} $($ lp_{n} $, $ v_{{k_{r} }} $, $ u_{{k_{r} }} $) is true.

Thus, $ v_{{k_{r} }} = u_{{k_{r} }} = j_{r} \leqslant i_{r} $. However, if $ v_{{k_{r} }} = u_{{k_{r} }} = j_{r} < i_{r} $, then $ i_{r - 1} = j_{r - 1} = u_{{k_{r - 1} }} = v_{{k_{r - 1} }} < v_{{k_{r} }} = u_{{k_{r} }} = j_{r} < i_{r} $, $ E^{{\prime }}_{{j_{r} }} \supseteq F^{{\prime }}_{{j_{r} }} = F^{'}_{{v_{{k_{r} }} }} = F_{{u_{{k_{r} }} }} \supseteq E_{r} $. Thus, $ j_{r} \in lp_{pre} \mathop = \limits^{\text{def}} \{ i_{1} , \ldots ,i_{r - 1} , j_{r} ,i_{r + 1} , \ldots ,i_{p} \} $ (and $ lp_{pre} \mathop = \limits^{\text{def}} \{ j_{1} ,i_{2} , \ldots ,i_{p} \} $, if r = 1), $ lp_{pre} \prec lp_{n} $, FI(pre, n) = r, $ lp_{pre} $[r] = $ j_{r} $ and the itemset $ F^{{\prime }}_{{j_{r} }} $ of $ \beta $ contains $ E_{r} $, so $ DCond_{2} $($ lp_{pre} $, $ lp_{n} $, $ j_{r} $) is true.

Hence, $ v_{{k_{r} }} = u_{{k_{r} }} = j_{r} = i_{r} $. (C(r, k)⁽²⁾).

From (C(r, k) ⁽¹⁾) − (C(r, k) ⁽²⁾), we have (C(r, k)).

Finally, under the hypothesis α ≡ β, then ∀r = 1, …, p + 1, ∀k = $ (k_{r - 1} + 1), \ldots , k_{r} $, we always have ($ v_{k} = u_{k} $), ($ j_{r} = i_{r} = u_{{k_{r} }} = v_{{k_{r} }} $, if $ r \leqslant p $). Therefore,

$$ (\forall k^{{\prime }} = 1, \ldots ,N,v_{{k^{{\prime }} }} = u_{{k^{{\prime }} }} )^{(*)} \,{\text{and}}\, (\forall r = 1, \ldots ,p,j_{r} = i_{r} )^{(**)} . $$

If $ lp_{m} = lp_{n} $, then $ j_{r} = i_{r} $, ∀r = 1,.., p. Since α and β are generated from two different position lists in $ {\mathcal{F}\mathcal{S}}^{**} \left( {\sigma ,\gamma } \right) $, so $ \exists i_{0} = 1, \ldots ,N $ such that $ u_{{i_{0} }} \ne v_{{i_{0} }} $. This contradicts ^(*).

If $ lp_{m} \ne lp_{n} $, i.e. $ \exists m_{0} = 1, \ldots ,p $ such that $ i_{{m_{0} }} \ne j_{{m_{0} }} $. This also contradicts ^(**). In other words, the hypothesis α ≡ β always leads to a contradiction.

Thus, α ≠ β, i.e., all sequences in $ {\mathcal{F}\mathcal{S}}^{**} \left( {\sigma ,\gamma } \right) $ are distinctly generated.

From Proposition 1 and Theorem 1.a, we have $ {\mathcal{F}\mathcal{S}}\left( {\sigma ,\gamma } \right) = {\mathcal{F}\mathcal{S}}^{{\prime }} \left( {\sigma ,\gamma } \right) = {\mathcal{F}\mathcal{S}}^{**} \left( {\sigma ,\gamma } \right) $ and the remaining assertions are deduced from Theorem 1. □

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Duong, H., Truong, T., Le, B., Fournier-Viger, P. (2019). An Explicit Relationship Between Sequential Patterns and Their Concise Representations. In: Madria, S., Fournier-Viger, P., Chaudhary, S., Reddy, P. (eds) Big Data Analytics. BDA 2019. Lecture Notes in Computer Science(), vol 11932. Springer, Cham. https://doi.org/10.1007/978-3-030-37188-3_20

Download citation

DOI: https://doi.org/10.1007/978-3-030-37188-3_20
Published: 12 December 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-37187-6
Online ISBN: 978-3-030-37188-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

An Explicit Relationship Between Sequential Patterns and Their Concise Representations

Abstract

Access this chapter

Notes

References

Acknowledgment

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Appendices

Appendices

1.1 Appendix 1: Proof of Proposition 1

1.2 Appendix 2: Proof of Theorem 1

1.3 Appendix 3: Proof of Theorem 2

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation