Effective database transformation and efficient support computation for mining sequential patterns

Cho, Chung-Wen; Wu, Yi-Hung; Chen, Arbee L. P.

doi:10.1007/s10844-007-0047-y

Effective database transformation and efficient support computation for mining sequential patterns

Published: 07 November 2007

Volume 32, pages 23–51, (2009)
Cite this article

Journal of Intelligent Information Systems Aims and scope Submit manuscript

Chung-Wen Cho¹,
Yi-Hung Wu² &
Arbee L. P. Chen³

70 Accesses
3 Citations
Explore all metrics

Abstract

In this paper, we propose a novel algorithm for mining frequent sequences from transaction databases. The transactions of the same customers form a set of customer sequences. A sequence (an ordered list of itemsets) is frequent if the number of customer sequences containing it satisfies the user-specified threshold. The 1-sequence is a special type of sequences because it consists of only a single itemset instead of an ordered list, while the k-sequence is a sequence composed of k itemsets. Compared with the cost of mining frequent k-sequences (k ≥ 2), the cost of mining frequent 1-sequences is negligible. We adopt a two-phase architecture to find the two types of frequent sequences separately in order that the discovery of frequent k-sequences can be well designed and optimized. For efficient frequent k-sequence mining, every frequent 1-sequence is encoded as a unique symbol and the database is transformed into one constituted by the symbols. We find that it is unnecessary to encode all the frequent 1-seqences, and make full use of the discovered frequent 1-sequences to transform the database into one with a smaller size. For every k ≥ 2, the customer sequences in the transformed database are scanned to find all the frequent k-sequences. We devise the compact representation for a customer sequence and elaborate the method to enumerate all distinct subsequences from a customer sequence without redundant scans. The soundness of the proposed approach is verified and a number of experiments are performed. The results show that our approach outperforms the previous works in both scalability and execution time.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An efficient join operations for utility list-based high-utility mining approaches using hybrid search technique

Article 12 April 2024

Time analysis of online consumer behavior by decision trees, GUHA association rules, and formal concept analysis

Article Open access 09 January 2024

A survey on Bayesian network structure learning from data

Article 29 May 2019

References

Agrawal, R., & Srikant, R. (1995). Mining sequential patterns. Proceedings of International Conference on Data Engineering (pp. 3–14).
Agrawal, R., & Srikant, R. (1996). Mining sequential patterns: Generalizations and performance improvements. Proceedings of International Conference on Extending Database Technology (pp. 3–17).
Ayres, J., Gehrke, J., Yiu, T., & Flannick, J. (2002). Sequential pattern mining using a bitmap representation. Proceedings of ACM SIGKDD Conference (pp. 429–435).
Chiu, D. Y., Wu, Y. H., & Chen, A. L. P. (2004). An efficient algorithm for mining frequent sequences by a new strategy without support counting. Proceedings of International Conference on Data Engineering (pp. 375–386).
Cho, C. W., Wu, Y. H., & Chen, A. L. P. (2005). Effective database transformation and efficient support computation for mining sequential patterns. Proceedings of Database Systems for Advanced Applications Conference (pp. 163–174).
Pei, J., Han, J., Mortazavi-Asl, B., Wang, J., Pinto, H., Chen, Q., et al. (2004). Mining sequential patterns by pattern-growth: The prefixSpan approach. IEEE Transactions on Knowledge and Data Engineering, 16(11), 1424–1440.
Article Google Scholar
Sun, X., Orlowska, M. E., & Li, X. (2003). Introducing uncertainty into pattern discovery in temporal event sequences. Proceedings of IEEE International Conference on Data Mining (pp. 299–306).
Tzvetkov, P., Yan, X., & Han, J. (2003). TSP: Mining top-K closed sequential patterns. Proceedings of IEEE International Conference on Data Mining (pp. 347–354).
Wang, K., Tang, L., Han, J., & Liu, J. (2002). Top-down FP-growth for association rule mining. Proceedings of Pacific-Asia Knowledge Discovery and Data Mining Conference (pp. 334–340).
Yan, X., Han, J., & Afshar, R. (2003). CloSpan: Mining closed sequential patterns in large datasets. Proceedings of SIAM International Conference Data Mining (pp. 166–177).
Yang, J., Wang, W., Yu, P. S., & Han, J. (2002). Mining long sequential patterns in a noisy environment. Proceedings of ACM SIGMOD International Conference on Management of Data (pp. 406–417).
Zaki, M. J. (2001). An efficient algorithm for mining frequent sequences. Machine Learning, 42(1/2), 31–60.
Article MATH Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, National Tsing Hua University, Hsinchu, Taiwan, Republic of China
Chung-Wen Cho
Department of Information and Computer Engineering, Chung Yuan Christian University, Jhongli, Taiwan, Republic of China
Yi-Hung Wu
Department of Computer Science, National Chengchi University, Tapei, Taiwan, Republic of China
Arbee L. P. Chen

Authors

Chung-Wen Cho
View author publications
You can also search for this author in PubMed Google Scholar
Yi-Hung Wu
View author publications
You can also search for this author in PubMed Google Scholar
Arbee L. P. Chen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Arbee L. P. Chen.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Cho, CW., Wu, YH. & Chen, A.L.P. Effective database transformation and efficient support computation for mining sequential patterns. J Intell Inf Syst 32, 23–51 (2009). https://doi.org/10.1007/s10844-007-0047-y

Download citation

Received: 16 August 2005
Revised: 22 August 2007
Accepted: 24 August 2007
Published: 07 November 2007
Issue Date: February 2009
DOI: https://doi.org/10.1007/s10844-007-0047-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Effective database transformation and efficient support computation for mining sequential patterns

Abstract

Access this article

Similar content being viewed by others

An efficient join operations for utility list-based high-utility mining approaches using hybrid search technique

Time analysis of online consumer behavior by decision trees, GUHA association rules, and formal concept analysis

A survey on Bayesian network structure learning from data

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Effective database transformation and efficient support computation for mining sequential patterns

Abstract

Access this article

Similar content being viewed by others

An efficient join operations for utility list-based high-utility mining approaches using hybrid search technique

Time analysis of online consumer behavior by decision trees, GUHA association rules, and formal concept analysis

A survey on Bayesian network structure learning from data

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation