Skip to main content
Log in

Effective database transformation and efficient support computation for mining sequential patterns

  • Published:
Journal of Intelligent Information Systems Aims and scope Submit manuscript

Abstract

In this paper, we propose a novel algorithm for mining frequent sequences from transaction databases. The transactions of the same customers form a set of customer sequences. A sequence (an ordered list of itemsets) is frequent if the number of customer sequences containing it satisfies the user-specified threshold. The 1-sequence is a special type of sequences because it consists of only a single itemset instead of an ordered list, while the k-sequence is a sequence composed of k itemsets. Compared with the cost of mining frequent k-sequences (k ≥ 2), the cost of mining frequent 1-sequences is negligible. We adopt a two-phase architecture to find the two types of frequent sequences separately in order that the discovery of frequent k-sequences can be well designed and optimized. For efficient frequent k-sequence mining, every frequent 1-sequence is encoded as a unique symbol and the database is transformed into one constituted by the symbols. We find that it is unnecessary to encode all the frequent 1-seqences, and make full use of the discovered frequent 1-sequences to transform the database into one with a smaller size. For every k ≥ 2, the customer sequences in the transformed database are scanned to find all the frequent k-sequences. We devise the compact representation for a customer sequence and elaborate the method to enumerate all distinct subsequences from a customer sequence without redundant scans. The soundness of the proposed approach is verified and a number of experiments are performed. The results show that our approach outperforms the previous works in both scalability and execution time.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Agrawal, R., & Srikant, R. (1995). Mining sequential patterns. Proceedings of International Conference on Data Engineering (pp. 3–14).

  • Agrawal, R., & Srikant, R. (1996). Mining sequential patterns: Generalizations and performance improvements. Proceedings of International Conference on Extending Database Technology (pp. 3–17).

  • Ayres, J., Gehrke, J., Yiu, T., & Flannick, J. (2002). Sequential pattern mining using a bitmap representation. Proceedings of ACM SIGKDD Conference (pp. 429–435).

  • Chiu, D. Y., Wu, Y. H., & Chen, A. L. P. (2004). An efficient algorithm for mining frequent sequences by a new strategy without support counting. Proceedings of International Conference on Data Engineering (pp. 375–386).

  • Cho, C. W., Wu, Y. H., & Chen, A. L. P. (2005). Effective database transformation and efficient support computation for mining sequential patterns. Proceedings of Database Systems for Advanced Applications Conference (pp. 163–174).

  • Pei, J., Han, J., Mortazavi-Asl, B., Wang, J., Pinto, H., Chen, Q., et al. (2004). Mining sequential patterns by pattern-growth: The prefixSpan approach. IEEE Transactions on Knowledge and Data Engineering, 16(11), 1424–1440.

    Article  Google Scholar 

  • Sun, X., Orlowska, M. E., & Li, X. (2003). Introducing uncertainty into pattern discovery in temporal event sequences. Proceedings of IEEE International Conference on Data Mining (pp. 299–306).

  • Tzvetkov, P., Yan, X., & Han, J. (2003). TSP: Mining top-K closed sequential patterns. Proceedings of IEEE International Conference on Data Mining (pp. 347–354).

  • Wang, K., Tang, L., Han, J., & Liu, J. (2002). Top-down FP-growth for association rule mining. Proceedings of Pacific-Asia Knowledge Discovery and Data Mining Conference (pp. 334–340).

  • Yan, X., Han, J., & Afshar, R. (2003). CloSpan: Mining closed sequential patterns in large datasets. Proceedings of SIAM International Conference Data Mining (pp. 166–177).

  • Yang, J., Wang, W., Yu, P. S., & Han, J. (2002). Mining long sequential patterns in a noisy environment. Proceedings of ACM SIGMOD International Conference on Management of Data (pp. 406–417).

  • Zaki, M. J. (2001). An efficient algorithm for mining frequent sequences. Machine Learning, 42(1/2), 31–60.

    Article  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Arbee L. P. Chen.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Cho, CW., Wu, YH. & Chen, A.L.P. Effective database transformation and efficient support computation for mining sequential patterns. J Intell Inf Syst 32, 23–51 (2009). https://doi.org/10.1007/s10844-007-0047-y

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10844-007-0047-y

Keywords

Navigation