Abstract
In this paper, we propose a novel algorithm for mining frequent sequences from transaction databases. The transactions of the same customers form a set of customer sequences. A sequence (an ordered list of itemsets) is frequent if the number of customer sequences containing it satisfies the user-specified threshold. The 1-sequence is a special type of sequences because it consists of only a single itemset instead of an ordered list, while the k-sequence is a sequence composed of k itemsets. Compared with the cost of mining frequent k-sequences (k ≥ 2), the cost of mining frequent 1-sequences is negligible. We adopt a two-phase architecture to find the two types of frequent sequences separately in order that the discovery of frequent k-sequences can be well designed and optimized. For efficient frequent k-sequence mining, every frequent 1-sequence is encoded as a unique symbol and the database is transformed into one constituted by the symbols. We find that it is unnecessary to encode all the frequent 1-seqences, and make full use of the discovered frequent 1-sequences to transform the database into one with a smaller size. For every k ≥ 2, the customer sequences in the transformed database are scanned to find all the frequent k-sequences. We devise the compact representation for a customer sequence and elaborate the method to enumerate all distinct subsequences from a customer sequence without redundant scans. The soundness of the proposed approach is verified and a number of experiments are performed. The results show that our approach outperforms the previous works in both scalability and execution time.
Similar content being viewed by others
References
Agrawal, R., & Srikant, R. (1995). Mining sequential patterns. Proceedings of International Conference on Data Engineering (pp. 3–14).
Agrawal, R., & Srikant, R. (1996). Mining sequential patterns: Generalizations and performance improvements. Proceedings of International Conference on Extending Database Technology (pp. 3–17).
Ayres, J., Gehrke, J., Yiu, T., & Flannick, J. (2002). Sequential pattern mining using a bitmap representation. Proceedings of ACM SIGKDD Conference (pp. 429–435).
Chiu, D. Y., Wu, Y. H., & Chen, A. L. P. (2004). An efficient algorithm for mining frequent sequences by a new strategy without support counting. Proceedings of International Conference on Data Engineering (pp. 375–386).
Cho, C. W., Wu, Y. H., & Chen, A. L. P. (2005). Effective database transformation and efficient support computation for mining sequential patterns. Proceedings of Database Systems for Advanced Applications Conference (pp. 163–174).
Pei, J., Han, J., Mortazavi-Asl, B., Wang, J., Pinto, H., Chen, Q., et al. (2004). Mining sequential patterns by pattern-growth: The prefixSpan approach. IEEE Transactions on Knowledge and Data Engineering, 16(11), 1424–1440.
Sun, X., Orlowska, M. E., & Li, X. (2003). Introducing uncertainty into pattern discovery in temporal event sequences. Proceedings of IEEE International Conference on Data Mining (pp. 299–306).
Tzvetkov, P., Yan, X., & Han, J. (2003). TSP: Mining top-K closed sequential patterns. Proceedings of IEEE International Conference on Data Mining (pp. 347–354).
Wang, K., Tang, L., Han, J., & Liu, J. (2002). Top-down FP-growth for association rule mining. Proceedings of Pacific-Asia Knowledge Discovery and Data Mining Conference (pp. 334–340).
Yan, X., Han, J., & Afshar, R. (2003). CloSpan: Mining closed sequential patterns in large datasets. Proceedings of SIAM International Conference Data Mining (pp. 166–177).
Yang, J., Wang, W., Yu, P. S., & Han, J. (2002). Mining long sequential patterns in a noisy environment. Proceedings of ACM SIGMOD International Conference on Management of Data (pp. 406–417).
Zaki, M. J. (2001). An efficient algorithm for mining frequent sequences. Machine Learning, 42(1/2), 31–60.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Cho, CW., Wu, YH. & Chen, A.L.P. Effective database transformation and efficient support computation for mining sequential patterns. J Intell Inf Syst 32, 23–51 (2009). https://doi.org/10.1007/s10844-007-0047-y
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10844-007-0047-y