Abstract
Constraints are essential for many sequential pattern mining applications. However, there is no systematic study on constraint-based sequential pattern mining. In this paper, we investigate this issue and point out that the framework developed for constrained frequent-pattern mining does not fit our mission well. An extended framework is developed based on a sequential pattern growth methodology. Our study shows that constraints can be effectively and efficiently pushed deep into the sequential pattern mining under this new framework. Moreover, this framework can be extended to constraint-based structured pattern mining as well.
Similar content being viewed by others
References
Agrawal, R., Imielinski, T., & Swami, A. (1993). Mining association rules between sets of items in large databases. In Proc. 1993 ACM-SIGMOD Int. Conf. Management of Data (SIGMOD’93) (pp. 207–216). New York: ACM.
Agrawal, R., & Srikant, R. (1994). Fast algorithms for mining association rules. In Proc. 1994 Int. Conf. Very Large Data Bases (VLDB’94) (pp. 487–499). California: Morgan Kaufmann.
Agrawal, R., & Srikant, R. (1995). Mining sequential patterns. In Proc. 1995 Int. Conf. Data Engineering (ICDE’95) (pp. 3–14). Washington, District of Columbia: IEEE Computer Society.
Ayres, J., Flannick, J., Gehrke, J., & Yiu, T. (2002). Sequential pattern mining using a bitmap representation. In Proc. 2002 ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (KDD’02) (pp. 429–435). New York: ACM.
Bayardo, R.J., Agrawal, R., & Gunopulos, D. (1999). Constraint-based rule mining on large, dense data sets. In Proc. 1999 Int. Conf. Data Engineering (ICDE’99) (pp. 188–197). Washington, District of Columbia: IEEE Computer Society.
Chiu, D.-Y., Wu, Y.-H., & Chen, A.L.P. (2004). An efficient algorithm for mining frequent sequences by a new strategy without support counting. In Proc. of the Twentieth IEEE International Conference on Data Engineering (ICDE’04) (pp. 275–286). Boston, Massachusetts: IEEE Computer Society.
Garofalakis, M., Rastogi, R., & Shim, K. (1999). SPIRIT: Sequential pattern mining with regular expression constraints. In Proc. 1999 Int. Conf. Very Large Data Bases (VLDB’99) (pp. 223–234). San Francisco, California: Morgan Kaufmann.
Grahne, G., Lakshmanan, L., & Wang, X. (2000). Efficient mining of constrained correlated sets. In Proc. 2000 Int. Conf. Data Engineering (ICDE’00) (pp. 512–521). Washington, District of Columbia: IEEE Computer Society.
Han, J., Pei, J., Mortazavi-Asl, B., Chen, Q., Dayal, U., & Hsu, M.C. (2000). FreeSpan: Frequent pattern-projected sequential pattern mining. In Proc. 2000 ACM SIGKDD Int. Conf. Knowledge Discovery in Databases (KDD’00) (pp. 355–359). New York: ACM.
Kifer, D., Gehrke, J., Bucila, C., & White, W. (2003). How to quickly find a witness. In Proc. 2003 ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS’03) (pp. 272–283). New York: ACM.
Kum, H.C.M., Pei, J., & Wang, W. (2003). Approxmap : Approximate mining of consensus sequential patterns. In Proc. 2003 SIAM Int. Conf. Data Mining (pp. 311–315). San Francisco, California.
Mannila, H., Toivonen, H., & Verkamo, A.I. (1997). Discovery of frequent episodes in event sequences. Data Min. Knowl. Discov., 1:259–289.
Ng, R., Lakshmanan, L.V.S., Han, J., & Pang, A. (1998). Exploratory mining and pruning optimizations of constrained associations rules. In Proc. 1998 ACM-SIGMOD Int. Conf. Management of Data (SIGMOD’98) (pp. 13–24). New York: ACM.
Pei, J., & Han, J. (2000). Can we push more constraints into frequent pattern mining? In Proc. 2000 ACM SIGKDD Int. Conf. Knowledge Discovery in Databases (KDD’00) (pp. 350–354). New York: ACM.
Pei, J., Han, J., & Lakshmanan, L.V.S. (2001). Mining frequent itemsets with convertible constraints. In Proc. 2001 Int. Conf. Data Engineering (ICDE’01) (pp. 433–442). Washington, District of Columbia: IEEE Computer Society.
Pei, J., Han, J., Mortazavi-Asl, B., Pinto, H., Chen, Q., Dayal, U., et al. (2001). PrefixSpan: Mining sequential patterns efficiently by prefix-projected pattern growth. In Proc. 2001 Int. Conf. Data Engineering (ICDE’01) (pp. 215–224). Washington, District of Columbia: IEEE Computer Society.
Pei, J., Han, J., & Wang, W. (2002). Constraint-based sequential pattern mining in large databases. In Proc. 2002 Int. Conf. on Information and Knowledge Management (CIKM’02) (pp. 18–25). New York: ACM.
Pinto, H., Han, J., Pei, J., Wang, K., Chen, Q., & Dayal, U. (2001). Multi-dimensional sequential pattern mining. In Proc. 2001 Int. Conf. Information and Knowledge Management (CIKM’01) (pp. 81–88). New York: ACM.
Srikant, R., & Agrawal, R. (1996). Mining sequential patterns: Generalizations and performance improvements. In Proc. 5th Int. Conf. Extending Database Technology (EDBT’96) (pp. 3–17). Berlin Heidelberg New York: Springer.
Tzvetkov, P., Yan, X., & Han, J. (2003). Tsp: Mining top-k closed sequential patterns. In Proc of the Third IEEE International Conference on Data Mining (ICDM’03) (p. 347). Washington, District of Columbia: IEEE Computer Society.
Wang, K., & Tan, J. (1996). Incremental discovery of sequential patterns. In Proc 1996 SIGMOD’96 Workshop Research Issues on Data Mining and Knowledge Discovery (DMKD’96) (pp. 95–102). New York: ACM.
Yan, X., Han, J., & Afshar, R. (2003). CloSpan: Mining closed sequential patterns in large databases. In Proc 2003 SIAM Int Conf Data Mining (pp. 406–417). New York: ACM.
Yang, J., Yu, P.S., Wang, W., & Han, J. (2002). Mining long sequential patterns in a noisy environment. In Proc 2002 ACM-SIGMOD Int Conf on Management of Data (SIGMOD’02) (pp. 68–75). New York: ACM.
Zaki, M.J. (1998). Efficient enumeration of frequent sequences. In Proc. 7th Int. Conf. Information and Knowledge Management (CIKM’98) (pp. 68–75). Washington, District of Columbia.
Zaki, M.J. (2001). Spade: An efficient algorithm for mining frequent sequences. Mach. Learn., 42 (1-2),31–60.
Author information
Authors and Affiliations
Corresponding author
Additional information
This research is supported in part by NSERC Grant 312194-05, NSF Grants IIS-0308001, IIS-0513678, BDI-0515813 and National Science Foundation of China (NSFC) grants No. 60303008 and 69933010. All opinions, findings, conclusions and recommendations in this paper are those of the authors and do not necessarily reflect the views of the funding agencies.
Rights and permissions
About this article
Cite this article
Pei, J., Han, J. & Wang, W. Constraint-based sequential pattern mining: the pattern-growth methods. J Intell Inf Syst 28, 133–160 (2007). https://doi.org/10.1007/s10844-006-0006-z
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10844-006-0006-z