Skip to main content
Log in

Efficient Algorithms for Mining and Incremental Update of Maximal Frequent Sequences

  • Published:
Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Abstract

We study two problems: (1) mining frequent sequences from a transactional database, and (2) incremental update of frequent sequences when the underlying database changes over time. We review existing sequence mining algorithms including GSP, PrefixSpan, SPADE, and ISM. We point out the large memory requirement of Pref ixSpan, SPADE, and ISM, and evaluate the performance of GSP. We discuss the high I/O cost of GSP, particularly when the database contains long frequent sequences. To reduce the I/O requirement, we propose an algorithm MFS, which could be considered as a generalization of GSP. The general strategy of MFS is to first find an approximate solution to the set of frequent sequences and then perform successive refinement until the exact set of frequent sequences is obtained. We show that this successive refinement approach results in a significant improvement in I/O cost. We discuss how MFS can be applied to the incremental update problem. In particular, the result of a previous mining exercise can be used (by MFS) as a good initial approximate solution for the mining of an updated database. This results in an I/O efficient algorithm. To improve processing efficiency, we devise pruning techniques that, when coupled with GSP or MFS, result in algorithms that are both CPU and I/O efficient.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Agrawal, R., Imielinski, T., and Swami, A.N. 1993. Mining association rules between sets of items in large databases. In Proc. ACM SIGMOD International Conference on Management of Data, Washington, D.C., pp. 207–216.

  • Agrawal, R. and Srikant, R. 1995. Mining sequential patterns. In Proc. of the llth Int’l Conference on Data Engineering. Taipei, Taiwan, pp. 3–14.

  • Ayan, N.F., Tansel, A.U., and Arkun, E. 1999. An efficient algorithm to update large itemsets with early pruning. In Proc. 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. San Diego, CA, USA, pp. 287–291.

  • Cheung, D.W., Han, J., Ng, V., and Wong, C.Y. 1996a. Maintenance of discovered association rules in large databases: An incremental updating techniques. In Proc. 12th IEEE International Conference on Data Engineering (ICDE). New Orleans, Louisiana, USA, pp. 106–114.

  • Cheung, D.W., Lee, S.D., and Kao, B. 1997. A general incremental technique for maintaining discovered association rules. In Proc. International Conference on Database Systems for Advanced Applications (DASFAA). Melbourne, Australia, pp. 185–194.

  • Cheung, D.W., Ng, V., and Tarn, B.W. 1996b. Maintenance of discovered knowledge: A case in multi-level association rules. In Proc. Second International Conference on Knowledge Discovery and Data Mining (KDD). Portland, Oregon, pp. 307–310.

  • Garofalakis, M.N., Rastogi, R., and Shim, K. 1999. SPIRIT: Sequential pattern mining with regular expression constraints. In Proceedings of the 25th International Conference on Very Large Data Bases. Edinburgh, Scotland, UK, pp. 223–234.

  • ibm. http://www.almaden.ibm.com/cs/quest/.

  • Lee, S., Cheung, D.W., and Kao, B. 1998. Is sampling useful in data mining? a case in the maintenance of discovered association rules. Data Mining and Knowledge Discovery, 2:233–262.

    Google Scholar 

  • Lee, S.D. and Cheung, D.W. 1997. Maintenance of discovered association rules: When to update? In Proc. 1997 ACM-SIGMOD Workshop on Data Mining and Knowledge Discovery (DMKD). Tucson, Arizona.

  • Omiecinski, E. and Savasere, A. 1998. Efficient mining of association rules in large dynamic databases. In Proc. BNCOD’98, pp. 49–63.

  • Parthasarathy, S., Zaki, M.J., Ogihara, M., and Dwarkadas, S. 1999. Incremental and interactive sequence mining. In Proceedings of the 1999 ACM 8th International Conference on Information and Knowledge Management (CIKM’99). Kansas City, MO USA, pp. 251–258.

  • Pel, J., Han, J., Mortazavi-Asl, B., Pinto, H., Chen, Q., Dayal, U., and Hsu, M.-C. 2001. Prefixspan: Mining sequential patterns by prefix-projected growth. In Proc. 17th IEEE International Conference on Data Engineering (ICDE). Heidelberg, Germany, pp. 215–224.

  • Provost, F., Jensen, D., and Gates, T. 1999. Efficient progressive sampling. In Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining. San Diego CA, USA, pp. 23–32.

  • Sarda, N.L. and Srinivas, N.V. 1998. An adaptive algorithm for incremental mining of association rules. In Proc. DEXA Workshop’98, pp. 240–245.

  • Srikant, R. and Agrawal, R. 1996. Mining sequential patterns: Generalizations and performance improvements. In Proc. of the 5th Conference on Extending Database Technology (EDBT). Avignion, France, pp. 3–17.

  • Thomas, S., Bodagala, S., Alsabti, K., and Ranka, S. 1997. An efficient algorithm for the incremental updation of association rules in large databases. In Proc. KDD’97, pp. 263–266.

  • Wang, K. 1997. Discovering patterns from large and dynamic sequential data. Journal of Intelligent Information Systems, 9:33–56.

    Google Scholar 

  • Zaki, M.J. 2000. SPADE: An efficient algorithm for mining frequent sequences. Machine Learning, pp. 31–60.

  • Zhang, M., Kao, B., Cheung, D., and Yip, C.-L. 2002. Efficient algorithms for incremental update of frequent sequences. In Proc. of the sixth Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD). Taiwan, pp. 186–197.

  • Zhang, M., Kao, B., Yip, C., and Cheung, D. 2001. A GSP-Based efficient algorithm for mining frequent sequences. In Proc. of IC-AI’2001. Las Vegas, Nevada, USA, pp. 497–503.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ben Kao.

Additional information

This research is supported by Hong Kong Research Grants Council grant HKU 7040/02E.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kao, B., Zhang, M., Yip, CL. et al. Efficient Algorithms for Mining and Incremental Update of Maximal Frequent Sequences. Data Min Knowl Disc 10, 87–116 (2005). https://doi.org/10.1007/s10618-005-0268-z

Download citation

  • Received:

  • Revised:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10618-005-0268-z

Keywords

Navigation