Efficient Algorithms for Mining and Incremental Update of Maximal Frequent Sequences

Kao, Ben; Zhang, Minghua; Yip, Chi-Lap; Cheung, David W.; Fayyad, Usama

doi:10.1007/s10618-005-0268-z

Efficient Algorithms for Mining and Incremental Update of Maximal Frequent Sequences

Published: March 2005

Volume 10, pages 87–116, (2005)
Cite this article

Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Ben Kao¹,
Minghua Zhang¹,
Chi-Lap Yip¹,
David W. Cheung¹ &
…
Usama Fayyad¹

248 Accesses
14 Citations
Explore all metrics

Abstract

We study two problems: (1) mining frequent sequences from a transactional database, and (2) incremental update of frequent sequences when the underlying database changes over time. We review existing sequence mining algorithms including GSP, PrefixSpan, SPADE, and ISM. We point out the large memory requirement of Pref ixSpan, SPADE, and ISM, and evaluate the performance of GSP. We discuss the high I/O cost of GSP, particularly when the database contains long frequent sequences. To reduce the I/O requirement, we propose an algorithm MFS, which could be considered as a generalization of GSP. The general strategy of MFS is to first find an approximate solution to the set of frequent sequences and then perform successive refinement until the exact set of frequent sequences is obtained. We show that this successive refinement approach results in a significant improvement in I/O cost. We discuss how MFS can be applied to the incremental update problem. In particular, the result of a previous mining exercise can be used (by MFS) as a good initial approximate solution for the mining of an updated database. This results in an I/O efficient algorithm. To improve processing efficiency, we devise pruning techniques that, when coupled with GSP or MFS, result in algorithms that are both CPU and I/O efficient.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Stratified random sampling from streaming and stored data

Article 23 October 2020

An efficient join operations for utility list-based high-utility mining approaches using hybrid search technique

Article 12 April 2024

The pattern frequency distribution theory: a mathematic establishment toward rational and reliable pattern mining

Article 20 August 2022

References

Agrawal, R., Imielinski, T., and Swami, A.N. 1993. Mining association rules between sets of items in large databases. In Proc. ACM SIGMOD International Conference on Management of Data, Washington, D.C., pp. 207–216.
Agrawal, R. and Srikant, R. 1995. Mining sequential patterns. In Proc. of the llth Int’l Conference on Data Engineering. Taipei, Taiwan, pp. 3–14.
Ayan, N.F., Tansel, A.U., and Arkun, E. 1999. An efficient algorithm to update large itemsets with early pruning. In Proc. 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. San Diego, CA, USA, pp. 287–291.
Cheung, D.W., Han, J., Ng, V., and Wong, C.Y. 1996a. Maintenance of discovered association rules in large databases: An incremental updating techniques. In Proc. 12th IEEE International Conference on Data Engineering (ICDE). New Orleans, Louisiana, USA, pp. 106–114.
Cheung, D.W., Lee, S.D., and Kao, B. 1997. A general incremental technique for maintaining discovered association rules. In Proc. International Conference on Database Systems for Advanced Applications (DASFAA). Melbourne, Australia, pp. 185–194.
Cheung, D.W., Ng, V., and Tarn, B.W. 1996b. Maintenance of discovered knowledge: A case in multi-level association rules. In Proc. Second International Conference on Knowledge Discovery and Data Mining (KDD). Portland, Oregon, pp. 307–310.
Garofalakis, M.N., Rastogi, R., and Shim, K. 1999. SPIRIT: Sequential pattern mining with regular expression constraints. In Proceedings of the 25th International Conference on Very Large Data Bases. Edinburgh, Scotland, UK, pp. 223–234.
ibm. http://www.almaden.ibm.com/cs/quest/.
Lee, S., Cheung, D.W., and Kao, B. 1998. Is sampling useful in data mining? a case in the maintenance of discovered association rules. Data Mining and Knowledge Discovery, 2:233–262.
Google Scholar
Lee, S.D. and Cheung, D.W. 1997. Maintenance of discovered association rules: When to update? In Proc. 1997 ACM-SIGMOD Workshop on Data Mining and Knowledge Discovery (DMKD). Tucson, Arizona.
Omiecinski, E. and Savasere, A. 1998. Efficient mining of association rules in large dynamic databases. In Proc. BNCOD’98, pp. 49–63.
Parthasarathy, S., Zaki, M.J., Ogihara, M., and Dwarkadas, S. 1999. Incremental and interactive sequence mining. In Proceedings of the 1999 ACM 8th International Conference on Information and Knowledge Management (CIKM’99). Kansas City, MO USA, pp. 251–258.
Pel, J., Han, J., Mortazavi-Asl, B., Pinto, H., Chen, Q., Dayal, U., and Hsu, M.-C. 2001. Prefixspan: Mining sequential patterns by prefix-projected growth. In Proc. 17th IEEE International Conference on Data Engineering (ICDE). Heidelberg, Germany, pp. 215–224.
Provost, F., Jensen, D., and Gates, T. 1999. Efficient progressive sampling. In Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining. San Diego CA, USA, pp. 23–32.
Sarda, N.L. and Srinivas, N.V. 1998. An adaptive algorithm for incremental mining of association rules. In Proc. DEXA Workshop’98, pp. 240–245.
Srikant, R. and Agrawal, R. 1996. Mining sequential patterns: Generalizations and performance improvements. In Proc. of the 5th Conference on Extending Database Technology (EDBT). Avignion, France, pp. 3–17.
Thomas, S., Bodagala, S., Alsabti, K., and Ranka, S. 1997. An efficient algorithm for the incremental updation of association rules in large databases. In Proc. KDD’97, pp. 263–266.
Wang, K. 1997. Discovering patterns from large and dynamic sequential data. Journal of Intelligent Information Systems, 9:33–56.
Google Scholar
Zaki, M.J. 2000. SPADE: An efficient algorithm for mining frequent sequences. Machine Learning, pp. 31–60.
Zhang, M., Kao, B., Cheung, D., and Yip, C.-L. 2002. Efficient algorithms for incremental update of frequent sequences. In Proc. of the sixth Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD). Taiwan, pp. 186–197.
Zhang, M., Kao, B., Yip, C., and Cheung, D. 2001. A GSP-Based efficient algorithm for mining frequent sequences. In Proc. of IC-AI’2001. Las Vegas, Nevada, USA, pp. 497–503.

Download references

Author information

Authors and Affiliations

Department of Computer Science, The University of Hong Kong, Hong Kong
Ben Kao, Minghua Zhang, Chi-Lap Yip, David W. Cheung & Usama Fayyad (Editor)

Authors

Ben Kao
View author publications
You can also search for this author in PubMed Google Scholar
Minghua Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Chi-Lap Yip
View author publications
You can also search for this author in PubMed Google Scholar
David W. Cheung
View author publications
You can also search for this author in PubMed Google Scholar
Usama Fayyad
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ben Kao.

Additional information

This research is supported by Hong Kong Research Grants Council grant HKU 7040/02E.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kao, B., Zhang, M., Yip, CL. et al. Efficient Algorithms for Mining and Incremental Update of Maximal Frequent Sequences. Data Min Knowl Disc 10, 87–116 (2005). https://doi.org/10.1007/s10618-005-0268-z

Download citation

Received: 15 April 2003
Revised: 14 April 2004
Issue Date: March 2005
DOI: https://doi.org/10.1007/s10618-005-0268-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Efficient Algorithms for Mining and Incremental Update of Maximal Frequent Sequences

Abstract

Access this article

Similar content being viewed by others

Stratified random sampling from streaming and stored data

An efficient join operations for utility list-based high-utility mining approaches using hybrid search technique

The pattern frequency distribution theory: a mathematic establishment toward rational and reliable pattern mining

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Efficient Algorithms for Mining and Incremental Update of Maximal Frequent Sequences

Abstract

Access this article

Similar content being viewed by others

Stratified random sampling from streaming and stored data

An efficient join operations for utility list-based high-utility mining approaches using hybrid search technique

The pattern frequency distribution theory: a mathematic establishment toward rational and reliable pattern mining

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation