An efficient algorithm for mining condensed sequential pattern bases
Abstract
Purpose
Mining sequential patterns in large databases has become an important data mining task with broad applications, such as business analysis, web mining, security, and bio‐sequences analysis. The purpose of this paper is to propose the notion of condensed frequent sequential pattern base (SP base) with guaranteed maximal error bound.
Design/methodology/approach
A subset of frequent sequential patterns is computed, and then used to approximate the supports of arbitrary frequent sequential patterns with guaranteed maximal error bound, because in many applications it is sufficient to generate only frequent sequential patterns with support frequency in close‐enough approximation instead of in full precision.
Findings
The concept of condensed frequent SP base is introduced, and an efficient algorithm for mining condensed SP bases is developed.
Research limitations/implications
A condensed frequent SP base can significantly reduce the set of sequential patterns that need to be mined, stored, and analyzed, while providing guaranteed error bound for frequencies of sequential patterns not in the base.
Practical implications
A much smaller base of patterns can help users to comprehend the mining results. Computing a much smaller pattern base also leads to better efficiency.
Originality/value
The paper shows that by adopting a novel pruning technology, the algorithm out‐performs the previous work by one order of magnitude.
Keywords
Citation
Wang, T. (2012), "An efficient algorithm for mining condensed sequential pattern bases", Kybernetes, Vol. 41 No. 9, pp. 1289-1296. https://doi.org/10.1108/03684921211275315
Publisher
:Emerald Group Publishing Limited
Copyright © 2012, Emerald Group Publishing Limited