Abstract
Efficient mining of frequent itemsets from a database plays an essential role in many data mining tasks such as association rule mining. Many algorithms use a prefix-tree to represent a database and mine frequent itemsets by constructing recursively conditional prefix-trees from the prefix-tree. A (conditional) prefix-tree can be stored in various structures. The construction and traversal costs of prefix-trees, or rather their storage structures, take a large proportion in the whole cost for such algorithms. The PatriciaMine algorithm employs a Patricia trie to store a prefix-tree and shows good performance. In this study, we introduce an efficient Patricia* structure for storing a prefix-tree. A Patricia* structure is more compact and contiguous than a corresponding Patricia trie, and thus the construction and traversal costs of the former are less than those of the latter. Previous prefix-tree-based algorithms adopt a similar mining procedure, in which most nodes in a prefix-tree are repeatedly accessed when the prefix-tree is processed. The paper presents a novel mining procedure in which node accesses for a prefix-tree are greatly reduced. We propose the PatriciaMine* algorithm that is the combination of the Patricia* structure with the proposed procedure. Experimental data show that PatriciaMine* outperforms not only PatriciaMine but also several fast algorithms, such as FPgrowth* and dEclat, for various databases.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Agrawal, R., Imieliński, T., Swami, A.: Mining Association Rules between Sets of Items in Large Databases. In: Proc. ACM SIGMOD, pp. 207–216 (1993)
Calders, T., Garboni, C., Goethals, B.: Approximation of Frequentness Probability of Itemsets in Uncertain Data. In: Proc. IEEE ICDM, pp. 749–754 (2010)
Ceglar, A., Roddick, J.F.: Association Mining. ACM Comput. Surv. 38(2), 1–42 (2006)
Grahne, G., Zhu, J.: Fast Algorithms for Frequent Itemset Mining Using FP-Trees. IEEE Trans. Knowl. Data Eng. 17(10), 1347–1362 (2005)
Han, J., Pei, J., Yin, Y., Mao, R.: Mining Frequent Patterns without Candidate Generation: A Frequent-Pattern Tree Approach*. Data Min. Knowl. Disc. 8(1), 53–87 (2004)
Knuth, D.: The Art of Computer Programming, vol 3: Sorting and Searching. Addison Wesley, Reading (1973)
Lam, H.T., Calders, T.: Mining Top-K Frequent Items in a Data Stream with Flexible Sliding Windows. In: Proc. ACM SIGKDD, pp. 283–292 (2010)
Liu, G., Lu, H., Lou, W., Xu, Y., Yu, J.X.: Efficient Mining of Frequent Patterns Using Ascending Frequency Ordered Prefix-Tree. Data Min. Knowl. Disc. 9(3), 249–274 (2004)
Liu, G., Lu, H., Yu, J.X., Wang, W., Xiao, X.: Afopt: An Efficient Implementation of Pattern Growth Approach. In: Proc. IEEE ICDM Workshop FIMI (2003)
Pietracaprina, A., Zandolin, D.: Mining Frequent Itemsets Using Patricia Tries*. In: Proc. IEEE ICDM Workshop FIMI (2003)
Schmidt-thieme, L.: Algorithmic Features of Eclat. In: Proc. IEEE ICDM Workshop FIMI (2004)
Tsay, Y.J., Hsu, T.J., Yu, J.R.: FIUT: A New Method for Mining Frequent Itemsets. Inf. Sci. 179(11), 1724–1737 (2009)
Zaki, M.J., Gouda, K.: Fast Vertical Mining Using Diffsets. In: Proc. ACM SIGKDD, pp. 326–335 (2003)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Qu, JF., Liu, M. (2012). A Fast Algorithm for Frequent Itemset Mining Using Patricia* Structures. In: Cuzzocrea, A., Dayal, U. (eds) Data Warehousing and Knowledge Discovery. DaWaK 2012. Lecture Notes in Computer Science, vol 7448. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-32584-7_17
Download citation
DOI: https://doi.org/10.1007/978-3-642-32584-7_17
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-32583-0
Online ISBN: 978-3-642-32584-7
eBook Packages: Computer ScienceComputer Science (R0)