Mining Frequent Itemsets in Large Data Warehouses: A Novel Approach Proposed for Sparse Data Sets

Fakhrahmad, S. M.; Jahromi, M. Zolghadri; Sadreddini, M. H.

doi:10.1007/978-3-540-77226-2_53

S. M. Fakhrahmad¹,
M. Zolghadri Jahromi² &
M. H. Sadreddini²

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4881))

Included in the following conference series:

International Conference on Intelligent Data Engineering and Automated Learning

3180 Accesses
1 Citations

Abstract

Proposing efficient techniques for discovery of useful information and valuable knowledge from very large databases and data warehouses has attracted the attention of many researchers in the field of data mining. The well-known Association Rule Mining (ARM) algorithm, Apriori, searches for frequent itemsets (i.e., set of items with an acceptable support) by scanning the whole database repeatedly to count the frequency of each candidate itemset. Most of the methods proposed to improve the efficiency of the Apriori algorithm attempt to count the frequency of each itemset without re-scanning the database. However, these methods rarely propose any solution to reduce the complexity of the inevitable enumerations that are inherited within the problem. In this paper, we propose a new algorithm for mining frequent itemsets and also association rules. The algorithm computes the frequency of itemsets in an efficient manner. Only a single scan of the database is required in this algorithm. The data is encoded into a compressed form and stored in main memory within a suitable data structure. The proposed algorithm works in an iterative manner, and in each iteration, the time required to measure the frequency of an itemset is reduced further (i.e., checking the frequency of n-dimensional candidate itemsets is much faster than those of n-1 dimensions). The efficiency of our algorithm is evaluated using artificial and real-life datasets. Experimental results indicate that our algorithm is more efficient than existing algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Zamiri, M.J., Rezaei, A.A.: Relationship between blood physiological attributes and carcass characteristics in Iranian fat-tailed sheep. Iranian Journal of Science and Technology, Transactions A 28(A), 97–106 (2004)
Google Scholar
Ghassem-Sani, G., Halavati, R.: Employing Domain Knowledge to Improve AI Planning Efficiency. Iranian Journal of Science and Technology, Transaction B 29(B1) (2005)
Google Scholar
Agrawal, R., Srikant, R.: Fast Algorithms for Mining Association Rules in Large Databases. In: 20th International Conference on Very Large Data Bases, pp. 487–499 (1994)
Google Scholar
Shenoy, P., Haritsa, J., Sudarshan, S., Bhalotia, G., Bawa, M., Shah, D.: Turbo-charging vertical mining of large databases. In: ACM SIGMOD Intl. Conf. on Management of Data, vol. 29(2), pp. 22–33. ACM Press, New York (2000)
Chapter Google Scholar
Pudi, V., Haritsa, J.R.: ARMOR: Association Rule Mining based on ORacle. In: ICDM Workshop on Frequent Itemset Mining Implementations, Florida, USA (2003)
Google Scholar
Han, J., Pei, J., Yin, Y., Mao, R.: Mining frequent patterns without candidate generation: A frequent -pattern tree approach. Data Minning and Knowledge Discovery 8(1), 53–87 (2004)
Article MathSciNet Google Scholar
Zheng, Z., Kohavi, R., Mason, L.: Real world performance of association rule algorithms. In: KDD. Intl. Conf. on Knowledge Discovery and Data Mining (2001)
Google Scholar
Michie, D., Spiegelhalter, D.J., Taylor, C.C. (eds.): Machine Learning, Neural and Statistical Classification (STATLOG Project). Herfordshire, Ellis Horwood
Google Scholar
Pei, J., Han, J., Mao, R.: CLOSET. An efficient algorithm for mining frequent closed itemsets. In: ACM_SIGMOD International Workshop on Data Mining and Knowledge Discovery (2003)
Google Scholar
Webb, G.I.: OPUS: An efficient admissible algorithm for unordered search. JAIR 3, 431–465 (2004)
Google Scholar
Webb, G.I.: Efficient search for association rules. In: Sixth ACM-SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 99–107. ACM, New York
Google Scholar

Download references

Author information

Authors and Affiliations

Faculty member in Department of Computer Eng., Islamic Azad University of Shiraz, and PhD student in Shiraz University, Shiraz, Iran
S. M. Fakhrahmad
Department of Computer Science &Engineering, Shiraz University, Shiraz, Iran
M. Zolghadri Jahromi & M. H. Sadreddini

Authors

S. M. Fakhrahmad
View author publications
You can also search for this author in PubMed Google Scholar
M. Zolghadri Jahromi
View author publications
You can also search for this author in PubMed Google Scholar
M. H. Sadreddini
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Hujun Yin Peter Tino Emilio Corchado Will Byrne Xin Yao

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Fakhrahmad, S.M., Jahromi, M.Z., Sadreddini, M.H. (2007). Mining Frequent Itemsets in Large Data Warehouses: A Novel Approach Proposed for Sparse Data Sets. In: Yin, H., Tino, P., Corchado, E., Byrne, W., Yao, X. (eds) Intelligent Data Engineering and Automated Learning - IDEAL 2007. IDEAL 2007. Lecture Notes in Computer Science, vol 4881. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-77226-2_53

Download citation

DOI: https://doi.org/10.1007/978-3-540-77226-2_53
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-77225-5
Online ISBN: 978-3-540-77226-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics