ABSTRACT
Mining frequent patterns in transaction databases has been a popular theme in data mining study. Common activities include finding patterns among the large set of data items in database transactions. The Apriori algorithm is a widely accepted method of generating frequent patterns. The algorithm requires many scans of the database and thus seriously tax resources. Some of the methods currently being used for improving the efficiency of the Apriori algorithm are hash-based itemset counting, transaction reduction, partitioning, sampling, dynamic itemset counting etc. Two main approaches for associations rule mining are: candidate set generation and test, and restricted test only. Both approaches use to scan massive database multiple times. In our study, we propose a transaction patternbase, constructed in first scan of database. Transactions with same pattern are added to the Patternbase as their frequency is increased. Thus subsequent scanning requires only scanning this compact dataset which increases efficiency of the respective methods. We have implemented this technique with FP Growth method. This technique outperforms the database approach in many situations and performs exceptionally well when the repetition of transaction patterns is higher. It can be used with any associations rule mining method.
- THE LUCS-KDD SOFTWARE LIBRARY (LIVERPOOL UNIVERSITY COMPUTER SCIENCE KNOWLEDGE DISCOVERY IN DATAS) http://www.csc.liv.ac.uk/~frans/KDD/Software/FPgrowth/pima.D38.N768.C2.numGoogle Scholar
- R. Agrawal, T. Imielinski, and A. N. Swami, Mining association rules between sets of items in large databases. Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data, volume 22(2) of SIGMOD Record, ACM Press, 1993. pp. 207--216. Google ScholarDigital Library
- R. Agrawal and R. Srikant, Fast algorithms for mining association rules. Proceedings 20th International Conference on Very Large Data Bases, Morgan Kaufmann, 1994. pp. 487--499. Google ScholarDigital Library
- S. Brin, R. Motwani, J. D. Ullman and S. Tsur, Dynamic itemset counting and implication rules for market basket data. Proceedings of the 1997 ACM SIGMOD International Conference on Management of Data, volume 26(2) of SIGMOD Record, ACM Press, 1997. pp 255--264. Google ScholarDigital Library
- J. Pei, J. Han, and R. Mao, CLOSET: An Efficient Algorithm for Mining Frequent Closed Itemsets, ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery 2000, Dallas, TX, 2000. pp. 21--30.Google Scholar
- J. Han, J. Pei, and Y. Yin, Mining frequent patterns without candidate generation. Proceeding of 2000 ACM SIGMOD Int. Conf. Management of Data (SIGMOD'00), Dallas, TX, May 2000. pp. 1--12 Google ScholarDigital Library
- J. Han, J. Pei, Y. Yin, and R. Mao, Mining frequent patterns without candidate generation: A frequent-pattern tree approach. Data Mining and Knowledge Discovery, 2003 Google ScholarDigital Library
- M. J. Zaki, S. Parthasarathy, M. Ogihara, and W. Li, "Parallel algorithm for discovery of association rules" Data Mining and Knowledge Discovery, 343--374, 1997. Google ScholarDigital Library
- M. J. Zaki and C.-J. Hsiao, CHARM: An efficient algorithm for closed itemset mining. In R. Grossman. Proceedings of the Second SIAM International Conference on Data Mining, 2002.Google ScholarCross Ref
Index Terms
- Improving the efficiency of FP tree construction using transactional patternbase
Recommendations
Batch incremental processing for FP-tree construction using FP-Growth algorithm
In the present scenario of global economy and World Wide Web, large sets of evolving and distributed data can be handled efficiently by incremental data mining. Frequent patterns are very important in knowledge discovery and data mining process, such as ...
An interactive method for generalized association rule mining using FP-tree
COMPUTE '09: Proceedings of the 2nd Bangalore Annual Compute ConferenceGeneralized association rule mining plays a very important role in Knowledge discovery in Databases (KDD). Generalized association rule mining is an extension of traditional association rule mining to discover more informative rules. In this paper, we ...
Efficient incremental maintenance of frequent patterns with FP-tree
AbstractMining frequent patterns has been studied popularly in data mining area. However, little work has been done on mining patterns when the database has an influx of fresh data constantly. In these dynamic scenarios, efficient maintenance of the ...
Comments