Abstract
Association rule mining among itemsets is a fundamental task and is of great importance in many data mining applications including attacks in network data, stock market, financial applications, bioinformatics to find genetic disorders, etc. However, association rule extraction from a reasonable-sized database produces a large number of rules. As a result, many of them are redundant to other rules, and they are practically useless. To overcome this issue, methods for mining non-redundant rules are essentially required. To address such problem, we initially propose a definition for redundancy in sense of minimal knowledge and then a compact representation of non-redundant association rules which we call as compact informative generic basis. We also provide an improved version of the existing DCI_CLOSED algorithm (DCI_PLUS) to find out the frequent closed itemsets (FCI) with their minimal representative generators in combination with BitTable which represents a compact database form in a single scan of the original database. We further introduce an algorithm for constructing the compact informative generic basis from the FCI and their generators in an efficient way. We finally present an inference mechanism in which all association rules can be generated without accessing the database. Experiments are performed on the proposed method. The experimental results show that the proposed method outperforms the other existing related methods.




Similar content being viewed by others
References
Agrawal R, Imieliński T, Swami A (1993) Mining association rules between sets of items in large databases. In: Proceedings of the 1993 ACM SIGMOD international conference on management of data (SIGMOD ’93), pp 207–216
Agrawal R, Srikant R (1994) Fast algorithms for mining association rules in large databases. In: Proceedings of the 20th international conference on very large data bases (VLDB ’94), pp 487–499
Bache K, Lichman M (2012) UCI machine learning repository (http://archive.ics.uci.edu/ml). University of California, School of Information and Computer Science, Irvine, CA
Bastide Y, Taouil R, Pasquier N, Stumme G, Lakhal L (2000) Mining frequent patterns with counting inference. SIGKDD Explor Newsl 2(2):66–75
Bonchi F, Lucchese C (2006) On condensed representations of constrained frequent patterns. Knowl Inf Syst 9(2):180–201
Boulicaut JF, Bykowski A, Rigotti C (2003) Free-sets: a condensed representation of Boolean data for the approximation of frequency queries. Data Min Knowl Disc 7(1):5–22
Brin S, Motwani R, Ullman JD, Tsur S (1997) Dynamic itemset counting and implication rules for market basket data. In: Proceedings of the 1997 ACM SIGMOD international conference on management of data (SIGMOD ’97), pp 255–264
Calders T, Goethals B (2007) Non-derivable itemset mining. Data Min Knowl Disc 14(1):171–206
Casali A, Cicchetti R, Lakhal L (2005) Essential patterns: a perfect cover of frequent patterns. In: Proceedings of the 7th international conference on data warehousing and knowledge, discovery (DaWaK’05), pp 428–437
Cheng J, Ke Y, Ng W (2008) Effective elimination of redundant association rules. Data Min Knowl Disc 16(2):221–249
Dong J, Han M (2007) BitTableFI: an efficient mining frequent itemsets algorithm. Knowl Based Syst 20(4):329–335
Fournier-Viger P, Gomariz A, Soltani A, Gueniche T (2012) SPMF: open-source data mining platform. http://www.philippe-fournier-viger.com/spmf/
Frequent itemset mining dataset repository. http://fimi.cs.helsinki.fi/data/
Ganter B, Wille R, Wille R (1999) Formal concept analysis. Springer, Berlin
Grahne G, Zhu J (2005) Fast algorithms for frequent itemset mining using FP-trees. IEEE Trans Knowl Data Eng 17(10):1347–1362
Hamrouni T, Yahia SB, Nguifo EM (2008) Succinct minimal generators: theoretical foundations and applications. Int J Found Comput Sci 19(2):271–296
Han J, Pei J, Yin Y (2000) Mining frequent patterns without candidate generation. In: Proceedings of the 2000 ACM SIGMOD international conference on management of data (SIGMOD ’00), pp 1–12
Jin R, Xiang Y, Liu L (2009) Cartesian contour: a concise representation for a collection of frequent sets. In: Proceedings of the 15th ACM SIGKDD international conference on knowledge discovery and data mining (KDD ’09), pp 417–426
Kryszkiewicz M (1998) Representative association rules. In: Wu X, Kotagiri R, Korb KB (eds) Proceedings of the Pacific-Asia conference on advances in knowledge discovery and data mining (PAKDD ’98). Lecture notes in computer science 1394. Springer, Berlin, pp 198–209
Lin DI, Kedem ZM (2002) Pincer-search: an efficient algorithm for discovering the maximum frequent set. IEEE Trans Knowl Data Eng 14(3):553–566
Liu G, Li J, Wong L (2008) A new concise representation of frequent itemsets using generators and a positive border. Knowl Inf Syst 17(1):35–56
Lucchese C, Orlando S, Perego R (2006) Fast and memory efficient mining of frequent closed itemsets. IEEE Trans Knowl Data Eng 18(1):21–36
Ng RT, Lakshmanan LVS, Han J, Pang A (1998) Exploratory mining and pruning optimizations of constrained associations rules. SIGMOD Rec 27(2):13–24
Palshikar GK, Kale MS, Apte MM (2007) Association rules mining using heavy itemsets. Data Knowl Eng 61(1):93–113
Park JS, Chen M-S, Yu PS (1995) An effective hash-based algorithm for mining association rules. In: Proceedings of the 1995 ACM SIGMOD international conference on management of data (SIGMOD ’95), pp 175–186
Pasquier N, Bastide Y, Taouil R, Lakhal L (1999) Efficient mining of association rules using closed itemset lattices. Inf Syst 24(1):25–46
PYR Pasquier N, Bastide Y, Taouil R, Lakhal L (1999) Discovering frequent closed itemsets for association rules. In: Proceedings of the 7th international conference on database theory (ICDT ’99), pp 398–416
Pasquier N, Taouil R, Bastide Y, Stumme G, Lakhal L (2005) Generating a condensed representation for association rules. J Intell Inf Syst 24(1):29–60
Pei J, Han J, Mao R (2000) CLOSET: an efficient algorithm for mining frequent closed itemsets. In: ACM SIGMOD workshop on research issues in, data mining and knowledge discovery, pp 21–30
Pei J, Han J, Lu H, Nishio S, Tang S, Yang D (2001) H-mine: hyper-structure mining of frequent patterns in large databases. In: Proceedings of the 2001 IEEE international conference on data mining (ICDM ’01), pp 441–448
Pei J, Dong G, Zou W, Han J (2004) Mining condensed frequent-pattern bases. Knowl Inf Syst 6(5):570–594
Singh NG, Singh SR, Mahanta AK (2005) CloseMiner: discovering frequent closed itemsets using frequent closed tidsets. In: Proceedings of the fifth IEEE international conference on data mining (ICDM ’05), pp 633–636
Song W, Yang B, Xu Z (2008) Index-CloseMiner: an improved algorithm for mining frequent closed itemset. Intell Data Anal 12(4):321–338
Srikant R, Vu Q, Agrawal R (1997) Mining association rules with item constraints. In: Proceedings of the 3rd international conference on knowledge discovery in databases and data mining, pp 67–73
Tsai PSM, Chen CM (2004) Mining interesting association rules from customer databases and transaction databases. Inf Syst 29(8):685–696
Vo B, Hong TP, Le B (2012) DBV-Miner: a dynamic bit-vector approach for fast mining frequent closed itemsets. Expert Syst Appl 39(8):7196–7206
Wang J, Han J, Pei J (2003) CLOSET+: searching for the best strategies for mining frequent closed itemsets. In: Proceedings of the ninth ACM SIGKDD international conference on knowledge discovery and data mining (KDD ’03), pp 236–245
Xu Y, Li Y, Shaw G (2011) Reliable representations for association rules. Data Knowl Eng 70(6):555–575
Yahia SB, Gasmi G, Nguifo EM (2009) A new generic basis of “factual” and “implicative” association rules. Intell Data Anal 13(4):633–656
Yen SJ, Lee YS (2002) Mining interesting association rules: a data mining language. In: Proceedings of the 6th Pacific-Asia conference on advances in knowledge discovery and data mining (PAKDD ’02), pp 172–176
Zaki MJ, Parthasarathy S, Ogihara M, Li W (1997) New algorithms for fast discovery of association rules. Technical report, University of Rochester, Rochester, NY
Zaki MJ (April 2002) Hsiao CJ (2002) CHARM: an efficient algorithm for closed itemset mining. In: Grossman R, Han J, Kumar V, Mannila H, Motwani R (eds) Proceedings of the second SIAM international conference on data mining. Arlington, VA, pp 457–473
Zaki MJ, Gouda K (2003) Fast vertical mining using diffsets. In: Proceedings of the ninth ACM SIGKDD international conference on knowledge discovery and data mining (KDD ’03), pp 326–335
Zaki MJ (2004) Mining non-redundant association rules. Data Min Knowl Disc 9(3):223–248
Zaki MJ, Hsiao CJ (2005) Efficient algorithms for mining closed itemsets and their lattice structure. IEEE Trans Knowl Data Eng 17(4):462–478
Acknowledgments
The authors would like to thank the anonymous reviewers, the Editor and the Editor-in-Chief for providing constructive and generous feedbacks, which have improved significantly the content, quality and the presentation of this paper.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Sahoo, J., Das, A.K. & Goswami, A. An effective association rule mining scheme using a new generic basis. Knowl Inf Syst 43, 127–156 (2015). https://doi.org/10.1007/s10115-014-0732-4
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-014-0732-4