Abstract
To date, association rule mining has mainly focused on the discovery of frequent patterns. Nevertheless, it is often interesting to focus on those that do not frequently occur. Existing algorithms for mining this kind of infrequent patterns are mainly based on exhaustive search methods and can be applied only over categorical domains. In a previous work, the use of grammar-guided genetic programming for the discovery of frequent association rules was introduced, showing that this proposal was competitive in terms of scalability, expressiveness, flexibility and the ability to restrict the search space. The goal of this work is to demonstrate that this proposal is also appropriate for the discovery of rare association rules. This approach allows one to obtain solutions within specified time limits and does not require large amounts of memory, as current algorithms do. It also provides mechanisms to discard noise from the rare association rule set by applying four different and specific fitness functions, which are compared and studied in depth. Finally, this approach is compared with other existing algorithms for mining rare association rules, and an analysis of the mined rules is performed. As a result, this approach mines rare rules in a homogeneous and low execution time. The experimental study shows that this proposal obtains a small and accurate set of rules close to the size specified by the data miner.
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10115-012-0591-9/MediaObjects/10115_2012_591_Fig1_HTML.gif)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10115-012-0591-9/MediaObjects/10115_2012_591_Fig2_HTML.gif)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10115-012-0591-9/MediaObjects/10115_2012_591_Fig3_HTML.gif)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10115-012-0591-9/MediaObjects/10115_2012_591_Fig4_HTML.gif)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10115-012-0591-9/MediaObjects/10115_2012_591_Fig5_HTML.gif)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10115-012-0591-9/MediaObjects/10115_2012_591_Fig6_HTML.gif)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10115-012-0591-9/MediaObjects/10115_2012_591_Fig7_HTML.gif)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10115-012-0591-9/MediaObjects/10115_2012_591_Fig8_HTML.gif)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10115-012-0591-9/MediaObjects/10115_2012_591_Fig9_HTML.gif)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10115-012-0591-9/MediaObjects/10115_2012_591_Fig10_HTML.gif)
Similar content being viewed by others
Notes
This dataset comprises 392 instances and 8 attributes, and is publicly available for download from the UCI machine learning repository (http://archive.ics.uci.edu/ml/datasets/Auto+MPG).
The dataset used in this example is a real dataset (http://archive.ics.uci.edu/ml/datasets/Zoo) that will be used in the experimental study.
Ankara weather, mushroom, soybean and vote datasets are available for download from the UCI machine learning repository (http://archive.ics.uci.edu/ml/datasets).
The Zoo dataset comprises 102 instances and 17 categorical attributes, and it is publicly available for download from the UCI machine learning repository (http://archive.ics.uci.edu/ml/datasets/Zoo).
The Automobile Performance dataset comprises 392 instances and 8 numerical attributes, and it is publicly available for download from the UCI machine learning repository (http://archive.ics.uci.edu/ml/datasets/Automobile).
All these datasets are publicly available for download from the UCI machine learning repository (http://archive.ics.uci.edu/ml/datasets).
JCLEC is available for download (http://jclec.sourceforge.net).
References
Adda M, Wu L, Feng Y (2007) Rare itemset mining. In: Proceedings of the 6th international conference on machine learning and applications, ICMLA ’07, pp 73–80, Cincinnati, Ohio
Agrawal R, Mannila H, Srikant R, Toivonen H, Verkamo AI (1996) Fast discovery of association rules. In: Fayyad UM, Piatetsky-Shapiro G, Smyth P, Uthurusamy R (eds) Advances in knowledge discovery and data mining. American Association for Artificial Intelligence, Menlo Park, CA, pp 307–328. http://dl.acm.org/citation.cfm?id=257938.257975
Agrawal R, Srikant R (1994) Fast algorithms for mining association rules in large databases. In: Proceedings of the 20th international conference on very large data bases, VLDB ’94, San Francisco, CA, USA, 1994. Morgan Kaufmann Publishers Inc., pp 487–499
Berzal F, Blanco I, Sánchez D, Vila MA (2002) Measuring the accuracy and interest of association rules: a new framework. Intell Data Anal 6(3):221–235
Borgelt C (2003) Efficient implementations of apriori and eclat. In: Proceedings of the 1st workshop on frequent itemset mining implementations, FIMI ’03, Melbourne, Florida, USA, pp 1–9
Chen Y, Peng W, Lee S (2011) Ceminer—an efficient algorithm for mining closed patterns from time interval-based data. In: Proceedings of the 11th IEEE international conference on data mining, ICDM ’11, Vancouver, BC, Canada, pp 121–130
Chi Y, Wang H, Yu PS, Muntz RR (2006) Catch the momento: maintaining closed frequent itemsets over a data stream sliding window. Knowl Inf Syst 10(3):265–294
Datar E, Fujiwara M, Gionis S, Indyk A, Motwani P, Ullman R, Yang JD, Cohen C (2001) Finding interesting associations without support pruning. IEEE Trans Knowl Data Eng 13(1):64–78
De Raedt L, Guns T, Nijssen S (2008) Constraint programming for itemset mining. In: Proceedings of the ACM SIGKDD international conference on knowledge discovery and data mining, ACM SIGKDD ’08, Las Vegas, USA, pp 204–212
Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30
García S, Fernández A, Luengo J, Herrera F (2010) Advanced nonparametric tests for multiple comparisons in the design of experiments in computational intelligence and data mining: experimental analysis of power. Inf Sci 180(10):2044–2064
García S, Molina D, Lozano M, Herrera F (2009) A study on the use of non-parametric tests for analyzing the evolutionary algorithms’ behaviour: a case study on the cec’2005 special session on real parameter optimization. J Heuristics 15(6):617–644
García-Piquer A, Fornells A, Orriols-Puig A, Corral G, Golobardes E (2011) Data classification through an evolutionary approach based on multiple criteria. Knowl Inf Syst. doi:10.1007/s10115-011-0462-9
Gruau F (1996) On using syntactic constraints with genetic programming. Adv Genet Progr 2:377–394
Ha H, Hwang D, Ryu B, Yun KH (2003) Mining association rules on significant rare data using relative support. J Syst Softw 67(3):181–191
Han J, Pei J, Yin Y, Mao R (2004) Mining frequent patterns without candidate generation: a frequent-pattern tree approach. Data Min Knowl Discov 8:53–87
Hoai RI, Whigham NX, Shan PA, O’neill Y, McKay M (2010) Grammar-based genetic programming: a survey. Genet Progr Evol Mach 11(3–4):365–396
Koh YS, Rountree N (2005) Finding sporadic rules using apriori-inverse. Lecture notes in computer science (including subseries lecture notes in artificial intelligence and lecture notes in bioinformatics), 3518:97–106
Koh YS, Rountree N (2010) Rare association rule mining and knowledge discovery: technologies for infrequent and critical event detection. Information science reference, Hershey, NY
Koufakou A, Secretan J, Georgiopoulos M (2011) Non-derivable itemsets for fast outlier detection in large high-dimensional categorical data. Knowl Inf Syst 29:697–725
Li T, Li X (2010) Novel alarm correlation analysis system based on association rules mining in telecommunication networks. Inf Sci 180(16):2960–2978
Luna JM, Ramírez A, Romero JR, Ventura S (2010) An intruder detection approach based on infrequent rating pattern mining. In: Proceedings of the 10th international conference on intelligent systems design and applications, ISDA ’10, Cairo, Egypt, pp 682–688
Luna JM, Romero JR, Ventura S (2012) Design and behavior study of a grammar-guided genetic programming algorithm for mining association rules. Knowl Inf Syst 32(1):53–76
Mata J, Álvarez JL, Riquelme JC (2002) Discovering numeric association rules via evolutionary algorithm. In: Proceeding of the 6th international conference on knowledge discovery and data mining, PAKDD ’02, pp 40–51
Ordoñez C, Ezquerra N, Santana C (2006) Constraining and summarizing association rules in medical data. Knowl Inf Syst 9(3):259–283
Piatetsky-Shapiro G (1991) Discovery, analysis and presentation of strong rules. In: Piatetsky-Shapiro G, Frawley W (eds) Knowledge discovery in databases. AAAI Press, Menlo Park, CA, pp 229–248
Rahman A, Ezeife CI, Aggarwal AK (2008) Wifi miner: an online apriori-infrequent based wireless intrusion system. In: Proceedings of the 2nd international workshop in knowledge discovery from sensor data, Sensor-KDD ’08, Las Vegas, USA, pp 76–93
Rastogi R, Shim K (2002) Mining optimized association rules with categorical and numeric attributes. IEEE Trans Knowl Data Eng 14(1):29–50
Romero C, Luna JM, Romero JR, Ventura S (2011) Rm-tool: a framework for discovering and evaluating association rules. Adv Eng Softw 42(8):566–576
Salam A, Khayal M (2012) Mining top-k frequent patterns without minimum support threshold. Knowl Inf Syst 30:57–86
Sánchez D, Serrano JM, Cerda L, Vila MA (2008) Association rules applied to credit card fraud detection. Expert Syst Appl 36:3630–3640
Schuster A, Wolff R, Trock D (2004) A high-performance distributed algorithm for mining association rules. Knowl Inf Syst 7(4):458–475
Szathmary L, Napoli A, Valtchev P (2007) Towards rare itemset mining. In: Proceedings of the 19th IEEE international conference on tools with artificial intelligence, ICTAI ’07, Patras, Greece, pp 305–312
Szathmary L, Valtchev P, Napoli A (2010) Generating rare association rules using the minimal rare itemsets family. Int J Softw Inf 4(3):219–238
Tan P, Kumar V Interestingness measures for association patterns: a perspective. In: Proceedings of the workshop on postprocessing in machine learning and data mining, KDD ’00, New York, USA
Tung AKH, Lu H, Han J, Feng L (2003) Efficient mining of intertransaction association rules. IEEE Trans Knowl Data Eng 15(1):43–56. http://doi.ieeecomputersociety.org/10.1109/TKDE.2003.1161581
Ventura S, Romero C, Zafra A, Delgado JA, Hervs C (2008) Jclec: a java framework for evolutionary computation. Soft Comput 12(4):381–392
Yun U, Ryu KH (2011) Approximate weighted frequent pattern mining with/without noisy environments. Knowl Based Syst 24(1):73–82
Zhang C, Zhang S (2002) Association rule mining: models and algorithms. Springer, Berlin
Acknowledgments
The authors would like to acknowledge the very helpful comments and suggestions of Dr. Mykola Pechenizkiy (Technical University of Eindhoven) on previous versions of this paper. This work was supported by the Regional Government of Andalusia and the Spanish Ministry of Science and Technology projects, P08-TIC-3720, TIN2008-06681-C06-03 and TIN-2011-22408, respectively, and FEDER funds. This research was also supported by the Spanish Ministry of Education under FPU grant AP2010-0041.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Luna, J.M., Romero, J.R. & Ventura, S. On the adaptability of G3PARM to the extraction of rare association rules. Knowl Inf Syst 38, 391–418 (2014). https://doi.org/10.1007/s10115-012-0591-9
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-012-0591-9