Abstract
Given a database of records, it might be possible to identify small subsets of data which distribution is exceptionally different from the distribution in the complete set of data records. Finding such interesting relationships, which we call exceptional relationships, in an automated way would allow discovering unusual or exceptional hidden behaviour. In this paper, we formulate the problem of mining exceptional relationships as a special case of exceptional model mining and propose a grammar-guided genetic programming algorithm (MERG3P) that enables the discovery of any exceptional relationships. In particular, MERG3P can work directly not only with categorical, but also with numerical data. In the experimental evaluation, we conduct a case study on mining exceptional relations between well-known and widely used quality measures of association rules, which exceptional behaviour would be of interest to pattern mining experts. For this purpose, we constructed a data set comprising a wide range of values for each considered association rule quality measure, such that possible exceptional relations between measures could be discovered. Thus, besides the actual validation of MERG3P, we found that the Support and Leverage measures in fact are negatively correlated under certain conditions, while in general experts in the field expect these measures to be positively correlated.
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10115-015-0859-y/MediaObjects/10115_2015_859_Fig1_HTML.gif)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10115-015-0859-y/MediaObjects/10115_2015_859_Fig2_HTML.gif)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10115-015-0859-y/MediaObjects/10115_2015_859_Fig3_HTML.gif)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10115-015-0859-y/MediaObjects/10115_2015_859_Fig4_HTML.gif)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10115-015-0859-y/MediaObjects/10115_2015_859_Fig5_HTML.gif)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10115-015-0859-y/MediaObjects/10115_2015_859_Fig6_HTML.gif)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10115-015-0859-y/MediaObjects/10115_2015_859_Fig7_HTML.gif)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10115-015-0859-y/MediaObjects/10115_2015_859_Fig8_HTML.gif)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10115-015-0859-y/MediaObjects/10115_2015_859_Fig9_HTML.gif)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10115-015-0859-y/MediaObjects/10115_2015_859_Fig10_HTML.gif)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10115-015-0859-y/MediaObjects/10115_2015_859_Fig11_HTML.gif)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10115-015-0859-y/MediaObjects/10115_2015_859_Fig12_HTML.gif)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10115-015-0859-y/MediaObjects/10115_2015_859_Fig13_HTML.gif)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10115-015-0859-y/MediaObjects/10115_2015_859_Fig14_HTML.gif)
Similar content being viewed by others
Notes
The data set and the data generator can be reached at http://www.uco.es/grupos/kdis/kdiswiki/index.php/Exceptional_ARM.
A sensitivity analysis was carried out. The results and statistical analysis could be reached at http://www.uco.es/grupos/kdis/kdiswiki/index.php/Exceptional_ARM.
JCLEC is available for download (http://jclec.sourceforge.net).
All the data sets are publicly available for download from the UCI machine learning repository (http://archive.ics.uci.edu/ml/datasets/).
The data set and the data generator can be reached at http://www.uco.es/grupos/kdis/kdiswiki/index.php/Exceptional_ARM.
References
Agrawal R, Srikant R (1994) Fast algorithms for mining association rules in large databases. In: Proceedings of 20th International Conference on Very Large Data Bases, VLDB’94. Santiago de Chile, Chile, Morgan Kaufmann, pp. 487–499
Berzal F, Blanco I, Sánchez D, Vila MA (2002) Measuring the accuracy and interest of association rules: a new framework. Intell Data Anal 6(3):221–235
Geng L, Hamilton HJ (2006) Interestingness measures for data mining: a survey. ACM Comput Surv 38(3):9
Han J, Pei J, Yin Y, Mao R (2004) Mining frequent patterns without candidate generation: a frequent-pattern tree approach. Data Min Knowl Discov 8:53–87
McKay RI, Nguyen XH, Whigham PA, Shan Y, O’Neill M (2010) Grammar-based genetic programming: a survey. Genet Program Evol Mach, 11(3–4):365–396
Jaroszewicz S (2008) Minimum variance associations—discovering relationships in numerical data. In: The Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD), Osaka, Japan, pp. 172–183
Koh YS, Rountree N (2010) Rare association rule mining and knowledge discovery: technologies for infrequent and critical event detection. Information Science Reference, Hershey
Koza JR (1992) Genetic programming: on the programming of computers by means of natural selection. The MIT Press, Cambridge
Leeuwen M (2010) Maximal exceptions with minimal descriptions. Data Min Knowl Discov 21(2):259–276
Leeuwen Matthijs, Knobbe Arno (2012) Diverse subgroup set discovery. Data Min Knowl Discov 25(2):208–242
Leman D, Feelders A, Knobbe AJ (2008) Exceptional model mining. In: Proceedings of the European Conference in Machine Learning and Knowledge Discovery in Databases, volume 5212 of ECML/PKDD 2008, Antwerp, Belgium, Springer, pp. 1–16
Luna JM, Romero JR, Ventura S (2012) Design and behavior study of a grammar-guided genetic programming algorithm for mining association rules. Knowl Inf Syst 32(1):53–76
Luna JM, Romero JR, Ventura S (2014) On the adaptability of G3PARM to the extraction of rare association rules. Knowl Inf Syst 38(2):391–418
Romero C, Luna JM, Romero JR, Ventura S (2010) Mining rare association rules from e-learning data. In: Proceedings of the 3rd International Conference on Educational Data Mining, EDM 2010, pp. 171–180
Romero C, Luna JM, Romero JR, Ventura S (2011) RM-Tool: a framework for discovering and evaluating association rules. Adv Eng Softw 42(8):566–576
Salam A, Khayal M (2012) Mining top-k frequent patterns without minimum support threshold. Knowl Inf Syst 30:57–86
Ventura S, Romero C, Zafra A, Delgado JA, Hervás C (2008) JCLEC: a java framework for evolutionary computation. Soft Comput 12(4):381–392
Webb GI (2001) Discovering associations with numeric variables. In: Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining, KDD ’01. New York, NY, USA, ACM, pp. 383–388
Zafra A, Pechenizkiy M, Ventura S (2012) ReliefF-MI: an extension of ReliefF to multiple instance learning. Neurocomputing 75(1):210–218
Acknowledgments
This research was supported by the Spanish Ministry of Economy and Competitiveness, project TIN-2014-55252-P, and by FEDER funds. This research was partly supported by STW CAPA project. Finally, this research was also supported by the Spanish Ministry of Education under FPU Grant AP2010-0041.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Luna, J.M., Pechenizkiy, M. & Ventura, S. Mining exceptional relationships with grammar-guided genetic programming. Knowl Inf Syst 47, 571–594 (2016). https://doi.org/10.1007/s10115-015-0859-y
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-015-0859-y