Abstract
Detecting association rules with low support but high confidence is a difficult data mining problem. To find such rules using approaches like the Apriori algorithm, minimum support must be set very low, which results in a large number of redundant rules. We are interested in sporadic rules; i.e. those that fall below a maximum support level but above the level of support expected from random coincidence. There are two types of sporadic rules: perfectly sporadic and imperfectly sporadic. Here we are more concerned about finding imperfectly sporadic rules, where the support of the antecedent as a whole falls below maximum support, but where items may have quite high support individually. In this paper, we introduce an algorithm called Mining Interesting Imperfectly Sporadic Rules (MIISR) to find imperfectly sporadic rules efficiently, e.g. fever, headache, stiff neck → meningitis. Our proposed method uses item constraints and coincidence pruning to discover these rules in reasonable time. This paper is an expanded version of Koh et al. [Advances in knowledge discovery and data mining: 10th Pacific-Asia Conference (PAKDD 2006), Singapore. Lecture Notes in Computer Science 3918, Springer, Berlin, pp 473–482].
Similar content being viewed by others
References
Agrawal R, Imielinski T, Swami A (1993) Mining association rules between sets of items in large databases. In: Buneman P, Jajodia S (eds) Proceedings of the ACM SIGMOD international conference on management of data, Washington, DC, pp~207–216
Agrawal R, Srikant R (1994) Fast algorithms for mining association rules. In: Bocca JB, Jarke M, Zaniolo C (eds) Proceedings of the 20th international conference on very large databases (VLDB 1994), Santiago de Chile, Chile, pp~487–499
Bayardo RJ, Agrawal R, Gunopulos D (2000) Constraint-based rule mining in large, dense databases. Data Mining Knowl Discov 4(2/3):217–240
Bonchi F, Giannotti F, Mazzanti A, Pedreschi D (2005) Efficient breadth-first mining of frequent pattern with monotone constraints. Knowl Inf Syst 8(2):131–153
Bonchi F, Lucchese C (2006) On condensed representations of constrained frequent patterns. Knowl Inf Syst 9(2):180–201
Bower KM (2003) When to use Fisher’s exact test. Am Soc Qual Six Sigma Forum Mag 2(4):35–37
Buckland M, Gey F (1994) The relationship between recall and precision. J Am Soc Inf Sci 45(1):12–19
Everitt B (1992) The analysis of contingency tables. Monographs on statistics and applied probability. Chapman and Hall, London, pp~11–36
Fisher RA (1970) Statistical methods for research workers. Oliver and Boyd, Edinburgh, UK
Flouvat F, Marchi FD, Petit J-M (2004) ABS: Adaptive borders search of frequent itemsets. In: Bayardo RJ Jr, Goethals B, Zaki MJ (eds) Proceedings of the IEEE ICDM workshop on frequent itemset mining implementations (FIMI 2004), Brighton, UK
Koh YS, Rountree N (2005) Finding sporadic rules using apriori-inverse. In: Ho TB, Cheung D, Liu H (eds) Advances in knowledge discovery and data mining: 9th Pacific-Asia conference (PAKDD 2005), Hanoi, Vietnam. Lecture notes in computer science, vol 3518. Springer, Berlin Heidelberg New York, pp~97–106
Koh YS, Rountree N, O’Keefe R (2006) Finding non-coincidental sporadic rules using apriori-inverse. Int J Data Warehousing Mining 2(2):38–54
Koh YS, Rountree N, O’Keefe R (2006) Mining interesting imperfectly sporadic rules. In: Ng WK, Kitsuregawa M, Li J, Chang K (eds) Advances in knowledge discovery and data mining: 10th Pacific-Asia conference (PAKDD 2006), Singapore. Lecture notes in computer science, vol 3918. Springer, Berlin Heidelberg New York, pp~473–482
Li J, Zhang X, Dong G, Ramamohanarao K, Sun Q (1999) Efficient mining of high confidence association rules without support threshold. In: Zytkow JM, Rauch J (eds) Principles of data mining and knowledge discovery: Third European conference (PKDD 1999), Prague, Czech Republic. Lecture notes in computer science, vol 1704. Springer, Berlin Heidelberg New York, pp~406–411
Liu B, Hsu W, Ma Y (1999) Pruning and summarizing the discovered associations. In: Proceedings of the 5th ACM SIGKDD international conference on knowledge discovery and data mining (KDD 1999), San Deigo, CA, pp~125–134
Newman D, Hettich S, Blake C, Merz C (1998) UCI repository of machine learning databases. http://www.ics.uci.edu/~mlearn/MLRepository.html
Ng RT, Lakshmanan LVS, Han J, Pang A (1998) Exploratory mining and pruning optimizations of constrained associations rules. In: Tiwary A, Franklin M (eds) Proceedings of the 1998 ACM SIGMOD international conference on management of data (SIGMOD 1998), Seattle, WA, pp~13–24
Rahal I, Ren D, Wu W, Perrizo W (2004) Mining confident minimal rules with fixed consequents. In: Proceedings of the 16th IEEE international conference on tools with artifical intelligence(ICTAI 2004), Boca Raton, FL, pp~6–13
Srikant R, Agrawal R (1995) Mining generalized association rules. In: Dayal U, Gray PMD, Nishio S (eds) Proceedings of the 21st international conference on very large data bases (VLDB 1995), Zurich, Switzerland, pp~407–419
Srikant R, Vu Q, Agrawal R (1997) Mining association rules with item constraints. In: Heckerman D, Mannila H, Pregibon D, Uthurusamy R (eds) Proceedings of the third international conference on knowledge discovery and data mining (KDD 1997). AAAI Press, Menlo Park, CA, pp~67–73
Uno T, Kiyomi M, Arimura H (2004) LCM ver. 2: efficient mining algorithms for frequent/closed/maximal itemsets. In: Bayardo RJ Jr, Goethals B, Zaki MJ (eds) Proceedings of the IEEE ICDM workshop on frequent itemset mining implementations (FIMI 2004), Brighton, UK
Wang H, Perng C-S, Ma S, Yu PS (2005) Demand-driven frequent itemset mining using pattern structures. Knowl Inf Syst 8(1):82–102
Weisstein E (2005) Fisher’s exact test. MathWorld – a Wolfram Web resource. http://mathworld.wolfram.com/FishersExactTest.html
Zou Q, Chu W, Johnson D, Chiu H (2002) A pattern decomposition algorithm for data mining of frequent patterns. Knowl Inf Syst 4(4):466–428
Author information
Authors and Affiliations
Corresponding author
Additional information
Yun Sing Koh is currently a Ph.D. student at the Department of Computer Science, University of Otago, New Zealand. Her main research interest is in association rule mining with particular interest in generating hard-to-find association rules and interestingness measures. She holds a B.Sc. (Honours) degree in computer science and a Master’s degree in software engineering, both from the University of Malaya, Malaysia.
Nathan Rountree has been a faculty member of the Department of Computer Science at the University of Otago, Dunedin, since 1999. His research interests are in the fields of data mining, artificial neural networks, and computer science education. He is also a consulting software engineer for Profiler Corporation, a Dunedin-based company specialising in data mining and knowledge discovery.
Richard A. O’Keefe holds a B.Sc. (Honours) degree in mathematics and physics, majoring in statistics, and an M.Sc. degree in physics (underwater acoustics), both obtained from the University of Auckland, New Zealand. He received his Ph.D. degree in artificial intelligence from the University of Edinburgh. He is the author of “The Craft of Prolog’’ (MIT Press). Dr. O’Keefe is now a lecturer at the University of Otago, New Zealand. His computing interests include declarative programming languages, especially Prolog and Erlang; statistical applications, including data mining and information retrieval; and applications of logic. He is also a member of the editorial board of theory and practice of logic programming.
Rights and permissions
About this article
Cite this article
Koh, Y.S., Rountree, N. & O’Keefe, R.A. Mining interesting imperfectly sporadic rules. Knowl Inf Syst 14, 179–196 (2008). https://doi.org/10.1007/s10115-007-0074-6
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-007-0074-6