Abstract
Certainty factor and lift are known evaluation measures of association rules. Nevertheless, they do not guarantee accurate evaluation of the strength of dependence between rule’s constituents. In particular, even if there is a strongest possible positive or negative dependence between rule’s constituents X and Y, these measures may reach values quite close to the values indicating independence of X and Y. Recently, we have proposed a new measure called a dependence factor to overcome this drawback. Unlike in the case of the certainty factor, when defining the dependence factor, we took into account the fact that for a given rule \(X \rightarrow Y\), the minimal conditional probability of the occurrence of Y given X may be greater than 0, while its maximal possible value may less than 1. In this paper, we first recall definitions and properties of all the three measures. Then, we examine the dependence factor from the point of view of an interestingness measure as well as we examine the relationship among the dependence factor for X and Y with those for \(\bar{X}\) and Y, X and \(\bar{Y}\), as well as \(\bar{X}\) and \(\bar{Y}\), respectively. As a result, we obtain a number of new properties of the dependence factor.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Agrawal R, Imielinski T, Swami AN (1993) Mining association rules between sets of items in large databases. In: ACM SIGMOD international conference on management of data, pp 207–216
Brin S, Motwani R, Ullman JD, Tsur S (1997) Dynamic itemset counting and implication rules for market basket data. In: ACM SIGMOD 1997 international conference on management of data, pp 255–264
Hilderman RJ, Hamilton HJ (2001) Evaluation of interestingness measures for ranking discovered knowledge. LNCS 2035:247–259
Kryszkiewicz M (2015) Dependence factor for association rules. In: Proceedings of ACIIDS 2015, part II LNAI, vol 9012, pp 135–145, Springer, New York
Lavrac N, Flach P, Zupan B (1999) Rule evaluation measures: a unifying view. In: Proceedings of ILP-1999. LNAI, vol 1634, pp 174–185. Springer, New York
Lenca P, Meyer P, Vaillant B, Lallich S (2008) On selecting interestingness measures for association rules: user oriented description and multiple criteria decision aid. In: European journal of operational research, vol 184, pp 610–626. Elsevier, France
Piatetsky-Shapiro G (1991) Discovery, analysis, and presentation of strong rules. Knowledge discovery in databases, pp 229–248. AAAI/MIT Press, Cambridge
Sheikh LM, Tanveer B, Hamdani SMA (2004) Interesting measures for mining association rules. In: Proceedings of INMIC 2004, IEEE
Shortliffe E, Buchanan B (1975) A model of inexact reasoning in medicine. Math Biosci 23:351–379
Silberschatz A, Tuzhilin A (1995) On subjective measures of interestingness in knowledge discovery. Proc KDD 1995:275–281
Suzuki E (2008) Pitfalls for categorizations of objective interestingness measures for rule discovery. In: Statistical implicative analysis: theory and applications, pp 383–395. Springer, New York
Suzuki E (2009) Interestingness measures—limits, desiderata, and recent results. In: QIMIE/PAKDD
Acknowledgments
We wish to thank an anonymous reviewer for constructive comments, which influenced the final version of this paper positively.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendix
Appendix
Proof of Lemma 2
In the proof, we will use the following equations:
-
\(P(\bar{X}) = 1 - P(X)\), \(P(\bar{Y}) = 1 - P(Y)\),
-
\(P(\bar{X}Y)=P(Y)-P(XY),\quad P(X\bar{Y}) = P(X)-P(XY),\)
-
\(P(\bar{X}\bar{Y}) = P(\bar{X}) - P(\bar{X}Y) = 1 - P(X)-P(Y)+P({ XY})\).
Ad (a)
Case \(P(\bar{X}\bar{Y}) >P(\bar{X}) \times P(\bar{Y})\):
This case is equivalent to the case when \(P({ XY}) >P(X)\times P(Y)\) (by Lemma 1a). Then:
\(df(\bar{X} \rightarrow \bar{Y}) = \text { /* by Proposition 3a */}\)
Case \(P(\bar{X}\bar{Y}) = P(\bar{X})\times P(\bar{Y})\):
This case is equivalent to the case when \(P(XY) = P(X) \times P(Y)\) (by Lemma 1b). Then:
\({ df}(\bar{X} \rightarrow \bar{Y} ) = \text { /* by Proposition 3a */}\)
\(\begin{aligned}&= 0 = \text { /* by Proposition 3a */}\\&= { df}(X \rightarrow Y). \end{aligned}\)
Case \(P(\bar{X}\bar{Y}) < P(\bar{X}) \times P(\bar{Y})\) and \(P(\bar{X}) + P(\bar{Y}) \le 1\):
This case is equivalent to the case when \(P({ XY}) < P(X) \times P(Y)\) (by Lemma 1c) and \(P(X)+P(Y)\ge 1\) (by Proposition 5c). Then:
\({ df}(\bar{X} \rightarrow \bar{Y}) = \text { /* by Proposition 3a */}\)
Case \(P(\bar{X}\bar{Y}) < P(\bar{X}) \times P(\bar{Y})\,\mathrm{and}\,P(\bar{X}) + P(\bar{Y}) > 1\):
This case is equivalent to the case when \(P(XY) < P(X) \times P(Y)\) (by Lemma 1c) and \(P(X)+P(Y) <\) 1 (by Proposition 5c). Then:
\(df(\bar{X}\rightarrow \bar{Y}) = \text { /* by Proposition 3a */} \)
Ad (b)
The proof is analogous to the proof of Lemma 1a.
Ad (c)
Case \(P(X\bar{Y})>P(X) \times P(\bar{Y})\) and \(P(X)\le P(\bar{Y})\):
This case is equivalent to the case when \(P(XY)< P(X)\times P(Y)\) (by Lemma 1c) and \(P(X) \le 1 - P(Y)\). Then:
\(df(X \rightarrow \bar{Y}) = \text { /* by Proposition 3a */}\)
Case \(P(X\bar{Y}) > P(X) \times P(\bar{Y})\,\mathrm{and}\,P(X) > P(\bar{Y})\).
This case is equivalent to the case when \(P(XY)< P(X) \times P(Y)\) (by Lemma 1c) and \(P(X) > 1 - P(Y)\). Then:
\(df(X \rightarrow \bar{Y}) = \text { /* by Proposition 3a */}\)
Case \(P(X\bar{Y})=P(X) \times P(\bar{Y})\):
This case is equivalent to the case when \(P(XY)=P(X)\times P(Y)\) (by Lemma 1b). Then:
\(df(\bar{X}\rightarrow \bar{Y}) = \text { /* by Proposition 3a */}\)
\(\begin{aligned}&= 0 = \text { /* by Proposition 3a */} \\&= -df(X\rightarrow Y). \end{aligned}\)
Case \(P(X\bar{Y}) < P(X)\times P(\bar{Y})\) and \(P(X)+ P(\bar{Y}) \le 1\).
This case is equivalent to the case when \(P(XY) > P(X) \times P(Y)\) (by Lemma 1a) and \(P(X)\le P(Y)\). Then:
\(df(X \rightarrow \bar{Y}) = \text { /* by Proposition 3a */}\)
Case \(P(X\bar{Y}) < P(X) \times P(\bar{Y})\) and \(P(X)+P(\bar{Y}) > 1\).
This case is equivalent to the case when \(P(XY) > P(X) \times P(Y)\) (by Lemma 1a) and \(P(X)>P(Y)\). Then:
\(df(X \rightarrow \bar{Y}) = \text { /* by Proposition 3a */}\)
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this chapter
Cite this chapter
Kryszkiewicz, M. (2016). Dependence Factor as a Rule Evaluation Measure. In: Matwin, S., Mielniczuk, J. (eds) Challenges in Computational Statistics and Data Mining. Studies in Computational Intelligence, vol 605. Springer, Cham. https://doi.org/10.1007/978-3-319-18781-5_12
Download citation
DOI: https://doi.org/10.1007/978-3-319-18781-5_12
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-18780-8
Online ISBN: 978-3-319-18781-5
eBook Packages: EngineeringEngineering (R0)