Skip to main content
Log in

Evaluation and optimization of frequent, closed and maximal association rule based classification

  • Published:
Statistics and Computing Aims and scope Submit manuscript

Abstract

Real world applications of association rule mining have well-known problems of discovering a large number of rules, many of which are not interesting or useful for the application at hand. The algorithms for closed and maximal itemsets mining significantly reduce the volume of rules discovered and complexity associated with the task, but the implications of their use and important differences with respect to the generalization power, precision and recall when used in the classification problem have not been examined. In this paper, we present a systematic evaluation of the association rules discovered from frequent, closed and maximal itemset mining algorithms, combining common data mining and statistical interestingness measures, and outline an appropriate sequence of usage. The experiments are performed using a number of real-world datasets that represent diverse characteristics of data/items, and detailed evaluation of rule sets is provided as a whole and w.r.t individual classes. Empirical results confirm that with a proper combination of data mining and statistical analysis, a large number of non-significant, redundant and contradictive rules can be eliminated while preserving relatively high precision and recall. More importantly, the results reveal the important characteristics and differences between using frequent, closed and maximal itemsets for the classification task, and the effect of incorporating statistical/heuristic measures for optimizing such rule sets. With closed itemset mining already being a preferred choice for complexity and redundancy reduction during rule generation, this study has further confirmed that overall closed itemset based association rules are also of better quality in terms of classification precision and recall, and precision and recall on individual class examples. On the other hand maximal itemset based association rules, that are a subset of closed itemset based rules, show to be insufficient in this regard, and typically will have worse recall and generalization power. Empirical results also show the downfall of using the confidence measure at the start to generate association rules, as typically done within the association rule framework. Removing rules that occur below a certain confidence threshold, will also remove the knowledge of existence of any contradictions in the data to the relatively higher confidence rules, and thus precision can be increased by disregarding contradictive rules prior to application of confidence constraint.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Algorithm 1

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

References

  • Agrawal, R., Srikant, R.: Fast algorithms for mining association rules in large databases. In: Proceedings of the 20th International Conference on Very Large Data Bases, VLDB, Santiago, Chile, pp. 487–499 (1994)

    Google Scholar 

  • Agrawal, R., Imieliski, T., Swami, A.: Mining association rules between sets of items in large databases. In: Proc. of the ACM SIGMOD Conference on Management of Data, Washington, DC, May 16–18, pp. 217–226 (1993)

    Google Scholar 

  • Agresti, A.: An Introduction to Categorical Data Analysis, 2nd edn. Wiley, New York (2007)

    Book  MATH  Google Scholar 

  • AidIn, T., Güvenir, H.A.: Modeling interestingness of streaming association rules as a benefit-maximizing classification problem. In: Knowledge-Based Systems, vol. 22, pp. 85–99. Elsevier, Amsterdam (2009)

    Google Scholar 

  • Bay, S.D., Pazzani, M.J.: Detecting group differences: mining contrast sets. Data Min. Knowl. Discov. 5, 213–246 (2001)

    Article  MATH  Google Scholar 

  • Bayardo, R.J.: Efficiently mining long patterns from databases. In: ACM SIGMOD International Conference on Management of Data, pp. 85–93 (1998)

    Google Scholar 

  • Bayardo, R.J., Agrawal, R., Gunopulos, D.: Constraint-based rule mining in large, dense databases. Data Min. Knowl. Discov. 4, 217–240 (2000)

    Article  Google Scholar 

  • Blanchard, J., Guillet, F., Gras, R., Briand, H.: Using information-theoretic measures to assess association rule interestingness. In: Proceedings of the 5th IEEE International Conference on Data Mining, Houston, Texas, USA, pp. 66–73 (2005)

    Google Scholar 

  • Brijs, T., Vanhoof, K., Wets, G.: Defining interestingness for association rules. Int. J. Inf. Theories Appl. 10(4), 370–376 (2003)

    Google Scholar 

  • Cheng, H., Yan, X., Han, J., Hsu, C.-W.: Discriminative frequent pattern analysis for effective classification. In: 23rd IEEE International Conference on Data Engineering (ICDE’07), pp. 716–725 (2007)

    Chapter  Google Scholar 

  • Cheng, H., Yan, X., Han, J., Yu, P.S.: Direct discriminative pattern mining for effective classification. In: 24th International Conference on Data Engineering (ICDE’08), pp. 169–178 (2008)

    Chapter  Google Scholar 

  • Frank, A., Asuncion, A.: UCI machine learning repository http://archive.ics.uci.edu/ml Irvine, CA: University of California, School of Information and Computer Science (2010)

  • Garriga, G.C., Kralj, P., Lavrac, N.: Closed sets for labeled data. J. Mach. Learn. Res. 9, 559–580 (2008)

    MATH  MathSciNet  Google Scholar 

  • Geng, L., Hamilton, H.J.: Interestingness measures for data mining: a survey. ACM Comput. Surv. 38(3), 9 (2006)

    Article  Google Scholar 

  • Goodman, A., Kamath, C., Kumar, V.: Data analysis in the 21st century. Stat. Anal. Data Min. 1(1), 1–3 (2008)

    Article  MathSciNet  Google Scholar 

  • Gouda, K., Zaki, M.J.: Efficiently mining maximal frequent itemsets. In: 1st IEEE International Conference on Data Mining (ICDM’01), pp. 163–170 (2001)

    Google Scholar 

  • Hadzic, F., Dillon, T.S.: Using the symmetrical tau (τ) criterion for feature selection in decision tree and neural network learning. In: 2nd SIAM Workshop on Feature Selection for Data Mining: Interfacing Machine Learning and Statistics (2006)

    Google Scholar 

  • Hämäläinen, W., Nykänen, M.: Efficient discovery of statistically significant association rules. In: 8th IEEE International Conference on Data Mining, pp. 203–212 (2008)

    Google Scholar 

  • Han, J., Cheng, H., Xin, D., Yan, X.: Frequent pattern mining: current status and future directions. Data Min. Knowl. Discov. 15(1), 55–86 (2007)

    Article  MathSciNet  Google Scholar 

  • Hosmer, D.W., Lemeshow, S.: Applied Logistic Regression. Wiley, New York (1989)

    Google Scholar 

  • Lallich, S., Teytaud, O., Prudhomme, E.: Association rule interestingness: measure and statistical validation. In: Guillet, F.J., Hamilton, H.J. (eds.) Quality Measures in Data Mining, pp. 251–275. Springer, Berlin (2007)

    Chapter  Google Scholar 

  • Lavrac, N., Flach, P., Zupan, B.: Rule evaluation measures: a unifying view. Inductive Log. Program. 174–185 (1999)

  • Le Bras, Y., Lenca, P., Lallich, S.: Mining classification rules without support: an anti-monotone property of Jaccard measure. In: 14th International Conference on Discovery Science. LNCS, vol. 6926, pp. 179–193. Springer, Berlin (2011)

    Chapter  Google Scholar 

  • Le Bras, Y., Lenca, P., Lallich, S.: Formal framework for the study of algorithmic properties of objective interestingness measures. In: Holmes, D.E., Jain, L.C. (eds.) Data Mining: Foundations and Intelligent Paradigms, ISRL, vol. 24, pp. 77–98 (2012)

    Google Scholar 

  • Lenca, P., Meyer, P., Vaillant, B., Lallich, S.: On selecting interestingness measures for association rules: user oriented description and multiple criteria decision aid. Eur. J. Oper. Res. 184, 610–626 (2008)

    Article  MATH  Google Scholar 

  • Li, J.: On optimal rule discovery. IEEE TKDD 18(4), 460–471 (2006)

    Google Scholar 

  • Li, W., Han, J., Pei, J.: CMAR: accurate and efficient classification based on multiple class-association rules. In: 2001 IEEE International Conference on Data Mining (ICDM’01), pp. 369–376 (2001)

    Google Scholar 

  • Li, J., Shen, H., Topor, R.W.: Mining the optimal class association rule set. Knowl.-Based Syst. 15, 399–405 (2002)

    Article  Google Scholar 

  • Little, R.J.A., Rubin, D.B.: Statistical Analysis with Missing Data, 2nd edn. Wiley, New York (2002)

    MATH  Google Scholar 

  • Liu, B., Hsu, W., Ma, Y.: Integrating classification and association rule mining. In: Proceedings of the 4th International Conference on Knowledge Discovery and Data Mining, pp. 80–86 (1998)

    Google Scholar 

  • Liu, B., Ma, Y., Wong, C.: Improving an association rule based classifier. In: Zighed, D., Komorowski, J., Zytkow, J. (eds.) Proceedings of the 4th European Conference on Principles of Data Mining and Knowledge Discovery, pp. 504–509 (2000)

    Chapter  Google Scholar 

  • McGarry, K.: A survey of interestingness measures for knowledge discovery. Knowl. Eng. Rev. 20, 39–61 (2005)

    Article  Google Scholar 

  • Meggido, N., Srikant, R.: Discovering predictive association rules. In: 4th International Conference on Knowledge Discovery in Databases and Data Mining, pp. 274–278 (1998)

    Google Scholar 

  • Novak, P.K., Lavrac, N., Webb, G.I.: Supervised descriptive rule discovery: a unifying survey of contrast set, emerging patterns and subgroup mining. J. Mach. Learn. Res. 10, 377–403 (2009)

    MATH  Google Scholar 

  • Piatetsky-Shapiro, G.: Discovery, analysis and presentation of strong rules. Knowl. Discov. Database 229–248 (1991)

  • Refaat, M.: Data Preparation for Data Mining Using SAS. Morgan Kaufmann, San Francisco (2007)

    Google Scholar 

  • Shaharanee, I.N.M., Hadzic, F., Dillon, T.S.: Interestingness measures for association rules based on statistical validity. Knowl.-Based Syst. 24, 386–392 (2011)

    Article  Google Scholar 

  • Silverstein, C., Brin, S., Motwani, R.: Beyond market baskets: generalizing association rules to dependence rules. Data Min. Knowl. Discov. 2, 39–68 (1998)

    Article  Google Scholar 

  • Simon, G.J., Kumar, V., Li, P.W.: A simple statistical model and association rule filtering for classification. In: 17th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD-2011), pp. 823–831 (2011)

    Chapter  Google Scholar 

  • Tan, P.N., Kumar, V., Srivastava, J.: Selecting the right interestingness measure for association patterns. In: 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 32–41 (2002)

    Google Scholar 

  • Veloso, A., Meira, W., Zaki, M.J.: Lazy associative classification. In: Proceedings of the 6th IEEE International Conference on Data Mining (ICDM’06), pp. 645–654 (2006)

    Google Scholar 

  • Wang, K., He, Y., Cheung, D.W.: Mining confident rules without support requirement. In: 10th International Conference on Information and Knowledge Management, pp. 89–96 (2001)

    Google Scholar 

  • Webb, G.I.: Discovering significant patterns. Mach. Learn. 1–33 (2007)

  • Wei, J.-M., Yi, W.-G., Wang, M.-Y.: Novel measurement for mining effective association rules. Knowl.-Based Syst. 19, 739–743 (2006)

    Article  Google Scholar 

  • Yin, X., Han, J.: CPAR: classification based on predictive association rules. In: Proceedings of the SIAM International Conference on Data Mining (SDM’03), pp. 369–376 (2003)

    Google Scholar 

  • Zaki, M.J.: Mining non-redundant association rules. Data Min. Knowl. Discov. 9(3), 223–248 (2004)

    Article  MathSciNet  Google Scholar 

  • Zaki, M.J., Hsiao, C.J.: CHARM: an efficient algorithm for closed itemset mining. In: 2nd SIAM International Conference in Data Mining (2002)

    Google Scholar 

  • Zhang, C., Zhang, S.: Collecting quality data for database mining. In: AI 2001: Advances in Artificial Intelligence, pp. 131–142 (2001)

    Google Scholar 

  • Zhou, X.J., Dillon, T.S.: A statistical-heuristic feature selection criterion for decision tree induction. IEEE Trans. Pattern Anal. Mach. Intell. 13, 834–841 (1991)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to F. Hadzic.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Shaharanee, I.N.M., Hadzic, F. Evaluation and optimization of frequent, closed and maximal association rule based classification. Stat Comput 24, 821–843 (2014). https://doi.org/10.1007/s11222-013-9404-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11222-013-9404-6

Keywords

Navigation