Summary
Classification Rule Mining (CRM) is a well-known Data Mining technique for the extraction of hidden Classification Rules (CRs) from a given database that is coupled with a set of pre-defined classes, the objective being to build a classifier to classify “unseen” data-records. One recent approach to CRM is to employ Association Rule Mining (ARM) techniques to identify the desired CRs, i.e. Classification Association Rule Mining (CARM). Although the advantages of accuracy and efficiency offered by CARM have been established in many papers, one major drawback is the large number of Classification Association Rules (CARs) that may be generated — up to a maximum of “2n − n − 1” in the worst case, where n represents the number of data-attributes in a database. However, there are only a limited number, say at most k̂ in each class, of CARs that are required to distinguish between classes. The problem addressed in this chapter is how to efficiently identify the k̂ such CARs. Having a CAR list that is generated from a given database, based on the well-established “Support-Confidence” framework, a rule weighting scheme is proposed in this chapter, which assigns a score to a CAR that evaluates how significantly this CAR contributes to a single pre-defined class. Consequently a rule mining approach is presented, that addresses the above, that operates in time O(k 2 n 2) in its deterministic fashion, and O(kn) in its randomised fashion, where k represents the number of CARs in each class that are potentially significant to distinguish between classes and k ≥ k̂; as opposed to exponential time O(2n) — the time required in score computation to mine all k̂ CARs in a “one-by-one” manner. The experimental results show good performance regarding the accuracy of classification when using the proposed rule weighting scheme with a suggested rule ordering mechanism, and evidence that the proposed rule mining approach performs well with respect to the efficiency of computation.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Agrawal R, Imielinski T, Swami A (1993) Mining association rules between sets of items in large databases. In: Buneman P, Jajodia S (eds): Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data (SIGMOD-93, ACM, New York, NY), Washington, DC, United States, May 1993. (pages 207–216)
Agrawal R, Srikant R (1994) Fast algorithm for mining association rules. In: Bocca JB, Jarke M, Zaniolo C (eds): Proceedings of the 20th International Conference on Very Large Data Bases (VLDB-94, Morgan Kaufmann, San Francisco, CA), Santiago de Chile, Chile, September 1994. (ISBN 1-55860-153-8, pages 487–499)
Ali K, Manganaris S, Srikant R (1997) Partial classification using association rules. In: Heckerman D, Mannila H, Pregibon D, Uthurusamy R (eds): Proceedings of the Third International Conference on Knowledge Discovery and Data Mining (KDD-97, AAAI, Menlo Park, CA), Newport Beach, California, United States, August 1997. (ISBN 1-57735-027-8, pages 115–118)
Blake CL, Merz CJ (1998) UCI repository of machine learning databases. http://www.ics.uci.edu/~mlearn/MLRepository.html, Irvine, CA: University of California, Department of Information and Computer Science
Bong CH, Narayanan K (2004) An empirical study of feature selection for text categorization based on term weightage. In: Proceedings of the 2004 IEEE/WIC/ACM International Conference on Web Intelligence (WI-04, IEEE Computer Society), Beijing, China, September 2004. (ISBN 0-7695-2100-2, pages 599–602)
Boser BE, Guyon IM, Vapnik VN (1992) A training algorithm for optimal margin classifiers. In: Haussler D (ed): Proceedings of the fifth ACM Annual Workshop on Computational Learning Theory (COLT-92, ACM, New York, NY), Pittsburgh, Pennsylvania, United States, July 1992. (ISBN 0-89791-497-X, pages 144–152)
Brin S, Motwani R, Ullman JD, Tsur S (1997) Dynamic itemset counting and implication rules for market basket data. In: Peckham J (ed): Proceedings of the 1997 ACM SIGMOD International Conference on Management of Data (SIGMOD-97, ACM, New York, NY), Tucson, Arizona, United States, May 1997. (SIGMOD Record 26(2), pages 255–264)
Burdick D, Calimlim M, Gehrke J (2001) MAFIA: A maximal frequent itemset algorithm for transactional databases. In: Proceedings of the 17th International Conference on Data Engineering (ICDE-01, IEEE Computer Society), Heidelberg, Germany, April 2001. (ISBN 0-7695-1001-9, pages 443–452)
Cheung W, Zaïane OR (2003) Incremental mining of frequent patterns without candidate generation or support constrain. In: Seventh International Database Engineering and Applications Symposium (IDEAS-03, IEEE Computer Society), Hong Kong, China, July 2003. (ISBN 0-7695-1981-4, pages 111–116)
Clark P, Boswell R (1991) Rule induction with CN2: Some recent improvements. In: Kodratoff Y (ed): Machine Learning – Proceedings of the Fifth European Working Session on Learning (EWSL-91, Springer, Berlin Heidelberg New York), Porto, Portugal, March 1991. (LNAI 482, ISBN 3-540-53816-X, pages 151–163)
Coenen F, Goulbourne G, Leng P (2001) Computing association rules using partial totals. In: Raedt LD, Siebes A (eds): Principles of Data Mining and Knowledge Discovery – Proceedings of the 5th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD-01, Springer, Berlin Heidelberg New York), Freiburg, Germany, September 2001. (LNAI 2168, ISBN 3-540-42534-9, pages 54–66)
Coenen F, Leng P (2001) Optimising association rule algorithms using itemset ordering. In: Bramer M, Coenen F, Preece A (eds): Research and Development in Intelligent Systems XVIII – Proceedings of the Twenty-first SGES International Conference on Knowledge Based Systems and Applied Artificial Intelligence (ES-01, Springer, Berlin Heidelberg New York), Cambridge, United Kingdom, December 2001. (ISBN 1852335351, pages 53–66)
Coenen F, Leng P (2002) Finding association rules with some very frequent attributes. In: Elomaa T, Mannila H, Toivonen H (eds): Principles of Data Mining and Knowledge Discovery – Proceedings of the 6th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD-02, Springer, Berlin Heidelberg New York), Helsinki, Finland, August 2002. (LNAI 2431, ISBN 3-540-44037-2, pages 99–111)
Coenen F (2003) The LUCS-KDD discretised/normalised ARM and CARM data library. http://www.csc.liv.ac.uk/~frans/KDD/Software/LUCS-KDD-DN/, Department of Computer Science, The University of Liverpool, UK
Coenen F, Leng P (2004) An evaluation of approaches to classification rule selection. In: Proceedings of the Fourth IEEE International Conference on Data Mining (ICDM-04, IEEE Computer Society), Brighton, UK, November 2004. (ISBN 0-7695-2142-8, pages 359–362)
Coenen F, Leng P, Ahmed S (2004) Data structure for association rule mining: T-trees and p-trees. IEEE Transactions on Knowledge and Data Engineering, Volume 16(6):774–778
Coenen F, Leng P, Goulbourne G (2004) Tree structures for mining association rules. Journal of Data Mining and Knowledge Discovery, Volume 8(1):25–51
Coenen F, Leng P, Zhang L (2005) Threshold tuning for improved classification association rule mining. In: Ho TB, Cheung D, Liu H (eds): Advances in Knowledge Discovery and Data Mining – Proceedings of the Ninth Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD-05, Springer, Berlin Heidelberg New York), Hanoi, Vietnam, May 2005. (LNAI 3518, ISBN 3-540-26076-5, pages 216–225)
De Bonis A, Ga¸sieniec L, Vaccaro U (2003) Generalized framework for selectors with applications in optimal group testing. In: Baeten JCM, Lenstra JK, Parrow J, Woeginger GJ (eds): Proceedings of the Thirtieth International Colloquium on Automata, Languages and Programming (ICALP-03, Springer, Berlin Heidelberg New York), Eindhoven, The Netherlands, June 30–July 4, 2003. (LNAI 2719, ISBN 3-540-40493-7, pages 81–96)
Domingos P, Pazzani M (1997) On the optimality of the simple bayesian classifier under zero-one loss. Machine Learning, 29(2/3):103–130.
Dong G, Li J (1999) Efficient mining of emerging patterns: Discovering trends and differences. In: Proceedings of the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-99, ACM, New York, NY), San Diego, CA, United States, August 1999. (pages 43–52)
Dong G, Zhang X, Wong L, Li J (1999) CAEP: Classification by aggregating emerging patterns. In: Arikawa S, Furukawa K (eds): Discovery Science – Proceedings of the Second International Conference Discovery Science (DS-99, Springer, Berlin Heidelberg New York), Tokyo, Japan, December 1999. (LNAI 1721, ISBN 3-540-66713-X, pages 30–42)
Dunham MH (2002) Data mining: Introductory and advanced topics. Prentice-Hall, August 2002. (ISBN 0-13-088892-3)
El-Hajj M, Zaïane OR (2003) Inverted matrix: efficient discovery of frequent items in large datasets in the context of interactive mining. In: Getoor L, Senator TE, Domingos P, Faloutsos C (eds): Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-03, ACM, New York, NY), Washington, DC, United States, August 2003. (ISBN 1-58113-737-0, pages 109–118)
Freitas AA (2002) Data mining and knowledge discovery with evolutionary algorithms, Springer, Berlin Heidelberg New York, Germany, 2002. (ISBN 3-540-43331-7)
Gouda K, Zaki MJ (2001) Efficiently mining maximal frequent itemsets. In: Cercone N, Lin TY, Wu X (eds): Proceedings of the 2001 IEEE International Conference on Data Mining (ICDM-01, IEEE Computer Society), San Jose, CA, United Stated, 29 November–2 December 2001. (ISBN 0-7695-1119-8, pages 163–170)
Han J, Pei J, Yin Y (2000) Mining frequent patterns without candidate generation. In: Chen W, Naughton JF, Bernstein PA (eds): Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data (SIGMOD-00, ACM, New York, NY), Dallas, TX, United States, May 2000. (ISBN 1-58113-218-2, pages 1–12)
Han J, Kamber M (2001) Data mining: Concepts and techniques. Morgan Kaufmann, San Francisco, CA, United States, 2001. (ISBN 1-55860-489-8)
Han J, Kamber M (2006) Data mining: Concepts and techniques (Second Edition). Morgan Kaufmann, San Francisco, CA, United States, March 2006. (ISBN 1-55860-901-6)
Hand D, Mannila H, Smyth R (2001) Principles of data mining. MIT, Cambridge, MA, United States, August 2001. (ISBN 0-262-08290-X)
Hidber C (1999) Online association rule mining. In: Delis A, Faloutsos C, Ghandeharizadeh S (eds): Proceedings of the 1999 ACM SIGMOD International Conference on Management of Data (SIGMOD-99, ACM, New York, NY), Philadelphia, Pennsylvania, United States, June 1999. (ISBN 1-58113-084-8, pages 145–156)
Holsheimer M, Kersten ML, Mannila H, Toivonen H (1995) A perspective on databases and data mining. In: Fayyad UM, Uthurusamy R (eds): Proceedings of the First International Conference on Knowledge Discovery and Data Mining (KDD-95, AAAI Press, Menlo Park, CA), Montreal, Canada, August 1995. (ISBN 0-929280-82-2, pages 150–155)
Houtsma M, Swami A (1995) Set-oriented mining of association rules in relational databases. In: Yu PS, Chen AL (eds): Proceedings of the Eleventh International Conference on Data Engineering (ICDE-95, IEEE Computer Society), Taipei, Taiwan, March 1995. (ISBN 0-8186-6910-1, pages 25–33)
James M (1985) Classification algorithms. Wiley, New York, NY, United States, 1985. (ISBN 0-471-84799-2)
Lavrač N, Flach P, Zupan B (1999) Rule evaluation measures: A unifying view. In: Dzeroski S, Flach PA (eds): Proceedings of the Ninth International Workshop on Inductive Logic Programming (ILP-99, Springer, Berlin Heidelberg), Bled, Slovenia, June 1999. (LNAI 1634, ISBN 3-540-66109-3, pages 174–185)
Li W, Han J, Pei J (2001) CMAR: Accurate and efficient classification based on multiple class-association rules. In: Cercone N, Lin TY, Wu X (eds): Proceedings of the 2001 IEEE International Conference on Data Mining (ICDM-01, IEEE Computer Society), San Jose, CA, United States, 29 November–2 December 2001. (ISBN 0-7695-1119-8, pages 369–376)
Lin D-I, Kedem ZM (1998) Pincer search: A new algorithm for discovering the maximum frequent set. In: Schek H-J, Saltor F, Ramos I, Alonso G (eds): Advances in Database Technology – Proceedings of the Sixth International Conference on Extending Database Technology (EDBT-98, Springer, Berlin Heidelberg New York), Valencia, Spain, March 1998. (LNAI 1377, ISBN 3-540-64264-1, pages 105–119)
Liu B, Hsu W, Ma Y (1998) Integrating classification and association rule mining. In: Agrawal R, Stolorz PE, Piatetsky-Shapiro G (eds): Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining (KDD-98, AAAI, Menlo Park, CA), New York City, New York, United States, August 1998. (ISBN 1-57735-070-7, pages 80–86)
Liu J, Pan Y, Wang K, Han J (2002) Mining frequent item sets by opportunistic projection. In: Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-02, ACM, New York, NY), Edmonton, Alberta, Canada, July 2002. (ISBN 1-58113-567-X, pages 229–238)
Mannila H, Toivonen H, Verkamo AI (1994) Efficient algorithms for discovering association rules. In: Fayyad UM, Uthurusamy R (eds): Knowledge Discovery in Databases: Papers from the 1994 AAAI Workshop (KDD-94, AAAI, Menlo Park, CA), Seattle, Washington, United States, July 1994. (Technical Report WS-94-03, ISBN 0-929280-73-3, pages 181–192)
Michalski RS (1980) Pattern recognition as rule-guided inductive inference. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1980. (pages 774–778)
Mirkin B, Mirkin BG (2005) Clustering for data mining: A data recovery approach. Chapman & Hall/CRC Press, April 2005. (ISBN 1584885343)
Park JS, Chen M-S, Yu PS (1995) An effective hash based algorithm for mining association rules. In: Carey MJ, Schneider DA (eds): Proceedings of the 1995 ACM SIGMOD International Conference on Management of Data (SIGMOD-95, ACM, New York, NY), San Jose, CA, United States, May 1995. (SIGMOD Record 24(2), pages 175–186)
Pei J, Han J, Mao R (2000) CLOSET: An efficient algorithm for mining frequent closed itemsets. In: Gunopulos D, Rastogi R (eds): 2000 ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery (SIGMOD-DMKD-01), Dallas, TX, United Stated, May 2000. (pages 21–30)
Quinlan JR (1993) C4.5: Programs for machine learning. Morgan Kaufmann Publishers, San Francisco, CA, United States, 1993. (ISBN 1-55860-238-0)
Quinlan JR, Cameron-Jones RM (1993) FOIL: A midterm report. In: Brazdil R (ed): Machine Learning – Proceedings of the 1993 European Conference on Machine Learning (ECML-93, Springer, Berlin Heidelberg New York), Vienna, Austria, April 1993. (LNAI 667, ISBN 3-540-56602-3, pages 3–20)
Roberto J, Bayardo Jr (1998) Efficiently mining long patterns from databases. In: Hass LM, Tiwary A (eds): Proceedings of the 1998 ACM SIGMOD International Conference on Management of Data (SIGMOD-98, ACM, New York, NY), Seattle, Washington, United States, June 1998. (ISBN 0-89791-995-5, pages 85–93)
Rymon R (1992) Search through systematic set enumeration. In: Nebel B, Rich C, Swartout WR (eds): Proceedings of the Third International Conference on Principles of Knowledge Representation and Reasoning (KR-92, Morgan Kaufmann, San Francisco, CA), Cambridge, MA, United States, October 1992. (ISBN 1-55860-262-3, pages 539–550)
Savasere A, Omiecinski E, Navathe S (1995) An efficient algorithm for mining association rules in large databases. In: Proceedings of the twenty-first International Conference on Very Large Data Bases (VLDB-95, Morgan Kaufmann, San Francisco, CA), Zurich, Switzerland, September 1995. (ISBN 1-55860-379-4, pages 432–444)
Toivonen H (1996) Sampling large databases for association rules. In: Vijayaraman TM, Buchmann AP, Mohan C, Sarda NL (eds): Proceedings of the twenty-second International Conference on Very Large Data Bases (VLDB-96, Morgan Kaufmann, San Francisco, CA), Mumbai (Bombay), India, September 1996. (ISBN 1-55860-382-4, pages 134–145)
Wang J, Han J, Pei J (2003) CLOSET+: Searching for the best strategies for mining frequent closed itemsets. In: In: Getoor L, Senator TE, Domingos P, Faloutsos C (eds): Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-03, ACM, New York, NY), Washington, DC, United States, August 2003. (ISBN 1-58113-737-0, pages 236–245)
Wang W, Yang J (2005) Mining sequential patterns from large data sets. Springer, Berlin Heidelberg New York, April 2005. (ISBN 0-387-24246-5)
Yin X, Han J (2003) CPAR: Classification based on predictive association rules. In: Barbará D, Kamath C (eds): Proceedings of the Third SIAM International Conference on Data Mining (SDM-03, SIAM, Philadelphia, PA), San Francisco, CA, United States, May 2003. (ISBN 0-89871-545-8, pages 331–335)
Zaki MJ, Parthasarathy S, Ogihara M, Li W (1997) New algorithms for fast discovery of association rules. In: Heckerman D, Mannila H, Pregibon D (eds): Proceedings of the Third International Conference on Knowledge Discovery and Data Mining (KDD-97, AAAI, Menlo Park, CA), Beach, CA, United States, August 1997. (ISBN 1-57735-027-8, pages 283–286)
Zaki MJ, Hsiao C-J (2002) CHARM: An efficient algorithm for closed itemset mining. In: Grossman RL, Han J, Kumar V, Mannila H, Motwani R (eds): Proceedings of the Second SIAM International Conference on Data Mining (SDM-02, SIAM, Philadelphia, PA), Arlington, VA, United States, April 2002. (ISBN 0-89871-517-2, Part IX No. 1)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Wang, Y.J., Xin, Q., Coenen, F. (2008). Mining Efficiently Significant Classification Association Rules. In: Lin, T.Y., Xie, Y., Wasilewska, A., Liau, CJ. (eds) Data Mining: Foundations and Practice. Studies in Computational Intelligence, vol 118. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-78488-3_26
Download citation
DOI: https://doi.org/10.1007/978-3-540-78488-3_26
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-78487-6
Online ISBN: 978-3-540-78488-3
eBook Packages: EngineeringEngineering (R0)