Abstract
Huge amounts of data are being collected and analyzed nowadays. By using the popular rule-learning algorithms, the number of rules discovered on those big datasets can easily exceed thousands of rules. To produce compact and accurate classifiers, such rules have to be grouped and pruned, so that only a reasonable number of them are presented to the end user for inspection and further analysis. To solve this problem researchers have proposed several associative classification approaches that combine two important data mining techniques, namely, classification and association rule mining.
In this paper, we propose a new method that is able to reduce the number of class association rules produced by classical class association rule classifiers, while maintaining an accurate classification model that is comparable to the ones generated by state-of-the-art classification algorithms. More precisely, we propose a new associative classifier – CMAC, that uses agglomerative hierarchical clustering as a post-processing step to reduce the number of its rules.
Experimental results performed on selected datasets from the UCI ML repository show that CMAC is able to learn classifiers containing significantly less rules than state-of-the-art rule learning algorithms on datasets with larger number of examples. On the other hand, classification accuracy of the CMAC classifier is not significantly different from state-of-the-art rule-learners on most of the datasets.
We can thus conclude that CMAC is able to learn compact (and meaningful) classifiers from “bigger” datasets, retaining an accuracy comparable to state-of-the-art rule learning algorithms.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Change history
30 March 2021
The original version of chapter 2 was inadvertently published with wrong RTS values in Table 3: “Results comparison with RTS, S, and SVMlight with standard linear loss with a 10-fold cross validation procedure.”
The RTS values were corrected by replacing the wrong values with the appropriate ones.
The footnote reads “1Code is available at: https://osf.io/fbzsc/” and has been added to the last sentence in the abstract.
The original version of chapter 34 was inadvertently published with incorrect allocations between authors and affiliations resp. one affiliation was entirely missing.
The affiliations have been corrected and read as follows: 1University of Primorska, Koper, Slovenia; 2Jožef Stefan Institute, Ljubljana, Slovenia; and 3Urgench State University, Urgench, Uzbekistan. The authors' affiliations are: Jamolbek Mattiev1,3 and Branko Kavšek1,2.
References
Agrawal, R., Srikant, R.: Fast algorithms for mining association rules. In: Bocca, J.B., Jarke, M., Zaniolo, C. (eds.) Proceedings of the 20th International Conference on Very Large Data Bases, VLDB 1994, Chile, pp. 487–499 (1994)
Baralis, E., Cagliero, L., Garza, P.: A novel pattern-based Bayesian classifier. IEEE Trans. Knowl. Data Eng. 25(12), 2780–2795 (2013)
Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)
Cohen, W.W.: Fast effective rule induction. In: Prieditis, A., Russel, S.J. (eds.) Proceedings of the Twelfth International Conference on Machine Learning, ICML 1995, California, pp. 115–123 (1995)
Dahbi, A., Mouhir, M., Balouki, Y., Gadi, T.: Classification of association rules based on K-means algorithm. In: Mohajir, M.E., Chahhou, M., Achhab, M.A., Mohajir, B.E. (eds.) 4th IEEE International Colloquium on Information Science and Technology, Tangier, Morocco, pp. 300–305 (2016)
Deng, H., Runger, G., Tuv, E., Bannister, W.: CBC: an associative classifier with a small number of rules. Decis. Support Syst. 50(1), 163–170 (2014)
Dua, D., Graff, C.: UCI Machine Learning Repository. University of California, Irvine (2019)
Frank, E., Witten, I.: Generating accurate rule sets without global optimization. In: Shavlik, J.W. (eds.) Fifteenth International Conference on Machine Learning, USA, pp. 144–151 (1998)
Gupta, K.G., Strehl, A., Ghosh, J.: Distance based clustering of association rules. In: Proceedings of Artificial Neural Networks in Engineering Conference, USA, pp. 759–764 (1999)
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA data mining software: an update. SIGKDD Explor. 11(1) (2009)
Holte, R.: Very simple classification rules perform well on most commonly used datasets. Mach. Learn. 11(1), 63–91 (1993)
Hu, L.-Y., Hu, Y.-H., Tsai, C.-F., Wang, J.-S., Huang, M.-W.: Building an associative classifier with multiple minimum supports. SpringerPlus 5(1), 1–19 (2016). https://doi.org/10.1186/s40064-016-2153-1
Kohavi, R.: The power of decision tables. In: Lavrač, N., Wrobel, S. (eds.) 8th European Conference on Machine Learning, Crete, Greece, pp. 174–189 (1995)
Kosters, W.A., Marchiori, E., Oerlemans, A.A.J.: Mining clusters with association rules. In: Hand, D.J., Kok, J.N., Berthold, M.R. (eds.) IDA 1999. LNCS, vol. 1642, pp. 39–50. Springer, Heidelberg (1999). https://doi.org/10.1007/3-540-48412-4_4
Lent, B., Swami, A., Widom, J.: Clustering association rules. In: Gray, A., Larson, P. (eds.) Proceedings of the Thirteenth International Conference on Data Engineering, England, pp. 220–231 (1997)
Kaufman, L., Rousseeuw, P.J.: Finding Groups in Data: An Introduction to Cluster Analysis. Wiley, Hoboken (1990)
Khairan, D.R.: New associative classification method based on rule pruning for classification of datasets. IEEE Access 7, 157783–157795 (2019)
Liu, B., Hsu, W., Ma, Y.: Integrating classification and association rule mining. In: Agrawal, R., Stolorz, P. (eds.) Proceedings of the 4th International Conference on Knowledge Discovery and Data Mining, New York, USA, pp. 80–86 (1998)
Mattiev, J., Kavšek, B.: A compact and understandable associative classifier based on overall coverage. Procedia Comput. Sci. 170, 1161–1167 (2020)
Mattiev, J., Kavšek, B.: Simple and accurate classification method based on class association rules performs well on well-known datasets. In: Nicosia, G., Pardalos, P., Umeton, R., Giuffrida, G., Sciacca, V. (eds.) LOD 2019. LNCS, vol. 11943, pp. 192–204. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-37599-7_17
Zait, M., Messatfa, H.: A comparative study of clustering methods. Future Gener. Comput. Syst. 13(2–3), 149–159 (1997)
Phipps, A., Lawrence, J.H.: An Overview of Combinatorial Data Analysis. Clustering and Classification, pp. 5–63. World Scientific, New Jersey (1996)
Quinlan, J.: C4.5: programs for machine learning. Mach. Learn. 16(3), pp. 235–240 (1993)
Ng, T.R., Han, J.: Efficient and effective clustering methods for spatial data mining. In: Bocca, J.B., Jarke, M., Zaniolo, C. (eds.) Proceedings of the 20th Conference on Very Large Data Bases (VLDB), Santiago, Chile, pp. 144–155 (1994)
Theodoridis, S., Koutroumbas, K.: Hierarchical algorithms. Pattern Recogn. 4(13), 653–700 (2009)
Zhang, T., Ramakrishnan, R., Livny, M.: BIRCH: an efficient data clustering method for very large databases. In: Widom, J. (eds.) Proceedings of the 1996 ACM-SIGMOD International Conference on Management of Data, Montreal, Canada, pp. 103–114 (1996)
Acknowledgement
The authors gratefully acknowledge the European Commission for funding the InnoRenew CoE project (Grant Agreement #739574) under the Horizon2020 Widespread-Teaming program and the Republic of Slovenia (Investment funding of the Republic of Slovenia and the European Union of the European Regional Development Fund). Jamolbek Mattiev is also funded for his Ph.D. by the “El-Yurt-Umidi” foundation under the Cabinet of Ministers of the Republic of Uzbekistan.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Mattiev, J., Kavšek, B. (2020). CMAC: Clustering Class Association Rules to Form a Compact and Meaningful Associative Classifier. In: Nicosia, G., et al. Machine Learning, Optimization, and Data Science. LOD 2020. Lecture Notes in Computer Science(), vol 12565. Springer, Cham. https://doi.org/10.1007/978-3-030-64583-0_34
Download citation
DOI: https://doi.org/10.1007/978-3-030-64583-0_34
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-64582-3
Online ISBN: 978-3-030-64583-0
eBook Packages: Computer ScienceComputer Science (R0)