Abstract
In this paper we study a new technique we call post-bagging, which consists in resampling parts of a classification model rather then the data. We do this with a particular kind of model: large sets of classification association rules, and in combination with ordinary best rule and weighted voting approaches. We empirically evaluate the effects of the technique in terms of classification accuracy. We also discuss the predictive power of different metrics used for association rule mining, such as confidence, lift, conviction and χ 2. We conclude that, for the described experimental conditions, post-bagging improves classification results and that the best metric is conviction.
Supported by the POSI/SRI/39630/2001/Class Project (Fundação Ciência e Tecnologia), FEDER e Programa de Financiamento Plurianual de Unidades de I & D.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Agrawal, R., Mannila, H., Srikant, R., Toivonen, H., Verkamo, A.I.: Fast Discovery of Association Rules. Advances in Knowledge Discovery and Data Mining, pp. 307–328 (1996)
Ali, K., Manganaris, S., Srikant, R.: Partial classification using association rules. In: Proceedings of the Third ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 1997, pp. 115–118. ACM, New York (1997)
Azevedo, P.J.: A Data Structure to Represent Association Rules based Classifiers Technical Report, Universidade do Minho (2005)
Azevedo, P.J., Jorge, A.M.: The CLASS Project, http://www.niaad.liacc.up.pt/~amjorge/Projectos/Class/
Bayardo, R.J., Agrawal, R., Gunopulos, D.: Constraint-Based Rule Mining in Large, Dense Databases. Data Mining and Knowledge Discovery 4(2-3), 217–240 (2000)
Breiman, L.: Bagging Predictors. Machine Learning 24(2), 123–140 (1996)
Breiman, L., Friedman, J.H., Olshen, R.A., Stone, C.J.: Classification and Regression Trees. Wadsworth (1984)
Brin, S., Motwani, R., Ullman, J.D., Tsur, S.: Dynamic itemset counting and implication rules for market basket data. In: Proceedings of th ACM SIGMOD International Conference on Management of Data (1997)
Domingos, P.: Why does bagging work? A Bayesian account and its implications. In: Proceedings of the Third ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 1997, pp. 115–118. ACM, New York (1997)
Fayyad, U.M., Irani, K.B.: Multi-Interval Discretization of Continuous-Valued Attributes for Classification Learning. In: Bajcsy, R. (ed.) Proceedings of the 13th International Joint Conference on Artificial Intelligence, Chambéry, France, pp. 1022–1029. Morgan Kaufmann, San Francisco (1993)
Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning: Data Mining, Inference and Prediction, Series in Statistics. Springer, Heidelberg (2001)
Ho, T.K., Hull, J.J., Srihari, S.N.: Decision Combination in Multiple Classifier Systems. IEEE Transactions on Pattern Analysis and Machine Intelligence 16(1), 66–75 (1994)
Ihaka, R., Gentleman, R.: R: A Language for Data Analysis and Graphics. Journal of Computational Graphics and Statistics 5(3), 299–314 (1996)
Jovanoski, V., Lavrac, N.: Classification rule learning with APRIORI-C. In: Brazdil, P.B., Jorge, A.M. (eds.) EPIA 2001. LNCS (LNAI), vol. 2258, pp. 44–51. Springer, Heidelberg (2001)
Jorge, A., Lopes, A.: Iterative part-of-speech tagging. In: Cussens, J., Džeroski, S. (eds.) LLL 1999. LNCS (LNAI), vol. 1925, p. 170. Springer, Heidelberg (2000)
Kononenko, I.: Combining decisions of multiple rules. In: du Boulay, B., Sgurev, V. (eds.) Artificial Intelligence V: Methodology, Systems, Applications. Elsevier, Amsterdam (1992)
Li, W., Han, J., Pei, J.: CMAR: Accurate and Efficient Classification Based on MultipleClass-Association Rules. In: IEEE International Conference on Data Mining (2001)
Liu, B., Hsu, W., Ma, Y.: Integrating Classification and Association Rule Mining. In: Proceedings of the Fourth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, August 15-18. ACM, New York (1998)
Liu, B., Hsu, W., Ma, Y.: Pruning and Summarizing the Discovered Associations. In: Proceedings of the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Diego, CA, USA, August 15-18, pp. 125–134. ACM, New York (1999)
Meretakis, D., Wüthrich, B.: Extending Nave Bayes Classifiers Using Long Itemsets. In: Proceedings of the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Diego, CA, USA, August 15-18, pp. 165–174. ACM, New York (1999)
Merz, C.J., Murphy, P.: UCI Repository of Machine Learning Database (1996), http://www.cs.uci.edu/~mlearn
Neave, H.R., Worthington, P.L.: Distribution-free tests, Unwin Hyman Ltd. (1988)
Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann, San Francisco (1993)
Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations. Morgan Kaufmann, San Francisco (1999)
Zaki, M.J.: Scalable algorithms for association mining. IEEE Transactions on Knowledge and Data Engineering 12(3), 372–390 (2000)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Jorge, A.M., Azevedo, P.J. (2005). An Experiment with Association Rules and Classification: Post-Bagging and Conviction. In: Hoffmann, A., Motoda, H., Scheffer, T. (eds) Discovery Science. DS 2005. Lecture Notes in Computer Science(), vol 3735. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11563983_13
Download citation
DOI: https://doi.org/10.1007/11563983_13
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-29230-2
Online ISBN: 978-3-540-31698-5
eBook Packages: Computer ScienceComputer Science (R0)