Abstract
In practical text classification tasks, the ability to interpret the classification result is as important as the ability to classify exactly. The associative classifier has favorable characteristics, rapid training, good classification accuracy, and excellent interpretation. However, the associative classifier has some obstacles to overcome when it is applied in the area of text classification. First of all, the training process of the associative classifier produces a huge amount of classification rules, which makes the prediction for a new document ineffective. We resolve this by pruning the rules according to their contribution to correct classifications. In addition, since the target text collection generally has a high dimension, the training process might take a very long time. We propose mutual information between the word and class variables as a feature selection measure to reduce the space dimension. Experimental classification results using the 20-newsgroups dataset show many benefits of the associative classification in both training and predicting.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Agrawal, R., Srikant, R.: Fast algorithms for mining association rules. In: VLDB 1994, Santiago, Chile, pp. 487–499 (September 1994)
Bekkerman, R., El-Yaniv, R., Tishby, N., Winter, Y.: On Feature Distributional Clustering for Text Categoriztion. In: Proceedings of SIGIR 2001, pp. 146–153 (2001)
Cover, T., Thomas, J.: Elements of Information Theory. John Wiley, Chichester (1991)
Han, J., Pei, J., Yin, Y.: Mining frequent patterns without candidate generation. In: SIGMOD 2000, Dallas, TX, pp. 1–12 (May 2000)
Lang, K.: NEWSWEEDER: learning to filter netnews. In: Proceedings of ICML 1995, 12th International Conference on Machine Learning, pp. 331–339 (1995)
Li, W., Pei, J., Han, J.: CMAR: Accurate and Efficient Classification Based on Multiple Class-Association Rules. In: ICDM 2001, San Jose, CA, pp. 369–376 (November 2001)
Liu, B., Hsu, W., Ma, Y.: Integrating classification and association rule mining. In: KDD 1998, New York, pp. 80–86 (August 1998)
McCallum, A.: Bow: A toolkit for statistical language modeling, text retrieval, classification and clustering (1996), http://www.cs.cmu.edu/~mccallum/bow
McCallum, A., Nigam, K.: A comparison of event models for nave Bayes text classification. In: AAAI-98 Workshop on Learning for Text Categorization. AAAI Press, Menlo Park (1998)
Sebstiani, F.: Machine Learning in Automated Text Categorization. ACM Computing Surverys 34(1), 1–47 (2002)
Webb, G.: Association Rules. In: Ye, N. (ed.) The Handbook of Data Mining. Lawrence Erlbaum Associates, Inc., Mahwah (2004)
Yin, X., Han, J.: CPAR: Classification based on Predictive Association Rules. In: SDM 2003, San Francisco, CA (May 2003)
Yoon, Y., Lee, C., Lee, G.: Systematic Construction of Hierarchical Classifier in SVM-Based Text Categorization. In: Su, K.-Y., Tsujii, J., Lee, J.-H., Kwong, O.Y. (eds.) IJCNLP 2004. LNCS (LNAI), vol. 3248, pp. 616–625. Springer, Heidelberg (2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Yoon, Y., Lee, G.G. (2005). Practical Application of Associative Classifier for Document Classification. In: Lee, G.G., Yamada, A., Meng, H., Myaeng, S.H. (eds) Information Retrieval Technology. AIRS 2005. Lecture Notes in Computer Science, vol 3689. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11562382_36
Download citation
DOI: https://doi.org/10.1007/11562382_36
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-29186-2
Online ISBN: 978-3-540-32001-2
eBook Packages: Computer ScienceComputer Science (R0)