Abstract
Many studies have shown that association-based classification can achieve higher accuracy than traditional rule based schemes. However, when applied to text classification domain, the high dimensionality, the diversity of text data sets and the class skew make classification tasks more complicated. In this study, we present a new method for associative text categorization tasks. First,we integrate the feature selection into rule pruning process rather than a separate preprocess procedure. Second, we combine several techniques to efficiently extract rules. Third, a new score model is used to handle the problem caused by imbalanced class distribution. A series of experiments on various real text corpora indicate that by applying our approaches, associative text classification (ATC) can achieve as competitive classification performance as well-known support vector machines (SVM) do.
Keywords
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Liu, B., Hsu, W., Ma, Y.: Integrating Classification and Association Rule Mining. In: SIGKDD (1998)
Li, W., Han, J., Pei, J.: CMAR: Accurate and Efficient Classification Based on Multiple Class-Association Rules. In: ICDM (2001)
Quinlan, R.: C4.5: programs for machine learning. Morgan Kaufmann Publishers Inc., San Francisco (1993)
Antonie, M., Zaiane, O.R.: Text Document Categorization by Term Association. In: ICDM (2002)
Feng, J., Liu, H., Zou, J.: SAT-MOD: Moderate Itemset Fittest for Text Classification. In: WWW (2005)
Wang, J., Karypis, G.: HARMONY: Efficiently Mining the Best Rules for Classification. In: SDM (2005)
Joachims, T.: A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categorization. In: ICML (1997)
Yang, Y., Pederson, J.O.: A Comparative Study on Feature Selection in Text Categorization. In: ICML (1997)
Dumais, S.T., Platt, J., Heckerman, D., Sahami, M.: Inductive learning algorithms and Representations for Text Categorization. In: CIKM (1998)
Zaki, M.J., Aggarwal, C.C.: XRules: An Effective Structural Classifier for XML Data. In: SIGKDD (2003)
Liu, B., Hsu, W., Ma, Y.: Pruning and Summarizing the Discovered Associations. In: SIGKDD (1999)
Barbara, D., Domeniconi, C., Kang, N.: Classifying Documents Without Labels. In: SDM (2004)
McCallum, A., Nigam, K.: A Comparison of Event Models for Naïve Bayes Text Classification. In: AAAI/ICML 1998 Workshop on Learning for Text Categorization (1998)
Forman, G.: An Extensive Empirical Study of Feature Selection Metrics for Text Classification. JMLR 3, 1289–1305 (2003)
Chang, C.-C., Lin, C.-J.: LIBSVM at http://www.csie.ntu.edu.tw/~cjlin/libsvm/
Reutres at http://www.daviddlewis.com/resources/testcollections/reuters21578/
WebKB at http://www.cs.cmu.edu/afs/cs.cmu.edu/project/theo-11/www/wwkb/
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Qian, T., Wang, Y., Xiang, L., Gong, W. (2006). Feature Selection, Rule Extraction, and Score Model: Making ATC Competitive with SVM. In: Wang, GY., Peters, J.F., Skowron, A., Yao, Y. (eds) Rough Sets and Knowledge Technology. RSKT 2006. Lecture Notes in Computer Science(), vol 4062. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11795131_69
Download citation
DOI: https://doi.org/10.1007/11795131_69
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-36297-5
Online ISBN: 978-3-540-36299-9
eBook Packages: Computer ScienceComputer Science (R0)