Abstract
Text classification techniques mostly rely on single term analysis of the document data set, while more concepts, especially the specific ones, are usually conveyed by set of terms. To achieve more accurate text classifier, more informative feature including frequent co-occurring words in the same sentence and their weights are particularly important in such scenarios. In this paper, we propose a novel approach using sentential frequent itemset, a concept comes from association rule mining, for text classification, which views a sentence rather than a document as a transaction, and uses a variable precision rough set based method to evaluate each sentential frequent itemset’s contribution to the classification. Experiments over the Reuters and newsgroup corpus are carried out, which validate the practicability of the proposed system.
Similar content being viewed by others
References
Li Wenmin, Jiawei Han, Pei Jian. CMAR: Accurate and efficient classification based on multiple class-association rules. In Proc. IEEE Int. Conf. Data Mining, Nick Cercone, T Y Lin, Xingdong Wu (eds.), San Jose, CA, USA, 2001, pp.369–376.
Liu B, Hsu W, Ma Y. Integrating classification and association rule mining. In Proc. ACM Int. Conf. Knowledge Discovery and Data Mining (SIGKDD’98), New York City, USA, August 1998, pp.80–86.
Antonie Maria-Luiza, Zaiane Osmar R. Text document categorization by term association. In Proc. IEEE Int. Conf. Data Mining (ICDM’2002), Maebashi City, Japan, 2002, pp.19–26.
Meretakis D, Fragoutids D, Lu H et al. Scalable association-based text classification. In Proc. the 9th Int. Conf. Information and Knowledge Management, Arvin Agah, Jamie Callan, Elke Rundensteiner et al. (eds.), McLean, USA, 2000, pp.5–11.
Hull D A. Improving text retrieval for the routing problem using latent semantic indexing. In Proc. the 17th Annual Int. ACM-SIGIR Conf. Research and Development in Information Retrieval, W Bruce Croft, C J van Rijsbergen (eds.), Dublin, Ireland, 1994, pp.282–291.
Lewis D D. Naïve (Bayes) at forty: The independence assumption in information retrieval. In Proc. the 10th European Conf. Machine Learning, Claire Nédellec, Céline Rouveirol (eds.), Chemnitz, Germany, 1998, pp.4–15.
Joachims T. Text categorization with support vector machines: Learning with many relevant features. In Proc. 10th European Conf. Machine Learning, Claire Nédellec, Céline Rouveirol (eds.), Chemnitz, Germany, 1998, pp.137–142.
Cohen W, Hirsch H. Joins that generalize: Text classification using whirl. In Proc. 4th Int. Conf. Knowledge Discovery and Data Mining (SigKDD’98), New York City, USA, 1998, pp.169–173.
Cohen W, Singer Y. Context-sensitive learning methods for text categorization. ACM Trans. Information Systems, 1999, 17(2): 146–173.
Yang Y. An evaluation of statistical approaches to text categorization. Technical Report CUM-CS-97-127, Carnegie Mellon University, April 1997.
Mounlinier I, Ganascia J G. Applying an existing machine learning algorithm to text categorization. In Connectionist Statistical, and Symbolic Approaches to Learning for Natural Language Processing, Wermter S, Riloff E, Scheler G (eds.), Heidelberg, Germany: Springer Verlag, Lecture Notes in Computer Science, Vol. 1040, 1996, pp.343–354.
Li H, Yamanishi K. Text classification using esc-based stochastic decision lists. In Proc. 8th ACM Int. Conf. Information and Knowledge Management (CIKM-99), Kansas City, USA, 1999, pp.122–130.
Apte C, Damerau F, Weiss S. Automated Learning of Decision Rules for Text Categorization. ACM Trans. Information System, 1994, 12(3): 232–251.
Tan C M, Wang Y F, Lee C D. The use of bigrams to enhance text categorization. Journal of Information Processing and Management, July 2002, 38(4): 529–546.
Ruiz M, Sinivasan P. Neural networks for text categorization. In Proc. 22nd ACM SIGIR Int. Conf. Information Retrieval, Berkeley, CA, USA, August 1999, pp.281–282.
Yang Y, Liu X. A re-examination of text categorization methods. In Proc. 22nd ACM Int. Conf. Research and Development in Information Retrieval (SIGIR-99), Berkeley, USA, 1999, pp.42–49.
Ziarko W. Variable precision rough set model. J. Computer and System Sciences, 1993, 46(1): 39–59.
Salton G, Wong A, Yang C. A vector space model for automatic indexing. Comn. ACM, Nov. 1975, 18(11): 613–620.
Salton G. Automatic Text Processing: The Transformation, Analysis and Retrieval of Information by Computer. Reading, Mas: Addison Wesley, 1989.
Zaíane O R, Antonie M L. Classifying text documents by association terms with text categories. In Proc. 13th Australasian Database Conference (ACD’02), Melbourne, Australia, January 2002, pp.215–222.
Author information
Authors and Affiliations
Corresponding author
Electronic supplementary material
Rights and permissions
About this article
Cite this article
Liu, SZ., Hu, HP. Text Classification Using Sentential Frequent Itemsets. J Comput Sci Technol 22, 334–337 (2007). https://doi.org/10.1007/s11390-007-9041-7
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11390-007-9041-7