Text Classification Using Sentential Frequent Itemsets

Liu, Shi-Zhu; Hu, He-Ping

doi:10.1007/s11390-007-9041-7

Text Classification Using Sentential Frequent Itemsets

Short Paper
Published: 17 April 2007

Volume 22, pages 334–337, (2007)
Cite this article

Journal of Computer Science and Technology Aims and scope Submit manuscript

Shi-Zhu Liu¹ &
He-Ping Hu¹

64 Accesses
3 Citations
Explore all metrics

Abstract

Text classification techniques mostly rely on single term analysis of the document data set, while more concepts, especially the specific ones, are usually conveyed by set of terms. To achieve more accurate text classifier, more informative feature including frequent co-occurring words in the same sentence and their weights are particularly important in such scenarios. In this paper, we propose a novel approach using sentential frequent itemset, a concept comes from association rule mining, for text classification, which views a sentence rather than a document as a transaction, and uses a variable precision rough set based method to evaluate each sentential frequent itemset’s contribution to the classification. Experiments over the Reuters and newsgroup corpus are carried out, which validate the practicability of the proposed system.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Li Wenmin, Jiawei Han, Pei Jian. CMAR: Accurate and efficient classification based on multiple class-association rules. In Proc. IEEE Int. Conf. Data Mining, Nick Cercone, T Y Lin, Xingdong Wu (eds.), San Jose, CA, USA, 2001, pp.369–376.
Liu B, Hsu W, Ma Y. Integrating classification and association rule mining. In Proc. ACM Int. Conf. Knowledge Discovery and Data Mining (SIGKDD’98), New York City, USA, August 1998, pp.80–86.
Antonie Maria-Luiza, Zaiane Osmar R. Text document categorization by term association. In Proc. IEEE Int. Conf. Data Mining (ICDM’2002), Maebashi City, Japan, 2002, pp.19–26.
Meretakis D, Fragoutids D, Lu H et al. Scalable association-based text classification. In Proc. the 9th Int. Conf. Information and Knowledge Management, Arvin Agah, Jamie Callan, Elke Rundensteiner et al. (eds.), McLean, USA, 2000, pp.5–11.
Hull D A. Improving text retrieval for the routing problem using latent semantic indexing. In Proc. the 17th Annual Int. ACM-SIGIR Conf. Research and Development in Information Retrieval, W Bruce Croft, C J van Rijsbergen (eds.), Dublin, Ireland, 1994, pp.282–291.
Lewis D D. Naïve (Bayes) at forty: The independence assumption in information retrieval. In Proc. the 10th European Conf. Machine Learning, Claire Nédellec, Céline Rouveirol (eds.), Chemnitz, Germany, 1998, pp.4–15.
Joachims T. Text categorization with support vector machines: Learning with many relevant features. In Proc. 10th European Conf. Machine Learning, Claire Nédellec, Céline Rouveirol (eds.), Chemnitz, Germany, 1998, pp.137–142.
Cohen W, Hirsch H. Joins that generalize: Text classification using whirl. In Proc. 4th Int. Conf. Knowledge Discovery and Data Mining (SigKDD’98), New York City, USA, 1998, pp.169–173.
Cohen W, Singer Y. Context-sensitive learning methods for text categorization. ACM Trans. Information Systems, 1999, 17(2): 146–173.
Article Google Scholar
Yang Y. An evaluation of statistical approaches to text categorization. Technical Report CUM-CS-97-127, Carnegie Mellon University, April 1997.
Mounlinier I, Ganascia J G. Applying an existing machine learning algorithm to text categorization. In Connectionist Statistical, and Symbolic Approaches to Learning for Natural Language Processing, Wermter S, Riloff E, Scheler G (eds.), Heidelberg, Germany: Springer Verlag, Lecture Notes in Computer Science, Vol. 1040, 1996, pp.343–354.
Google Scholar
Li H, Yamanishi K. Text classification using esc-based stochastic decision lists. In Proc. 8th ACM Int. Conf. Information and Knowledge Management (CIKM-99), Kansas City, USA, 1999, pp.122–130.
Apte C, Damerau F, Weiss S. Automated Learning of Decision Rules for Text Categorization. ACM Trans. Information System, 1994, 12(3): 232–251.
Google Scholar
Tan C M, Wang Y F, Lee C D. The use of bigrams to enhance text categorization. Journal of Information Processing and Management, July 2002, 38(4): 529–546.
MATH Google Scholar
Ruiz M, Sinivasan P. Neural networks for text categorization. In Proc. 22nd ACM SIGIR Int. Conf. Information Retrieval, Berkeley, CA, USA, August 1999, pp.281–282.
Yang Y, Liu X. A re-examination of text categorization methods. In Proc. 22nd ACM Int. Conf. Research and Development in Information Retrieval (SIGIR-99), Berkeley, USA, 1999, pp.42–49.
Ziarko W. Variable precision rough set model. J. Computer and System Sciences, 1993, 46(1): 39–59.
Article MATH MathSciNet Google Scholar
Salton G, Wong A, Yang C. A vector space model for automatic indexing. Comn. ACM, Nov. 1975, 18(11): 613–620.
Article MATH Google Scholar
Salton G. Automatic Text Processing: The Transformation, Analysis and Retrieval of Information by Computer. Reading, Mas: Addison Wesley, 1989.
Google Scholar
Zaíane O R, Antonie M L. Classifying text documents by association terms with text categories. In Proc. 13th Australasian Database Conference (ACD’02), Melbourne, Australia, January 2002, pp.215–222.

Download references

Author information

Authors and Affiliations

College of Computer Science, Huazhong University of Science and Technology, Wuhan, 430074, China
Shi-Zhu Liu & He-Ping Hu

Authors

Shi-Zhu Liu
View author publications
You can also search for this author in PubMed Google Scholar
He-Ping Hu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shi-Zhu Liu.

Electronic supplementary material

Supplementary material - Chinese Abstract (PDF 56.3 KB)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Liu, SZ., Hu, HP. Text Classification Using Sentential Frequent Itemsets. J Comput Sci Technol 22, 334–337 (2007). https://doi.org/10.1007/s11390-007-9041-7

Download citation

Received: 22 May 2005
Revised: 05 September 2006
Published: 17 April 2007
Issue Date: March 2007
DOI: https://doi.org/10.1007/s11390-007-9041-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Text Classification Using Sentential Frequent Itemsets

Abstract

Access this article

Similar content being viewed by others

A Comparative Analysis of Logistic Regression, Random Forest and KNN Models for the Text Classification

A review of semi-supervised learning for text classification

Survey on supervised machine learning techniques for automatic text classification

References

Author information

Authors and Affiliations

Corresponding author

Electronic supplementary material

Supplementary material - Chinese Abstract (PDF 56.3 KB)

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Text Classification Using Sentential Frequent Itemsets

Abstract

Access this article

Similar content being viewed by others

A Comparative Analysis of Logistic Regression, Random Forest and KNN Models for the Text Classification

A review of semi-supervised learning for text classification

Survey on supervised machine learning techniques for automatic text classification

References

Author information

Authors and Affiliations

Corresponding author

Electronic supplementary material

Supplementary material - Chinese Abstract (PDF 56.3 KB)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation