Skip to main content
Log in

Text Classification Using Sentential Frequent Itemsets

  • Short Paper
  • Published:
Journal of Computer Science and Technology Aims and scope Submit manuscript

Abstract

Text classification techniques mostly rely on single term analysis of the document data set, while more concepts, especially the specific ones, are usually conveyed by set of terms. To achieve more accurate text classifier, more informative feature including frequent co-occurring words in the same sentence and their weights are particularly important in such scenarios. In this paper, we propose a novel approach using sentential frequent itemset, a concept comes from association rule mining, for text classification, which views a sentence rather than a document as a transaction, and uses a variable precision rough set based method to evaluate each sentential frequent itemset’s contribution to the classification. Experiments over the Reuters and newsgroup corpus are carried out, which validate the practicability of the proposed system.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Li Wenmin, Jiawei Han, Pei Jian. CMAR: Accurate and efficient classification based on multiple class-association rules. In Proc. IEEE Int. Conf. Data Mining, Nick Cercone, T Y Lin, Xingdong Wu (eds.), San Jose, CA, USA, 2001, pp.369–376.

  2. Liu B, Hsu W, Ma Y. Integrating classification and association rule mining. In Proc. ACM Int. Conf. Knowledge Discovery and Data Mining (SIGKDD’98), New York City, USA, August 1998, pp.80–86.

  3. Antonie Maria-Luiza, Zaiane Osmar R. Text document categorization by term association. In Proc. IEEE Int. Conf. Data Mining (ICDM’2002), Maebashi City, Japan, 2002, pp.19–26.

  4. Meretakis D, Fragoutids D, Lu H et al. Scalable association-based text classification. In Proc. the 9th Int. Conf. Information and Knowledge Management, Arvin Agah, Jamie Callan, Elke Rundensteiner et al. (eds.), McLean, USA, 2000, pp.5–11.

  5. Hull D A. Improving text retrieval for the routing problem using latent semantic indexing. In Proc. the 17th Annual Int. ACM-SIGIR Conf. Research and Development in Information Retrieval, W Bruce Croft, C J van Rijsbergen (eds.), Dublin, Ireland, 1994, pp.282–291.

  6. Lewis D D. Naïve (Bayes) at forty: The independence assumption in information retrieval. In Proc. the 10th European Conf. Machine Learning, Claire Nédellec, Céline Rouveirol (eds.), Chemnitz, Germany, 1998, pp.4–15.

  7. Joachims T. Text categorization with support vector machines: Learning with many relevant features. In Proc. 10th European Conf. Machine Learning, Claire Nédellec, Céline Rouveirol (eds.), Chemnitz, Germany, 1998, pp.137–142.

  8. Cohen W, Hirsch H. Joins that generalize: Text classification using whirl. In Proc. 4th Int. Conf. Knowledge Discovery and Data Mining (SigKDD’98), New York City, USA, 1998, pp.169–173.

  9. Cohen W, Singer Y. Context-sensitive learning methods for text categorization. ACM Trans. Information Systems, 1999, 17(2): 146–173.

    Article  Google Scholar 

  10. Yang Y. An evaluation of statistical approaches to text categorization. Technical Report CUM-CS-97-127, Carnegie Mellon University, April 1997.

  11. Mounlinier I, Ganascia J G. Applying an existing machine learning algorithm to text categorization. In Connectionist Statistical, and Symbolic Approaches to Learning for Natural Language Processing, Wermter S, Riloff E, Scheler G (eds.), Heidelberg, Germany: Springer Verlag, Lecture Notes in Computer Science, Vol. 1040, 1996, pp.343–354.

    Google Scholar 

  12. Li H, Yamanishi K. Text classification using esc-based stochastic decision lists. In Proc. 8th ACM Int. Conf. Information and Knowledge Management (CIKM-99), Kansas City, USA, 1999, pp.122–130.

  13. Apte C, Damerau F, Weiss S. Automated Learning of Decision Rules for Text Categorization. ACM Trans. Information System, 1994, 12(3): 232–251.

    Google Scholar 

  14. Tan C M, Wang Y F, Lee C D. The use of bigrams to enhance text categorization. Journal of Information Processing and Management, July 2002, 38(4): 529–546.

    MATH  Google Scholar 

  15. Ruiz M, Sinivasan P. Neural networks for text categorization. In Proc. 22nd ACM SIGIR Int. Conf. Information Retrieval, Berkeley, CA, USA, August 1999, pp.281–282.

  16. Yang Y, Liu X. A re-examination of text categorization methods. In Proc. 22nd ACM Int. Conf. Research and Development in Information Retrieval (SIGIR-99), Berkeley, USA, 1999, pp.42–49.

  17. Ziarko W. Variable precision rough set model. J. Computer and System Sciences, 1993, 46(1): 39–59.

    Article  MATH  MathSciNet  Google Scholar 

  18. Salton G, Wong A, Yang C. A vector space model for automatic indexing. Comn. ACM, Nov. 1975, 18(11): 613–620.

    Article  MATH  Google Scholar 

  19. Salton G. Automatic Text Processing: The Transformation, Analysis and Retrieval of Information by Computer. Reading, Mas: Addison Wesley, 1989.

    Google Scholar 

  20. Zaíane O R, Antonie M L. Classifying text documents by association terms with text categories. In Proc. 13th Australasian Database Conference (ACD’02), Melbourne, Australia, January 2002, pp.215–222.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shi-Zhu Liu.

Electronic supplementary material

Rights and permissions

Reprints and permissions

About this article

Cite this article

Liu, SZ., Hu, HP. Text Classification Using Sentential Frequent Itemsets. J Comput Sci Technol 22, 334–337 (2007). https://doi.org/10.1007/s11390-007-9041-7

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11390-007-9041-7

Keywords

Navigation