Skip to main content

Classifying Chinese Texts in Two Steps

  • Conference paper
Natural Language Processing – IJCNLP 2005 (IJCNLP 2005)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3651))

Included in the following conference series:

Abstract

This paper proposes a two-step method for Chinese text categorization (TC). In the first step, a Naïve Bayesian classifier is used to fix the fuzzy area between two categories, and, in the second step, the classifier with more subtle and powerful features is used to deal with documents in the fuzzy area, which are thought of being unreliable in the first step. The preliminary experiment validated the soundness of this method. Then, the method is extended from two-class TC to multi-class TC. In this two-step framework, we try to further improve the classifier by taking the dependences among features into consideration in the second step, resulting in a Causality Naïve Bayesian Classifier.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Sebastiani, F.: Machine Learning in Automated Text Categorization. ACM Computing Surveys 34(1), 1–47 (2002)

    Article  Google Scholar 

  2. Lewis, D.: Naive Bayes at Forty: The Independence Assumption in Information Retrieval. In: Proceedings of ECML-1998, pp. 4–15 (1998)

    Google Scholar 

  3. Salton, G.: Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer. Addison-Wesley, Reading (1989)

    Google Scholar 

  4. Mitchell, T.M.: Machine Learning. McCraw Hill, New York (1996)

    MATH  Google Scholar 

  5. Yang, Y., Liu, X.: A Re-examination of Text Categorization Methods. In: Proceedings of SIGIR-1999, pp. 42–49 (1999)

    Google Scholar 

  6. Fan, X.: Causality Reasoning and Text Categorization, Postdoctoral Research Report of Tsinghua University, P.R. China (April 2004) (in Chinese)

    Google Scholar 

  7. Dumais, S.T., Platt, J., Hecherman, D., Sahami, M.: Inductive Learning Algorithms and Representation for Text Categorization. In: Proceedings of CIKM-1998, Bethesda, MD, pp. 148–155 (1998)

    Google Scholar 

  8. Sahami, M., Dumais, S., Hecherman, D., Horvitz, E.A.: Bayesian Approach to Filtering Junk E-Mail. In: Learning for Text Categorization: Papers from the AAAI Workshop, 55-62, Madison Wisconsin. AAAI Technical Report WS-98-05 (1998)

    Google Scholar 

  9. Fan, X.: Causality Diagram Theory Research and Applying It to Fault Diagnosis of Complexity System, Ph.D. Dissertation of Chongqing University, P.R. China (April 2002) (In Chinese)

    Google Scholar 

  10. Fan, X., Qin, Z., Maosong, S., Xiyue, H.: Reasoning Algorithm in Multi-Valued Causality Diagram. Chinese Journal of Computers 26(3), 310–322 (2003) (in Chinese)

    Google Scholar 

  11. Sahami, M.: Learning Limited Dependence Bayesian Classifiers. In: Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, Portland, pp. 335–338 (1996)

    Google Scholar 

  12. Rajashekar, T.B., Croft, W.B.: Combining Automatic and Manual Index Representations in Probabilistic Retrieval. Journal of the American society for information science 6(4), 272–283 (1995)

    Article  Google Scholar 

  13. Yang, Y., Ault, T., Pierce, T.: Combining Multiple Learning Strategies for Effective Cross Validation. In: Proceedings of ICML 2000, pp. 1167–1174 (2000)

    Google Scholar 

  14. Hull, D.A., Pedersen, J.O., Schutze, H.: Method Combination for Document Filtering. In: Proceedings of SIGIR-1996, pp. 279–287 (1996)

    Google Scholar 

  15. Larkey, L.S., Croft, W.B.: Combining Classifiers in Text Categorization. In: Proceedings of SIGIR-1996, pp. 289–297 (1996)

    Google Scholar 

  16. Li, Y.H., Jain, A.K.: Classification of Text Documents. The Computer Journal 41(8), 537–546 (1998)

    Article  MATH  Google Scholar 

  17. Lam, W., Lai, K.Y.: A Meta-learning Approach for Text Categorization. In: Proceedings of SIGIR-2001, pp. 303–309 (2001)

    Google Scholar 

  18. Bennett, P.N., Dumais, S.T., Horvitz, E.: Probabilistic Combination of Text Classifiers Using Reliability Indicators: Models and Results. In: Proceedings of SIGIR-2002, pp. 11–15 (2002)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Fan, X., Sun, M., Choi, Ks., Zhang, Q. (2005). Classifying Chinese Texts in Two Steps. In: Dale, R., Wong, KF., Su, J., Kwong, O.Y. (eds) Natural Language Processing – IJCNLP 2005. IJCNLP 2005. Lecture Notes in Computer Science(), vol 3651. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11562214_27

Download citation

  • DOI: https://doi.org/10.1007/11562214_27

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-29172-5

  • Online ISBN: 978-3-540-31724-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics