Skip to main content

A Linear Text Classification Algorithm Based on Category Relevance Factors

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2555))

Abstract

In this paper, we present a linear text classification algorithm called CRF. By using category relevance factors, CRF computes the feature vectors of training documents belonging to the same category. Based on these feature vectors, CRF induces the profile vector of each category. For new unlabelled documents, CRF adopts a modified cosine measure to obtain similarities between these documents and categories and assigns them to categories that have the biggest similarity scores. In CRF, it is profile vectors not vectors of all training documents that join in computing the similarities between documents and categories. We evaluated our algorithm on a subset of Reuters-21578 and 20_newsgroups text collections and compared it against k-NN and SVM. Experimental results show that CRF outperforms k-NN and is competitive with SVM.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Y. Yang. Expert network: Effective and efficient learning from human decisions in text categorization and retrieval. In 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’94), pages 13–22, 1994.

    Google Scholar 

  2. A. McCallum and K. Nigam. A comparison of event models for naïve bayes text classification. In AAA-98 Workshop on Learning for Text Categorization, 1998.

    Google Scholar 

  3. C. Apte, F. Damerau, and S. Weiss. Text mining with decision rules and decision trees. In proceedings of Conference on Automated Learning and Discovery, Workshop 6: Learning from Text and the Web, 1998.

    Google Scholar 

  4. H.T. Ng, W.B. Goh, and K.L. Low. Feature selection, perceptron learning, and a usability case study for text categorization. In 20th Ann Int ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’97), pages 67–73, 1997.

    Google Scholar 

  5. S. Dumais, J. Platt, D. Heckerman, and M. Sahami. Inductive learning algorithms and representations for text categorization. In Proceedings of the 1998 ACM CIKM International Conference on Information and Knowledge Management, pages 148–155, 1998.

    Google Scholar 

  6. Y. Yang and C.G. Chute. An example-based mapping method for text categorization and retrieval. ACM Transaction on Information Systems (TOIS), 12(3): 252–277, 1994.

    Article  Google Scholar 

  7. T. Joachims. Text Categorization with Support Vector Machines: Learning with Many Relevant Features. In European Conference on Machines Learning (ECML), pages 137–142, 1998.

    Google Scholar 

  8. Y. Yang. An evaluation of statistical approaches to text categorization. Journal of Information Retrieval, 1(1/2): 67–88, 1999.

    Google Scholar 

  9. Y. Yang, X. Liu. A re-examination of text categorization methods. In 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’99), pages 42–49, 1999.

    Google Scholar 

  10. B. Masand, G. Linoff, and D. Waltz. Classifying News Stories using Memory Based Reasoning. In 15th Annul International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’92), pages 59–64, 1992.

    Google Scholar 

  11. M. Iwayama, T. Tokunaga. Cluster-Based Text Categorization: A Comparison of Category Search Strategies. In 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’95), pages 273–280, 1995.

    Google Scholar 

  12. G. Salton. Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer. Addison-Wesley, MA, 1989.

    Google Scholar 

  13. V. Vapnic. The Nature of Statistical Learning Theory. Springer, New York, 1995.

    Google Scholar 

  14. C. Cortes and V. Vapnik. Support Vector networks. Machine Learning, 20: 273–297, 1995.

    MATH  Google Scholar 

  15. C. T. Yu, K. Lam, G. Salton. Term weighting in information retrieval using the term precision model. Journal of the ACM, 29(1): 152–170, 1982.

    Article  MATH  MathSciNet  Google Scholar 

  16. G. Salton, M. J. McGill. Introduction to Modern Information Retrieval. McGraw-Hill, New York, 1983.

    MATH  Google Scholar 

  17. T. Joachims, Making large-Scale SVM Learning Practical. Advances in Kernel Methods-Support Vector Learning, MIT-Press, 1999.

    Google Scholar 

  18. D.D. Lewis. Reuters_21578 text categorization test collection. http://www.research.att.com /~lewis/reuters21578.html.

  19. M.F. Porter. An algorithm for suffix stripping. Program, 14(3): 130–137, 1980.

    Google Scholar 

  20. F. Sebastiani. A Tutorial on Automated Text Categorisation. In Proceedings of the First Argentinean Symposium on Artificial Intelligence, 7–35, 1999.

    Google Scholar 

  21. C.J. van Rijsbergen. Information Retrieval. Butterworths, London, 1979.

    Google Scholar 

  22. D.D. Lewis. Representation and Learning in Information Retrieval. Ph.D. dissertation, University of Massachusetts, USA, 1992.

    Google Scholar 

  23. Y. Yang, J.P. Pedersen. A Comparative Study on Feature Selection in Text Categorization. In Proceedings of 14th International Conference on Machine Learning, 412–420, 1997.

    Google Scholar 

  24. D. Mladenic, M. Grobelnik. Feature Selection for Classification Based on Text Hierarchy. In Working notes of Learning from Text and the Web, Conference on Automated Learning and Discovery (CONALD’98), 1998.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2002 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Deng, ZH., Tang, SW., Yang, DQ., Zhang, M., Wu, XB., Yang, M. (2002). A Linear Text Classification Algorithm Based on Category Relevance Factors. In: Lim, E.P., et al. Digital Libraries: People, Knowledge, and Technology. ICADL 2002. Lecture Notes in Computer Science, vol 2555. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-36227-4_9

Download citation

  • DOI: https://doi.org/10.1007/3-540-36227-4_9

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-00261-1

  • Online ISBN: 978-3-540-36227-2

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics