Skip to main content

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5755))

Included in the following conference series:

  • 1454 Accesses

Abstract

This research proposes an alternative approach to machine learning based ones for categorizing online news articles in Reuter21578. For using machine learning based approaches for any task of text mining or information retrieval, documents should be encoded into numerical vectors; two problems, huge dimensionality and sparse distribution, caused by encoding so. Although there are various tasks of text mining such as text categorization, text clustering, and text summarization, the scope of this research is restricted to text categorization. The idea of this research is to avoid the two problems by encoding a document or documents into a table, instead of numerical vectors. Therefore, the goal of this research is to improve the performance of text categorization by avoiding the two problems.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Joachims, T.: Text Categorization with Support Vector Machines: Learning with many Relevant Features. In: The Proceedings of 10th European Conference on Machine Learning, pp. 143–151 (1998)

    Google Scholar 

  2. Drucker, H., Wu, D., Vapnik, V.N.: Support Vector Machines for Spam Categorization. IEEE Transaction on Neural Networks 10, 1048–1054 (1999)

    Article  Google Scholar 

  3. Androutsopoulos, I., Koutsias, K., Chandrinos, K.V., Spyropoulos, C.D.: An Experimental Comparison of Naive Bayes and Keyword-based Anti-spam Filtering with personal email message. In: The Proceedings of 23rd ACM SIGIR, pp. 160–167 (2000)

    Google Scholar 

  4. Lodhi, H., Saunders, C., Shawe-Taylor, J., Cristianini, N., Watkins, C.: Text Classification with String Kernels. Journal of Machine Learning Research 2, 419–444 (2002)

    Article  MATH  Google Scholar 

  5. Massand, B., Linoff, G., Waltz, D.: Classifying News Stories using Memory based Reasoning. In: The Proceedings of 15th ACM International Conference on Research and Development in Information Retrieval, pp. 59–65 (1992)

    Google Scholar 

  6. Yang, Y.: An evaluation of statistical approaches to text categorization. Information Retrieval 1, 67–88 (1999)

    Article  Google Scholar 

  7. Mitchell, T.M.: Machine Learning. McGraw-Hill, New York (1997)

    MATH  Google Scholar 

  8. Cristianini, N., Shawe-Taylor, J.: Support Vector Machines and Other Kernel-based Learning Methods. Cambridge University Press, Cambridge (2000)

    Google Scholar 

  9. Wiener, E.D.: A Neural Network Approach to Topic Spotting in Text. The Thesis of Master of University of Colorado (1995)

    Google Scholar 

  10. Ruiz, M.E., Srinivasan, P.: Hierarchical Text Categorization Using Neural Networks. Information Retrieval 5, 87–118 (2002)

    Article  MATH  Google Scholar 

  11. Sebastiani, F.: Machine Learning in Automated Text Categorization. ACM Computing Survey 34, 1–47 (2002)

    Article  MathSciNet  Google Scholar 

  12. Jackson, P., Mouliner, I.: Natural Language Processing for Online Applications: Text Retrieval. In: Extraction and Categorization. John Benjamins Publishing Company, Amsterdam (2002)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Jo, T. (2009). Profile Based Algorithm to Topic Spotting in Reuter21578. In: Huang, DS., Jo, KH., Lee, HH., Kang, HJ., Bevilacqua, V. (eds) Emerging Intelligent Computing Technology and Applications. With Aspects of Artificial Intelligence. ICIC 2009. Lecture Notes in Computer Science(), vol 5755. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04020-7_27

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-04020-7_27

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-04019-1

  • Online ISBN: 978-3-642-04020-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics