Abstract
This research proposes an alternative approach to machine learning based ones for categorizing online news articles in Reuter21578. For using machine learning based approaches for any task of text mining or information retrieval, documents should be encoded into numerical vectors; two problems, huge dimensionality and sparse distribution, caused by encoding so. Although there are various tasks of text mining such as text categorization, text clustering, and text summarization, the scope of this research is restricted to text categorization. The idea of this research is to avoid the two problems by encoding a document or documents into a table, instead of numerical vectors. Therefore, the goal of this research is to improve the performance of text categorization by avoiding the two problems.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Joachims, T.: Text Categorization with Support Vector Machines: Learning with many Relevant Features. In: The Proceedings of 10th European Conference on Machine Learning, pp. 143–151 (1998)
Drucker, H., Wu, D., Vapnik, V.N.: Support Vector Machines for Spam Categorization. IEEE Transaction on Neural Networks 10, 1048–1054 (1999)
Androutsopoulos, I., Koutsias, K., Chandrinos, K.V., Spyropoulos, C.D.: An Experimental Comparison of Naive Bayes and Keyword-based Anti-spam Filtering with personal email message. In: The Proceedings of 23rd ACM SIGIR, pp. 160–167 (2000)
Lodhi, H., Saunders, C., Shawe-Taylor, J., Cristianini, N., Watkins, C.: Text Classification with String Kernels. Journal of Machine Learning Research 2, 419–444 (2002)
Massand, B., Linoff, G., Waltz, D.: Classifying News Stories using Memory based Reasoning. In: The Proceedings of 15th ACM International Conference on Research and Development in Information Retrieval, pp. 59–65 (1992)
Yang, Y.: An evaluation of statistical approaches to text categorization. Information Retrieval 1, 67–88 (1999)
Mitchell, T.M.: Machine Learning. McGraw-Hill, New York (1997)
Cristianini, N., Shawe-Taylor, J.: Support Vector Machines and Other Kernel-based Learning Methods. Cambridge University Press, Cambridge (2000)
Wiener, E.D.: A Neural Network Approach to Topic Spotting in Text. The Thesis of Master of University of Colorado (1995)
Ruiz, M.E., Srinivasan, P.: Hierarchical Text Categorization Using Neural Networks. Information Retrieval 5, 87–118 (2002)
Sebastiani, F.: Machine Learning in Automated Text Categorization. ACM Computing Survey 34, 1–47 (2002)
Jackson, P., Mouliner, I.: Natural Language Processing for Online Applications: Text Retrieval. In: Extraction and Categorization. John Benjamins Publishing Company, Amsterdam (2002)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Jo, T. (2009). Profile Based Algorithm to Topic Spotting in Reuter21578. In: Huang, DS., Jo, KH., Lee, HH., Kang, HJ., Bevilacqua, V. (eds) Emerging Intelligent Computing Technology and Applications. With Aspects of Artificial Intelligence. ICIC 2009. Lecture Notes in Computer Science(), vol 5755. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04020-7_27
Download citation
DOI: https://doi.org/10.1007/978-3-642-04020-7_27
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-04019-1
Online ISBN: 978-3-642-04020-7
eBook Packages: Computer ScienceComputer Science (R0)