Abstract
This research deals with the use of self-organising maps for the classification of text documents. The aim was to classify documents to separate classes according to their topics. We therefore constructed self-organising maps that were effective for this task and tested them with German newspaper documents. We compared the results gained to those of k nearest neighbour searching and k-means clustering. For five and ten classes, the self-organising maps were better yielding as high average classification accuracies as 88-89%, whereas nearest neighbour searching gave 74-83% and k-means clustering 72-79% as their highest accuracies.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Airio, E.: Word normalization and decompounding in mono- and bilingual IR. Information Retrieval 9(3), 249–271 (2006)
Chowdhury, N., Saha, D.: Unsupervised text classification using Kohonen’s self organizing network. In: Gelbukh, A. (ed.) CICLing 2005. LNCS, vol. 3406, pp. 715–718. Springer, Heidelberg (2005)
Conover, W.J.: Practical Nonparametric Statistics. John Wiley & Sons, New York (1999)
Doan, A., Domingos, P., Halevy, A.: Learning to match the schemas of data sources: a multistrategy approach. Machine Learning 50, 279–301 (2003)
Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification, 2nd edn. John Wiley & Sons, New York (2001)
Guerro-Bote, V.P., Moya-Anegón, F., Herrero-Solana, V.: Document organization using Kohonen’s algorithm. Information Processing and Management 38, 79–89 (2002)
Honkela, T.: Self-Organizing Maps in Natural Language Processing, Academic Dissertation. Helsinki University of Technology, Finland (1997)
Kohonen, T.: Self-Organizing Maps. Springer, Berlin (1995)
Lagus, K., Kaski, S., Kohonen, T.: Mining massive document collections by the WEBSOM method. Information Sciences 163(1-3), 135–156 (2004)
Moya-Anegón, F., Herrero-Solana, V., Jiménez-Contreras, E.: A connectionist and multivariate approach to science maps: the SOM, clustering and MDS applied to library and information science research. Journal of Information Science 32(1), 63–77 (2006)
Saarikoski, J., Laurikkala, J., Järvelin, K., Juhola, M.: A study on the use of self-organising maps in information retrieval. To appear in Journal of Documentation (2008)
Sebastiani, F.: Machine learning in automated text categorization. ACM Computing Surveys 34(1), 1–47 (2002)
Serrano, J.I., del Castillo, M.D.: Evolutionary learning of document categories. Information Retrieval 10, 69–83 (2007)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Saarikoski, J., Järvelin, K., Laurikkala, J., Juhola, M. (2009). On Document Classification with Self-Organising Maps. In: Kolehmainen, M., Toivanen, P., Beliczynski, B. (eds) Adaptive and Natural Computing Algorithms. ICANNGA 2009. Lecture Notes in Computer Science, vol 5495. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04921-7_15
Download citation
DOI: https://doi.org/10.1007/978-3-642-04921-7_15
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-04920-0
Online ISBN: 978-3-642-04921-7
eBook Packages: Computer ScienceComputer Science (R0)