Skip to main content

Self-Organising Maps in Document Classification: A Comparison with Six Machine Learning Methods

  • Conference paper
Adaptive and Natural Computing Algorithms (ICANNGA 2011)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 6593))

Included in the following conference series:

Abstract

This paper focuses on the use of self-organising maps, also known as Kohonen maps, for the classification task of text documents. The aim is to effectively and automatically classify documents to separate classes based on their topics. The classification with self-organising map was tested with three data sets and the results were then compared to those of six well known baseline methods: k-means clustering, Ward’s clustering, k nearest neighbour searching, discriminant analysis, Naïve Bayes classifier and classification tree. The self-organising map proved to be yielding the highest accuracies of tested unsupervised methods in classification of the Reuters news collection and the Spanish CLEF 2003 news collection, and comparable accuracies against some of the supervised methods in all three data sets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Apte, C., Damerau, F.J., Weiss, S.M.: Automated learning of decision rules for text categorization. ACM Transactions on Information Systems 12, 233–251 (1994)

    Article  Google Scholar 

  2. ChandraShekar, B.H., Shobha, G.: Classification of Documents Using Kohonen’s Self-Organizing Map. International Journal of Computer Theory and Engineering 5(1), 610–613 (2009)

    Article  Google Scholar 

  3. Chen, Y., Qin, B., Liu, T., Liu, Y., Li, S.: The Comparison of SOM and K-means for Text Clustering. Computer and Information Science 2(3), 268–274 (2010)

    Google Scholar 

  4. Chowdhury, N., Saha, D.: Unsupervised text classification using kohonen’s self organizing network. In: Gelbukh, A. (ed.) CICLing 2005. LNCS, vol. 3406, pp. 715–718. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  5. Chumwatana, T., Wong, K., Xie, H.: A SOM-Based Document Clustering Using Frequent Max Substring for Non-Segmented Texts. Journal of Intelligent Learning Systems & Applications 2, 117–125 (2010)

    Article  Google Scholar 

  6. CLEF: The Cross-Language Evaluation Forum, http://www.clef-campaign.org/

  7. Conover, W.J.: Practical Nonparametric Statistics. John Wiley & Sons, New York (1999)

    Google Scholar 

  8. Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification, 2nd edn. John Wiley & Sons, New York (2001)

    MATH  Google Scholar 

  9. Eyassu, S., Gambäck, B.: Classifying Amharic News Text Using Self-Organizing Maps. Proceeding of the ACL Workshop on Computational Approaches to Semitic Languages, Ann Arbor, Michigan, USA, pp. 71–78 (2005)

    Google Scholar 

  10. Fernandez, J., Mones, R., Diaz, I., Ranilla, J., Combarro, E.: Experiments with Self Organizing Maps in CLEF 2003. In: Peters, C., Gonzalo, J., Braschler, M., Kluck, M. (eds.) CLEF 2003. LNCS, vol. 3237, pp. 358–366. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  11. Guerro-Bote, V.P., Moya-Anegón, F., Herrero-Solana, V.: Document organization using Kohonen’s algorithm. Information Processing and Management 38, 79–89 (2002)

    Article  MATH  Google Scholar 

  12. Honkela, T.: Self-Organizing Maps in Natural Language Processing, Academic Dissertation. Helsinki University of Technology, Finland (1997)

    Google Scholar 

  13. Kohonen, T.: Self-Organizing Maps. Springer, Berlin (1995)

    Book  MATH  Google Scholar 

  14. Lagus, K.: Text retrieval using self-organized document maps. Neural Processing Letters 15, 21–29 (2002)

    Article  MATH  Google Scholar 

  15. Lagus, K., Kaski, S., Kohonen, T.: Mining massive document collections by the WEBSOM method. Information Sciences 163(1-3), 135–156 (2004)

    Article  Google Scholar 

  16. Moya-Anegón, F., Herrero-Solana, V., Jiménez-Contreras, E.: A connectionist and multivariate approach to science maps: the SOM, clustering and MDS applied to library and information science research. Journal of Information Science 32(1), 63–77 (2006)

    Article  Google Scholar 

  17. Reuters-21578 collection, http://kdd.ics.uci.edu/databases/reuters21578/reuters21578.html

  18. Saarikoski, J., Laurikkala, J., Järvelin, K., Juhola, M.: A study of the use of self-organising maps in information retrieval. Journal of Documentation 65(2), 304–322 (2009)

    Article  Google Scholar 

  19. Saarikoski, J., Järvelin, K., Laurikkala, J., Juhola, M.: On Document Classification with Self-Organising Maps. In: Kolehmainen, M., Toivanen, P., Beliczynski, B. (eds.) ICANNGA 2009. LNCS, vol. 5495, pp. 140–149. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  20. Salton, G.: Automatic Text Processing: The Transformation, Analysis and Retrieval of Information by Computer. Addison-Wesley, Reading (1989)

    Google Scholar 

  21. Sebastiani, F.: Machine learning in automated text categorization. ACM Computing Surveys 34(1), 1–47 (2002)

    Article  Google Scholar 

  22. SOM_PAK, http://www.cis.hut.fi/research/som-research/nnrc-programs.shtml

  23. 20 newsgroups collection, http://people.csail.mit.edu/jrennie/20Newsgroups/

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Saarikoski, J., Laurikkala, J., Järvelin, K., Juhola, M. (2011). Self-Organising Maps in Document Classification: A Comparison with Six Machine Learning Methods. In: Dobnikar, A., Lotrič, U., Šter, B. (eds) Adaptive and Natural Computing Algorithms. ICANNGA 2011. Lecture Notes in Computer Science, vol 6593. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-20282-7_27

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-20282-7_27

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-20281-0

  • Online ISBN: 978-3-642-20282-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics