Skip to main content
Log in

Text Retrieval Using Self-Organized Document Maps

  • Published:
Neural Processing Letters Aims and scope Submit manuscript

Abstract

A map of text documents arranged using the Self-Organizing Map (SOM) algorithm (1) is organized in a meaningful manner so that items with similar content appear at nearby locations of the 2-dimensional map display, and (2) clusters the data, resulting in an approximate model of the data distribution in the high-dimensional document space. This article describes how a document map that is automatically organized for browsing and visualization can be successfully utilized also in speeding up document retrieval. Furthermore, experiments on the well-known CISI collection [3] show significantly improved performance compared to Salton's vector space model, measured by average precision (AP) when retrieving a small, fixed number of best documents. Regarding comparison with Latent Semantic Indexing the results are inconclusive.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  1. Baeza-Yates, R. and Ribeiro-Neto, B. (eds): Modern Information Retrieval. Addison Wesley Longman, 1999.

  2. Chen, H., Houston, A. L., Sewell, R. R. and Schatz, B. R.: Internet browsing and searching: User evaluations of category map and concept space techniques. Journal of the American Society for Information Science (JASIR), 49(7) (1998), 582–603.

    Google Scholar 

  3. CISI-collection. The CISI reference collection for information retrieval. 1460 documents and 76 queries. <http://local.dcs.gla.ac.uk/idom/ir resources/test collections/cisi/>, 1981.

  4. Deerwester, S., Dumais, S. T., Furnas, G. W. and Landauer, T. K.: Indexing by latent semantic analysis. Journal of the American Society for Information Science, 41 (1990), 391–407.

    Article  Google Scholar 

  5. Hearst, M. A.: Modern Information Retrieval, chapter 10. User Interfaces and Visualization, pp. 257–324. Addison Wesley Longman, 1999.

  6. Honkela, T., Kaski, S., Lagus, K. and Kohonen, T.: Newsgroup exploration with WEBSOM method and browsing interface. Technical Report A32, Helsinki University of Technology, Laboratory of Computer and Information Science, Espoo, Finland, 1996.

    Google Scholar 

  7. Kaski, S., Honkela, T., Lagus, K. and Kohonen, T.: WEBSOMM self-organizing maps of document collections. Neurocomputing, 21 (1998), 101–117.

    Article  MATH  Google Scholar 

  8. Kaski, S., Kangas, J. and Kohonen, T.: Bibliography of self-organizing map (SOM) papers: 1981–1997. Neural Computing Surveys, 1(3/4) (1998), 1–176. Available in electronic form at http://www.icsi.berkeley.edu/~jagota/NCS/: Vol 1, pp. 102–350.

    MATH  Google Scholar 

  9. Kohonen, T.: Self-organizing formation of topologically correct feature maps. Biological Cybernetics, 43(1) (1982), 59–69.

    Article  MATH  MathSciNet  Google Scholar 

  10. Kohonen, T.: Self-Organizing Maps, volume 30 of Springer Series in Information Sciences. Springer, Berlin, 1995. Second, extended edition, 1997.

    Google Scholar 

  11. Kohonen, T., Kaski, S., Lagus, K., Salojärvi, J., Paatero, V. and Saarela, A.: Organization of a massive document collection. IEEE Transactions on Neural Networks, Special Issue on Neural Networks for Data Mining and Knowledge Discovery, 11(3) (2000), 574–585.

    Google Scholar 

  12. Koskenniemi, K.: Two-Level Morphology: A General Computational Model for Word-Form Recognition and Production. PhD thesis, University of Helsinki, Department of General Linguistics, 1983.

  13. Lagus, K., Honkela, T., Kaski, S. and Kohonen, T.: WEBSOM for textual data mining. Artificial Intelligence Review, 13(5/6) (1999), 345–364.

    Article  Google Scholar 

  14. Salton, G. and Buckley, C.: Term weighting approaches in automatic text retrieval. Technical Report 87–881, Cornell University, Department of Computer Science, Ithaca, NY, 1987.

    Google Scholar 

  15. Salton, G. and McGill, M. J.: Introduction to Modern Information Retrieval. McGraw-Hill, New York, 1983.

    MATH  Google Scholar 

  16. Salton, G., Wong, A. and Yang, C. S.: A vector space model for automatic indexing. Communications of the ACM, 18(11) (1975), 613–620.

    Article  MATH  Google Scholar 

  17. Voorhees, E. M. and Harman, D. K.: Appendix: Evaluation techniques and measures. In: Proceedings of The Eighth Text REtrieval Conference (TREC 8). NIST, 2000. http://trec.nist.gov/pubs/trec8/t8 proceedings.html.

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lagus, K. Text Retrieval Using Self-Organized Document Maps. Neural Processing Letters 15, 21–29 (2002). https://doi.org/10.1023/A:1013853012954

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1013853012954

Navigation