Abstract
A map of text documents arranged using the Self-Organizing Map (SOM) algorithm (1) is organized in a meaningful manner so that items with similar content appear at nearby locations of the 2-dimensional map display, and (2) clusters the data, resulting in an approximate model of the data distribution in the high-dimensional document space. This article describes how a document map that is automatically organized for browsing and visualization can be successfully utilized also in speeding up document retrieval. Furthermore, experiments on the well-known CISI collection [3] show significantly improved performance compared to Salton's vector space model, measured by average precision (AP) when retrieving a small, fixed number of best documents. Regarding comparison with Latent Semantic Indexing the results are inconclusive.
Similar content being viewed by others
References
Baeza-Yates, R. and Ribeiro-Neto, B. (eds): Modern Information Retrieval. Addison Wesley Longman, 1999.
Chen, H., Houston, A. L., Sewell, R. R. and Schatz, B. R.: Internet browsing and searching: User evaluations of category map and concept space techniques. Journal of the American Society for Information Science (JASIR), 49(7) (1998), 582–603.
CISI-collection. The CISI reference collection for information retrieval. 1460 documents and 76 queries. <http://local.dcs.gla.ac.uk/idom/ir resources/test collections/cisi/>, 1981.
Deerwester, S., Dumais, S. T., Furnas, G. W. and Landauer, T. K.: Indexing by latent semantic analysis. Journal of the American Society for Information Science, 41 (1990), 391–407.
Hearst, M. A.: Modern Information Retrieval, chapter 10. User Interfaces and Visualization, pp. 257–324. Addison Wesley Longman, 1999.
Honkela, T., Kaski, S., Lagus, K. and Kohonen, T.: Newsgroup exploration with WEBSOM method and browsing interface. Technical Report A32, Helsinki University of Technology, Laboratory of Computer and Information Science, Espoo, Finland, 1996.
Kaski, S., Honkela, T., Lagus, K. and Kohonen, T.: WEBSOMM self-organizing maps of document collections. Neurocomputing, 21 (1998), 101–117.
Kaski, S., Kangas, J. and Kohonen, T.: Bibliography of self-organizing map (SOM) papers: 1981–1997. Neural Computing Surveys, 1(3/4) (1998), 1–176. Available in electronic form at http://www.icsi.berkeley.edu/~jagota/NCS/: Vol 1, pp. 102–350.
Kohonen, T.: Self-organizing formation of topologically correct feature maps. Biological Cybernetics, 43(1) (1982), 59–69.
Kohonen, T.: Self-Organizing Maps, volume 30 of Springer Series in Information Sciences. Springer, Berlin, 1995. Second, extended edition, 1997.
Kohonen, T., Kaski, S., Lagus, K., Salojärvi, J., Paatero, V. and Saarela, A.: Organization of a massive document collection. IEEE Transactions on Neural Networks, Special Issue on Neural Networks for Data Mining and Knowledge Discovery, 11(3) (2000), 574–585.
Koskenniemi, K.: Two-Level Morphology: A General Computational Model for Word-Form Recognition and Production. PhD thesis, University of Helsinki, Department of General Linguistics, 1983.
Lagus, K., Honkela, T., Kaski, S. and Kohonen, T.: WEBSOM for textual data mining. Artificial Intelligence Review, 13(5/6) (1999), 345–364.
Salton, G. and Buckley, C.: Term weighting approaches in automatic text retrieval. Technical Report 87–881, Cornell University, Department of Computer Science, Ithaca, NY, 1987.
Salton, G. and McGill, M. J.: Introduction to Modern Information Retrieval. McGraw-Hill, New York, 1983.
Salton, G., Wong, A. and Yang, C. S.: A vector space model for automatic indexing. Communications of the ACM, 18(11) (1975), 613–620.
Voorhees, E. M. and Harman, D. K.: Appendix: Evaluation techniques and measures. In: Proceedings of The Eighth Text REtrieval Conference (TREC 8). NIST, 2000. http://trec.nist.gov/pubs/trec8/t8 proceedings.html.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Lagus, K. Text Retrieval Using Self-Organized Document Maps. Neural Processing Letters 15, 21–29 (2002). https://doi.org/10.1023/A:1013853012954
Issue Date:
DOI: https://doi.org/10.1023/A:1013853012954