Skip to main content
Log in

LSISOM — A Latent Semantic Indexing Approach to Self-Organizing Maps of Document Collections

  • Published:
Neural Processing Letters Aims and scope Submit manuscript

Abstract

The Self Organizing Map (SOM) algorithm has been utilized, with much success, in a variety of applications for the automatic organization of full-text document collections. A great advantage of the SOM method is that document collections can be ordered in such a way so that documents with similar content are positioned at nearby locations of the 2-dimensional SOM lattice. The resulting ordered map thus presents a general view of the document collection which helps the exploration of information contained in the whole document space. The most notable example of such an application is the WEBSOM method where the document collection is ordered onto a map by utilizing word category histograms for representing the documents data vectors. In this paper, we introduce the LSISOM method which resembles WEBSOM in the sense that the document maps are generated from word category histograms rather than simple histograms of the words. However, a major difference between the two methods is that in WEBSOM the word category histograms are formed using statistical information of short word contexts whereas in LSISOM these histograms are obtained from the SOM clustering of the Latent Semantic Indexing representation of document terms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Ampazis, N. and Perantonis, S. J.: Evaluation of dimensionality reduction techniques for SOM clustering of textual data, In: Artificial Intelligence and Applications (AIA 2002), Malaga, Spain, 2001.

  2. Berry, M. W.: Large scale singular value computations, International Journal of Supercomputer Applications 6 (1) (1992).

  3. Berry, M. W., Dumais, S. T. and O'Brien, G. W.: Using linear algebra for intelligent information retrieval, SIAM Review 37 (4) (1995), 573–595.

    Article  MathSciNet  Google Scholar 

  4. Bingham, E. and Mannila, H.: Random projection in dimensionality reduction: applica-tions to image and text data, In: Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-2001), pp. 245–250, San Francisco, CA, USA, 2001.

  5. Chen, H., Schuffels, C. and Orwig, R.: Internet categorization and search: A machine learning approach, Journal of Visual Communication and Image Representation 7 (1) (1996), 88–102, Special Issue on Digital Libraries.

    Google Scholar 

  6. Davies, D. L. and Bouldin, D.: A cluster separation measure, IEEE Transactions on Pattern Analysis and Machine Intelligence 1 (1979), 224–227.

    Google Scholar 

  7. Deerwester, S., Dumais, S., Furnas, G., Landauer, T. and Harshman, R.: Indexing by latent semantic analysis, Journal of the American Society for Information Science 41 (6) (1990), 391–407.

    Article  Google Scholar 

  8. Ding, C.: A similarity based probability model for latent semantic indexing, In: ACM SIGIR Conference Proceedings (1999).

  9. Honkela, T., Kaski, S., Lagus, K. and Kohonen, T.: Newsgroup exploration with WEBSOM method and browsing interface, Technical Report No. A32, Espoo, Finland. 1996.

  10. Honkela, T., Kaski, S., Lagus, K. and Kohonen, T.: WEBSOM-self-organizing maps of document collections, In: Proceedings of WSOM' 97, Workshop on Self-organizing Maps, Espoo, Finland, June 4-6. pp. 310–315, Espoo, Finland, Helsinki University of Technology, Neural Networks Research Centre, 1997.

    Google Scholar 

  11. Jain, A. K. and Dubes, R. C.: Algorithms for Clustering Data. Prentice Hall, New Jersey, 1988.

    Google Scholar 

  12. Jing, Y. and Croft, W. B.: An association thesaurus for information retrieval, In: Proceedings of RIAO-94, Fourth International Conference Recherche d' Information Assistee par Ordinateur, pp. 146–160, New York, US, 1994.

  13. Johnson, W. B. and Lindenstrauss, J.: Extensions of Lipshitz mapping into Hilbert space, Contemp. Math. 26 (1984), 189–206.

    MathSciNet  Google Scholar 

  14. Kaski, S., Honkela, T., Lagus, K. and Kohonen, T.: Creating an order in digital libraries with self-organizing maps, In: Proceedings of WCNN' 96, World Congress on Neural Networks, September 15-18, pp. 814-817, San Diego, California. Mahwah, NJ, Lawrence Erlbaum and INNS Press, 1996.

    Google Scholar 

  15. Kohonen, T.: Self-organization and Associative Memory, Springer-Verlag, N. Y, 3rd edition. 1989.

    Google Scholar 

  16. Kohonen, T., Kaski, S., Lagus, K., Salojrvi, J., Honkela, J., Paatero, V. and Saarela, A.: Self organization of a massive document collection, IEEE Transactions on Neural Networks 11 (3) (2000), 574–585. Special Issue on Neural Networks for Data Mining and Knowledge Discovery.

    Article  Google Scholar 

  17. Lagus, K., Honkela, T., Kaski, S. and Kohonen, T.: Self-organizing maps of document collections: A new approach to interactive exploration, In: Simoudis, E., Han, J. and U. Fayyad (eds.), Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, AAAI Press, Menlo Park, California, pp. 238–243, 1996a.

    Google Scholar 

  18. Lagus, K., Kaski, S., Honkela, T. and Kohonen, T.: Browsing digital libraries with the aid of self-organizing maps, In: Proceedings of the Fifth International World Wide Web Conference WWW5, May 6-10, pp. 71–79, Paris, France, Vol. Poster Proceedings. EPGL, 1996b.

  19. MacCuish, J. D., N. C. and MacCuish, N. E.: A pattern recognition approach to understanding the multilayer perceptron, J. Chemical Information and Computer Sciences 41 (2001), 134–146.

    Google Scholar 

  20. Merkl, D. and Tjoa, A. M.: The representation of semantic similarity between documents by using maps: application of an arti cial neural network to organize software libraries, In: FID' 94, General Assembly Conference and Congress of the International Federation for Information and Documentation, 1994.

  21. Papadimitriou, C. H., Raghavan, P., Tamaki, H. and Vempala, S.: Latent semantic indexing: a probabilistic analysis, JCSS 61 (2) (2000), 217–235.

    MathSciNet  Google Scholar 

  22. Rauber, A. and Merkl, D.: Using self-organizing maps to organize document archives and to characterize subject matter: how to make a map tell the news of the world, In: Database and Expert Systems Applications (DEXA 1999), pp. 302-311, 1999.

  23. Ritter, H. and Kohonen, T.: Self-organizing semantic maps, Biol. Cyb. 61 (4) (1989), 241–254.

    Google Scholar 

  24. Salton, G. and McGill, M. J.: Introduction to Modern Information Retrieval, New York, 1983.

  25. Scholtes, J. C.: Unsupervised learning and the information retrieval problem, In: IJCNN' 91, International Joint Conference on Neural Networks, pp. 95–100, Singapore, 1991.

  26. Vesanto, J., Himberg, J., Alhoniemi, E. and Parhankangas, J.: SOM Toolbox for Matlab 5. Report No. A57, Helsinki University of Technology, Neural Networks Research Centre, Espoo, Finland, 2000.

    Google Scholar 

  27. Lin, X., D. S. and Marchionini, G.: A self-organizing semantic map for information retrieval, In: Fourteenth Annual International ACM/SIGIR Conference on R & D In Information Retrieval, pp. 262–269, 1991.

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ampazis, N., Perantonis, S.J. LSISOM — A Latent Semantic Indexing Approach to Self-Organizing Maps of Document Collections. Neural Processing Letters 19, 157–173 (2004). https://doi.org/10.1023/B:NEPL.0000023449.95030.8f

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/B:NEPL.0000023449.95030.8f

Navigation