Abstract
In this contribution we discuss the application of self-organizing maps to arrange documents based on a similarity measure. For this, the concepts of self-organizing systems will be briefly reviewed, an overview of methods for the required document pre-processing and encoding will be given and applications of self-organizing maps in document retrieval will be discussed. Furthermore, a prototypical implementation of a software tool for interactive search in document databases will be presented, which combines conventional keyword search methods with the possibility to interactively explore a document collection. The usability of the presented approach is shown by sample searches.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Agosti, M., Crestani, F., and Pasi, G. (2001). Lectures on Information Retrieval, Lecture Notes in Computer Science. Vol. 1980, Springer Verlag.
Alahakoon, D., Halgamuge, S. K., and Srinivasan, B. (1998). A structure adapting feature map for optimal cluster representations, In: Proc. Int. Conf. On Neural Information Processing, Kitakyushu, Japan, pp. 809–812.
Digital Equipment Corporation (1995). AltaVista, http://www.altavista.com.
Baeza-Yates, R., and Ribeiro-Neto, B. (1999). Modern Information Retrieval,Addison Wesley Longman.
Brin, S., and Page, L. (1998). The Anatomy of a Large-Scale Hypertextual Web Search Engine, In: Proc. of the 7th International World Wide Web Conference, pp. 107–117, Brisbane, Australia.
Deerwester, S., Dumais, S. T., Furnas, G. W., and Landauer, T. K. (1990). Indexing by latent semantic analysis, Journal of the American Society for Information Sciences, 41, pp. 391–407.
Frakes, W. B., and Baeza-Yates, R. (1992). Information Retrieval: Data Structures and Algorithms, Prentice Hall, New Jersey.
Fritzke, B. (1994). Growing cell structures–a self-organizing network for unsupervised and supervised learning, Neural Networks, 7 (9), pp. 1441–1460.
Greiff, W. R. (1998). A Theory of Term Weighting Based on Exploratory Data Analysis, In: 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, ACM, New York, NY.
Honkela, T. (1997). Self-Organizing Maps in Natural Language Processing, Helsinki University of Technology, Neural Networks Research Center, Espoo, Finland.
Honkela, T., Kaski, S., Lagus, K., and Kohonen, T. (1996). Newsgroup Exploration with the WEBSOM Method and Browsing Interface, Technical Report, Helsinki University of Technology, Neural Networks Research Center, Espoo, Finland.
Isbell, C. L., and Viola, P. (1998). Restructuring sparse high dimensional data for effective retrieval, In: Proc. of the Conference on Neural Information Processing (NIPS’98), pp. 480–486.
Kaski, S. (1998). Dimensionality reduction by random mapping: Fast similarity computation for clustering, In: Proc. Of the International Joint Conference on Artificial Neural Networks (IJCNN’98), pp. 413–418, IEEE.
Klose, A., Nürnberger, A., Kruse, R., Hartmann, G. K., and Richards, M. (2000). Interactive Text Retrieval Based on Document Similarities, Physics and Chemistry of the Earth, Part A: Solid Earth and Geodesy, 25(8), pp. 649654, Elsevier Science, Amsterdam.
Kohonen, T. (1982). Self-Organized Formation of Topologically Correct Feature Maps, Biological Cybernetics, 43, pp. 59–69.
Kohonen, T. (1984). Self-Organization and Associative Memory, Springer-Verlag, Berlin.
Kohonen, T., Kaski, S., Lagus, K., Salojärvi, J., Honkela, J., Paattero, V., and Saarela, A. (2000). Self organization of a massive document collection, IEEE Transactions on Neural Networks, 11 (3), pp. 574–585.
Lagus, K., and Kaski, S. (1999). Keyword selection method for characterizing text document maps, In: Proceedings of ICANN99, Ninth International Conference on Artificial Neural Networks, pp. 371–376, IEEE.
Lin, X., Marchionini, G., and Soergel, D. (1991). A selforganizing semantic map for information retrieval, In: Proceedings of the 14th International ACM/SIGIR Conference on Research and Development in Information Retrieval, pp. 262–269, ACM Press, New York.
Lochbaum, K. E., and Streeter, L. A. (1989). Combining and comparing the effectiveness of latent semantic indexing and the ordinary vector space model for information retrieval, Information Processing and Management, 25 (6), pp. 665–676.
Merkl, D. (1998). Text classification with self-organizing maps: Some lessons learned, Neurocomputing, 21, pp. 61–77.
Nürnberger, A. (2001). Interactive Text Retrieval Supported by Growing Self-Organizing Maps, In: Proc. of the International Workshop on Information Retrieval (JR2001), Infotech, Oulu, Finland.
Nürnberger, A., Klose, A., Kruse, R., Hartmann, G., and Richards, M. (2000). Interactive Text Retrieval Based on Document Similarities, In: Hartmann, G., Nölle, A., Richards, M., and Leitinger, R. (eds.), Data Utilization Software Tools 2 (DUST-2 CD-ROM), Max-Planck-Institut fir Aeronomie, Katlenburg-Lindau, Germany.
Porter, M. (1980). An algorithm for suffix stripping, Program, pp. 130–137.
Rauber, A. (1999). Label SOM: On the Labeling of Self-Organizing Maps, In: In Proc. of the International Joint Conference on Neural Networks (IJCNN’99), pp. 3527–3532, IEEE, Piscataway, NJ.
van Rijsbergen, C. J. (1986). A non-classical logic for Information Retrieval, The Computer Journal, 29 (6), pp. 481–485.
Ritter, H., and Kohonen, T. (1989). Self-organizing semantic maps, Biological Cybernetics, 61 (4).
Robertson, S. E. (1977). The probability ranking principle, Journal of Documentation, 33, pp. 294–304.
Roussinov, D. G., and Chen, H. (2001). Information navigation on the web by clustering and summarizing query results, Information Processing and Management, 37 (6), pp. 789–816.
Salton, G., Allan, J., and Buckley, C. (1994). Automatic structuring and retrieval of large text files, Communications of the ACM, 37 (2), pp. 97–108.
Salton, G., and Buckley, C. (1988). Term Weighting Approaches in Automatic Text Retrieval, Information Processing and Management, 24 (5), pp. 513–523.
Salton, G., Wong, A., and Yang, C. S. (1975). A vector space model for automatic indexing, Communications of the ACM,18(11), pp. 613–620, (see also TR74–218, Cornell University, NY, USA).
Scholtes, J. (1993). Neural Networks in Natural Language Processing and Information Retrieval, PhD Thesis, University of Amsterdam, Amsterdam, Netherlands.
Steinbach, M., Karypis, G., and Kumara, V. (2000). A Comparison of Document Clustering Techniques, In: KDD Workshop on Text Mining, (see also TR #00–034, University of Minnesota, MN).
Turtle, H., and Croft, W. (1990). Inference Networks for Document Retrieval, In: Proc. of the 13th Int. Conf. on Research and Development in Information Retrieval, pp. 1–24, ACM, New York.
Witten, I. H., Moffat, A., and Bell, T. C. (1999). Managing Gigabytes: Compressing and Indexing Documents and Images, Morgan Kaufmann, San Francisco, CA.
Yang, J., and Filo, D. (1994). Yahoo Home Page, URL: http://www.yahoo.com.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Nürnberger, A., Klose, A., Kruse, R. (2003). Self-Organizing Maps for Interactive Search in Document Databases. In: Szczepaniak, P.S., Segovia, J., Kacprzyk, J., Zadeh, L.A. (eds) Intelligent Exploration of the Web. Studies in Fuzziness and Soft Computing, vol 111. Physica, Heidelberg. https://doi.org/10.1007/978-3-7908-1772-0_8
Download citation
DOI: https://doi.org/10.1007/978-3-7908-1772-0_8
Publisher Name: Physica, Heidelberg
Print ISBN: 978-3-7908-2519-0
Online ISBN: 978-3-7908-1772-0
eBook Packages: Springer Book Archive