Abstract
In this paper we propose an effective method to cluster documents into a dynamically built taxonomy of topics, directly extracted from the documents. We take into account short contextual information within the text corpus, which is weighted by importance and used as input to a set of independently spun growing Self-Organising Maps (SOM). This work shows an increase in precision and labelling quality in the hierarchy of topics, using these indexing units. The use of the tree structure over sets of conventional two-dimensional maps creates topic hierarchies that are easy to browse and understand, in which the documents are stored based on their content similarity.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Salton, G., Automatic text processing: the transformation, analysis, and retrieval of information by Computer, Reading, Mass.Wokingham: Addison-Wesley 1988.
Maarek, Y.S., Fagin, R., Ben-Shaul, I.Z., Pelleg, D., Ephemeral document clustering for web applications, IBM Research Report RJ 10186, April, 2000.
Deerwester, S., Dumais, S.T., Furnas, G.W., Landauer, T.K., and Harshman, R., Indexing by latent semantic analysis. Journal of the American Society for Information Science, 41(6), pp. 391–407, 1990.
Honkela, T., WEBSOM Self-Organizing Maps of Document Collections, Proceedings of WSOM’97, Workshop on Self-Organizing Maps, Espoo, Finland, June 4–6, 1997.
Kohonen, T., Kaski, S., Lagus, K., Salojrvi, J., Paatero, V., Saarela, A., Self Organization of a Massive Document Collection. IEEE Transactions on Neural Networks, Special Issue on Neural Networks for Data Mining and Knowledge Discovery, vol. 11, n. 3, pp. 574–585, May, 2000.
Miikkulainen, R., Script recognition with hierarchical feature maps. Connection Science, 2(1&2), pp. 83–101, 1990.
Alahakoon, D., Halgamuge, S.K., Srinivasan, B., Dynamic self organizing maps with controlled growth for knowledge discovery, IEEE Transactions on Neural Networks, vol. 11,pp. 601–614, 2000.
Dittenbach, M., Merkl, D., Rauber, A., The Growing Hierarchical Self-Organizing Map, Proceedings of the International Joint Conference on Neural Networks (IJCNN 2000), vol. 6, pp. 15–19, July 24–27, 2000.
Freeman, R., Yin, H., Allinson, N. M., Self-Organising Maps for Tree View Based Hierarchical Document Clustering, Proceedings of the International Joint Conference on Neural Networks (IJCNN’02), vol. 2, pp. 1906–1911, Honolulu, Hawaii, 12–17 May, 2002.
Martinetz, T.M., Berkovich, S.G., Schulten, K.J., “Neural-Gas” Network for Vector Quantization and its Application to Time-Series Prediction, IEEE Transactions on Neural Networks, Vol. 4, No. 4, pp. 558–569, July, 1993.
Yin, H., Allinson, N.M., Interpolating self-organising maps (iSOM), Electronics Letters, Vol. 35, No. 19, pp. 1649–1650, 1999.
Yin, H., ViSOM-A novel method for multivariate data projection and structure visualisation, in IEEE Transactions on Neural Networks, Vol. 13, No. 1, 2002.
Pullwitt, D., Der, R., Integrating Contextual Information into Text Document clustering with Self-Organizing Maps, in Advances in Self-Organising Maps, N. Allinson, H. Yin, L. Allinson, J. Slack (Eds.), Springer, pp. 54–60, 2001.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2002 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Freeman, R., Yin, H. (2002). Self-Organising Maps for Hierarchical Tree View Document Clustering Using Contextual Information. In: Yin, H., Allinson, N., Freeman, R., Keane, J., Hubbard, S. (eds) Intelligent Data Engineering and Automated Learning — IDEAL 2002. IDEAL 2002. Lecture Notes in Computer Science, vol 2412. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45675-9_21
Download citation
DOI: https://doi.org/10.1007/3-540-45675-9_21
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-44025-3
Online ISBN: 978-3-540-45675-9
eBook Packages: Springer Book Archive