Abstract
The paper presents a clustering technique based on dynamic self-organizing neural networks and its application to a large-scale and highly multidimensional WWW-newsgroup-document clustering problem. The collection of 19 997 documents (e-mail messages of different Usenet-News newsgroups) available at WWW server of the School of Computer Science, Carnegie Mellon University (www.cs.cmu.edu/ TextLearning/datasets.html) has been the subject of clustering. A broad comparative analysis with nine alternative clustering techniques has also been carried out demonstrating the superiority of the proposed approach in the considered problem.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Berry, M.W.: Survey of Text Mining. Springer, New York (2004)
Caillet, M., Pessiot, J., Amini, M., Gallinari, P.: Unsupervised Learning with Term Clustering For Thematic Segmentation of Texts. In: Proc. of RIAO 2004 (Recherche d’Information Assiste par Ordinateur), Toulouse, France (2004)
Chakrabarti, S.: Mining the Web: Analysis of Hypertext and Semi Structured Data. Morgan Kaufmann Publishers, San Francisco (2002)
Franke, J., Nakhaeizadeh, G., Renz, I. (eds.): Text Mining: Theoretical Aspects and Applications. Physica Verlag/Springer, Heidelberg (2003)
Gorzałczany, M.B., Rudziński, F.: Application of Genetic Algorithms and Kohonen Networks to Cluster Analysis. In: Rutkowski, L., Siekmann, J.H., Tadeusiewicz, R., Zadeh, L.A. (eds.) ICAISC 2004. LNCS (LNAI), vol. 3070, pp. 556–561. Springer, Heidelberg (2004)
Gorzałczany, M.B., Rudziński, F.: Modified Kohonen Networks for Complex Cluster-Analysis Problems. In: Rutkowski, L., Siekmann, J., Tadeusiewicz, R., Zadeh, L.A. (eds.) ICAISC 2004. LNCS (LNAI), vol. 3070, pp. 562–567. Springer, Heidelberg (2004)
Gorzałczany, M.B., Rudziński, F.: Cluster Analysis Via Dynamic Self-organizing Neural Networks. In: Rutkowski, L., Tadeusiewicz, R., Zadeh, L.A., Żurada, J.M. (eds.) ICAISC 2006. LNCS (LNAI), vol. 4029, pp. 593–602. Springer, Heidelberg (2006)
Gorzałczany, M.B., Rudziński, F.: Application of dynamic self-organizing neural networks to WWW-document clustering. International Journal of Information Technology and Intelligent Computing 1(1), 89-101 (2006) (also presented at 8th Int. Conference on Artificial Intelligence and Soft Computing ICAISC 2006, Zakopane)
Salton, G., McGill, M.J.: Introduction to Modern Information Retrieval. McGraw-Hill Book Co., New York (1983)
Slonim, N., Friedman, N., Tishby, N.: Unsupervised Document Classification using Sequential Informaiton Maximization. In: Proc. of the Twenty-Fifth Annual International ACM SIGIR Conference, Tampere, Finland, pp. 129–136 (2002)
Tang, B., Shepherd, M., Milios, E., Heywood, M.I.: Comparing and combining dimension reduction techniques for efficient text clustering. In: Proc. of Int. Workshop on Feature Selection and Data Mining, Newport Beach (2005)
Weiss, S., Indurkhya, N., Zhang, T., Damerau, F.: Text Mining: Predictive Methods for Analyzing Unstructured Information. Springer, New York (2004)
Zanasi, A. (ed.): Text Mining and its Applications to Intelligence, CRM and Knowledge Management. WIT Press, Southampton (2005)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Gorzałczany, M.B., Rudziński, F. (2008). WWW-Newsgroup-Document Clustering by Means of Dynamic Self-organizing Neural Networks. In: Rutkowski, L., Tadeusiewicz, R., Zadeh, L.A., Zurada, J.M. (eds) Artificial Intelligence and Soft Computing – ICAISC 2008. ICAISC 2008. Lecture Notes in Computer Science(), vol 5097. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-69731-2_5
Download citation
DOI: https://doi.org/10.1007/978-3-540-69731-2_5
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-69572-1
Online ISBN: 978-3-540-69731-2
eBook Packages: Computer ScienceComputer Science (R0)