Abstract
A variant of the self-organizing maps algorithm is proposed in this paper for document organization and retrieval. Bigrams are used to encode the available documents and signed ranks are assigned to these bigrams according to their frequencies. A novel metric which is based on the Wilcoxon signed-rank test exploits these ranks in assessing the contextual similarity between documents. This metric replaces the Euclidean distance employed by the self-organizing maps algorithm in identifying the winner neuron. Experiments performed using both algorithms demonstrates a superior performance of the proposed variant against the self-organizing map algorithm regarding the average recall-precision curves.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
R. B. Yates and B. R. Neto, Modern Information Retrieval, ACM Press, 1999.
D. Manning and H. Schütze, Foundations of Statistical Natural Language Processing, Cambridge, MA: MIT Press, 1999.
T. Kohonen, Self Organizing Maps, Germany: Springer-Verlag, 1997.
T. Kohonen, S. Kaski, K. Lagus, J. Salojärvi, V. Paatero, and A. Saarela, “Organization of a massive document collection,” IEEE Trans. on Neural Networks, vol. 11, no. 3, pp. 574–585, May 2000.
J. Astola, P. Haavisto, and Y. Neuro, “Vector median filters,” Proceedings of the IEEE, vol. 78, no. 4, pp. 678–689, April 1990.
D. D. Lewis, “Reuters-21578 text categorization test collection, distribution 1.0,” Sep. 1997, http://kdd.ics.uci.edu/databases/reuters21578/reuters21578.html.
R. R. Korfhage, Information Storage and Retrieval, New York: J. Wiley, 1997.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2002 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Georgakis, A., Kotropoulos, C., Pitas, I. (2002). A SOM Variant Based on the Wilcoxon Test for Document Organization and Retrieval. In: Dorronsoro, J.R. (eds) Artificial Neural Networks — ICANN 2002. ICANN 2002. Lecture Notes in Computer Science, vol 2415. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-46084-5_161
Download citation
DOI: https://doi.org/10.1007/3-540-46084-5_161
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-44074-1
Online ISBN: 978-3-540-46084-8
eBook Packages: Springer Book Archive