Abstract
We present an approach for comparing human-made and automatically generated semantic representations with an assumption that neither of these has a primary status over the other. In the experimental part, we compare the results gained by using independent component analysis and the self-organizing map algorithm on word context analysis with a semantically labeled dictionary called BLESS. The data-driven methods are useful in assessing the quality of the hand-created semantic resources and these resources can be used to evaluate the outcome of the automated process. We present a number of specific findings that go beyond typical quantitative evaluations of the results of data-driven methods in which the manually created resources are usually taken as a gold standard.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Baroni, M., Lenci, A.: How we blessed distributional semantic evaluation. In: Pado, S., Peirsman, Y. (eds.) Proc. of EMNLP 2012, Geometrical Models for Natural Language Semantics (GEMS 2011) Workshop, pp. 1–10. ACL, East Stroudsburg (2011)
Bullinaria, J., Levy, J.: Extracting semantic representations from word co-occurrence statistics: A computational study. Behavior Research Methods 39, 510–526 (2007)
Comon, P.: Independent component analysis—a new concept? Signal Processing 36, 287–314 (1994)
Goldstone, R.: The role of similarity in categorization: Providing a groundwork. Cognition 52, 125–157 (1994)
Honkela, T., Hyvärinen, A., Väyrynen, J.: WordICA — emergence of linguistic representations for words by independent component analysis. Natural Language Engineering 16, 277–308 (2010)
Honkela, T., Pulkki, V., Kohonen, T.: Contextual relations of words in Grimm tales, analyzed by self-organizing map. In: Fogelman-Soulié, F., Gallinari, P. (eds.) Proc. of ICANN 1995, pp. 3–7. EC2, Nanterre (1995)
Hyvärinen, A., Karhunen, J., Oja, E.: Independent Component Analysis. John Wiley & Sons (2001)
Hyvärinen, A., Oja, E.: A fast fixed-point algorithm for independent component analysis. Neural Computation 9(7), 1483–1492 (1997)
Kohonen, T.: Self-Organizing maps. Springer, Heidelberg (2001)
Kohonen, T., Honkela, T.: Kohonen network. Scholarpedia 2(1), 1568 (2007)
Landauer, T., Dumais, S.: A solution to Plato’s problem: The latent semantic analysis theory of acquisition, induction and representation of knowledge. Psychological Review 104(2), 211–240 (1997)
Lindh-Knuutila, T., Väyrynen, J., Honkela, T.: Semantic analysis in word vector spaces with ICA and feature selection. In: Jancsary, J. (ed.) Proc. of The 11th Conference on Natural Language Processing (KONVENS), pp. 98–107. ÖGAI (2012)
Miller, G., Charles, W.: Contextual correlates of semantic similarity. Language and Cognitive Processes pp. 1–28 (1991)
Niwa, Y., Nitta, Y.: Co-occurrence vectors from corpora vs. distance vectors from dictionaries. In: Proc. of COLING 1994, pp. 304–309 (1994)
Ritter, H., Kohonen, T.: Self-organizing semantic maps. Biological Cybernetics 61, 241–254 (1989)
Sahlgren, M.: The word-space model: using distributional analysis to represent syntagmatic and paradigmatic relations between words in high-dimensional vector spaces. Ph.D. thesis, Stocholm University, Department of Linguistics (2006)
Schütze, H.: Word space. In: Advances in Neural Information Processing Systems, vol. 5, pp. 895–902. Morgan Kaufmann (1993)
Schwering, A.: Approaches to semantic similarity measurement for geo-spatial data: A survey. Transactions in GIS 12(1), 5–29 (2008)
Turney, P., Pantel, P.: From frequency to meaning: Vector space models of semantics. Journal of Artificial Intelligence Research 37, 141–188 (2010)
Väyrynen, J.J., Lindqvist, L., Honkela, T.: Sparse distributed representations for words with thresholded independent component analysis. In: Si, J., Sun, R. (eds.) Proc. of IJCNN 2007, pp. 1031–1036. IEEE (2007)
Venna, J., Kaski, S.: Local multidimensional scaling. Neural Networks 19(6), 889–899 (2006)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Lindh-Knuutila, T., Honkela, T. (2013). Exploratory Text Analysis: Data-Driven versus Human Semantic Similarity Judgments. In: Tomassini, M., Antonioni, A., Daolio, F., Buesser, P. (eds) Adaptive and Natural Computing Algorithms. ICANNGA 2013. Lecture Notes in Computer Science, vol 7824. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-37213-1_44
Download citation
DOI: https://doi.org/10.1007/978-3-642-37213-1_44
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-37212-4
Online ISBN: 978-3-642-37213-1
eBook Packages: Computer ScienceComputer Science (R0)