Exploratory Text Analysis: Data-Driven versus Human Semantic Similarity Judgments

Lindh-Knuutila, Tiina; Honkela, Timo

doi:10.1007/978-3-642-37213-1_44

Tiina Lindh-Knuutila¹⁷ &
Timo Honkela¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 7824))

Included in the following conference series:

International Conference on Adaptive and Natural Computing Algorithms

1816 Accesses
2 Citations

Abstract

We present an approach for comparing human-made and automatically generated semantic representations with an assumption that neither of these has a primary status over the other. In the experimental part, we compare the results gained by using independent component analysis and the self-organizing map algorithm on word context analysis with a semantically labeled dictionary called BLESS. The data-driven methods are useful in assessing the quality of the hand-created semantic resources and these resources can be used to evaluate the outcome of the automated process. We present a number of specific findings that go beyond typical quantitative evaluations of the results of data-driven methods in which the manually created resources are usually taken as a gold standard.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Exploratory analysis of semantic categories: comparing data-driven and human similarity judgments

Article Open access 07 July 2015

Search for Meaning Through the Study of Co-occurrences in Texts

Correlating Words - Approaches and Applications

References

Baroni, M., Lenci, A.: How we blessed distributional semantic evaluation. In: Pado, S., Peirsman, Y. (eds.) Proc. of EMNLP 2012, Geometrical Models for Natural Language Semantics (GEMS 2011) Workshop, pp. 1–10. ACL, East Stroudsburg (2011)
Google Scholar
Bullinaria, J., Levy, J.: Extracting semantic representations from word co-occurrence statistics: A computational study. Behavior Research Methods 39, 510–526 (2007)
Article Google Scholar
Comon, P.: Independent component analysis—a new concept? Signal Processing 36, 287–314 (1994)
Article MATH Google Scholar
Goldstone, R.: The role of similarity in categorization: Providing a groundwork. Cognition 52, 125–157 (1994)
Article Google Scholar
Honkela, T., Hyvärinen, A., Väyrynen, J.: WordICA — emergence of linguistic representations for words by independent component analysis. Natural Language Engineering 16, 277–308 (2010)
Article Google Scholar
Honkela, T., Pulkki, V., Kohonen, T.: Contextual relations of words in Grimm tales, analyzed by self-organizing map. In: Fogelman-Soulié, F., Gallinari, P. (eds.) Proc. of ICANN 1995, pp. 3–7. EC2, Nanterre (1995)
Google Scholar
Hyvärinen, A., Karhunen, J., Oja, E.: Independent Component Analysis. John Wiley & Sons (2001)
Google Scholar
Hyvärinen, A., Oja, E.: A fast fixed-point algorithm for independent component analysis. Neural Computation 9(7), 1483–1492 (1997)
Article Google Scholar
Kohonen, T.: Self-Organizing maps. Springer, Heidelberg (2001)
Google Scholar
Kohonen, T., Honkela, T.: Kohonen network. Scholarpedia 2(1), 1568 (2007)
Article Google Scholar
Landauer, T., Dumais, S.: A solution to Plato’s problem: The latent semantic analysis theory of acquisition, induction and representation of knowledge. Psychological Review 104(2), 211–240 (1997)
Article Google Scholar
Lindh-Knuutila, T., Väyrynen, J., Honkela, T.: Semantic analysis in word vector spaces with ICA and feature selection. In: Jancsary, J. (ed.) Proc. of The 11th Conference on Natural Language Processing (KONVENS), pp. 98–107. ÖGAI (2012)
Google Scholar
Miller, G., Charles, W.: Contextual correlates of semantic similarity. Language and Cognitive Processes pp. 1–28 (1991)
Google Scholar
Niwa, Y., Nitta, Y.: Co-occurrence vectors from corpora vs. distance vectors from dictionaries. In: Proc. of COLING 1994, pp. 304–309 (1994)
Google Scholar
Ritter, H., Kohonen, T.: Self-organizing semantic maps. Biological Cybernetics 61, 241–254 (1989)
Article Google Scholar
Sahlgren, M.: The word-space model: using distributional analysis to represent syntagmatic and paradigmatic relations between words in high-dimensional vector spaces. Ph.D. thesis, Stocholm University, Department of Linguistics (2006)
Google Scholar
Schütze, H.: Word space. In: Advances in Neural Information Processing Systems, vol. 5, pp. 895–902. Morgan Kaufmann (1993)
Google Scholar
Schwering, A.: Approaches to semantic similarity measurement for geo-spatial data: A survey. Transactions in GIS 12(1), 5–29 (2008)
Article Google Scholar
Turney, P., Pantel, P.: From frequency to meaning: Vector space models of semantics. Journal of Artificial Intelligence Research 37, 141–188 (2010)
MathSciNet MATH Google Scholar
Väyrynen, J.J., Lindqvist, L., Honkela, T.: Sparse distributed representations for words with thresholded independent component analysis. In: Si, J., Sun, R. (eds.) Proc. of IJCNN 2007, pp. 1031–1036. IEEE (2007)
Google Scholar
Venna, J., Kaski, S.: Local multidimensional scaling. Neural Networks 19(6), 889–899 (2006)
Article MATH Google Scholar

Download references

Author information

Authors and Affiliations

Department of Information and Computer Science, Aalto University School of Science, P.O. Box 15400, FI-00076, Aalto, Finland
Tiina Lindh-Knuutila & Timo Honkela

Authors

Tiina Lindh-Knuutila
View author publications
You can also search for this author in PubMed Google Scholar
Timo Honkela
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Départment des Systémes d’Information, Quartier UNIL-Dorigny, Bâtiment Internef, Université de Lausanne, 105, Lausanne, Switzerland
Marco Tomassini , Alberto Antonioni , Fabio Daolio & Pierre Buesser , , &

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Lindh-Knuutila, T., Honkela, T. (2013). Exploratory Text Analysis: Data-Driven versus Human Semantic Similarity Judgments. In: Tomassini, M., Antonioni, A., Daolio, F., Buesser, P. (eds) Adaptive and Natural Computing Algorithms. ICANNGA 2013. Lecture Notes in Computer Science, vol 7824. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-37213-1_44

Download citation

DOI: https://doi.org/10.1007/978-3-642-37213-1_44
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-37212-4
Online ISBN: 978-3-642-37213-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics