Skip to main content

Exploratory Text Analysis: Data-Driven versus Human Semantic Similarity Judgments

  • Conference paper
Adaptive and Natural Computing Algorithms (ICANNGA 2013)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 7824))

Included in the following conference series:

Abstract

We present an approach for comparing human-made and automatically generated semantic representations with an assumption that neither of these has a primary status over the other. In the experimental part, we compare the results gained by using independent component analysis and the self-organizing map algorithm on word context analysis with a semantically labeled dictionary called BLESS. The data-driven methods are useful in assessing the quality of the hand-created semantic resources and these resources can be used to evaluate the outcome of the automated process. We present a number of specific findings that go beyond typical quantitative evaluations of the results of data-driven methods in which the manually created resources are usually taken as a gold standard.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Baroni, M., Lenci, A.: How we blessed distributional semantic evaluation. In: Pado, S., Peirsman, Y. (eds.) Proc. of EMNLP 2012, Geometrical Models for Natural Language Semantics (GEMS 2011) Workshop, pp. 1–10. ACL, East Stroudsburg (2011)

    Google Scholar 

  2. Bullinaria, J., Levy, J.: Extracting semantic representations from word co-occurrence statistics: A computational study. Behavior Research Methods 39, 510–526 (2007)

    Article  Google Scholar 

  3. Comon, P.: Independent component analysis—a new concept? Signal Processing 36, 287–314 (1994)

    Article  MATH  Google Scholar 

  4. Goldstone, R.: The role of similarity in categorization: Providing a groundwork. Cognition 52, 125–157 (1994)

    Article  Google Scholar 

  5. Honkela, T., Hyvärinen, A., Väyrynen, J.: WordICA — emergence of linguistic representations for words by independent component analysis. Natural Language Engineering 16, 277–308 (2010)

    Article  Google Scholar 

  6. Honkela, T., Pulkki, V., Kohonen, T.: Contextual relations of words in Grimm tales, analyzed by self-organizing map. In: Fogelman-Soulié, F., Gallinari, P. (eds.) Proc. of ICANN 1995, pp. 3–7. EC2, Nanterre (1995)

    Google Scholar 

  7. Hyvärinen, A., Karhunen, J., Oja, E.: Independent Component Analysis. John Wiley & Sons (2001)

    Google Scholar 

  8. Hyvärinen, A., Oja, E.: A fast fixed-point algorithm for independent component analysis. Neural Computation 9(7), 1483–1492 (1997)

    Article  Google Scholar 

  9. Kohonen, T.: Self-Organizing maps. Springer, Heidelberg (2001)

    Google Scholar 

  10. Kohonen, T., Honkela, T.: Kohonen network. Scholarpedia 2(1), 1568 (2007)

    Article  Google Scholar 

  11. Landauer, T., Dumais, S.: A solution to Plato’s problem: The latent semantic analysis theory of acquisition, induction and representation of knowledge. Psychological Review 104(2), 211–240 (1997)

    Article  Google Scholar 

  12. Lindh-Knuutila, T., Väyrynen, J., Honkela, T.: Semantic analysis in word vector spaces with ICA and feature selection. In: Jancsary, J. (ed.) Proc. of The 11th Conference on Natural Language Processing (KONVENS), pp. 98–107. ÖGAI (2012)

    Google Scholar 

  13. Miller, G., Charles, W.: Contextual correlates of semantic similarity. Language and Cognitive Processes pp. 1–28 (1991)

    Google Scholar 

  14. Niwa, Y., Nitta, Y.: Co-occurrence vectors from corpora vs. distance vectors from dictionaries. In: Proc. of COLING 1994, pp. 304–309 (1994)

    Google Scholar 

  15. Ritter, H., Kohonen, T.: Self-organizing semantic maps. Biological Cybernetics 61, 241–254 (1989)

    Article  Google Scholar 

  16. Sahlgren, M.: The word-space model: using distributional analysis to represent syntagmatic and paradigmatic relations between words in high-dimensional vector spaces. Ph.D. thesis, Stocholm University, Department of Linguistics (2006)

    Google Scholar 

  17. Schütze, H.: Word space. In: Advances in Neural Information Processing Systems, vol. 5, pp. 895–902. Morgan Kaufmann (1993)

    Google Scholar 

  18. Schwering, A.: Approaches to semantic similarity measurement for geo-spatial data: A survey. Transactions in GIS 12(1), 5–29 (2008)

    Article  Google Scholar 

  19. Turney, P., Pantel, P.: From frequency to meaning: Vector space models of semantics. Journal of Artificial Intelligence Research 37, 141–188 (2010)

    MathSciNet  MATH  Google Scholar 

  20. Väyrynen, J.J., Lindqvist, L., Honkela, T.: Sparse distributed representations for words with thresholded independent component analysis. In: Si, J., Sun, R. (eds.) Proc. of IJCNN 2007, pp. 1031–1036. IEEE (2007)

    Google Scholar 

  21. Venna, J., Kaski, S.: Local multidimensional scaling. Neural Networks 19(6), 889–899 (2006)

    Article  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Lindh-Knuutila, T., Honkela, T. (2013). Exploratory Text Analysis: Data-Driven versus Human Semantic Similarity Judgments. In: Tomassini, M., Antonioni, A., Daolio, F., Buesser, P. (eds) Adaptive and Natural Computing Algorithms. ICANNGA 2013. Lecture Notes in Computer Science, vol 7824. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-37213-1_44

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-37213-1_44

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-37212-4

  • Online ISBN: 978-3-642-37213-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics