Skip to main content

Random Indexing Distributional Semantic Models for Croatian Language

  • Conference paper
Text, Speech and Dialogue (TSD 2011)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 6836))

Included in the following conference series:

  • 931 Accesses

Abstract

Distributional semantic models (DSMs) model semantic relations between expressions by comparing the contexts in which these expressions occur. This paper presents an extensive evaluation of distributional semantic models for Croatian language. We focus on random indexing models, an efficient and scalable approach to building DSMs. We build a number of models with different parameters (dimension, context type, and similarity measure) and compare them against human semantic similarity judgments. Our results indicate that even low-dimensional random indexing models may outperform the raw frequency models, and that the choice of the similarity measure is most important.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Baroni, M., Lenci, A.: One distributional memory, many semantic spaces. In: Proceedings of the EACL Workshop on Geometrical Models of Natural Language Semantics (2009)

    Google Scholar 

  2. Bingham, E., Mannila, H.: Random projection in dimensionality reduction: applications to image and text data. In: KDD 2001: Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2001)

    Google Scholar 

  3. Broda, B., Derwojedowa, M., Piasecki, M., Szpakowicz, S.: Corpus-based semantic relatedness for the construction of polish wordnet. In: Proceedings of the Sixth International Language Resources and Evaluation, LREC 2008 (2008)

    Google Scholar 

  4. Broda, B., Piasecki, M.: Supermatrix: a general tool for lexical semantic knowledge acquisition. In: Speech and Language Technology, vol. 11, pp. 239–254. Polish Phonetics Assocation (2008)

    Google Scholar 

  5. Burgess, C., Lund, K.: Modelling parsing constraints with high-dimensional context space. Language and Cognitive Processes 12, 1–34 (1997)

    Article  Google Scholar 

  6. Curran, J.: From Distributional to Semantic Similarity. Ph.D. thesis, University of Edinburgh (2008)

    Google Scholar 

  7. Evert, S., Lenci, A.: Foundations of distributional semantic models, http://wordspace.collocations.de/lib/exe/fetch.php/course:acl2010:naacl2010_part1.slides.pdf (2010)

  8. Kanerva, P.: Sparse Distributed Memory. MIT Press, Cambridge (1988)

    MATH  Google Scholar 

  9. Kilgarriff, A., Rychly, P., Smrz, P., Tugwell, D.: The sketch engine. In: Proceedings of the 11th EURALEX International Congress, pp. 105–116 (2004)

    Google Scholar 

  10. Landauer, T., Dumais, S.: A solution to Plato’s problem: The latent semantic analysis theory of acquisition, induction and representation of knowledge. Psychological Review 104(2), 211–240 (1997)

    Article  Google Scholar 

  11. Lenci, A.: Distributional semantics in linguistic and cognitive research. Italian Journal of Linguistics 20(1), 1–31 (2008)

    Google Scholar 

  12. Ljubešić, N., Boras, D., Bakarić, N., Njavro, J.: Comparing measures of semantic similarity. In: Proceedings of the ITI 2008 30th International Conference of Information Technology Interfaces (2008)

    Google Scholar 

  13. Mitrofanova, O., Mukhin, A., Panicheva, P., Savitsky, V.: Automatic word clustering in Russian texts. In: Matoušek, V., Mautner, P. (eds.) TSD 2007. LNCS (LNAI), vol. 4629, pp. 85–91. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  14. Nakov, P.: Latent semantic analysis for bulgarian literature. In: Proceedings of Spring Conference of Bulgarian Mathematicians Union. Borovetz (2001)

    Google Scholar 

  15. Nakov, P.: Latent semantic analysis for russian literature investigation. In: Proceedings of the 120 years Bulgarian Naval Academy Conference, Citeseer (2001)

    Google Scholar 

  16. Pado, S., Lapata, M.: Dependency-based construction of semantic space models. Computational Linguistics 33(2), 161–199 (2007)

    Article  MATH  Google Scholar 

  17. Piasecki, M.: Automated extraction of lexical meanings from corpus: A case study of potentialities and limitations. In: Representing Semantics in Digital Lexicography. Innovative Solutions for Lexical Entry Content in Slavic Lexicography, pp. 32–43. Institute of Slavic Studies, Polish Academy of Sciences (2009)

    Google Scholar 

  18. Sahlgren, M.: An introduction to random indexing. In: Proceedings of the Methods and Applications of Semantic Indexing Workshop at the 7th International Conference on Terminology and Knowledge Engineering (2005)

    Google Scholar 

  19. Sahlgren, M.: The Word-Space Model: Using distributional analysis to represent syntagmatic and paradigmatic relations between words in high-dimensional vector spaces. Ph.D. thesis, Department of Linguistics, Stockholm University (2006)

    Google Scholar 

  20. Sahlgren, M.: The distributional hypothesis. Rivista di Linguistica 20(1) (2008)

    Google Scholar 

  21. Smrž, P., Rychlỳ, P.: Finding semantically related words in large corpora. In: Matoušek, V., Mautner, P., Mouček, R., Tauser, K. (eds.) TSD 2001. LNCS (LNAI), vol. 2166, pp. 108–115. Springer, Heidelberg (2001)

    Chapter  Google Scholar 

  22. Turney, P.D., Pantel, P.: From frequency to meaning: Vector space models of semantics. Journal of Artificial Intelligence Research 37, 141–188 (2010)

    MathSciNet  MATH  Google Scholar 

  23. Šnajder, J., Dalbelo Bašić, B., Tadić, M.: Automatic acquisition of inflectional lexica for morphological normalisation. Information Processing and Management 44(5), 1720–1731 (2008)

    Article  Google Scholar 

  24. Wilks, Y., Charniak, E.: Computational Semantics: An Introduction to Artificial Intelligence and Natural Language Understanding. North-Holland, Amsterdam (1976)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Janković, V., Šnajder, J., Dalbelo Bašić, B. (2011). Random Indexing Distributional Semantic Models for Croatian Language. In: Habernal, I., Matoušek, V. (eds) Text, Speech and Dialogue. TSD 2011. Lecture Notes in Computer Science(), vol 6836. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-23538-2_52

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-23538-2_52

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-23537-5

  • Online ISBN: 978-3-642-23538-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics