Random Indexing Distributional Semantic Models for Croatian Language

Janković, Vedrana; Šnajder, Jan; Dalbelo Bašić, Bojana

doi:10.1007/978-3-642-23538-2_52

Vedrana Janković²¹,
Jan Šnajder²¹ &
Bojana Dalbelo Bašić²¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 6836))

Included in the following conference series:

International Conference on Text, Speech and Dialogue

931 Accesses

Abstract

Distributional semantic models (DSMs) model semantic relations between expressions by comparing the contexts in which these expressions occur. This paper presents an extensive evaluation of distributional semantic models for Croatian language. We focus on random indexing models, an efficient and scalable approach to building DSMs. We build a number of models with different parameters (dimension, context type, and similarity measure) and compare them against human semantic similarity judgments. Our results indicate that even low-dimensional random indexing models may outperform the raw frequency models, and that the choice of the similarity measure is most important.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Baroni, M., Lenci, A.: One distributional memory, many semantic spaces. In: Proceedings of the EACL Workshop on Geometrical Models of Natural Language Semantics (2009)
Google Scholar
Bingham, E., Mannila, H.: Random projection in dimensionality reduction: applications to image and text data. In: KDD 2001: Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2001)
Google Scholar
Broda, B., Derwojedowa, M., Piasecki, M., Szpakowicz, S.: Corpus-based semantic relatedness for the construction of polish wordnet. In: Proceedings of the Sixth International Language Resources and Evaluation, LREC 2008 (2008)
Google Scholar
Broda, B., Piasecki, M.: Supermatrix: a general tool for lexical semantic knowledge acquisition. In: Speech and Language Technology, vol. 11, pp. 239–254. Polish Phonetics Assocation (2008)
Google Scholar
Burgess, C., Lund, K.: Modelling parsing constraints with high-dimensional context space. Language and Cognitive Processes 12, 1–34 (1997)
Article Google Scholar
Curran, J.: From Distributional to Semantic Similarity. Ph.D. thesis, University of Edinburgh (2008)
Google Scholar
Evert, S., Lenci, A.: Foundations of distributional semantic models, http://wordspace.collocations.de/lib/exe/fetch.php/course:acl2010:naacl2010_part1.slides.pdf (2010)
Kanerva, P.: Sparse Distributed Memory. MIT Press, Cambridge (1988)
MATH Google Scholar
Kilgarriff, A., Rychly, P., Smrz, P., Tugwell, D.: The sketch engine. In: Proceedings of the 11th EURALEX International Congress, pp. 105–116 (2004)
Google Scholar
Landauer, T., Dumais, S.: A solution to Plato’s problem: The latent semantic analysis theory of acquisition, induction and representation of knowledge. Psychological Review 104(2), 211–240 (1997)
Article Google Scholar
Lenci, A.: Distributional semantics in linguistic and cognitive research. Italian Journal of Linguistics 20(1), 1–31 (2008)
Google Scholar
Ljubešić, N., Boras, D., Bakarić, N., Njavro, J.: Comparing measures of semantic similarity. In: Proceedings of the ITI 2008 30th International Conference of Information Technology Interfaces (2008)
Google Scholar
Mitrofanova, O., Mukhin, A., Panicheva, P., Savitsky, V.: Automatic word clustering in Russian texts. In: Matoušek, V., Mautner, P. (eds.) TSD 2007. LNCS (LNAI), vol. 4629, pp. 85–91. Springer, Heidelberg (2007)
Chapter Google Scholar
Nakov, P.: Latent semantic analysis for bulgarian literature. In: Proceedings of Spring Conference of Bulgarian Mathematicians Union. Borovetz (2001)
Google Scholar
Nakov, P.: Latent semantic analysis for russian literature investigation. In: Proceedings of the 120 years Bulgarian Naval Academy Conference, Citeseer (2001)
Google Scholar
Pado, S., Lapata, M.: Dependency-based construction of semantic space models. Computational Linguistics 33(2), 161–199 (2007)
Article MATH Google Scholar
Piasecki, M.: Automated extraction of lexical meanings from corpus: A case study of potentialities and limitations. In: Representing Semantics in Digital Lexicography. Innovative Solutions for Lexical Entry Content in Slavic Lexicography, pp. 32–43. Institute of Slavic Studies, Polish Academy of Sciences (2009)
Google Scholar
Sahlgren, M.: An introduction to random indexing. In: Proceedings of the Methods and Applications of Semantic Indexing Workshop at the 7th International Conference on Terminology and Knowledge Engineering (2005)
Google Scholar
Sahlgren, M.: The Word-Space Model: Using distributional analysis to represent syntagmatic and paradigmatic relations between words in high-dimensional vector spaces. Ph.D. thesis, Department of Linguistics, Stockholm University (2006)
Google Scholar
Sahlgren, M.: The distributional hypothesis. Rivista di Linguistica 20(1) (2008)
Google Scholar
Smrž, P., Rychlỳ, P.: Finding semantically related words in large corpora. In: Matoušek, V., Mautner, P., Mouček, R., Tauser, K. (eds.) TSD 2001. LNCS (LNAI), vol. 2166, pp. 108–115. Springer, Heidelberg (2001)
Chapter Google Scholar
Turney, P.D., Pantel, P.: From frequency to meaning: Vector space models of semantics. Journal of Artificial Intelligence Research 37, 141–188 (2010)
MathSciNet MATH Google Scholar
Šnajder, J., Dalbelo Bašić, B., Tadić, M.: Automatic acquisition of inflectional lexica for morphological normalisation. Information Processing and Management 44(5), 1720–1731 (2008)
Article Google Scholar
Wilks, Y., Charniak, E.: Computational Semantics: An Introduction to Artificial Intelligence and Natural Language Understanding. North-Holland, Amsterdam (1976)
Google Scholar

Download references

Author information

Authors and Affiliations

Faculty of Electrical Engineering and Computing, University of Zagreb, Croatia
Vedrana Janković, Jan Šnajder & Bojana Dalbelo Bašić

Authors

Vedrana Janković
View author publications
You can also search for this author in PubMed Google Scholar
Jan Šnajder
View author publications
You can also search for this author in PubMed Google Scholar
Bojana Dalbelo Bašić
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Sciences, University of West Bohemia, Univerzitní 22, 306 14, Pilsen, Czech Republic
Ivan Habernal
Faculty of Applied Sciences, Dept. of Computer Science and Engineering, University of West Bohemia, Univerzitni 8, 306 14, Pilsen, Czech Republic
Václav Matoušek

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Janković, V., Šnajder, J., Dalbelo Bašić, B. (2011). Random Indexing Distributional Semantic Models for Croatian Language. In: Habernal, I., Matoušek, V. (eds) Text, Speech and Dialogue. TSD 2011. Lecture Notes in Computer Science(), vol 6836. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-23538-2_52

Download citation

DOI: https://doi.org/10.1007/978-3-642-23538-2_52
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-23537-5
Online ISBN: 978-3-642-23538-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics