Testing Word Similarity: Language Independent Approach with Examples from Romance

Alexandrov, Mikhail; Blanco, Xavier; Makagonov, Pavel

doi:10.1007/978-3-540-27779-8_20

Mikhail Alexandrov¹⁸,
Xavier Blanco¹⁹ &
Pavel Makagonov²⁰

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 3136))

Included in the following conference series:

International Conference on Application of Natural Language to Information Systems

707 Accesses
2 Citations

Abstract

Identification of words with the same basic meaning (stemming) has important applications in Information Retrieval, first of all for constructing word frequency lists. Usual morphologically-based approaches (including the Porter stemmers) rely on language-dependent linguistic resources or knowledge, which causes problems when working with multilingual data and multi-thematic document collections. We suggest several empirical formulae with easy to adjust parameters and demonstrate how to construct such formulae for a given language using an inductive method of model self-organization. This method considers a set of models (formulae) of a given class and selects the best ones using training and test samples. We describe the method and give detailed examples for French, Italian, Portuguese, and Spanish. The formulae are examined on real domain-oriented document collections. Our approach can be easily applied to other European languages.

Work done under partial support of Mexican Government (CONACyT and CGEPI-IPN).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unsupervised Approaches for Computing Word Similarity in Portuguese

Statistical Stemmers: A Reproducibility Study

Semantic Relatedness for All (Languages): A Comparative Analysis of Multilingual Semantic Relatedness Using Machine Translation

References

Baeza-Yates, R., Ribero-Neto, B.: Modern Information Retrieval. Addison-Wesley, Reading (1999)
Google Scholar
Cramer, H.: Mathematical methods of statistics, Cambridge (1946)
Google Scholar
Gelbukh, A.: Exact and approximate prefis search under access locality requirements for morphological analysis and spelling correction. Computación y Sistemas 6(3), 167–182 (2003)
Google Scholar
Gelbukh, A., Sidorov, G.: Zipf and Heaps Laws’ Coefficients Depend on Language. In: Gelbukh, A. (ed.) CICLing 2001. LNCS, vol. 2004, pp. 332–335. Springer, Heidelberg (2001)
Chapter Google Scholar
Gelbukh, A., Sidorov, G.: Morphological Analysis of Inflective Languages through Generation. Procesamiento de Lenguaje Natural (29), 105–112 (2002)
Google Scholar
Gelbukh, A., Sidorov, G.: Approach to construction of automatic morphological analysis systems for inflective languages with little effort. In: Gelbukh, A. (ed.) CICLing 2003. LNCS, vol. 2588, pp. 215–220. Springer, Heidelberg (2003)
Chapter Google Scholar
Ivahnenko, A.:: Manual on typical algorithms of modeling. Tehnika Publ., Kiev (1980) (in Russian)
Google Scholar
Makagonov, P., Alexandrov, M.: Constructing empirical formulas for testing word similarity by the inductive method of model self-organization. In: Ranchhold, Mamede (eds.) Advances in Natural Language Processing. LNCS (LNAI), vol. 2379, pp. 239–247. Springer, Heidelberg (2002)
Chapter Google Scholar
Porter, M.: An algorithm for suffix stripping. Program 14, 130–137 (1980)
Google Scholar

Download references

Author information

Authors and Affiliations

Center for Computing Research, National Polytechnic Institute (IPN), Mexico
Mikhail Alexandrov
Department of French and Romance Philology, Autonomous University of Barcelona,
Xavier Blanco
Mixteca University of Technology, Mexico
Pavel Makagonov

Authors

Mikhail Alexandrov
View author publications
You can also search for this author in PubMed Google Scholar
Xavier Blanco
View author publications
You can also search for this author in PubMed Google Scholar
Pavel Makagonov
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Computing, Science and Engineering Newton Building, University of Salford, M5 4WT, Greater Manchester, UK
Farid Meziane
Lab. CEDRIC, CNAM, Paris, France
Elisabeth Métais

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Alexandrov, M., Blanco, X., Makagonov, P. (2004). Testing Word Similarity: Language Independent Approach with Examples from Romance. In: Meziane, F., Métais, E. (eds) Natural Language Processing and Information Systems. NLDB 2004. Lecture Notes in Computer Science, vol 3136. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-27779-8_20

Download citation

DOI: https://doi.org/10.1007/978-3-540-27779-8_20
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-22564-5
Online ISBN: 978-3-540-27779-8
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics

Testing Word Similarity: Language Independent Approach with Examples from Romance

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Unsupervised Approaches for Computing Word Similarity in Portuguese

Statistical Stemmers: A Reproducibility Study

Semantic Relatedness for All (Languages): A Comparative Analysis of Multilingual Semantic Relatedness Using Machine Translation

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Testing Word Similarity: Language Independent Approach with Examples from Romance

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Unsupervised Approaches for Computing Word Similarity in Portuguese

Statistical Stemmers: A Reproducibility Study

Semantic Relatedness for All (Languages): A Comparative Analysis of Multilingual Semantic Relatedness Using Machine Translation

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation