Abstract
The aim of the present work is to find clues on how to compare the difficulties of five languages for morphosyntactic analysis and the development of lexicographic resources running a corpora and lexical comparative study on multilingual parallel corpora and morphosyntactic lexicons. First, we ran some corpus-based experiments without any other type of knowledge, following classical measures used in lexical statistics. Then we carried out further experiments on the corpora using morphosyntactic lexicons. Finally, we plotted given diagrams using different clues to offer an overview of the difficulty of a language for the development of morphosyntactic resources.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Atserias, J., Casas, B., Comelles, E., González, M., Padró, L., Padró, M.: FreeLing 1.3: Syntactic and semantic services in an open-source NLP library. In: Proceedings of LREC 2006. ELRA, Genoa (2006)
Baayen, H.: Word Frequency Distributions. Kluwer Academic Publishers, Dordrecht (2001)
Baroni, M.: Distributions in text. In: Lüdeling, A., Kytö, M. (eds.) Corpus linguistics: An International Handbook, vol. 2, pp. 803–821. Mouton de Gruyter (2009)
Blancafort, H., de Loupy, C.: Comparing languages from vocabulary growth to inflection paradigms: a study run on parallel corpora and multilingual lexicons. Procesamiento del lenguaje natural 41, 113–120 (2008) ISSN 1135-5948
Blancafort, H.: Learning Morphology of Romance, Germanic and Slavic Languages with the Tool Linguistica. In: LREC 2010, La Valetta, Malta (2010)
Evert, S., Baroni, M.: ZipfR: Working with words and other rare events in R. In: R User Conference (2006)
Feldman, A., Hana, J.: A resource-light approach to morpho-syntactic tagging. In: Mair, C., Meyer, C.F., Oostdijk, N. (eds.) Language and Computers. Studies in Practical Linguistics, vol. 70. Rodopi Press, Amsterdam (2010)
Goldsmith, J.: Unsupervised learning of the morphology of a natural language. Computational Linguistics 27(2), 153–198 (2001)
Ide, N., Véronis, J.: MULTEXT: Multilingual Text Tools and Corpora. In: Proceedings of the 15th International Conference on Computational Linguistics, COLING 1994, Kyoto, Japan, pp. 588–592 (1994)
Kettunen, K., Sadeniemi, M., Lindh-Knuutila, T., Honkela, T.: Analysis of EU languages through text compression. In: Salakoski, T., Ginter, F., Pyysalo, S., Pahikkala, T. (eds.) FinTAL 2006. LNCS (LNAI), vol. 4139, pp. 99–109. Springer, Heidelberg (2006)
Koehn, P.: Europarl: A Parallel Corpus for Statistical Machine Translation. MT Summit (2005)
Lepage,Y., Lardilleux, A, Gosme, J.: Commonality across vocabulary structures as an estimate of the proximity between languages. In: 4th Language & Technology Conference (LTC 2009), Poznań, Poland. (2009)
Lezius, W.: Morphy-German Morphology, Part-of-Speech Tagging and Applications. In: Heid, U., Evert, S., Lehmann, E., Rohrer, C. (eds.) Proceedings of the 9th EURALEX International Congress, Stuttgart, Germany, pp. 619–623 (2000)
Mérialdo, B.: Multilevel decoding for very-large-size-dictionary speech recognition. IBM Journal of Research and Development 32(2), 227–237 (1988)
Pirkola, A.: Morphological Typology of Languages for IR. Journal of Documentation 57, 330–348 (2001)
Resnik, P., Broman, O., Diab, M.: The Bible as a parallel corpus: Annotating the “Book of 2000 Tongues. Computers and the Humanities 33(1-2), 363–379 (1999)
Sagot, B., Clément, L., Villemonte de la Clergerie, E., Boullier, P.: The Lefff 2 syntactic lexicon for French: architecture, acquisition, use. In: Proceedings of LREC 2006 (2006)
Whaley, L.J.: Introduction to typology: the unity and diversity of language. Sage Publications, Thousand Oaks (1997)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Blancafort, H., de Loupy, C. (2011). Clues to Compare Languages for Morphosyntactic Analysis: A Study Run on Parallel Corpora and Morphosyntactic Lexicons. In: Vetulani, Z. (eds) Human Language Technology. Challenges for Computer Science and Linguistics. LTC 2009. Lecture Notes in Computer Science(), vol 6562. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-20095-3_31
Download citation
DOI: https://doi.org/10.1007/978-3-642-20095-3_31
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-20094-6
Online ISBN: 978-3-642-20095-3
eBook Packages: Computer ScienceComputer Science (R0)