Skip to main content

Clues to Compare Languages for Morphosyntactic Analysis: A Study Run on Parallel Corpora and Morphosyntactic Lexicons

  • Conference paper
Human Language Technology. Challenges for Computer Science and Linguistics (LTC 2009)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 6562))

Included in the following conference series:

  • 1070 Accesses

Abstract

The aim of the present work is to find clues on how to compare the difficulties of five languages for morphosyntactic analysis and the development of lexicographic resources running a corpora and lexical comparative study on multilingual parallel corpora and morphosyntactic lexicons. First, we ran some corpus-based experiments without any other type of knowledge, following classical measures used in lexical statistics. Then we carried out further experiments on the corpora using morphosyntactic lexicons. Finally, we plotted given diagrams using different clues to offer an overview of the difficulty of a language for the development of morphosyntactic resources.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Atserias, J., Casas, B., Comelles, E., González, M., Padró, L., Padró, M.: FreeLing 1.3: Syntactic and semantic services in an open-source NLP library. In: Proceedings of LREC 2006. ELRA, Genoa (2006)

    Google Scholar 

  2. Baayen, H.: Word Frequency Distributions. Kluwer Academic Publishers, Dordrecht (2001)

    Book  MATH  Google Scholar 

  3. Baroni, M.: Distributions in text. In: Lüdeling, A., Kytö, M. (eds.) Corpus linguistics: An International Handbook, vol. 2, pp. 803–821. Mouton de Gruyter (2009)

    Google Scholar 

  4. Blancafort, H., de Loupy, C.: Comparing languages from vocabulary growth to inflection paradigms: a study run on parallel corpora and multilingual lexicons. Procesamiento del lenguaje natural 41, 113–120 (2008) ISSN 1135-5948

    Google Scholar 

  5. Blancafort, H.: Learning Morphology of Romance, Germanic and Slavic Languages with the Tool Linguistica. In: LREC 2010, La Valetta, Malta (2010)

    Google Scholar 

  6. Evert, S., Baroni, M.: ZipfR: Working with words and other rare events in R. In: R User Conference (2006)

    Google Scholar 

  7. Feldman, A., Hana, J.: A resource-light approach to morpho-syntactic tagging. In: Mair, C., Meyer, C.F., Oostdijk, N. (eds.) Language and Computers. Studies in Practical Linguistics, vol. 70. Rodopi Press, Amsterdam (2010)

    Google Scholar 

  8. Goldsmith, J.: Unsupervised learning of the morphology of a natural language. Computational Linguistics 27(2), 153–198 (2001)

    Article  MathSciNet  Google Scholar 

  9. Ide, N., Véronis, J.: MULTEXT: Multilingual Text Tools and Corpora. In: Proceedings of the 15th International Conference on Computational Linguistics, COLING 1994, Kyoto, Japan, pp. 588–592 (1994)

    Google Scholar 

  10. Kettunen, K., Sadeniemi, M., Lindh-Knuutila, T., Honkela, T.: Analysis of EU languages through text compression. In: Salakoski, T., Ginter, F., Pyysalo, S., Pahikkala, T. (eds.) FinTAL 2006. LNCS (LNAI), vol. 4139, pp. 99–109. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  11. Koehn, P.: Europarl: A Parallel Corpus for Statistical Machine Translation. MT Summit (2005)

    Google Scholar 

  12. Lepage,Y., Lardilleux, A, Gosme, J.: Commonality across vocabulary structures as an estimate of the proximity between languages. In: 4th Language & Technology Conference (LTC 2009), Poznań, Poland. (2009)

    Google Scholar 

  13. Lezius, W.: Morphy-German Morphology, Part-of-Speech Tagging and Applications. In: Heid, U., Evert, S., Lehmann, E., Rohrer, C. (eds.) Proceedings of the 9th EURALEX International Congress, Stuttgart, Germany, pp. 619–623 (2000)

    Google Scholar 

  14. Mérialdo, B.: Multilevel decoding for very-large-size-dictionary speech recognition. IBM Journal of Research and Development 32(2), 227–237 (1988)

    Article  Google Scholar 

  15. Pirkola, A.: Morphological Typology of Languages for IR. Journal of Documentation 57, 330–348 (2001)

    Article  Google Scholar 

  16. Resnik, P., Broman, O., Diab, M.: The Bible as a parallel corpus: Annotating the “Book of 2000 Tongues. Computers and the Humanities 33(1-2), 363–379 (1999)

    Google Scholar 

  17. Sagot, B., Clément, L., Villemonte de la Clergerie, E., Boullier, P.: The Lefff 2 syntactic lexicon for French: architecture, acquisition, use. In: Proceedings of LREC 2006 (2006)

    Google Scholar 

  18. Whaley, L.J.: Introduction to typology: the unity and diversity of language. Sage Publications, Thousand Oaks (1997)

    Book  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Blancafort, H., de Loupy, C. (2011). Clues to Compare Languages for Morphosyntactic Analysis: A Study Run on Parallel Corpora and Morphosyntactic Lexicons. In: Vetulani, Z. (eds) Human Language Technology. Challenges for Computer Science and Linguistics. LTC 2009. Lecture Notes in Computer Science(), vol 6562. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-20095-3_31

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-20095-3_31

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-20094-6

  • Online ISBN: 978-3-642-20095-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics