Skip to main content

Word-Order Analysis Based Upon Treebank Data

  • Conference paper
  • First Online:
Book cover Advances in Artificial Intelligence and Soft Computing (MICAI 2015)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9413))

Included in the following conference series:

  • 1116 Accesses

Abstract

The paper describes an experiment consisting in the attempt to quantify word-order properties of three Indo-European languages (Czech, English and Farsi). The investigation is driven by the endeavor to find an objective way how to compare natural languages from the point of view of the degree of their word-order freedom. Unlike similar studies which concentrate either on purely linguistic or purely statistical approach, our experiment tries to combine both – the observations are verified against large samples of sentences from available treebanks, and, at the same time, we exploit the ability of our tools to analyze selected important phenomena (as, e.g., the differences of the word order of a main and a subordinate clause) more deeply.

The quantitative results of our research are collected from the syntactically annotated treebanks available for all three languages. Thanks to the HamleDT project, it is possible to search all treebanks in a uniform way by means of a universal query tool PML-TQ. This is also a secondary goal of this paper – to demonstrate the research potential provided by language resources which are to a certain extent unified.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://lindat.mff.cuni.cz/.

  2. 2.

    http://ufal.mff.cuni.cz/hamledt.

  3. 3.

    http://ufal.mff.cuni.cz/pdt3.0.

  4. 4.

    https://www.cis.upenn.edu/~treebank.

  5. 5.

    http://dadegan.ir/en/perdt.

  6. 6.

    https://lindat.mff.cuni.cz/services/pmltq/#!/treebanks.

  7. 7.

    https://lindat.mff.cuni.cz/services/pmltq/pdt30/.

  8. 8.

    https://lindat.mff.cuni.cz/services/pmltq/hamledt_en/.

  9. 9.

    The number of subject-less subordinated clauses is inadequately high due to the same reasons as for main clauses: annotation scheme for coordination and analytical verb forms.

  10. 10.

    https://lindat.mff.cuni.cz/services/pmltq/hamledt_fa/.

References

  1. Bejček, E., Hajičová, E., Hajič, J., Jínová, P., Kettnerová, V., Kolářová, V., Mikulová, M., Mírovský, J., Nedoluzhko, A., Panevová, J., Poláková, L., Ševčíková, M., Štěpánek, J., Zikánová, Š.: Prague Dependency Treebank 3.0 (2013)

    Google Scholar 

  2. Dryer, M.S., Haspelmath, M.: The World Atlas of Language Structures Online. Harcourt, Brace and company, Leipzig (2005–2013). http://wals.info, Accessed on 28 June 2015

  3. Futrell, R., Mahowald, K., Gibson, E.: Quantifying Word order freedom in dependency corpora. In: Proceedings of the International Conference on Dependency Linguistics (Depling 2015), Uppsala University, Uppsala, Sweden (2015)

    Google Scholar 

  4. Holan, T., Kuboň, V., Oliva, K., Plátek, M.: On complexity of word order. Les grammaires de dépendance - Traitement automatique des langues (TAL) 41(1), 273–300 (2000)

    Google Scholar 

  5. Kuboň, V., Lopatková, M., Plátek, M.: On formalization of word order properties. In: Gelbukh, A. (ed.) CICLing 2012, Part I. LNCS, vol. 7181, pp. 130–141. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  6. Lopatková, M., Homola, P., Klyueva, N.: Annotation of sentence structure: capturing the relationship between clauses in Czech sentences. Lang. Res. Eval. 46(1), 25–36 (2012)

    Article  Google Scholar 

  7. Marcus, M.P., Marcinkiewicz, M.A., Santorini, B.: Building a large annotated corpus of English: the penn treebank. Comput. Linguist. 19, 313–330 (1993)

    Google Scholar 

  8. Oepen, S., Netter, K., Klein, J.: TSNLP - Test suites for natural language processing. CSLI Lecture Notes (1998)

    Google Scholar 

  9. Pajas, P., Štěpánek, J.: System for querying syntactically annotated corpora. In: Proceedings of the ACL-IJCNLP 2009 Software Demonstrations, pp. 33–36. Association for Computational Linguistics, Suntec, Singapore, August 2009

    Google Scholar 

  10. Rosa, R., Žabokrtský, Z.: \(KL_{cpos^3}\) - a Language Similarity Measure for Delexicalized Parser Transfer (2015)

    Google Scholar 

  11. Rosa, R., Žabokrtský, Z.: MSTParser Model interpolation for multi-source delexicalized transfer. In: Proceedings of the 14th International Conference on Parsing Technologies, pp. 71–75. Association for Computational Linguistics, Stroudsburg (2015)

    Google Scholar 

  12. Sapir, E.: Language: An Introduction to the Study of Speech. Harcourt Brace and Company, New York (1921). http://www.gutenberg.org/files/12629/12629-h/12629-h.htm

    Google Scholar 

  13. Saussure, F.: Course in General Linguistics. Open Court, La Salle (1983). (prepared by C. Bally and A. Sechehaye, translated by R. Harris)

    Google Scholar 

  14. Skalička, V.: Vývoj jazyka. Soubor statí. Státní pedagogické nakladatelství, Praha (1960)

    Google Scholar 

  15. Čermák, F.: Jazyk a jazykověda. Pražská imaginace, Ptraha (1994)

    Google Scholar 

  16. Zeman, D., Dušek, O., Mareček, D., Popel, M., Ramasamy, L., Štěpánek, J., Žabokrtský, Z., Hajič, J.: HamleDT: harmonized multi-language dependency treebank. Lang. Res. Eval. 48(4), 601–637 (2014)

    Article  Google Scholar 

  17. Zeman, D., Resnik, P.: Cross-language parser adaptation between related languages. In: IJCNLP 2008 Workshop on NLP for Less Privileged Languages, pp. 35–42. Asian Federation of Natural Language Processing, Hyderabad (2008)

    Google Scholar 

Download references

Acknowledgments

This work has been using language resources developed, stored and distributed by the LINDAT/CLARIN project of the Ministry of Education of the Czech Republic (project LM2010013).

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Vladislav Kuboň or Markéta Lopatková .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Kuboň, V., Lopatková, M. (2015). Word-Order Analysis Based Upon Treebank Data. In: Sidorov, G., Galicia-Haro, S. (eds) Advances in Artificial Intelligence and Soft Computing. MICAI 2015. Lecture Notes in Computer Science(), vol 9413. Springer, Cham. https://doi.org/10.1007/978-3-319-27060-9_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-27060-9_4

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-27059-3

  • Online ISBN: 978-3-319-27060-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics