Skip to main content

DeepBankPT and Companion Portuguese Treebanks in a Multilingual Collection of Treebanks Aligned with the Penn Treebank

  • Conference paper
Computational Processing of the Portuguese Language (PROPOR 2014)

Abstract

We present a new collection of treebanks for the Portuguese language, comprising five datasets that cover major types of grammatically annotated corpora: TreeBankPT, PropBankPT, DependencyBankPT, LogicalFormBankPT and DeepBankPT. This collection is the Portuguese part of a broader multilingual collection of aligned treebanks that are developed for different languages, including English, under the same methodological principles and guidelines, and whose raw text versions are translations of the Penn Treebank, a de facto standard dataset for research on language technology.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Artstein, R., Poesio, M.: Inter-Coder Agreement for Computational Linguistics. Computational Linguistics 34(4) (2008)

    Google Scholar 

  2. Böhmová, A., Hajič, J., Hajičová, E., Hladká, B.: The Prague Dependency Treebank. In: Abeillé, A. (ed.) Treebanks. Kluwer (2003)

    Google Scholar 

  3. Castro, S.: Developing Reliability Metrics and Validation Tools for Datasets with Deep Linguistic Information, MA Dissertaion, Universty of Lisbon (2011)

    Google Scholar 

  4. Copestake, A., Flickinger, D., Pollard, C., Sag, I.A.: Minimal Recursion Semantics: An Introduction. Journal of Research on Language and Computation 3(4) (2005)

    Google Scholar 

  5. Copestake, A., Flickinger, D.: An open-source grammar development environment and broad-coverage English grammar using HPSG. In: LREC 2000 (2000)

    Google Scholar 

  6. Cotton, S., Bird, S.: An Integrated Framework for Treebanks and Multilayer Annotations. In: Proceedings of LREC 2002 (2002)

    Google Scholar 

  7. Branco, A., Carvalheiro, C., Pereira, S., Avelãs, M., Pinto, C., Silveira, S., Costa, F., Silva, J., Castro, S., Graça, J.: A PropBank for Portuguese: The CINTIL-PropBank. In: Proceedings of LREC 2012 (2012)

    Google Scholar 

  8. Branco, A., Silva, J., Costa, F., Castro, S.: CINTIL TreeBank Handbook: Design options for the representation of syntactic constituency. Department of Informatics, University of Lisbon, Technical Reports nb. di-fcul-tp-11-02 (2011)

    Google Scholar 

  9. António, B., Castro, S., Silva, J., Costa, F.: CINTIL DepBank Handbook: Design options for the representation of grammatical dependencies. In: Department of Informatics, University of Lisbon, Technical Reports nb. di-fcul-tr-11-03 (2011)

    Google Scholar 

  10. Branco, A., Costa, F., Silva, J., Silveira, S., Castro, S., Avelãs, M., Pinto, C., Graça, J.: Developing a Deep Linguistic Databank Supporting a Collection of Treebanks. In: Proceedings of LREC 2010 (2010)

    Google Scholar 

  11. Costa, F., Branco, A.: LXGram: A Deep Linguistic Processing Grammar for Portuguese. In: Pardo, T.A.S., Branco, A., Klautau, A., Vieira, R., de Lima, V.L.S. (eds.) PROPOR 2010. LNCS (LNAI), vol. 6001, pp. 86–89. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  12. Branco, A.: LogicalFormBanks, the Next Generation of Semantically Annotated Corpora: key issues in construction methodology. In: Klopotek, M., Przepiorkowski, A., Wierzchón, S., Trojanowski, K. (eds.) Recent Advances in Intelligent Information Systems. Academic Publishing House EXIT, Warsaw (2009)

    Google Scholar 

  13. Dickinson, M., Meurers, D.: Detecting Annotation Errors in Spoken Language Corpora. In: Proceedings of the Special Session on Treebanks for Spoken Language and Discourse at the 15th Nordic Conference of Computational Linguistics (2005)

    Google Scholar 

  14. Dipper, S.: Grammar-based Corpus Annotation. In: Proceedings of Workshop on Linguistically Interpreted Corpora (2000)

    Google Scholar 

  15. Flickinger, D., Kordoni, V., Zhang, Y., Branco, A., Simov, K., Osenova, P., Carvalheiro, C., Costa, F., Castro, S.: ParDeepBank: Multiple Parallel Deep Treebanking, Proceedings. In: Proceedings of TLT 2012 (2012)

    Google Scholar 

  16. Flickinger, D., Kordoni, V., Zhang, Y.: DeepBank: A Dynamically Annotated Treebank of the Wall Street Journal, Proceedings. In: Proceedings of TLT 2012 (2012)

    Google Scholar 

  17. Marcus, M., Santorini, B., Marcinkiewicz, M.A.: Building a large annotated corpus of English: The Penn Treebank. Computational Linguistics 19(2) (1993)

    Google Scholar 

  18. Oepen, S., Flickinger, D., Toutanova, K., Manning, C.D., Brants, T.: The LinGO Redwoods Treebank: Motivation and Preliminary Applications. In: Proceedings of COLING 2002 (2002)

    Google Scholar 

  19. Oepen, S.: [incr tsdb()] — Competence and Performance Laboratory. User Manual, Technical Report, Computational Linguistics, Saarland University, Germany (1999)

    Google Scholar 

  20. Palmer, M., Kingsbury, P., Gildea, D.: The Proposition Bank: An Annotated Corpus of Semantic Roles. Computational Linguistics, 31 (2005)

    Google Scholar 

  21. Rosén, V., Meurer, P., Losnegaard, G.S., Lyse, G.I., De Smedt, K., Thunes, M., Dyvik, H.: An integrated web-based treebank annotation system. In: Proceedings of TLT 2012 (2012)

    Google Scholar 

  22. Rosén, V., Meurer, P., de Smedt, K.: LFG Parsebanker: A Toolkit for Building and Searching a Treebank as a Parsed Corpus. In: Van Eynde, F., Frank, A., van Noord, G., De Smedt, K. (eds.) Proceedings of TLT7 (2009)

    Google Scholar 

  23. Rosén, V., Meurer, P., de Smedt, K.: Constructing a Parsed Corpus with a Large LFG Grammar. In: Butt, M., King, T.H. (eds.) Proceedings of the LFG 2005 Conference. CSLI Publications (2005)

    Google Scholar 

  24. Silva, J., Branco, A.: Deep, consistent and also useful: Extracting vistas from deep corpora for shallower tasks. In: Proceedings of the Workshop on Advanced Treebanking, Proceedings of LREC 2012 (2012)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Branco, A. et al. (2014). DeepBankPT and Companion Portuguese Treebanks in a Multilingual Collection of Treebanks Aligned with the Penn Treebank. In: Baptista, J., Mamede, N., Candeias, S., Paraboni, I., Pardo, T.A.S., Volpe Nunes, M.d.G. (eds) Computational Processing of the Portuguese Language. PROPOR 2014. Lecture Notes in Computer Science(), vol 8775. Springer, Cham. https://doi.org/10.1007/978-3-319-09761-9_23

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-09761-9_23

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-09760-2

  • Online ISBN: 978-3-319-09761-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics