Skip to main content

Syntax Deep Explorer

  • Conference paper
  • First Online:
Book cover Computational Processing of the Portuguese Language (PROPOR 2016)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9727))

Abstract

The analysis of the co-occurrence patterns between words allows for a better understanding of the use (and meaning) of words and its most straightforward applications are lexicography and linguist description in general. Some tools already produce co-occurrence information about words taken from Portuguese corpora, but few can use lemmata or syntactic dependency information. Syntax Deep Explorer is a new tool that uses several association measures to quantify several co-occurrence types, defined on the syntactic dependencies (e.g. subject, complement, modifier) between a target word lemma and its co-locates. The resulting co-occurrence statistics is represented in lex-grams, that is, a synopsis of the syntactically-based co-occurrence patterns of a word distribution within a given corpus. These lex-grams are obtained from a large-sized Portuguese corpus processed by STRING [19] and are presented in a user-friendly way through a graphical interface. The Syntax Deep Explorer will allow the development of finer lexical resources and the improvement of STRING processing in general, as well as providing public access to co-occurrence information derived from parsed corpora.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    string.l2f.inesc-id.pt/demo/deepExplorer (last visit 29/02/2016).

  2. 2.

    www.l2f.inesc-id.pt (last visit 29/02/2016).

  3. 3.

    http://www.linguateca.pt/ACDC/ (last visit on 29/02/2016).

  4. 4.

    http://grammarsoft.com/ (last visit 29/02/2016).

  5. 5.

    http://gramtrans.com/gramtrans (last visit on 29/02/2016).

  6. 6.

    http://www.sketchengine.co.uk/ (last visit 29/02/2016).

  7. 7.

    http://www.sqlite.org/about (last visit 29/02/2016).

  8. 8.

    https://bitbucket.org/xerial/sqlite-jdbc (last visit 29/02/2016).

  9. 9.

    https://angularjs.org (last visit 29/02/2016).

References

  1. Art-Mokhtar, S., Chanod, J.P., Roux, C.: Robustness beyond shallowness: incremental deep parsing. Nat. Lang. Eng. 8, 121–144 (2002)

    Google Scholar 

  2. Bick, E.: The Parsing System PALAVRAS. Automatic Grammatical Analysis of Portuguese in a Constraint Grammar Framework. Aarhus University Press, Aarhus (2000)

    Google Scholar 

  3. Bick, E.: DeepDict - a graphical corpus-based dictionary of word relations. In: Proceedings of NODALIDA 2009. NEALT Proceedings Series, vol. 4, pp. 268–271. Tartu University Library, Tartu (2009)

    Google Scholar 

  4. Biemann, C., Bordag, S., Heyer, G., Quasthoff, U., Wolff, C.: Language-independent methods for compiling monolingual lexical data. In: Gelbukh, A. (ed.) CICLing 2004. LNCS, vol. 2945, pp. 217–228. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  5. Carapinha, F.: Extração Automática de Conteúdos Documentais. Master’s thesis, Instituto Superior Técnico, Universidade Técnica de Lisboa, June 2013

    Google Scholar 

  6. Chen, P.: The entity-relationship model—toward a unified view of data. ACM Trans. Database Syst. 1(1), 9–36 (1976)

    Article  Google Scholar 

  7. Church, K., Hanks, P.: Word association norms, mutual information, and lexicography. Comput. Linguist. 16(1), 22–29 (1990)

    Google Scholar 

  8. Codd, E.: A relational model of data for large shared data banks. Commun. ACM 26(6), 64–69 (1983)

    Article  Google Scholar 

  9. Dice, L.: Measures of the amount of ecologic association between species. Ecology 26(3), 297–302 (1945)

    Article  Google Scholar 

  10. Diniz, C., Mamede, N., Pereira, J.: RuDriCo2 - a faster disambiguator and segmentation modifier. In: INFORUM II, pp. 573–584, September 2010

    Google Scholar 

  11. Diniz, C., Mamede, N., Pereira, J.D.: RuDriCo2 - a faster disambiguator and segmentation modifier. In: Simpósio de Informática - INForum, pp. 573–584. Universidade do Minho, Portugal (2010)

    Google Scholar 

  12. Dunning, T.: Accurate methods for the statistics of surprise and coincidence. Comput. Linguist. 19(1), 61–74 (1993)

    Google Scholar 

  13. Hagège, C., Baptista, J., Mamede, N.: Identificação, Classificação e Normalização de Expressões Temporais em Português: a Experiência do Segundo HAREM e o Futuro. In: Mota, C., Santos, D. (eds.) Desafios na Avaliação Conjunta do Reconhecimento de Entidades Mencionadas: o Segundo HAREM, chap. 2, pp. 33–54. Linguateca (2008). http://www.inesc-id.pt/ficheiros/publicacoes/5758.pdf/

  14. Hagège, C., Baptista, J., Mamede, N.: Portuguese temporal expressions recognition: from TE characterization to an effective TER module implementation. In: 7th Brazilian Symposium in Information and Human Language Technology, STIL 2009, pp. 1–5. Sociedade Brasileira de Computação, São Carlos (2009)

    Google Scholar 

  15. Hagège, C., Baptista, J., Mamede, N.J.: Reconhecimento de entidadesmencionadas com o xip: Uma colaboração entre o inesc-l2f e a xerox. In: Mota, C., Santos, D. (eds.) Desafios na avaliação conjunta doreconhecimento de entidades mencionadas: Actas do Encontro do Segundo HAREM (Aveiro, 11 de Setembro de 2008). Linguateca (2009)

    Google Scholar 

  16. Hagège, C., Baptista, J., Mamede, N.J.: Caracterização e processamento de expressões temporais em português. Linguamática 2(1), 63–76 (2010)

    Google Scholar 

  17. Kilgarriff, A., et al.: The sketch engine: ten years on. Lexicography 1(1), 7–36 (2014)

    Article  Google Scholar 

  18. Kilgarriff, A., Rychly, P., Tugwell, D., Smrz, P.: The sketch engine. In: Proceedings of Euralex. vol. Demo Session, pp. 105–116. Lorient, France, July 2004

    Google Scholar 

  19. Mamede, N., Baptista, J., Diniz, C., Cabarrão, V.: STRING: an hybrid statistical and rule-based natural language processing chain for Portuguese. In: PROPOR 2012, vol. Demo Session, April 2012

    Google Scholar 

  20. Mamede, N.J., Baptista, J.: Nomenclature of chunks and dependencies in Portuguese XIP Grammar 4.5. Technical report, L2F-Spoken Language Laboratory, INESC-ID Lisboa, Lisboa, January 2016

    Google Scholar 

  21. Manning, C., Schütze, H.: Foundations of Statistical Natural Language Processing. MIT Press, Cambridge (1999)

    MATH  Google Scholar 

  22. Marques, J.S.: Anaphora Resolution. Master’s thesis, Instituto Superior Técnico, Universidade Técnica de Lisboa, Lisboa (2013)

    Google Scholar 

  23. Maurício, A.: Identificação, Classificação e Normalização de Expressões Temporais. Master’s thesis, Instituto Superior Técnico, Universidade Técnica de Lisboa, Lisboa, November 2011

    Google Scholar 

  24. Nobre, N.: Resolução de Expressões Anafóricas. Master’s thesis, Instituto Superior Técnico, Universidade Técnica de Lisboa, June 2011

    Google Scholar 

  25. Oliveira, D.: Extraction and Classification of Named Entities. Master’s thesis, Instituto Superior Técnico, Universidade Técnica de Lisboa (2010)

    Google Scholar 

  26. Pereira, S.: Linguistics Parameters for Zero Anaphora Resolution. Master’s thesis, Universidade do Algarve and University of Wolverhampton (2010)

    Google Scholar 

  27. Quasthoff, U., Richter, M., Biemann, C.: Corpus portal for search in monolingual corpora. In: Proceedings of the 5th LREC, pp. 1799–1802 (2006)

    Google Scholar 

  28. Ribeiro, R.: Anotação Morfossintática Desambiguada do Português. Master’s thesis, Instituto Superior Técnico, Universidade Técnica de Lisboa, March 2003

    Google Scholar 

  29. Rychly, P.: Manatee/Bonito - a modular corpus manager. In: Sojka, P., Horák, A. (eds.) RASLAN 2008, pp. 65–70. Masaryk University, Brno (2007)

    Google Scholar 

  30. Rychly, P.: A lexicographer-friendly association score. In: RASLAN 2008, pp. 6–9. Masarykova Univerzita, Brno (2008)

    Google Scholar 

  31. Santos, D., Rocha, P.: Evaluating CETEMPúblico, a free resource for Portuguese. In: Proceedings of the 39th Annual Meeting of ACL, ACL 2001, pp. 450–457. Association for Computational Linguistics, Stroudsburg (2001)

    Google Scholar 

  32. Silberschatz, A., Korth, H., Sudarshan, S.: Database System Concepts. Connect, learn, succeed. McGraw-Hill Education (2010)

    Google Scholar 

  33. Sinclair, J.: Corpus, Concordance, Collocation. Oxford University Press, Oxford (1991)

    Google Scholar 

  34. Smadja, F., McKeown, K., Hatzivassiloglou, V.: Translating collocations for bilingual lexicons: a statistical approach. Comput. Linguist. 22(1), 1–38 (1996)

    Google Scholar 

  35. Vicente, A.M.F.: LexMan: um Segmentador e Analisador Morfológico com Transdutores. Master’s thesis, Instituto Superior Técnico, Universidade de Lisboa, Lisboa, June 2013

    Google Scholar 

Download references

Acknowledgment

This work was supported by national funds through FCT–Fundação para a Ciência e a Tecnologia, ref. UID/CEC/50021/2013. Thanks to Neuza Costa (UAlg) for revising the final version of this paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jorge Baptista .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Correia, J., Baptista, J., Mamede, N. (2016). Syntax Deep Explorer. In: Silva, J., Ribeiro, R., Quaresma, P., Adami, A., Branco, A. (eds) Computational Processing of the Portuguese Language. PROPOR 2016. Lecture Notes in Computer Science(), vol 9727. Springer, Cham. https://doi.org/10.1007/978-3-319-41552-9_19

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-41552-9_19

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-41551-2

  • Online ISBN: 978-3-319-41552-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics