Skip to main content

Improving Portuguese Term Extraction

  • Conference paper
Computational Processing of the Portuguese Language (PROPOR 2012)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7243))

Abstract

This paper presents the evaluation of a set of heuristics to improve the quality of extracted terms from an annotated domain corpus written in Portuguese. The proposed heuristics start from part-of-speech and grammatical functional annotation of texts, identifying nouns and noun phrases that are the best candidates to be considered terms of the domain. These nouns and noun phrases are submitted to a set of approximative rules (heuristics) that may either discard some, accept others (removing words or not), or even discover implicit terms that can be inferred. The effectiveness of these heuristics is verified through a corpus experiment, on the basis of a reference list for which usual metrics are computed.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Banerjee, S., Pedersen, T.: The design, implementation and use of the ngram statistics package. In: 4th ITPCL, pp. 370–381 (2003)

    Google Scholar 

  2. Bick, E.: The parsing system PALAVRAS: automatic grammatical analysis of portuguese in constraint grammar framework. PhD thesis, Arhus University (2000)

    Google Scholar 

  3. Buitelaar, P., Cimiano, P., Magnini, B.: Ontology learning from text: An overview. In: Buitelaar, P., Cimiano, P., Magnini, B. (eds.) Ontology Learning from Text. Front. in Art. Intel. and Apllic., vol. 123. IOS Press (2005)

    Google Scholar 

  4. Chung, T.M.: A corpus comparison approach for terminology extraction. Terminology 9, 221–246 (2003)

    Article  Google Scholar 

  5. Coulthard, R.J.: The application of Corpus Methodology to Translation: the JPED parallel corpus and the Pediatrics comparable corpus. Master’s thesis, UFSC (2005)

    Google Scholar 

  6. Fortuna, B., Lavrač, N., Velardi, P.: Advancing Topic Ontology Learning through Term Extraction. In: Ho, T.-B., Zhou, Z.-H. (eds.) PRICAI 2008. LNCS (LNAI), vol. 5351, pp. 626–635. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  7. Lopes, L., Fernandes, P., Vieira, R., Fedrizzi, G.: ExATO lp – An Automatic Tool for Term Extraction from Portuguese Language Corpora. In: Proc. of the 4th Language & Tech. Conf., LTC 2009, pp. 427–431. Adam Mickiewicz Univ. (2009)

    Google Scholar 

  8. Lopes, L., Oliveira, L.H., Vieira, R.: Portuguese term extraction methods: Comparing linguistic and statistical approaches. In: PROPOR 2010 (2010)

    Google Scholar 

  9. Lopes, L., Vieira, R., Finatto, M.J., Martins, D.: Extracting compound terms from domain corpora. Journal of the Brazilian Computer Society 16, 247–259 (2010)

    Article  Google Scholar 

  10. Lopes, L., Vieira, R., Finatto, M.J., Zanette, A., Martins, D., Ribeiro Jr., L.C.: Automatic extraction of composite terms for construction of ontologies: an experiment in the health care area. RECIIS 3(1), 72–84 (2009)

    Google Scholar 

  11. Maedche, A., Staab, S.: Learning ontologies for the semantic web. In: SemWeb (2001)

    Google Scholar 

  12. Maia, L.C., Souza, R.R.: Uso de sintagmas nominais na classificação automática de documentos eletrônicos. Perspec. em Ciência da Inform. 15, 154–172 (2010)

    Article  Google Scholar 

  13. Ribeiro, L.C.: OntoLP: Construção semi-automática de ontologias a partir de textos da língua portuguesa. Master’s thesis, UNISINOS (2008)

    Google Scholar 

  14. Silva, J., Branco, A., Castro, S., Reis, R.: Out-of-the-Box Robust Parsing of Portuguese. In: Pardo, T.A.S., Branco, A., Klautau, A., Vieira, R., de Lima, V.L.S. (eds.) PROPOR 2010. LNCS, vol. 6001, pp. 75–85. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  15. van Rijsbergen, C.J.: Information Retrieval. Butterworths, London (1975)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Lopes, L., Vieira, R. (2012). Improving Portuguese Term Extraction. In: Caseli, H., Villavicencio, A., Teixeira, A., Perdigão, F. (eds) Computational Processing of the Portuguese Language. PROPOR 2012. Lecture Notes in Computer Science(), vol 7243. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-28885-2_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-28885-2_9

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-28884-5

  • Online ISBN: 978-3-642-28885-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics