Skip to main content

Extracting and Structuring Open Relations from Portuguese Text

  • Conference paper
  • First Online:
Computational Processing of the Portuguese Language (PROPOR 2016)

Abstract

The task of Open Relation Extraction from texts faces many challenges, considering the required linguistic knowledge and the sophistication of the language processing techniques employed. This paper presents the extraction and structuring of open relations between named entities from Portuguese texts. We apply the Conditional Random Fields model for the extraction of relation descriptors between named entities belonging to Person, Place and Organisation categories. A 0.64 of F-measure was reached as a result. To make better sense of the output, we structure the extracted relation descriptors using mining configurations.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Recognition of Relation between Named Entities.

  2. 2.

    http://www.nltk.org/.

  3. 3.

    http://mallet.cs.umass.edu/.

  4. 4.

    It is worth noting that preposition-article contraction is split (“da”, “do” changes to “de + a”, “de + o”).

  5. 5.

    http://www.linguateca.pt/harem/.

References

  1. Abedjan, Z., Naumann, F.: Context and target configurations for mining RDF data. In: Proceedings of the 1st International Workshop on Search and Mining Entity-Relationship Data, SMER 2011, New York, USA, pp. 23–24 (2011)

    Google Scholar 

  2. Abedjan, Z., Naumann, F.: Improving rdf data through association rule mining. Datenbank-Spektrum 13(2), 111–120 (2013)

    Article  Google Scholar 

  3. Abreu, S.C., Bonamigo, T.L., Vieira, R.: A review on relation extraction with an eye on Portuguese. J. Braz. Comput. Soc. 19(4), 553–571 (2013)

    Article  Google Scholar 

  4. Agichtein, E., Gravano, L.: Snowball: extracting relations from large plain-text collections. In: Proceedings of the Fifth ACM Conference on Digital Libraries, pp. 85–94. ACM Press (2000)

    Google Scholar 

  5. Banko, M., Cafarella, M.J., Soderl, S., Broadhead, M., Etzioni, O.: Open information extraction from the web. In: Proceedings of the 20th International Joint Conference on Artifical Intelligence, IJCAI 2007, pp. 2670–2676. Morgan Kaufmann Publishers Inc., San Francisco (2007)

    Google Scholar 

  6. Banko, M., Etzioni, O.: The tradeoffs between open and traditional relation extraction. In: McKeown, K., Moore, J.D., Teufel, S., Allan, J., Furui, S. (eds.) Proceedings of ACL 2008: HLT, pp. 28–36. Association for Computational Linguistics, Columbus (2008)

    Google Scholar 

  7. Brucksen, M., Souza, J.G.C., Vieira, R., Rigo, S.: Sistema serelep para o reconhecimento de relações entre entidades mencionadas. In: Mota, C., Santos, D. (eds.) Segundo HAREM, chap. 14, pp. 247–260. Linguateca (2008)

    Google Scholar 

  8. Cardoso, N.: Rembrandt - reconhecimento de entidades mencionadas baseado em relações e análise detalhada do texto. In: Mota, C., Santos, D. (eds.) Segundo HAREM, chap. 11, pp. 195–211. Linguateca (2008)

    Google Scholar 

  9. Carvalho, P., Oliveira, H.G., Mota, C., Santos, D., Freitas, C.: Segundo harem: modelo geral, novidades e avaliação. In: Mota, C., Santos, D. (eds.) Desafios na avaliação conjunta do reconhecimento de entidades mencionadas: O Segundo HAREM (2008)

    Google Scholar 

  10. Chaves, M.S.: Geo-ontologias e padrões para reconhecimento de locais e de suas relações em textos: o sei-geo no segundo harem. In: Mota, C., Santos, D. (eds.) Segundo HAREM, chap. 13, pp. 231–245. Linguateca (2008

    Google Scholar 

  11. Collovini, S., de Bairros Filho, M., Vieira, R.: Analysing the role of representation choices in Portuguese relation extraction. In: Mothe, J., Savoy, J., Kamps, J., Pinel-Sauvagnat, K., Jones, G.J.F., SanJuan, E., Cappellato, L., Ferro, N. (eds.) CLEF 2015. LNCS, vol. 9283, pp. 105–116. Springer, Switzerland (2015)

    Chapter  Google Scholar 

  12. Collovini, S., Pugens, L., Vanin, A.A., Vieira, R.: Extraction of relation descriptors for Portuguese using conditional random fields. In: Bazzan, A.L.C., Pichara, K. (eds.) IBERAMIA 2014. LNCS, vol. 8864, pp. 108–119. Springer, Heidelberg (2014)

    Google Scholar 

  13. Culotta, A., McCallum, A., Betz, J.: Integrating probabilistic extraction models and data mining to discover relations and patterns in text. In: Proceedings of the Main Conference on HLT-NAACL, HLT-NAACL 2006, pp. 296–303. Association for Computational Linguistics, Stroudsburg (2006)

    Google Scholar 

  14. Culotta, A., Sorensen, J.: Dependency tree kernels for relation extraction. In: Proceedings of the 42nd Meeting of the Association for Computational Linguistics (ACL 2004), Main Volume, Barcelona, Spain, pp. 423–429 (2004)

    Google Scholar 

  15. Fader, A., Soderland, S., Etzioni, O.: Identifying relations for open information extraction. In: Proceedings of Empirical Methods in Natural Language Processing, EMNLP, pp. 1535–1545 (2011)

    Google Scholar 

  16. Freitag, D., Mccallum, A.: Information extraction with HMM structures learned by stochastic optimization. In: Proceedings of the Seventeenth National Conference on Artificial Intelligence, pp. 584–589. AAAI Press (2000)

    Google Scholar 

  17. Gamallo, P., Garcia, M.: Multilingual open information extraction. In: Pereira, F., Machado, P., Costa, E., Cardoso, A. (eds.) EPIA 2015. LNCS, vol. 9273, pp. 711–722. Springer, Heidelberg (2015)

    Google Scholar 

  18. Gamallo, P., Garcia, M., Fernández-Lanza, S.: Dependency-based open information extraction. In: Proceedings of the Joint Workshop on Unsupervised and Semi-Supervised Learning in NLP, pp. 10–18. Association for Computational Linguistics, Avignon (2012)

    Google Scholar 

  19. Hasegawa, T., Sekine, S., Grishman, R.: Discovering relations among named entities from large corpora. In: Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics (ACL 2004), pp. 415–422. Association for Computational Linguistics, Morristown (2004)

    Google Scholar 

  20. Jurafsky, D., Martin, J.H.: Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics and Speech Recognition. Prentice Hall Series in Artificial Intelligence, 2nd edn. Pearson Education Ltd., London (2009)

    Google Scholar 

  21. Kambhatla, N.: Combining lexical, syntactic, and semantic features with maximum entropy models for information extraction. In: Proceedings of 42nd Annual Meeting of the Association for Computational Linguistics, pp. 178–181. Association for Computational Linguistics, Barcelona (2004)

    Google Scholar 

  22. Lafferty, J.D., McCallum, A., Pereira, F.C.N.: Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: Proceedings of the Eighteenth International Conference on Machine Learning, ICML 2001, pp. 282–289. Morgan Kaufmann Publishers Inc., San Francisco (2001)

    Google Scholar 

  23. Li, H., Bollegala, D., Matsuo, Y., Ishizuka, M.: Using graph based method to improve bootstrapping relation extraction. In: Gelbukh, A. (ed.) CICLing 2011, Part II. LNCS, vol. 6609, pp. 127–138. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  24. Li, Y., Jiang, J., Chieu, H.L., Chai, K.M.A.: Extracting relation descriptors with conditional random fields. In: Proceedings of 5th International Joint Conference on Natural Language Processing, pp. 392–400. Asian Federation of Natural Language Processing, Chiang Mai (2011)

    Google Scholar 

  25. Mccallum, A.: Efficiently inducing features of conditional random fields. In: Proceedings of Uncertainty in Artificial Intelligence, pp. 403–410. Morgan Kaufmann, San Francisco (2003)

    Google Scholar 

  26. Mota, C., Santos, D., Ranchhod, E.: Avaliação e reconhecimento de entidades mencionadas: princípio do harem. In: Santos, D. (ed.) Avaliação Conjunta: um Novo paradigma no Processamento Computacional da Língua Portuguesa, chap. 14, pp. 161–176. IST Press (2007)

    Google Scholar 

  27. Pires, J.C.B.: Extração e mineração de informação independente de domínios da web na língua Portuguesa. Master’s thesis, Universidade Federal de Goiás, Goiânia (2015)

    Google Scholar 

  28. Santos, A.P., Ramos, C., Marques, N.C.: Extração de Relações em Títulos de Notícias Desportivas. In: INFORUM 2012, Simpósio de Informática, Lisbon, Portugal (2012)

    Google Scholar 

  29. Santos, D., Cardoso, N.: Breve introdução ao HAREM, chap. 1, pp. 1–16. Linguateca (2007)

    Google Scholar 

  30. Santos, V., Pinheiro, V.: Report - um sistema de extração de informações aberta para língua Portuguesa. In: Proceedings of the X Brazilian Symposium in Information and Human Language Technology (STIL). SBC, Natal (2015)

    Google Scholar 

  31. Wu, F., Weld, D.S.: Open information extraction using wikipedia. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, ACL 2010, pp. 118–127. Association for Computational Linguistics, Stroudsburg (2010)

    Google Scholar 

  32. Zhao, S., Grishman, R.: Extracting relations with integrated information using kernel methods. In: Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL 2005). The Association for Computer Linguistics (2005)

    Google Scholar 

Download references

Acknowledgments

We thank the CNPQ, CAPES and FAPERGS for their financial support.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sandra Collovini .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Collovini, S., Machado, G., Vieira, R. (2016). Extracting and Structuring Open Relations from Portuguese Text. In: Silva, J., Ribeiro, R., Quaresma, P., Adami, A., Branco, A. (eds) Computational Processing of the Portuguese Language. PROPOR 2016. Lecture Notes in Computer Science(), vol 9727. Springer, Cham. https://doi.org/10.1007/978-3-319-41552-9_16

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-41552-9_16

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-41551-2

  • Online ISBN: 978-3-319-41552-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics