Extracting and Structuring Open Relations from Portuguese Text

Collovini, Sandra; Machado, Gabriel; Vieira, Renata

doi:10.1007/978-3-319-41552-9_16

Sandra Collovini¹⁸,
Gabriel Machado¹⁸ &
Renata Vieira¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9727))

Included in the following conference series:

International Conference on Computational Processing of the Portuguese Language

Abstract

The task of Open Relation Extraction from texts faces many challenges, considering the required linguistic knowledge and the sophistication of the language processing techniques employed. This paper presents the extraction and structuring of open relations between named entities from Portuguese texts. We apply the Conditional Random Fields model for the extraction of relation descriptors between named entities belonging to Person, Place and Organisation categories. A 0.64 of F-measure was reached as a result. To make better sense of the output, we structure the extracted relation descriptors using mining configurations.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Recognition of Relation between Named Entities.
2.
http://www.nltk.org/.
3.
http://mallet.cs.umass.edu/.
4.
It is worth noting that preposition-article contraction is split (“da”, “do” changes to “de + a”, “de + o”).
5.
http://www.linguateca.pt/harem/.

References

Abedjan, Z., Naumann, F.: Context and target configurations for mining RDF data. In: Proceedings of the 1st International Workshop on Search and Mining Entity-Relationship Data, SMER 2011, New York, USA, pp. 23–24 (2011)
Google Scholar
Abedjan, Z., Naumann, F.: Improving rdf data through association rule mining. Datenbank-Spektrum 13(2), 111–120 (2013)
Article Google Scholar
Abreu, S.C., Bonamigo, T.L., Vieira, R.: A review on relation extraction with an eye on Portuguese. J. Braz. Comput. Soc. 19(4), 553–571 (2013)
Article Google Scholar
Agichtein, E., Gravano, L.: Snowball: extracting relations from large plain-text collections. In: Proceedings of the Fifth ACM Conference on Digital Libraries, pp. 85–94. ACM Press (2000)
Google Scholar
Banko, M., Cafarella, M.J., Soderl, S., Broadhead, M., Etzioni, O.: Open information extraction from the web. In: Proceedings of the 20th International Joint Conference on Artifical Intelligence, IJCAI 2007, pp. 2670–2676. Morgan Kaufmann Publishers Inc., San Francisco (2007)
Google Scholar
Banko, M., Etzioni, O.: The tradeoffs between open and traditional relation extraction. In: McKeown, K., Moore, J.D., Teufel, S., Allan, J., Furui, S. (eds.) Proceedings of ACL 2008: HLT, pp. 28–36. Association for Computational Linguistics, Columbus (2008)
Google Scholar
Brucksen, M., Souza, J.G.C., Vieira, R., Rigo, S.: Sistema serelep para o reconhecimento de relações entre entidades mencionadas. In: Mota, C., Santos, D. (eds.) Segundo HAREM, chap. 14, pp. 247–260. Linguateca (2008)
Google Scholar
Cardoso, N.: Rembrandt - reconhecimento de entidades mencionadas baseado em relações e análise detalhada do texto. In: Mota, C., Santos, D. (eds.) Segundo HAREM, chap. 11, pp. 195–211. Linguateca (2008)
Google Scholar
Carvalho, P., Oliveira, H.G., Mota, C., Santos, D., Freitas, C.: Segundo harem: modelo geral, novidades e avaliação. In: Mota, C., Santos, D. (eds.) Desafios na avaliação conjunta do reconhecimento de entidades mencionadas: O Segundo HAREM (2008)
Google Scholar
Chaves, M.S.: Geo-ontologias e padrões para reconhecimento de locais e de suas relações em textos: o sei-geo no segundo harem. In: Mota, C., Santos, D. (eds.) Segundo HAREM, chap. 13, pp. 231–245. Linguateca (2008
Google Scholar
Collovini, S., de Bairros Filho, M., Vieira, R.: Analysing the role of representation choices in Portuguese relation extraction. In: Mothe, J., Savoy, J., Kamps, J., Pinel-Sauvagnat, K., Jones, G.J.F., SanJuan, E., Cappellato, L., Ferro, N. (eds.) CLEF 2015. LNCS, vol. 9283, pp. 105–116. Springer, Switzerland (2015)
Chapter Google Scholar
Collovini, S., Pugens, L., Vanin, A.A., Vieira, R.: Extraction of relation descriptors for Portuguese using conditional random fields. In: Bazzan, A.L.C., Pichara, K. (eds.) IBERAMIA 2014. LNCS, vol. 8864, pp. 108–119. Springer, Heidelberg (2014)
Google Scholar
Culotta, A., McCallum, A., Betz, J.: Integrating probabilistic extraction models and data mining to discover relations and patterns in text. In: Proceedings of the Main Conference on HLT-NAACL, HLT-NAACL 2006, pp. 296–303. Association for Computational Linguistics, Stroudsburg (2006)
Google Scholar
Culotta, A., Sorensen, J.: Dependency tree kernels for relation extraction. In: Proceedings of the 42nd Meeting of the Association for Computational Linguistics (ACL 2004), Main Volume, Barcelona, Spain, pp. 423–429 (2004)
Google Scholar
Fader, A., Soderland, S., Etzioni, O.: Identifying relations for open information extraction. In: Proceedings of Empirical Methods in Natural Language Processing, EMNLP, pp. 1535–1545 (2011)
Google Scholar
Freitag, D., Mccallum, A.: Information extraction with HMM structures learned by stochastic optimization. In: Proceedings of the Seventeenth National Conference on Artificial Intelligence, pp. 584–589. AAAI Press (2000)
Google Scholar
Gamallo, P., Garcia, M.: Multilingual open information extraction. In: Pereira, F., Machado, P., Costa, E., Cardoso, A. (eds.) EPIA 2015. LNCS, vol. 9273, pp. 711–722. Springer, Heidelberg (2015)
Google Scholar
Gamallo, P., Garcia, M., Fernández-Lanza, S.: Dependency-based open information extraction. In: Proceedings of the Joint Workshop on Unsupervised and Semi-Supervised Learning in NLP, pp. 10–18. Association for Computational Linguistics, Avignon (2012)
Google Scholar
Hasegawa, T., Sekine, S., Grishman, R.: Discovering relations among named entities from large corpora. In: Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics (ACL 2004), pp. 415–422. Association for Computational Linguistics, Morristown (2004)
Google Scholar
Jurafsky, D., Martin, J.H.: Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics and Speech Recognition. Prentice Hall Series in Artificial Intelligence, 2nd edn. Pearson Education Ltd., London (2009)
Google Scholar
Kambhatla, N.: Combining lexical, syntactic, and semantic features with maximum entropy models for information extraction. In: Proceedings of 42nd Annual Meeting of the Association for Computational Linguistics, pp. 178–181. Association for Computational Linguistics, Barcelona (2004)
Google Scholar
Lafferty, J.D., McCallum, A., Pereira, F.C.N.: Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: Proceedings of the Eighteenth International Conference on Machine Learning, ICML 2001, pp. 282–289. Morgan Kaufmann Publishers Inc., San Francisco (2001)
Google Scholar
Li, H., Bollegala, D., Matsuo, Y., Ishizuka, M.: Using graph based method to improve bootstrapping relation extraction. In: Gelbukh, A. (ed.) CICLing 2011, Part II. LNCS, vol. 6609, pp. 127–138. Springer, Heidelberg (2011)
Chapter Google Scholar
Li, Y., Jiang, J., Chieu, H.L., Chai, K.M.A.: Extracting relation descriptors with conditional random fields. In: Proceedings of 5th International Joint Conference on Natural Language Processing, pp. 392–400. Asian Federation of Natural Language Processing, Chiang Mai (2011)
Google Scholar
Mccallum, A.: Efficiently inducing features of conditional random fields. In: Proceedings of Uncertainty in Artificial Intelligence, pp. 403–410. Morgan Kaufmann, San Francisco (2003)
Google Scholar
Mota, C., Santos, D., Ranchhod, E.: Avaliação e reconhecimento de entidades mencionadas: princípio do harem. In: Santos, D. (ed.) Avaliação Conjunta: um Novo paradigma no Processamento Computacional da Língua Portuguesa, chap. 14, pp. 161–176. IST Press (2007)
Google Scholar
Pires, J.C.B.: Extração e mineração de informação independente de domínios da web na língua Portuguesa. Master’s thesis, Universidade Federal de Goiás, Goiânia (2015)
Google Scholar
Santos, A.P., Ramos, C., Marques, N.C.: Extração de Relações em Títulos de Notícias Desportivas. In: INFORUM 2012, Simpósio de Informática, Lisbon, Portugal (2012)
Google Scholar
Santos, D., Cardoso, N.: Breve introdução ao HAREM, chap. 1, pp. 1–16. Linguateca (2007)
Google Scholar
Santos, V., Pinheiro, V.: Report - um sistema de extração de informações aberta para língua Portuguesa. In: Proceedings of the X Brazilian Symposium in Information and Human Language Technology (STIL). SBC, Natal (2015)
Google Scholar
Wu, F., Weld, D.S.: Open information extraction using wikipedia. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, ACL 2010, pp. 118–127. Association for Computational Linguistics, Stroudsburg (2010)
Google Scholar
Zhao, S., Grishman, R.: Extracting relations with integrated information using kernel methods. In: Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL 2005). The Association for Computer Linguistics (2005)
Google Scholar

Download references

Acknowledgments

We thank the CNPQ, CAPES and FAPERGS for their financial support.

Author information

Authors and Affiliations

Pontifícia Universidade Católica do Rio Grande do Sul, Porto Alegre, Brazil
Sandra Collovini & Gabriel Machado
Universidade Federal de Ciências da Saúde de Porto Alegre, Porto Alegre, Rio Grande do Sul, Brazil
Renata Vieira

Authors

Sandra Collovini
View author publications
You can also search for this author in PubMed Google Scholar
Gabriel Machado
View author publications
You can also search for this author in PubMed Google Scholar
Renata Vieira
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sandra Collovini .

Editor information

Editors and Affiliations

Universidade de Lisbon, Portugal
João Silva
ISCTE-IUL, Lisbon, Portugal
Ricardo Ribeiro
Universidade de Évora, Évora, Portugal
Paulo Quaresma
Universidade de Caxias do Sul, Caxias do Suö, Brazil
André Adami
Universidade de Lisbon, Lisboa, Portugal
António Branco

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Collovini, S., Machado, G., Vieira, R. (2016). Extracting and Structuring Open Relations from Portuguese Text. In: Silva, J., Ribeiro, R., Quaresma, P., Adami, A., Branco, A. (eds) Computational Processing of the Portuguese Language. PROPOR 2016. Lecture Notes in Computer Science(), vol 9727. Springer, Cham. https://doi.org/10.1007/978-3-319-41552-9_16

Download citation

DOI: https://doi.org/10.1007/978-3-319-41552-9_16
Published: 21 June 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-41551-2
Online ISBN: 978-3-319-41552-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics