Extraction of Relation Descriptors for Portuguese Using Conditional Random Fields

Collovini, Sandra; Pugens, Lucas; Vanin, Aline A.; Vieira, Renata

doi:10.1007/978-3-319-12027-0_9

Sandra Collovini⁶,
Lucas Pugens⁶,
Aline A. Vanin⁶ &
…
Renata Vieira⁶

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8864))

Included in the following conference series:

Ibero-American Conference on Artificial Intelligence

Abstract

An important task in Information Extraction is Relation Extraction. Relation Extraction (RE) is the task of detecting and characterizing the semantic relations between entities in the text. This work proposes a new process for the extraction of any relation descriptors between Named Entities (NEs) in the Organization domain, for the Portuguese language, using the Conditional Random Fields (CRF) model. For example, from the following sentence fragment “Microsoft headquartered in Redmond, […]”, we can extract the relation descriptor “headquartered-in”, that relates the NEs “Microsoft” and “Redmond”. We evaluated different features configurations for CRF; the best results were obtained with the inclusion of the semantic feature based on the NE category, since this feature could express, in a better way, the kind of relationship between the pair of NEs we want to identify. The proposed process achieved F-measure rates of 45 % and 53 %, considering the extraction of complete and partial matching, respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Sarawagi, S.: Information Extraction. Foundations and Trends in Databases 1(3), 261–377 (2008)
Article Google Scholar
Jurafsky, D., Martin, J.H.: Speed and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics and Speech Recognition. Prentice Hall series in Artificial Inteligence, 2nd edn. Pearson Education Ltd., London (2009)
Google Scholar
Etzioni, O., Fader, A., Christensen, J., Soderland, S.: Mausam: Open Information Extraction: the second generation. In: Twenty-second International Joint Conference on Artificial Intelligence, IJCAI, pp. 3–10 (2011)
Google Scholar
Chen, Y., Zheng, Q., Wang, W., Chen, Y.: Knowledge element relation extraction using conditional random fields. In: CSCWD, pp 245–250 (2010)
Google Scholar
Banko, M., Etzioni, O.: The tradeoffs between open and traditional relation extraction. In: McKeown, K., Moore, J.D., Teufel, S., Allan, J., Furui, S. (eds) ACL, The Association for Computer, Linguistics, Bulgaria, pp. 28–36 (2010)
Google Scholar
Li, Y., Jiang, J., Chieu, H.L., Chai, K.M.A.: Extracting relation descriptors with conditional random fields. In: Proceedings of 5th International Joint Conference on NLP. Asian Federation of NLP, Chiang Mai, pp. 392–400 (2011)
Google Scholar
Culotta, A., McCallum, A., Betz, J.: Integrating probabilistic extraction models and data mining to discover relations and patterns in text. In: Proceedings of the Main Conference on Human Language Technology Conference of the North American chapter of the Association of Computational Linguistics, HLT-NAACL 2006, pp. 296–303. Association for Computational Linguistics, Stroudsburg (2006)
Google Scholar
Mintz, M., Bills, S., Snow, R., Jurafsky, D.: Distant supervision for relation extraction without labeled data. In: Proceedings of the joint conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on NLP of the AFNLP, ACL 2009, vol. 2, pp. 1003–1011. Association for Computational Linguistics, Stroudsburg (2009)
Google Scholar
Agichtein, E., Gravano, L.: SNOWBALL: Extracting relations from large plain-text collections. In: 5th ACM International Conference on Digital Libraries, pp 85–94 (2000)
Google Scholar
Brin, S.: Extracting patterns and relations from the World Wide Web. In: Atzeni, P., Mendelzon, A.O., Mecca, G. (eds.) WebDB 1998. LNCS, vol. 1590, pp. 172–183. Springer, Heidelberg (1999)
Chapter Google Scholar
Etzioni, O., Cafarella, M.J., Downey, D., Kok, S., Popescu, A.M., Shaked, T., Soderland, S., Weld, D.S., Yates, A.: Web-scale information extraction in knowitall: preliminary results. In: WWW, pp. 100–110 (2004)
Google Scholar
Fader, A., Soderland, S., Etzioni, O.: Identifying relations for open information extraction. In: EMNLP, pp. 1535–1545 (2011)
Google Scholar
Hasegawa, T., Sekine, S., Grishman, R.: Discovering relations among named entities from large corpora. In: ACL 2004: Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics, pp. 415. Association for Computational Linguistics (2004)
Google Scholar
Hobbs, J.R., Appelt, D., Bear, J., Israel, D., Kameyama, M., Stickel, M., Tyson, M.: Fastus: a cascaded finite-state transducer for extracting information from natural-language text. In: Roche, E., Schabes, Y. (eds.) Finite-state Language Processing, pp. 383–406. MIT Press, Cambridge (1997)
Google Scholar
Hoffmann, R., Zhang, C., Ling, X., Zettlemoyer, L.S., Weld, D.S.: Knowledge-based Weak Supervision for Information Extraction of Overlapping Relations, pp. 541–550. ACL, Stroudsburg (2011)
Google Scholar
Sun, A.: A two-stage bootstrapping algorithm for relation extraction. In: Proceedings of RANLP 2009—recent advances in NLP, Borovets, Bulgaria (2009)
Google Scholar
Wu, F., Weld, D.S.: Open information extraction using Wikipedia. In: ACL, Stroudsburg, pp. 118–127 (2010)
Google Scholar
Culotta, A., Sorensen, J.: Dependency tree kernels for relation extraction. In: Proceedings of the 42nd Meeting of the Association for Computational Linguistics (ACL 2004), main volume, Barcelona, pp. 423–429 (2004)
Google Scholar
Cardoso, N.: REMBRANDT — Reconhecimento de Entidades Mencionadas Baseado em Relações e ANálise Detalhada do Texto. In: Mota, C., Santos, D. (eds) Segundo HAREM, Chap. 11. Linguateca, pp. 195–211 (2008)
Google Scholar
Brucksen, M., Souza, J.G.C., Vieira, R., Rigo, S.: Sistema SeRELeP para o Reconhecimento de Relações entre Entidades Mencionadas. In: Mota, C., Santos, D. (eds) Segundo HAREM, Chap. 14. Linguateca, pp. 247–260 (2008)
Google Scholar
Chaves, M.S.: Geo-ontologias e padrões para reconhecimento de locais e de suas relações em textos: o SEI-Geo no Segundo HAREM. In: Mota, C., Santos, D. (eds) Segundo HAREM, Chap. 13. Linguateca, pp. 231–245 (2008)
Google Scholar
Batista, D.S., Forte, D., Silva, R., Martins, B., Silva, M.: Extracção de relações semânticas de textos em português explorando a DBpédia e a Wikipédia. Linguamatica 5(1), 41–57 (2013)
Google Scholar
Santos, D., Mamede, N., Baptista, J.: Extraction of family relations between entities. In: Barbosa, L. S., Correia, M. P. (ed) Proceedings of the INForum 2010—II Simpósio de Informática, Braga, Portugal, pp. 549–560 (2010)
Google Scholar
Taba, L.S., de Medeiros Caseli, H.: Automatic Hyponymy Identification from Brazilian Portuguese Texts. In: Caseli, H., Villavicencio, A., Teixeira, A., Perdig\ {a}o, F. (eds.) PROPOR 2012. LNCS, vol. 7243, pp. 186–192. Springer, Heidelberg (2012)
Chapter Google Scholar
Ferreira, L., Oliveira, C., Teixeira, A., Cunha, J.: Extração de informação de relatórios médicos. Linguamatica 1(1), 89–101 (2009)
Google Scholar
Oliveira, H. G., Costa, H., Gomes, P.: Extracção de conhecimento léxico-semântico a partir de resumos da Wikipédia. In: Barbosa, L. S., Correia, M. P. (ed) Proceedings of the INForum 2010—II Simpósio de Informática, Braga, Portugal, pp. 537–548 (2010)
Google Scholar
Lafferty, J.D., McCallum, A., Pereira, F.C.N.: Conditional Random Fields: probabilistic models for segmenting and labeling sequence data. In: Proceedings of the Eighteenth International Conference on Machine Learning, ICML 2001, pp. 282–289. Morgan Kaufmann Publishers Inc., San Francisco (2001)
Google Scholar
Banko, M., Cafarella, M.J., Soderl, S., Broadhead, M., Etzioni, O.: Open information extraction from the Web. In: IJCAI, pp. 2670–2676 (2007)
Google Scholar
Ling, X., Weld, D.S.: Fine-grained entity recognition. In: Proceeding of the Twenty-Sixty AAAI Conference on Artificial Intelligence, AAAI, Toronto, Ontario, Canada (2012)
Google Scholar
Abreu, S.C., Bonamigo, T.L., Vieira, R.: A review on relation extraction with an eye on portuguese. Journal of the Brazilian Computer Society 19, 553–571 (2013)
Article Google Scholar
Freitas, C., Santos, D., Oliveira, H.G., Carvalho, P., Mota, C.: Relações semânticas do ReRelEM: além das entidades no Segundo HAREM, Chap. 4. Linguateca, pp. 75–94 (2008)
Google Scholar
Bick, E.: The parsing system PALAVRAS. In: Automatic Grammatical Analysis of Portuguese in a Constraint Grammar Frame-work. University of Arhus, Arhus (2000)
Google Scholar
Collovini, S., Grando, F., Souza, M., Freitas, L., Vieira, R.: Semantic relations extraction in the organization domain. In: Proceedings of IADIS International Conference on Applied Computing, Rio de Janeiro, pp. 99–106 (2011)
Google Scholar
Ramshaw, L.A., Marcus, M.P.: Text chunking using transformation-based learning. In: Proceedings of the Third ACL Workshop on Very Large Corpora, Cambridge, MA, USA, pp. 82–94 (1995)
Google Scholar
Hogg, R.V., Craig, A.T.: Introduction to Mathematical Statistics. Macmillan, New York (1978)
Google Scholar

Download references

Author information

Authors and Affiliations

Pontifícia Universidade Católica do Rio Grande do Sul - PUCRS, Faculdade de Informática, Av. Ipiranga, Porto Alegre, RS, 6681, Brazil
Sandra Collovini, Lucas Pugens, Aline A. Vanin & Renata Vieira

Authors

Sandra Collovini
View author publications
You can also search for this author in PubMed Google Scholar
Lucas Pugens
View author publications
You can also search for this author in PubMed Google Scholar
Aline A. Vanin
View author publications
You can also search for this author in PubMed Google Scholar
Renata Vieira
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sandra Collovini .

Editor information

Editors and Affiliations

Universidade Federal do Rio Grande do Sul, Porto Alegre, Brazil
Ana L.C. Bazzan
Pontifica Universidad Católica (PUC), Santiago de Chile, Chile
Karim Pichara

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Collovini, S., Pugens, L., Vanin, A.A., Vieira, R. (2014). Extraction of Relation Descriptors for Portuguese Using Conditional Random Fields. In: Bazzan, A., Pichara, K. (eds) Advances in Artificial Intelligence -- IBERAMIA 2014. IBERAMIA 2014. Lecture Notes in Computer Science(), vol 8864. Springer, Cham. https://doi.org/10.1007/978-3-319-12027-0_9

Download citation

DOI: https://doi.org/10.1007/978-3-319-12027-0_9
Published: 12 November 2014
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-12026-3
Online ISBN: 978-3-319-12027-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics