Abstract
An important task in Information Extraction is Relation Extraction. Relation Extraction (RE) is the task of detecting and characterizing the semantic relations between entities in the text. This work proposes a new process for the extraction of any relation descriptors between Named Entities (NEs) in the Organization domain, for the Portuguese language, using the Conditional Random Fields (CRF) model. For example, from the following sentence fragment “Microsoft headquartered in Redmond, […]”, we can extract the relation descriptor “headquartered-in”, that relates the NEs “Microsoft” and “Redmond”. We evaluated different features configurations for CRF; the best results were obtained with the inclusion of the semantic feature based on the NE category, since this feature could express, in a better way, the kind of relationship between the pair of NEs we want to identify. The proposed process achieved F-measure rates of 45 % and 53 %, considering the extraction of complete and partial matching, respectively.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Sarawagi, S.: Information Extraction. Foundations and Trends in Databases 1(3), 261–377 (2008)
Jurafsky, D., Martin, J.H.: Speed and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics and Speech Recognition. Prentice Hall series in Artificial Inteligence, 2nd edn. Pearson Education Ltd., London (2009)
Etzioni, O., Fader, A., Christensen, J., Soderland, S.: Mausam: Open Information Extraction: the second generation. In: Twenty-second International Joint Conference on Artificial Intelligence, IJCAI, pp. 3–10 (2011)
Chen, Y., Zheng, Q., Wang, W., Chen, Y.: Knowledge element relation extraction using conditional random fields. In: CSCWD, pp 245–250 (2010)
Banko, M., Etzioni, O.: The tradeoffs between open and traditional relation extraction. In: McKeown, K., Moore, J.D., Teufel, S., Allan, J., Furui, S. (eds) ACL, The Association for Computer, Linguistics, Bulgaria, pp. 28–36 (2010)
Li, Y., Jiang, J., Chieu, H.L., Chai, K.M.A.: Extracting relation descriptors with conditional random fields. In: Proceedings of 5th International Joint Conference on NLP. Asian Federation of NLP, Chiang Mai, pp. 392–400 (2011)
Culotta, A., McCallum, A., Betz, J.: Integrating probabilistic extraction models and data mining to discover relations and patterns in text. In: Proceedings of the Main Conference on Human Language Technology Conference of the North American chapter of the Association of Computational Linguistics, HLT-NAACL 2006, pp. 296–303. Association for Computational Linguistics, Stroudsburg (2006)
Mintz, M., Bills, S., Snow, R., Jurafsky, D.: Distant supervision for relation extraction without labeled data. In: Proceedings of the joint conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on NLP of the AFNLP, ACL 2009, vol. 2, pp. 1003–1011. Association for Computational Linguistics, Stroudsburg (2009)
Agichtein, E., Gravano, L.: SNOWBALL: Extracting relations from large plain-text collections. In: 5th ACM International Conference on Digital Libraries, pp 85–94 (2000)
Brin, S.: Extracting patterns and relations from the World Wide Web. In: Atzeni, P., Mendelzon, A.O., Mecca, G. (eds.) WebDB 1998. LNCS, vol. 1590, pp. 172–183. Springer, Heidelberg (1999)
Etzioni, O., Cafarella, M.J., Downey, D., Kok, S., Popescu, A.M., Shaked, T., Soderland, S., Weld, D.S., Yates, A.: Web-scale information extraction in knowitall: preliminary results. In: WWW, pp. 100–110 (2004)
Fader, A., Soderland, S., Etzioni, O.: Identifying relations for open information extraction. In: EMNLP, pp. 1535–1545 (2011)
Hasegawa, T., Sekine, S., Grishman, R.: Discovering relations among named entities from large corpora. In: ACL 2004: Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics, pp. 415. Association for Computational Linguistics (2004)
Hobbs, J.R., Appelt, D., Bear, J., Israel, D., Kameyama, M., Stickel, M., Tyson, M.: Fastus: a cascaded finite-state transducer for extracting information from natural-language text. In: Roche, E., Schabes, Y. (eds.) Finite-state Language Processing, pp. 383–406. MIT Press, Cambridge (1997)
Hoffmann, R., Zhang, C., Ling, X., Zettlemoyer, L.S., Weld, D.S.: Knowledge-based Weak Supervision for Information Extraction of Overlapping Relations, pp. 541–550. ACL, Stroudsburg (2011)
Sun, A.: A two-stage bootstrapping algorithm for relation extraction. In: Proceedings of RANLP 2009—recent advances in NLP, Borovets, Bulgaria (2009)
Wu, F., Weld, D.S.: Open information extraction using Wikipedia. In: ACL, Stroudsburg, pp. 118–127 (2010)
Culotta, A., Sorensen, J.: Dependency tree kernels for relation extraction. In: Proceedings of the 42nd Meeting of the Association for Computational Linguistics (ACL 2004), main volume, Barcelona, pp. 423–429 (2004)
Cardoso, N.: REMBRANDT — Reconhecimento de Entidades Mencionadas Baseado em Relações e ANálise Detalhada do Texto. In: Mota, C., Santos, D. (eds) Segundo HAREM, Chap. 11. Linguateca, pp. 195–211 (2008)
Brucksen, M., Souza, J.G.C., Vieira, R., Rigo, S.: Sistema SeRELeP para o Reconhecimento de Relações entre Entidades Mencionadas. In: Mota, C., Santos, D. (eds) Segundo HAREM, Chap. 14. Linguateca, pp. 247–260 (2008)
Chaves, M.S.: Geo-ontologias e padrões para reconhecimento de locais e de suas relações em textos: o SEI-Geo no Segundo HAREM. In: Mota, C., Santos, D. (eds) Segundo HAREM, Chap. 13. Linguateca, pp. 231–245 (2008)
Batista, D.S., Forte, D., Silva, R., Martins, B., Silva, M.: Extracção de relações semânticas de textos em português explorando a DBpédia e a Wikipédia. Linguamatica 5(1), 41–57 (2013)
Santos, D., Mamede, N., Baptista, J.: Extraction of family relations between entities. In: Barbosa, L. S., Correia, M. P. (ed) Proceedings of the INForum 2010—II Simpósio de Informática, Braga, Portugal, pp. 549–560 (2010)
Taba, L.S., de Medeiros Caseli, H.: Automatic Hyponymy Identification from Brazilian Portuguese Texts. In: Caseli, H., Villavicencio, A., Teixeira, A., Perdig\ {a}o, F. (eds.) PROPOR 2012. LNCS, vol. 7243, pp. 186–192. Springer, Heidelberg (2012)
Ferreira, L., Oliveira, C., Teixeira, A., Cunha, J.: Extração de informação de relatórios médicos. Linguamatica 1(1), 89–101 (2009)
Oliveira, H. G., Costa, H., Gomes, P.: Extracção de conhecimento léxico-semântico a partir de resumos da Wikipédia. In: Barbosa, L. S., Correia, M. P. (ed) Proceedings of the INForum 2010—II Simpósio de Informática, Braga, Portugal, pp. 537–548 (2010)
Lafferty, J.D., McCallum, A., Pereira, F.C.N.: Conditional Random Fields: probabilistic models for segmenting and labeling sequence data. In: Proceedings of the Eighteenth International Conference on Machine Learning, ICML 2001, pp. 282–289. Morgan Kaufmann Publishers Inc., San Francisco (2001)
Banko, M., Cafarella, M.J., Soderl, S., Broadhead, M., Etzioni, O.: Open information extraction from the Web. In: IJCAI, pp. 2670–2676 (2007)
Ling, X., Weld, D.S.: Fine-grained entity recognition. In: Proceeding of the Twenty-Sixty AAAI Conference on Artificial Intelligence, AAAI, Toronto, Ontario, Canada (2012)
Abreu, S.C., Bonamigo, T.L., Vieira, R.: A review on relation extraction with an eye on portuguese. Journal of the Brazilian Computer Society 19, 553–571 (2013)
Freitas, C., Santos, D., Oliveira, H.G., Carvalho, P., Mota, C.: Relações semânticas do ReRelEM: além das entidades no Segundo HAREM, Chap. 4. Linguateca, pp. 75–94 (2008)
Bick, E.: The parsing system PALAVRAS. In: Automatic Grammatical Analysis of Portuguese in a Constraint Grammar Frame-work. University of Arhus, Arhus (2000)
Collovini, S., Grando, F., Souza, M., Freitas, L., Vieira, R.: Semantic relations extraction in the organization domain. In: Proceedings of IADIS International Conference on Applied Computing, Rio de Janeiro, pp. 99–106 (2011)
Ramshaw, L.A., Marcus, M.P.: Text chunking using transformation-based learning. In: Proceedings of the Third ACL Workshop on Very Large Corpora, Cambridge, MA, USA, pp. 82–94 (1995)
Hogg, R.V., Craig, A.T.: Introduction to Mathematical Statistics. Macmillan, New York (1978)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Collovini, S., Pugens, L., Vanin, A.A., Vieira, R. (2014). Extraction of Relation Descriptors for Portuguese Using Conditional Random Fields. In: Bazzan, A., Pichara, K. (eds) Advances in Artificial Intelligence -- IBERAMIA 2014. IBERAMIA 2014. Lecture Notes in Computer Science(), vol 8864. Springer, Cham. https://doi.org/10.1007/978-3-319-12027-0_9
Download citation
DOI: https://doi.org/10.1007/978-3-319-12027-0_9
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-12026-3
Online ISBN: 978-3-319-12027-0
eBook Packages: Computer ScienceComputer Science (R0)