Abstract
Extracting relations out of unstructured text is essential for a wide range of applications. Minimal human effort, scalability and high precision are desirable characteristics. We introduce a distant supervised closed relation extraction approach based on distributional semantics and a tree generalization. Our approach uses training data obtained from a reference knowledge base to derive dependency parse trees that might express a relation. It then uses a novel generalization algorithm to construct dependency tree patterns for the relation. Distributional semantics are used to eliminate false candidate patterns. We evaluate the performance in experiments on a large corpus using ninety target relations. Our evaluation results suggest that our approach achieves a higher precision than two state-of-the-art systems. Moreover, our results also underpin the scalability of our approach. Our open source implementation can be found at https://github.com/dice-group/Ocelot.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
\(A:= \{ label, lemma, pos, ner, domain, range, general\}\) are the vertex attributes used throughout this paper.
- 2.
The shortest sentence with a relation has at least two tokens for the named entity arguments, one token for the relation mention and one for the end punctuation.
- 3.
Seven types are applied (Place, Person, Organization, Money, Percent, Date, Time).
- 4.
- 5.
- 6.
- 7.
- 8.
- 9.
In our approach we utilize Organization, Person and Place.
- 10.
We provided an example of an RDF serialisation of the framework in Listing 1.1.
References
Augenstein, I., Maynard, D., Ciravegna, F.: Relation extraction from the web using distant supervision. In: Janowicz, K., Schlobach, S., Lambrix, P., Hyvönen, E. (eds.) EKAW 2014. LNCS (LNAI), vol. 8876, pp. 26–41. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-13704-9_3
Curran, J.R., Murphy, T., Scholz, B.: Minimising semantic drift with mutual exclusion bootstrapping. In: Proceedings of the 10th Conference of the Pacific Association for Computational Linguistics, pp. 172–180 (2007)
Del Corro, L., Gemulla, R.: Clausie: clause-based open information extraction. In: Proceedings of the 22nd International Conference on World Wide Web, WWW 2013, pp. 355–366. ACM, New York (2013). https://doi.org/10.1145/2488388.2488420, https://doi.org/10.1145/2488388.2488420
Draicchio, F., Gangemi, A., Presutti, V., Nuzzolese, A.G.: FRED: from natural language text to RDF and OWL in one click. In: Cimiano, P., FernĂ¡ndez, M., Lopez, V., Schlobach, S., Völker, J. (eds.) ESWC 2013. LNCS, vol. 7955, pp. 263–267. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-41242-4_36
Dubey, M., Dasgupta, S., Sharma, A., Hoffner, K., Lehmann, J.: Asknow: a framework for natural language query formalization in sparql. In: Proceedings of the Extended Semantic Web Conference 2016 (2016). http://jens-lehmann.org/files/2016/eswc_asknow.pdf
Fader, A., Soderland, S., Etzioni, O.: Identifying relations for open information extraction, pp. 1535–1545 (2011)
Gerber, D., et al.: Defacto - temporal and multilingual deep fact validation. Web Semant. Sci. Serv. Agents World Wide Web (2015). http://svn.aksw.org/papers/2015/JWS_DeFacto/public.pdf
Gerber, D., Ngonga Ngomo, A.C.: Bootstrapping the linked data web. In: 1st Workshop on Web Scale Knowledge Extraction @ ISWC 2011 (2011)
Höffner, K., Walter, S., Marx, E., Usbeck, R., Lehmann, J., Ngonga Ngomo, A.C.: Survey on challenges of question answering in the semantic web. Semant. Web J. 8(6) (2017). http://www.semantic-web-journal.net/system/files/swj1375.pdf
Krause, S., Li, H., Uszkoreit, H., Xu, F.: Large-scale learning of relation-extraction rules with distant supervision from the web. In: Cudré-Mauroux, P., Heflin, J., Sirin, E., Tudorache, T., Euzenat, J., Hauswirth, M., Parreira, J.X., Hendler, J., Schreiber, G., Bernstein, A., Blomqvist, E. (eds.) ISWC 2012. LNCS, vol. 7649, pp. 263–278. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-35176-1_17
Lehmann, J., BĂ¼hmann, L.: AutoSPARQL: let users query your knowledge base. In: Antoniou, G., Grobelnik, M., Simperl, E., Parsia, B., Plexousakis, D., De Leenheer, P., Pan, J. (eds.) ESWC 2011. LNCS, vol. 6643, pp. 63–79. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-21034-1_5
Mausam, Schmitz, M., Bart, R., Soderland, S., Etzioni, O.: Open language learning for information extraction. In: Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, EMNLP-CoNLL 2012, Association for Computational Linguistics, Stroudsburg, PA, USA, pp. 523–534 (2012). http://dl.acm.org/citation.cfm?id=2390948.2391009
Mendes, P.N., Jakob, M., Garcia-Silva, A., Bizer, C.: Dbpedia spotlight: shedding light on the web of documents. In: Proceedings of the 7th International Conference on Semantic Systems (I-Semantics) (2011)
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. CoRR (2013). http://arxiv.org/abs/1301.3781
Mikolov, T., Sutskever, I., Chen, K., Corrado, G., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Proceedings of the 26th International Conference on Neural Information Processing Systems, NIPS 2013, vol. 2, pp. 3111–3119. Curran Associates Inc., USA (2013). http://dl.acm.org/citation.cfm?id=2999792.2999959
Mintz, M., Bills, S., Snow, R., Jurafsky, D.: Distant supervision for relation extraction without labeled data. In: Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP, Association for Computational Linguistics, pp. 1003–1011 (2009). http://www.aclweb.org/anthology/P09-1113
Nakashole, N., Weikum, G., Suchanek, F.: Patty: a taxonomy of relational patterns with semantic types. In: Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, EMNLP-CoNLL 2012, Association for Computational Linguistics, Stroudsburg, PA, USA, pp. 1135–1145 (2012). http://dl.acm.org/citation.cfm?id=2390948.2391076
Ren, X., Wu, Z., He, W., Qu, M., Voss, C.R., Ji, H., Abdelzaher, T.F., Han, J.: Cotype: joint extraction of typed entities and relations with knowledge bases. In: Proceedings of the 26th International Conference on World Wide Web, pp. 1015–1024 (2017)
Riedel, S., Yao, L., McCallum, A.: Modeling relations and their mentions without labeled text. In: BalcĂ¡zar, J.L., Bonchi, F., Gionis, A., Sebag, M. (eds.) Machine Learning and Knowledge Discovery in Databases, pp. 148–163. Springer, Heidelberg (2010)
Riloff, E., Jones, R.: Learning dictionaries for information extraction by multi-level bootstrapping. In: Proceedings of the Sixteenth National Conference on Artificial Intelligence and the Eleventh Innovative Applications of Artificial Intelligence Conference Innovative Applications of Artificial Intelligence, AAAI 1999/IAAI 1999, American Association for Artificial Intelligence, Menlo Park, CA, USA, pp. 474–479 (1999). http://dl.acm.org/citation.cfm?id=315149.315364
Singh, K., Mulang’, I.O., Lytra, I., Jaradeh, M.Y., Sakor, A., Vidal, M.E., Lange, C., Auer, S.: Capturing knowledge in semantically-typed relational patterns to enhance relation linking. In: Proceedings of the Knowledge Capture Conference, K-CAP 2017, pp. 31:1–31:8. ACM, New York (2017). https://doi.org/10.1145/3148011.3148031, https://doi.org/10.1145/3148011.3148031
Usbeck, R., Ngomo, A.-C.N., BĂ¼hmann, L., Unger, C.: HAWK – hybrid question answering using linked data. In: Gandon, F., Sabou, M., Sack, H., d’Amato, C., CudrĂ©-Mauroux, P., Zimmermann, A. (eds.) ESWC 2015. LNCS, vol. 9088, pp. 353–368. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-18818-8_22
Usbeck, R., Ngomo, A.-C.N., Haarmann, B., Krithara, A., Röder, M., Napolitano, G.: 7th open challenge on question answering over linked data (QALD-7). In: Dragoni, M., Solanki, M., Blomqvist, E. (eds.) SemWebEval 2017. CCIS, vol. 769, pp. 59–69. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-69146-6_6. https://svn.aksw.org/papers/2017/ESWC_2017_QALD/public.pdf
Yates, A., Cafarella, M., Banko, M., Etzioni, O., Broadhead, M., Soderland, S.: Textrunner: Open information extraction on the web. In: Proceedings of Human Language Technologies: The Annual Conference of the North American Chapter of the Association for Computational Linguistics: Demonstrations, NAACL-Demonstrations 2007, Association for Computational Linguistics, Stroudsburg, PA, USA, pp. 25–26 (2007). http://dl.acm.org/citation.cfm?id=1614164.1614177
Zaveri, A., Rula, A., Maurino, A., Pietrobon, R., Lehmann, J., Auer, S.: Quality assessment for linked data: a survey. Semantic Web Journal (2015). http://www.semantic-web-journal.net/content/quality-assessment-linked-data-survey
Acknowledgement
This work has been supported by the H2020 project HOBBIT (no. 688227), the BMWI projects GEISER (no. 01MD16014E) and OPAL (no. 19F2028A), the EuroStars projects DIESEL (no. 01QE1512C) and QAMEL (no. 01QE1549C).
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Speck, R., Ngomo Ngonga, AC. (2018). On Extracting Relations Using Distributional Semantics and a Tree Generalization. In: Faron Zucker, C., Ghidini, C., Napoli, A., Toussaint, Y. (eds) Knowledge Engineering and Knowledge Management. EKAW 2018. Lecture Notes in Computer Science(), vol 11313. Springer, Cham. https://doi.org/10.1007/978-3-030-03667-6_27
Download citation
DOI: https://doi.org/10.1007/978-3-030-03667-6_27
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-03666-9
Online ISBN: 978-3-030-03667-6
eBook Packages: Computer ScienceComputer Science (R0)