Abstract
Recently more and more structured data in the form of RDF triples have been published and integrated into Linked Open Data (LOD). While the current LOD contains hundreds of data sources with billions of triples, it has a small number of distinct relations compared with the large number of entities. On the other hand, Web pages are growing rapidly, which results in much larger number of textual contents to be exploited. With the popularity and wide adoption of open information extraction technology, extracting entities and relations among them from text at the Web scale is possible. In this paper, we present an approach to extract the subject individuals and the object counterparts for the relations from text and determine the most appropriate domain and range and the most confident dependency path patterns for the given relation based on the EM algorithm. As a preliminary result, we built a knowledge base for relations extracted from Chinese encyclopedias. The experimental results show the effectiveness of our approach to extract relations with reasonable domain, range, and path pattern restrictions, as well as high-quality triples.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
All sites mentioned in this figure have rights and marks held by their respective owners.
- 2.
FudanNLP, an Open Source Chinese Natural Language Processing toolkit, http://code.google.com/p/fudannlp/.
- 3.
- 4.
The matching result should be better if more powerful entity linking involved.
- 5.
Unfortunately Zhishi.me is down in the recent weeks, so the links do not work now. It will be fixed when Zhishi.me comes back.
References
Auer, S., Bizer, C., Kobilarov, G., Lehmann, J., Cyganiak, R., Ives, Z.: Dbpedia: a nucleus for a web of open data. In: Proceedings of the 6th International the Semantic Web and 2nd Asian Conference on Asian Semantic Web Conference, pp. 722–735, ISWC’07/ASWC’07. Springer, Berlin, Heidelberg (2007)
Baker, C.F., Fillmore, C.J., Lowe, J.B.: The berkeley framenet project. In: Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics, Vol. 1, pp. 86–90. Association for Computational Linguistics, Montreal, Quebec (1998)
Banko, M., Cafarella, M.J., Soderland, S., Broadhead, M., Etzioni, O.: Open information extraction from the web. In: IJCAI 2007, Proceedings of the 20th International Joint Conference on Artificial Intelligence, pp. 2670–2676. Hyderabad, India, January 6–12, 2007
Banko, M., Etzioni, O.: The tradeoffs between open and traditional relation extraction. In: Proceedings of ACL-08: HLT, pp. 28–36. Association for Computational Linguistics, Columbus, Ohio (2008)
Carlson, A., Betteridge, J., Kisiel, B., Settles, B., Jr., E.R.H., Mitchell, T.M.: Toward an architecture for never-ending language learning. In: Proceedings of the Twenty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2010. AAAI Press, Atlanta, Georgia, July 11–15, 2010
Carlson, A., Betteridge, J., Wang, R.C., Hruschka, Jr., E.R., Mitchell, T.M.: Coupled semi-supervised learning for information extraction. In: Proceedings of the Third ACM International Conference on Web Search and Data Mining, pp. 101–110, WSDM ’10. ACM, New York, NY (2010)
Dempster, A., Laird, N., Rubin, D.: Maximum likelihood from incomplete data via the em algorithm. J. R. Stat. Soc. B (Methodological) 1–38 (1977)
Etzioni, O., Banko, M., Soderland, S., Weld, D.S.: Open information extraction from the web. Comm. ACM 51, 68–74 (2008)
Etzioni, O., Fader, A., Christensen, J., Soderland, S., Mausam: Open information extraction: The second generation. In: IJCAI 2011, Proceedings of the 22nd International Joint Conference on Artificial Intelligence, pp. 3–10. IJCAI/AAAI, Barcelona, Catalonia, Spain, July 16–22, 2011
Fader, A., Soderland, S., Etzioni, O.: Identifying relations for open information extraction. In: Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing, pp. 1535–1545. Association for Computational Linguistics, Edinburgh, Scotland (2011)
Fan, J., Ferrucci, D., Gondek, D., Kalyanpur, A.: Prismatic: Inducing knowledge from a large scale lexicalized relation resource. In: Proceedings of the NAACL HLT 2010 First International Workshop on Formalisms and Methodology for Learning by Reading, pp.122–127. Association for Computational Linguistics, Los Angeles, California (2010)
Fellbaum, C.: WordNet: An Electronic Lexical Database. MIT Press (1998)
Kubler, S., McDonald, R., Nivre, J.: Dependency Parsing. Morgan & Claypool (2009)
Mitchell, T.M., Betteridge, J., Carlson, A., Hruschka, E., Wang, R.: Populating the semantic web by macro-reading internet text. In: Proceedings of the 8th International Semantic Web Conference, pp. 998–1002, ISWC ’09. Springer, Berlin, Heidelberg (2009)
Mohamed, T., Hruschka, E., Mitchell, T.: Discovering relations between noun categories. In: Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing, pp. 1447–1455. Association for Computational Linguistics, Edinburgh, Scotland (2011)
Niu, X., Sun, X., Wang, H., Rong, S., Qi, G., Yu, Y.: Zhishi.me: weaving chinese linking open data. In: Proceedings of the 10th International Conference on The Semantic Web - Volume Part II, pp. 205–220, ISWC’11. Springer, Berlin, Heidelberg (2011)
Palmer, M., Gildea, D., Kingsbury, P.: The proposition bank: An annotated corpus of semantic roles. Comput. Linguist. 31(1), 71–106 (2005)
Poon, H., Christensen, J., Domingos, P., Etzioni, O., Hoffmann, R., Kiddon, C., Lin, T., Ling, X., Mausam, Ritter, A., Schoenmackers, S., Soderland, S., Weld, D., Wu, F., Zhang, C.: Machine reading at the university of washington. In: Proceedings of the NAACL HLT 2010 First International Workshop on Formalisms and Methodology for Learning by Reading, pp. 87–95. Association for Computational Linguistics, Los Angeles, California (2010)
Schuler, K.K.: Verbnet: a broad-coverage, comprehensive verb lexicon. Ph.D. thesis, Philadelphia, PA, USA (2005), aAI3179808
Suchanek, F.M., Kasneci, G., Weikum, G.: Yago: a core of semantic knowledge. In: Proceedings of the 16th International Conference on World Wide Web, pp. 697–706, WWW ’07. ACM, New York, NY (2007)
Wu, F., Weld, D.S.: Open information extraction using wikipedia. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pp. 118–127. Association for Computational Linguistics, Uppsala (2010)
Zhu, J., Nie, Z., Liu, X., Zhang, B., Wen, J.R.: Statsnowball: a statistical approach to extracting entity relationships. In: Proceedings of the 18th International Conference on World Wide Web, pp. 101–110, WWW ’09. ACM, New York, NY (2009)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer Science+Business Media New York
About this paper
Cite this paper
Pan, J., Wang, H., Yu, Y. (2013). Building Large-Scale Knowledge Base for Relations from Text. In: Li, J., Qi, G., Zhao, D., Nejdl, W., Zheng, HT. (eds) Semantic Web and Web Science. Springer Proceedings in Complexity. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-6880-6_9
Download citation
DOI: https://doi.org/10.1007/978-1-4614-6880-6_9
Published:
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4614-6879-0
Online ISBN: 978-1-4614-6880-6
eBook Packages: Computer ScienceComputer Science (R0)