Skip to main content

Building Large-Scale Knowledge Base for Relations from Text

  • Conference paper
  • First Online:
Semantic Web and Web Science

Part of the book series: Springer Proceedings in Complexity ((SPCOM))

  • 1850 Accesses

Abstract

Recently more and more structured data in the form of RDF triples have been published and integrated into Linked Open Data (LOD). While the current LOD contains hundreds of data sources with billions of triples, it has a small number of distinct relations compared with the large number of entities. On the other hand, Web pages are growing rapidly, which results in much larger number of textual contents to be exploited. With the popularity and wide adoption of open information extraction technology, extracting entities and relations among them from text at the Web scale is possible. In this paper, we present an approach to extract the subject individuals and the object counterparts for the relations from text and determine the most appropriate domain and range and the most confident dependency path patterns for the given relation based on the EM algorithm. As a preliminary result, we built a knowledge base for relations extracted from Chinese encyclopedias. The experimental results show the effectiveness of our approach to extract relations with reasonable domain, range, and path pattern restrictions, as well as high-quality triples.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    All sites mentioned in this figure have rights and marks held by their respective owners.

  2. 2.

    FudanNLP, an Open Source Chinese Natural Language Processing toolkit, http://code.google.com/p/fudannlp/.

  3. 3.

    http://verbs.colorado.edu/chinese/cpb/.

  4. 4.

    The matching result should be better if more powerful entity linking involved.

  5. 5.

    Unfortunately Zhishi.me is down in the recent weeks, so the links do not work now. It will be fixed when Zhishi.me comes back.

References

  1. Auer, S., Bizer, C., Kobilarov, G., Lehmann, J., Cyganiak, R., Ives, Z.: Dbpedia: a nucleus for a web of open data. In: Proceedings of the 6th International the Semantic Web and 2nd Asian Conference on Asian Semantic Web Conference, pp. 722–735, ISWC’07/ASWC’07. Springer, Berlin, Heidelberg (2007)

    Google Scholar 

  2. Baker, C.F., Fillmore, C.J., Lowe, J.B.: The berkeley framenet project. In: Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics, Vol. 1, pp. 86–90. Association for Computational Linguistics, Montreal, Quebec (1998)

    Google Scholar 

  3. Banko, M., Cafarella, M.J., Soderland, S., Broadhead, M., Etzioni, O.: Open information extraction from the web. In: IJCAI 2007, Proceedings of the 20th International Joint Conference on Artificial Intelligence, pp. 2670–2676. Hyderabad, India, January 6–12, 2007

    Google Scholar 

  4. Banko, M., Etzioni, O.: The tradeoffs between open and traditional relation extraction. In: Proceedings of ACL-08: HLT, pp. 28–36. Association for Computational Linguistics, Columbus, Ohio (2008)

    Google Scholar 

  5. Carlson, A., Betteridge, J., Kisiel, B., Settles, B., Jr., E.R.H., Mitchell, T.M.: Toward an architecture for never-ending language learning. In: Proceedings of the Twenty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2010. AAAI Press, Atlanta, Georgia, July 11–15, 2010

    Google Scholar 

  6. Carlson, A., Betteridge, J., Wang, R.C., Hruschka, Jr., E.R., Mitchell, T.M.: Coupled semi-supervised learning for information extraction. In: Proceedings of the Third ACM International Conference on Web Search and Data Mining, pp. 101–110, WSDM ’10. ACM, New York, NY (2010)

    Google Scholar 

  7. Dempster, A., Laird, N., Rubin, D.: Maximum likelihood from incomplete data via the em algorithm. J. R. Stat. Soc. B (Methodological) 1–38 (1977)

    Google Scholar 

  8. Etzioni, O., Banko, M., Soderland, S., Weld, D.S.: Open information extraction from the web. Comm. ACM 51, 68–74 (2008)

    Article  Google Scholar 

  9. Etzioni, O., Fader, A., Christensen, J., Soderland, S., Mausam: Open information extraction: The second generation. In: IJCAI 2011, Proceedings of the 22nd International Joint Conference on Artificial Intelligence, pp. 3–10. IJCAI/AAAI, Barcelona, Catalonia, Spain, July 16–22, 2011

    Google Scholar 

  10. Fader, A., Soderland, S., Etzioni, O.: Identifying relations for open information extraction. In: Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing, pp. 1535–1545. Association for Computational Linguistics, Edinburgh, Scotland (2011)

    Google Scholar 

  11. Fan, J., Ferrucci, D., Gondek, D., Kalyanpur, A.: Prismatic: Inducing knowledge from a large scale lexicalized relation resource. In: Proceedings of the NAACL HLT 2010 First International Workshop on Formalisms and Methodology for Learning by Reading, pp.122–127. Association for Computational Linguistics, Los Angeles, California (2010)

    Google Scholar 

  12. Fellbaum, C.: WordNet: An Electronic Lexical Database. MIT Press (1998)

    Google Scholar 

  13. Kubler, S., McDonald, R., Nivre, J.: Dependency Parsing. Morgan & Claypool (2009)

    Google Scholar 

  14. Mitchell, T.M., Betteridge, J., Carlson, A., Hruschka, E., Wang, R.: Populating the semantic web by macro-reading internet text. In: Proceedings of the 8th International Semantic Web Conference, pp. 998–1002, ISWC ’09. Springer, Berlin, Heidelberg (2009)

    Google Scholar 

  15. Mohamed, T., Hruschka, E., Mitchell, T.: Discovering relations between noun categories. In: Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing, pp. 1447–1455. Association for Computational Linguistics, Edinburgh, Scotland (2011)

    Google Scholar 

  16. Niu, X., Sun, X., Wang, H., Rong, S., Qi, G., Yu, Y.: Zhishi.me: weaving chinese linking open data. In: Proceedings of the 10th International Conference on The Semantic Web - Volume Part II, pp. 205–220, ISWC’11. Springer, Berlin, Heidelberg (2011)

    Google Scholar 

  17. Palmer, M., Gildea, D., Kingsbury, P.: The proposition bank: An annotated corpus of semantic roles. Comput. Linguist. 31(1), 71–106 (2005)

    Article  Google Scholar 

  18. Poon, H., Christensen, J., Domingos, P., Etzioni, O., Hoffmann, R., Kiddon, C., Lin, T., Ling, X., Mausam, Ritter, A., Schoenmackers, S., Soderland, S., Weld, D., Wu, F., Zhang, C.: Machine reading at the university of washington. In: Proceedings of the NAACL HLT 2010 First International Workshop on Formalisms and Methodology for Learning by Reading, pp. 87–95. Association for Computational Linguistics, Los Angeles, California (2010)

    Google Scholar 

  19. Schuler, K.K.: Verbnet: a broad-coverage, comprehensive verb lexicon. Ph.D. thesis, Philadelphia, PA, USA (2005), aAI3179808

    Google Scholar 

  20. Suchanek, F.M., Kasneci, G., Weikum, G.: Yago: a core of semantic knowledge. In: Proceedings of the 16th International Conference on World Wide Web, pp. 697–706, WWW ’07. ACM, New York, NY (2007)

    Google Scholar 

  21. Wu, F., Weld, D.S.: Open information extraction using wikipedia. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pp. 118–127. Association for Computational Linguistics, Uppsala (2010)

    Google Scholar 

  22. Zhu, J., Nie, Z., Liu, X., Zhang, B., Wen, J.R.: Statsnowball: a statistical approach to extracting entity relationships. In: Proceedings of the 18th International Conference on World Wide Web, pp. 101–110, WWW ’09. ACM, New York, NY (2009)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Junfeng Pan .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer Science+Business Media New York

About this paper

Cite this paper

Pan, J., Wang, H., Yu, Y. (2013). Building Large-Scale Knowledge Base for Relations from Text. In: Li, J., Qi, G., Zhao, D., Nejdl, W., Zheng, HT. (eds) Semantic Web and Web Science. Springer Proceedings in Complexity. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-6880-6_9

Download citation

  • DOI: https://doi.org/10.1007/978-1-4614-6880-6_9

  • Published:

  • Publisher Name: Springer, New York, NY

  • Print ISBN: 978-1-4614-6879-0

  • Online ISBN: 978-1-4614-6880-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics