Building Large-Scale Knowledge Base for Relations from Text

Pan, Junfeng; Wang, Haofen; Yu, Yong

doi:10.1007/978-1-4614-6880-6_9

Junfeng Pan⁶,
Haofen Wang⁶ &
Yong Yu⁶

Part of the book series: Springer Proceedings in Complexity ((SPCOM))

1850 Accesses

Abstract

Recently more and more structured data in the form of RDF triples have been published and integrated into Linked Open Data (LOD). While the current LOD contains hundreds of data sources with billions of triples, it has a small number of distinct relations compared with the large number of entities. On the other hand, Web pages are growing rapidly, which results in much larger number of textual contents to be exploited. With the popularity and wide adoption of open information extraction technology, extracting entities and relations among them from text at the Web scale is possible. In this paper, we present an approach to extract the subject individuals and the object counterparts for the relations from text and determine the most appropriate domain and range and the most confident dependency path patterns for the given relation based on the EM algorithm. As a preliminary result, we built a knowledge base for relations extracted from Chinese encyclopedias. The experimental results show the effectiveness of our approach to extract relations with reasonable domain, range, and path pattern restrictions, as well as high-quality triples.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Large Scale Semantic Relation Discovery: Toward Establishing the Missing Link Between Wikipedia and Semantic Network

Open Information Extraction with Global Structure Cohesiveness

Matrix Models with Feature Enrichment for Relation Extraction

Notes

1.
All sites mentioned in this figure have rights and marks held by their respective owners.
2.
FudanNLP, an Open Source Chinese Natural Language Processing toolkit, http://code.google.com/p/fudannlp/.
3.
http://verbs.colorado.edu/chinese/cpb/.
4.
The matching result should be better if more powerful entity linking involved.
5.
Unfortunately Zhishi.me is down in the recent weeks, so the links do not work now. It will be fixed when Zhishi.me comes back.

References

Auer, S., Bizer, C., Kobilarov, G., Lehmann, J., Cyganiak, R., Ives, Z.: Dbpedia: a nucleus for a web of open data. In: Proceedings of the 6th International the Semantic Web and 2nd Asian Conference on Asian Semantic Web Conference, pp. 722–735, ISWC’07/ASWC’07. Springer, Berlin, Heidelberg (2007)
Google Scholar
Baker, C.F., Fillmore, C.J., Lowe, J.B.: The berkeley framenet project. In: Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics, Vol. 1, pp. 86–90. Association for Computational Linguistics, Montreal, Quebec (1998)
Google Scholar
Banko, M., Cafarella, M.J., Soderland, S., Broadhead, M., Etzioni, O.: Open information extraction from the web. In: IJCAI 2007, Proceedings of the 20th International Joint Conference on Artificial Intelligence, pp. 2670–2676. Hyderabad, India, January 6–12, 2007
Google Scholar
Banko, M., Etzioni, O.: The tradeoffs between open and traditional relation extraction. In: Proceedings of ACL-08: HLT, pp. 28–36. Association for Computational Linguistics, Columbus, Ohio (2008)
Google Scholar
Carlson, A., Betteridge, J., Kisiel, B., Settles, B., Jr., E.R.H., Mitchell, T.M.: Toward an architecture for never-ending language learning. In: Proceedings of the Twenty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2010. AAAI Press, Atlanta, Georgia, July 11–15, 2010
Google Scholar
Carlson, A., Betteridge, J., Wang, R.C., Hruschka, Jr., E.R., Mitchell, T.M.: Coupled semi-supervised learning for information extraction. In: Proceedings of the Third ACM International Conference on Web Search and Data Mining, pp. 101–110, WSDM ’10. ACM, New York, NY (2010)
Google Scholar
Dempster, A., Laird, N., Rubin, D.: Maximum likelihood from incomplete data via the em algorithm. J. R. Stat. Soc. B (Methodological) 1–38 (1977)
Google Scholar
Etzioni, O., Banko, M., Soderland, S., Weld, D.S.: Open information extraction from the web. Comm. ACM 51, 68–74 (2008)
Article Google Scholar
Etzioni, O., Fader, A., Christensen, J., Soderland, S., Mausam: Open information extraction: The second generation. In: IJCAI 2011, Proceedings of the 22nd International Joint Conference on Artificial Intelligence, pp. 3–10. IJCAI/AAAI, Barcelona, Catalonia, Spain, July 16–22, 2011
Google Scholar
Fader, A., Soderland, S., Etzioni, O.: Identifying relations for open information extraction. In: Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing, pp. 1535–1545. Association for Computational Linguistics, Edinburgh, Scotland (2011)
Google Scholar
Fan, J., Ferrucci, D., Gondek, D., Kalyanpur, A.: Prismatic: Inducing knowledge from a large scale lexicalized relation resource. In: Proceedings of the NAACL HLT 2010 First International Workshop on Formalisms and Methodology for Learning by Reading, pp.122–127. Association for Computational Linguistics, Los Angeles, California (2010)
Google Scholar
Fellbaum, C.: WordNet: An Electronic Lexical Database. MIT Press (1998)
Google Scholar
Kubler, S., McDonald, R., Nivre, J.: Dependency Parsing. Morgan & Claypool (2009)
Google Scholar
Mitchell, T.M., Betteridge, J., Carlson, A., Hruschka, E., Wang, R.: Populating the semantic web by macro-reading internet text. In: Proceedings of the 8th International Semantic Web Conference, pp. 998–1002, ISWC ’09. Springer, Berlin, Heidelberg (2009)
Google Scholar
Mohamed, T., Hruschka, E., Mitchell, T.: Discovering relations between noun categories. In: Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing, pp. 1447–1455. Association for Computational Linguistics, Edinburgh, Scotland (2011)
Google Scholar
Niu, X., Sun, X., Wang, H., Rong, S., Qi, G., Yu, Y.: Zhishi.me: weaving chinese linking open data. In: Proceedings of the 10th International Conference on The Semantic Web - Volume Part II, pp. 205–220, ISWC’11. Springer, Berlin, Heidelberg (2011)
Google Scholar
Palmer, M., Gildea, D., Kingsbury, P.: The proposition bank: An annotated corpus of semantic roles. Comput. Linguist. 31(1), 71–106 (2005)
Article Google Scholar
Poon, H., Christensen, J., Domingos, P., Etzioni, O., Hoffmann, R., Kiddon, C., Lin, T., Ling, X., Mausam, Ritter, A., Schoenmackers, S., Soderland, S., Weld, D., Wu, F., Zhang, C.: Machine reading at the university of washington. In: Proceedings of the NAACL HLT 2010 First International Workshop on Formalisms and Methodology for Learning by Reading, pp. 87–95. Association for Computational Linguistics, Los Angeles, California (2010)
Google Scholar
Schuler, K.K.: Verbnet: a broad-coverage, comprehensive verb lexicon. Ph.D. thesis, Philadelphia, PA, USA (2005), aAI3179808
Google Scholar
Suchanek, F.M., Kasneci, G., Weikum, G.: Yago: a core of semantic knowledge. In: Proceedings of the 16th International Conference on World Wide Web, pp. 697–706, WWW ’07. ACM, New York, NY (2007)
Google Scholar
Wu, F., Weld, D.S.: Open information extraction using wikipedia. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pp. 118–127. Association for Computational Linguistics, Uppsala (2010)
Google Scholar
Zhu, J., Nie, Z., Liu, X., Zhang, B., Wen, J.R.: Statsnowball: a statistical approach to extracting entity relationships. In: Proceedings of the 18th International Conference on World Wide Web, pp. 101–110, WWW ’09. ACM, New York, NY (2009)
Google Scholar

Download references

Author information

Authors and Affiliations

APEX Data & Knowledge Management Lab, Shanghai Jiao Tong University, 800 Dongchuan Rd., Shanghai, 200240, China
Junfeng Pan, Haofen Wang & Yong Yu

Authors

Junfeng Pan
View author publications
You can also search for this author in PubMed Google Scholar
Haofen Wang
View author publications
You can also search for this author in PubMed Google Scholar
Yong Yu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Junfeng Pan .

Editor information

Editors and Affiliations

, Dept. of Computer Science and Technology, Tsinghua University, Room 10-206, East main building, Beijing, 100084, China, People's Republic
Juanzi Li
, School of Comp. Sci. & Eng., Southeast University, Dongda Road 2, Nanjing, 211189, Jiangsu, China, People's Republic
Guilin Qi
Peking University, Inst. of Computer Science & Tech., North Zhongguancun Street 128, Beijing, 100871, China, People's Republic
Dongyan Zhao
L3S Research Center, Leibniz University Hannover, Appelstr. 4, Hannover, 30167, Germany
Wolfgang Nejdl
Tsinghua Campus H202B, Shenzhen City, 518055, China, People's Republic
Hai-Tao Zheng

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Pan, J., Wang, H., Yu, Y. (2013). Building Large-Scale Knowledge Base for Relations from Text. In: Li, J., Qi, G., Zhao, D., Nejdl, W., Zheng, HT. (eds) Semantic Web and Web Science. Springer Proceedings in Complexity. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-6880-6_9

Download citation

DOI: https://doi.org/10.1007/978-1-4614-6880-6_9
Published: 02 May 2013
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4614-6879-0
Online ISBN: 978-1-4614-6880-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Building Large-Scale Knowledge Base for Relations from Text

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Large Scale Semantic Relation Discovery: Toward Establishing the Missing Link Between Wikipedia and Semantic Network

Open Information Extraction with Global Structure Cohesiveness

Matrix Models with Feature Enrichment for Relation Extraction

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Building Large-Scale Knowledge Base for Relations from Text

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Large Scale Semantic Relation Discovery: Toward Establishing the Missing Link Between Wikipedia and Semantic Network

Open Information Extraction with Global Structure Cohesiveness

Matrix Models with Feature Enrichment for Relation Extraction

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation