Large Scale Semantic Relation Discovery: Toward Establishing the Missing Link Between Wikipedia and Semantic Network

Han, Xianpei; Song, Xiliang; Sun, Le

doi:10.1007/978-981-10-3168-7_6

Xianpei Han¹⁶,
Xiliang Song¹⁶ &
Le Sun¹⁶

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 650))

Included in the following conference series:

China Conference on Knowledge Graph and Semantic Computing

1508 Accesses

Abstract

Wikipedia has been the largest knowledge repository on the Web. However, most of the semantic knowledge in Wikipedia is documented in natural language, which is mostly only human readable and incomprehensible for computer processing. To establish the missing link from Wikipedia to semantic network, this paper proposes a relation discovery method, which can: (1) discover and characterize a large collection of relations from Wikipedia by exploiting the relation pattern regularity, the relation distribution regularity and the relation instance redundancy; and (2) annotate the hyperlinks between Wikipedia articles with the discovered semantic relations. Finally we discover 14,299 relations, 105,661 relation patterns and 5,214,175 relation instances from Wikipedia, and this will be a valuable resource for many NLP and AI tasks.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Leveraging Linked Data to Discover Semantic Relations Within Data Sources

Matrix Models with Feature Enrichment for Relation Extraction

WebIsALOD: Providing Hypernymy Relations Extracted from the Web as Linked Open Data

Notes

1.
http://nlp.stanford.edu/software/corenlp.shtml.

References

Agichtein, E., Gravano, L.: Snowball: extracting relations from large plain-text collections. In: Proceedings of the Fifth ACM Conference on Digital Libraries, pp. 85–94. ACM, New York (2000)
Google Scholar
Amigo, E., Gonzalo, J., Artiles, J., Verdejo, F.: A comparison of extrinsic clustering evaluation metrics based on formal constraints. Ident. Common Mol. Subsequences 12, 461–486 (2009)
Google Scholar
Auer, S., Bizer, C., Kobilarov, G., Lehmann, J., Cyganiak, R., Ives, Z.: DBpedia: a nucleus for a web of open data. In: Aberer, K., Choi, K.-S., Noy, N., Allemang, D., Lee, K.-I., Nixon, L., Golbeck, J., Mika, P., Maynard, D., Mizoguchi, R., Schreiber, G., Cudré-Mauroux, P. (eds.) ASWC/ISWC -2007. LNCS, vol. 4825, pp. 722–735. Springer, Heidelberg (2007). doi:10.1007/978-3-540-76298-0_52
Chapter Google Scholar
Baker, C.F., Charles, J.F., John, B.L.: The Berkeley framenet project. In: Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics, pp. 86–90. Association for Computational Linguistics, Stroudsburg (1998)
Google Scholar
Banko, M., Cafarella, M.J., Soderland, S., Broadhead, M., Etzioni, O.: Open information extraction from the web. In: IJCAI, vol. 7, pp. 2670–2676 (2007)
Google Scholar
Bunescu, R., Mooney, R.: A shortest path dependency kernel for relation extraction. In: Proceedings of the Conference on Human Language Technology and Empirical Methods in Natural Language Processing, pp. 724–731. Association for Computational Linguistics, Stroudsburg (2005)
Google Scholar
Brin, S.: Extracting patterns and relations from the world wide web. In: International Workshop on the World Wide Web and Databases, pp. 172–183 (1999)
Google Scholar
Carlson, A., Betteridge, J., et al.: Toward an architecture for never-ending language learning. In: Proceedings of the Conference on Artificial Intelligence (AAAI 2010), p. 3. AAAI Press, Palo Alto (2010)
Google Scholar
Chan, Y.S., Roth, D.: Exploiting syntactico-semantic structures for relation extraction. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 551–560 (2011)
Google Scholar
Chen, H., Benson, E., et al.: In-domain relation discovery with meta-constraints via posterior regularization. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 530–540. Association for Computational Linguistics, Stroudsburg (2011)
Google Scholar
De Marneffe, M.C., Manning, C.D.: Stanford typed dependencies manual. Technical report, Stanford University, pp. 338–345 (2008)
Google Scholar
Doddington, G., et al.: The automatic content extraction (ACE) program–tasks, data, and evaluation. In: Proceedings of LREC (2004)
Google Scholar
Escobar, M.D., West, M.: Bayesian density estimation and inference using mixtures. J. Am. Stat. Assoc. 90(430), 577–588 (1995)
Article MathSciNet MATH Google Scholar
Etzioni, O., Banko, M., et al.: Open information extraction from the web. Commun. ACM 51, 68–74 (2008)
Article Google Scholar
Etzioni, O., et al.: Web-scale information extraction in knowitall: (preliminary results). In: Proceedings of the 13th International Conference on World Wide Web, pp. 100–110. ACM, New York (2004)
Google Scholar
Grishman, R., Sundheim, B.: Message understanding conference-6: a brief history. In: Proceedings of the 16th International Conference on Computational Linguistics, pp. 466–471 (1996)
Google Scholar
Han, X., Sun, L.: An entity-topic model for entity linking. In: Proceedings of EMNLP-CoNLL, pp. 105–115. Association for Computational Linguistics, Stroudsburg (2012)
Google Scholar
Li, P., Jiang, J., et al.: Generating templates of entity summaries with an entity-aspect model and pattern mining. In: Proceedings of ACL, pp. 640–649. Association for Computational Linguistics, Stroudsburg (2010)
Google Scholar
Matuszek, C., Cabral, J., Witbrock, M., DeOliveira, J.: An introduction to the syntax and content of Cyc. In: Proceedings of the 2006 AAAI Spring Symposium on Formalizing and Compiling Background Knowledge and its Applications to Knowledge Representation and Question Answering, pp. 44–49. AAAI Press, Palo Alto (2006)
Google Scholar
Miller, G.A.: WordNet: a lexical database for English. Commun. ACM 38, 39–41 (1995)
Article Google Scholar
Mintz, M., Bills, S., Snow, R., Jurafsky D.: Distant supervision for relation extraction without labeled data. In: Proceedings ACL-IJCNLP, pp. 1003—1011. Association for Computational Linguistics, Stroudsburg (2009)
Google Scholar
Mohamed, T.P., Hruschka, J.E.R., et al.: Discovering relations between noun categories. In: Proceedings of EMNLP, pp. 1447–1455. Association for Computational Linguistics, Stroudsburg (2011)
Google Scholar
Nakashole, N., Weikum, G., Suchanek, F.: PATTY: a taxonomy of relational patterns with semantic types. In: Proceedings of EMNLP, pp. 1135–1145 (2012)
Google Scholar
Ponzetto, S.P., Navigli, R.: Large-scale taxonomy mapping for restructuring and integrating Wikipedia. In: Proceedings of the 21th IJCAI, pp. 2083–2088. AAAI Press, Palo Alto (2009)
Google Scholar
Suchanek, F.M., Kasneci, G., et al.: Yago: a large ontology from wikipedia and wordnet. Web Semant.: Sci. Serv. Agents World Wide Web 6, 203–217 (2008)
Article Google Scholar
Teh, Y.W., Jordan, M.I., et al.: Hierarchical Dirichlet processes. J. Am. Stat. Assoc. 101, 1566–1581 (2006)
Article MathSciNet MATH Google Scholar
Wang, C., Kalyanpur, A., et al.: Relation extraction and scoring in DeepQA. IBM J. Res. Dev. 56, 9:1–9:12 (2012)
Article Google Scholar
Wu, F., Weld, D.S.: Autonomously semantifying wikipedia. In: Proceedings of CIKM, pp. 41–50. ACM, New York (2007)
Google Scholar
Yates, A., et al.: TextRunner: open information extraction on the web. In: Proceedings of HLT-NAACL, pp. 25–26. Association for Computational Linguistics, Stroudsburg (2007)
Google Scholar

Download references

Author information

Authors and Affiliations

State Key Laboratory of Computer Sciences, Institute of Software, Chinese Academy of Sciences, Beijing, 100190, China
Xianpei Han, Xiliang Song & Le Sun

Authors

Xianpei Han
View author publications
You can also search for this author in PubMed Google Scholar
Xiliang Song
View author publications
You can also search for this author in PubMed Google Scholar
Le Sun
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xianpei Han .

Editor information

Editors and Affiliations

Zhejiang University, Zhejiang, China
Huajun Chen
Rensselaer Polytechnic Institute, Troy, New York, USA
Heng Ji
Chinese Academy of Sciences, Beijing, China
Le Sun
Google Research, Mountain View, California, USA
Haixun Wang
Wuhan University, Wuhan, Hubei, China
Tieyun Qian
East China University of Science and Technology, Shanghai, China
Tong Ruan

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Han, X., Song, X., Sun, L. (2016). Large Scale Semantic Relation Discovery: Toward Establishing the Missing Link Between Wikipedia and Semantic Network. In: Chen, H., Ji, H., Sun, L., Wang, H., Qian, T., Ruan, T. (eds) Knowledge Graph and Semantic Computing: Semantic, Knowledge, and Linked Big Data. CCKS 2016. Communications in Computer and Information Science, vol 650. Springer, Singapore. https://doi.org/10.1007/978-981-10-3168-7_6

Download citation

DOI: https://doi.org/10.1007/978-981-10-3168-7_6
Published: 23 November 2016
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-3167-0
Online ISBN: 978-981-10-3168-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics