Skip to main content

Large Scale Semantic Relation Discovery: Toward Establishing the Missing Link Between Wikipedia and Semantic Network

  • Conference paper
  • First Online:
  • 1466 Accesses

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 650))

Abstract

Wikipedia has been the largest knowledge repository on the Web. However, most of the semantic knowledge in Wikipedia is documented in natural language, which is mostly only human readable and incomprehensible for computer processing. To establish the missing link from Wikipedia to semantic network, this paper proposes a relation discovery method, which can: (1) discover and characterize a large collection of relations from Wikipedia by exploiting the relation pattern regularity, the relation distribution regularity and the relation instance redundancy; and (2) annotate the hyperlinks between Wikipedia articles with the discovered semantic relations. Finally we discover 14,299 relations, 105,661 relation patterns and 5,214,175 relation instances from Wikipedia, and this will be a valuable resource for many NLP and AI tasks.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    http://nlp.stanford.edu/software/corenlp.shtml.

References

  • Agichtein, E., Gravano, L.: Snowball: extracting relations from large plain-text collections. In: Proceedings of the Fifth ACM Conference on Digital Libraries, pp. 85–94. ACM, New York (2000)

    Google Scholar 

  • Amigo, E., Gonzalo, J., Artiles, J., Verdejo, F.: A comparison of extrinsic clustering evaluation metrics based on formal constraints. Ident. Common Mol. Subsequences 12, 461–486 (2009)

    Google Scholar 

  • Auer, S., Bizer, C., Kobilarov, G., Lehmann, J., Cyganiak, R., Ives, Z.: DBpedia: a nucleus for a web of open data. In: Aberer, K., Choi, K.-S., Noy, N., Allemang, D., Lee, K.-I., Nixon, L., Golbeck, J., Mika, P., Maynard, D., Mizoguchi, R., Schreiber, G., Cudré-Mauroux, P. (eds.) ASWC/ISWC -2007. LNCS, vol. 4825, pp. 722–735. Springer, Heidelberg (2007). doi:10.1007/978-3-540-76298-0_52

    Chapter  Google Scholar 

  • Baker, C.F., Charles, J.F., John, B.L.: The Berkeley framenet project. In: Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics, pp. 86–90. Association for Computational Linguistics, Stroudsburg (1998)

    Google Scholar 

  • Banko, M., Cafarella, M.J., Soderland, S., Broadhead, M., Etzioni, O.: Open information extraction from the web. In: IJCAI, vol. 7, pp. 2670–2676 (2007)

    Google Scholar 

  • Bunescu, R., Mooney, R.: A shortest path dependency kernel for relation extraction. In: Proceedings of the Conference on Human Language Technology and Empirical Methods in Natural Language Processing, pp. 724–731. Association for Computational Linguistics, Stroudsburg (2005)

    Google Scholar 

  • Brin, S.: Extracting patterns and relations from the world wide web. In: International Workshop on the World Wide Web and Databases, pp. 172–183 (1999)

    Google Scholar 

  • Carlson, A., Betteridge, J., et al.: Toward an architecture for never-ending language learning. In: Proceedings of the Conference on Artificial Intelligence (AAAI 2010), p. 3. AAAI Press, Palo Alto (2010)

    Google Scholar 

  • Chan, Y.S., Roth, D.: Exploiting syntactico-semantic structures for relation extraction. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 551–560 (2011)

    Google Scholar 

  • Chen, H., Benson, E., et al.: In-domain relation discovery with meta-constraints via posterior regularization. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 530–540. Association for Computational Linguistics, Stroudsburg (2011)

    Google Scholar 

  • De Marneffe, M.C., Manning, C.D.: Stanford typed dependencies manual. Technical report, Stanford University, pp. 338–345 (2008)

    Google Scholar 

  • Doddington, G., et al.: The automatic content extraction (ACE) program–tasks, data, and evaluation. In: Proceedings of LREC (2004)

    Google Scholar 

  • Escobar, M.D., West, M.: Bayesian density estimation and inference using mixtures. J. Am. Stat. Assoc. 90(430), 577–588 (1995)

    Article  MathSciNet  MATH  Google Scholar 

  • Etzioni, O., Banko, M., et al.: Open information extraction from the web. Commun. ACM 51, 68–74 (2008)

    Article  Google Scholar 

  • Etzioni, O., et al.: Web-scale information extraction in knowitall: (preliminary results). In: Proceedings of the 13th International Conference on World Wide Web, pp. 100–110. ACM, New York (2004)

    Google Scholar 

  • Grishman, R., Sundheim, B.: Message understanding conference-6: a brief history. In: Proceedings of the 16th International Conference on Computational Linguistics, pp. 466–471 (1996)

    Google Scholar 

  • Han, X., Sun, L.: An entity-topic model for entity linking. In: Proceedings of EMNLP-CoNLL, pp. 105–115. Association for Computational Linguistics, Stroudsburg (2012)

    Google Scholar 

  • Li, P., Jiang, J., et al.: Generating templates of entity summaries with an entity-aspect model and pattern mining. In: Proceedings of ACL, pp. 640–649. Association for Computational Linguistics, Stroudsburg (2010)

    Google Scholar 

  • Matuszek, C., Cabral, J., Witbrock, M., DeOliveira, J.: An introduction to the syntax and content of Cyc. In: Proceedings of the 2006 AAAI Spring Symposium on Formalizing and Compiling Background Knowledge and its Applications to Knowledge Representation and Question Answering, pp. 44–49. AAAI Press, Palo Alto (2006)

    Google Scholar 

  • Miller, G.A.: WordNet: a lexical database for English. Commun. ACM 38, 39–41 (1995)

    Article  Google Scholar 

  • Mintz, M., Bills, S., Snow, R., Jurafsky D.: Distant supervision for relation extraction without labeled data. In: Proceedings ACL-IJCNLP, pp. 1003—1011. Association for Computational Linguistics, Stroudsburg (2009)

    Google Scholar 

  • Mohamed, T.P., Hruschka, J.E.R., et al.: Discovering relations between noun categories. In: Proceedings of EMNLP, pp. 1447–1455. Association for Computational Linguistics, Stroudsburg (2011)

    Google Scholar 

  • Nakashole, N., Weikum, G., Suchanek, F.: PATTY: a taxonomy of relational patterns with semantic types. In: Proceedings of EMNLP, pp. 1135–1145 (2012)

    Google Scholar 

  • Ponzetto, S.P., Navigli, R.: Large-scale taxonomy mapping for restructuring and integrating Wikipedia. In: Proceedings of the 21th IJCAI, pp. 2083–2088. AAAI Press, Palo Alto (2009)

    Google Scholar 

  • Suchanek, F.M., Kasneci, G., et al.: Yago: a large ontology from wikipedia and wordnet. Web Semant.: Sci. Serv. Agents World Wide Web 6, 203–217 (2008)

    Article  Google Scholar 

  • Teh, Y.W., Jordan, M.I., et al.: Hierarchical Dirichlet processes. J. Am. Stat. Assoc. 101, 1566–1581 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  • Wang, C., Kalyanpur, A., et al.: Relation extraction and scoring in DeepQA. IBM J. Res. Dev. 56, 9:1–9:12 (2012)

    Article  Google Scholar 

  • Wu, F., Weld, D.S.: Autonomously semantifying wikipedia. In: Proceedings of CIKM, pp. 41–50. ACM, New York (2007)

    Google Scholar 

  • Yates, A., et al.: TextRunner: open information extraction on the web. In: Proceedings of HLT-NAACL, pp. 25–26. Association for Computational Linguistics, Stroudsburg (2007)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xianpei Han .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer Nature Singapore Pte Ltd.

About this paper

Cite this paper

Han, X., Song, X., Sun, L. (2016). Large Scale Semantic Relation Discovery: Toward Establishing the Missing Link Between Wikipedia and Semantic Network. In: Chen, H., Ji, H., Sun, L., Wang, H., Qian, T., Ruan, T. (eds) Knowledge Graph and Semantic Computing: Semantic, Knowledge, and Linked Big Data. CCKS 2016. Communications in Computer and Information Science, vol 650. Springer, Singapore. https://doi.org/10.1007/978-981-10-3168-7_6

Download citation

  • DOI: https://doi.org/10.1007/978-981-10-3168-7_6

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-10-3167-0

  • Online ISBN: 978-981-10-3168-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics