Skip to main content

Catriple: Extracting Triples from Wikipedia Categories

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 5367))

Abstract

As an important step towards bootstrapping the Semantic Web, many efforts have been made to extract triples from Wikipedia because of its wide coverage, good organization and rich knowledge. One kind of important triples is about Wikipedia articles and their non-isa properties, e.g. (Beijing, country, China). Previous work has tried to extract such triples from Wikipedia infoboxes, article text and categories. The infobox-based and text-based extraction methods depend on the infoboxes and suffer from a low article coverage. In contrast, the category-based extraction methods exploit the widespread categories. However, they rely on predefined properties, which is too effort-consuming and explores only very limited knowledge in the categories. This paper automatically extracts properties and triples from the less explored Wikipedia categories so as to achieve a wider article coverage with less manual effort. We manage to realize this goal by utilizing the syntax and semantics brought by super-sub category pairs in Wikipedia. Our prototype implementation outputs about 10M triples with a 12-level confidence ranging from 47.0% to 96.4%, which cover 78.2% of Wikipedia articles. Among them, 1.27M triples have confidence of 96.4%. Applications can on demand use the triples with suitable confidence.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Opennlp, http://opennlp.sourceforge.net/

  2. Porter stemmer, http://tartarus.org/martin/PorterStemmer/

  3. Stanford parser, http://nlp.stanford.edu/software/lex-parser.shtml

  4. Auer, S., Bizer, C., Kobilarov, G., Lehmann, J., Cyganiak, R., Ives, Z.G.: DBpedia: A nucleus for a web of open data. In: Aberer, K., Choi, K.-S., Noy, N., Allemang, D., Lee, K.-I., Nixon, L., Golbeck, J., Mika, P., Maynard, D., Mizoguchi, R., Schreiber, G., Cudré-Mauroux, P. (eds.) ASWC 2007 and ISWC 2007. LNCS, vol. 4825, pp. 722–735. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  5. Auer, S., Lehmann, J.: What have innsbruck and leipzig in common? Extracting semantics from wiki content. In: Franconi, E., Kifer, M., May, W. (eds.) ESWC 2007. LNCS, vol. 4519, pp. 503–517. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  6. Herbelot, A., Copestake, A.: Acquiring ontological relationships from wikipedia using RMRS. In: Proc.of the ISWC 2006 Workshop on Web Content Mining with Human Language Technologies (2006)

    Google Scholar 

  7. Nguyen, D.P.T., Matsuo, Y., Ishizuka, M.: Exploiting Syntactic and Semantic Information for Relation Extraction from Wikipedia. In: IJCAI Workshop on Text-Mining & Link-Analysis, TextLink 2007 (2007)

    Google Scholar 

  8. Ponzetto, S.P., Strube, M.: Deriving a large-scale taxonomy from wikipedia. In: AAAI 2007, pp. 1440–1445 (2007)

    Google Scholar 

  9. Ponzetto, S.P., Strube, M.: Knowledge derived from wikipedia for computing semantic relatedness. Journal of Artificial Intelligence Research 30, 181–212 (2007)

    Google Scholar 

  10. Strube, M., Ponzetto, S.P.: Wikirelate! computing semantic relatedness using wikipedia. In: AAAI (2006)

    Google Scholar 

  11. Suchanek, F., Kasneci, G., Weikum, G.: Yago: A large ontology from wikipedia and wordnet. Research Report MPI-I-2007-5-003, Max-Planck-Institut für Informatik, Stuhlsatzenhausweg 85, 66123 Saarbrücken, Germany (2007)

    Google Scholar 

  12. Suchanek, F.M., Kasneci, G., Weikum, G.: Yago: a core of semantic knowledge. In: WWW (2007)

    Google Scholar 

  13. Wang, G., Yu, Y., Zhu, H.: Pore: Positive-only relation extraction from wikipedia text. In: ISWC/ASWC, pp. 580–594 (2007)

    Google Scholar 

  14. Wu, F., Weld, D.S.: Autonomously semantifying wikipedia. In: CIKM (2007)

    Google Scholar 

  15. Wu, F., Weld, D.S.: Automatically refining the wikipedia infobox ontology. In: WWW, pp. 635–644 (2008)

    Google Scholar 

  16. Yu, J., Thom, J.A., Tam, A.M.: Ontology evaluation using wikipedia categories for browsing. In: CIKM, pp. 223–232 (2007)

    Google Scholar 

  17. Zesch, T., Gurevych, I.: Analysis of the wikipedia category graph for nlp applications. In: Proceedings of the TextGraphs-2 Workshop (NAACL-HLT) (2007)

    Google Scholar 

  18. Zirn, C., Nastase, V., Strube, M.: Distinguishing between instances and classes in the wikipedia taxonomy. In: Bechhofer, S., Hauswirth, M., Hoffmann, J., Koubarakis, M. (eds.) ESWC 2008. LNCS, vol. 5021, pp. 376–387. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Liu, Q., Xu, K., Zhang, L., Wang, H., Yu, Y., Pan, Y. (2008). Catriple: Extracting Triples from Wikipedia Categories. In: Domingue, J., Anutariya, C. (eds) The Semantic Web. ASWC 2008. Lecture Notes in Computer Science, vol 5367. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-89704-0_23

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-89704-0_23

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-89703-3

  • Online ISBN: 978-3-540-89704-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics