Catriple: Extracting Triples from Wikipedia Categories

Liu, Qiaoling; Xu, Kaifeng; Zhang, Lei; Wang, Haofen; Yu, Yong; Pan, Yue

doi:10.1007/978-3-540-89704-0_23

Qiaoling Liu³,
Kaifeng Xu³,
Lei Zhang⁴,
Haofen Wang³,
Yong Yu³ &
…
Yue Pan⁴

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 5367))

Included in the following conference series:

Asian Semantic Web Conference

887 Accesses
18 Citations

Abstract

As an important step towards bootstrapping the Semantic Web, many efforts have been made to extract triples from Wikipedia because of its wide coverage, good organization and rich knowledge. One kind of important triples is about Wikipedia articles and their non-isa properties, e.g. (Beijing, country, China). Previous work has tried to extract such triples from Wikipedia infoboxes, article text and categories. The infobox-based and text-based extraction methods depend on the infoboxes and suffer from a low article coverage. In contrast, the category-based extraction methods exploit the widespread categories. However, they rely on predefined properties, which is too effort-consuming and explores only very limited knowledge in the categories. This paper automatically extracts properties and triples from the less explored Wikipedia categories so as to achieve a wider article coverage with less manual effort. We manage to realize this goal by utilizing the syntax and semantics brought by super-sub category pairs in Wikipedia. Our prototype implementation outputs about 10M triples with a 12-level confidence ranging from 47.0% to 96.4%, which cover 78.2% of Wikipedia articles. Among them, 1.27M triples have confidence of 96.4%. Applications can on demand use the triples with suitable confidence.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Uncovering the Semantics of Wikipedia Categories

Exploiting Wikipedia-Based Information-Rich Taxonomy for Extracting Location, Creator and Membership Related Information for ConceptNet Expansion

User Generated Content Oriented Chinese Taxonomy Construction

References

Opennlp, http://opennlp.sourceforge.net/
Porter stemmer, http://tartarus.org/martin/PorterStemmer/
Stanford parser, http://nlp.stanford.edu/software/lex-parser.shtml
Auer, S., Bizer, C., Kobilarov, G., Lehmann, J., Cyganiak, R., Ives, Z.G.: DBpedia: A nucleus for a web of open data. In: Aberer, K., Choi, K.-S., Noy, N., Allemang, D., Lee, K.-I., Nixon, L., Golbeck, J., Mika, P., Maynard, D., Mizoguchi, R., Schreiber, G., Cudré-Mauroux, P. (eds.) ASWC 2007 and ISWC 2007. LNCS, vol. 4825, pp. 722–735. Springer, Heidelberg (2007)
Chapter Google Scholar
Auer, S., Lehmann, J.: What have innsbruck and leipzig in common? Extracting semantics from wiki content. In: Franconi, E., Kifer, M., May, W. (eds.) ESWC 2007. LNCS, vol. 4519, pp. 503–517. Springer, Heidelberg (2007)
Chapter Google Scholar
Herbelot, A., Copestake, A.: Acquiring ontological relationships from wikipedia using RMRS. In: Proc.of the ISWC 2006 Workshop on Web Content Mining with Human Language Technologies (2006)
Google Scholar
Nguyen, D.P.T., Matsuo, Y., Ishizuka, M.: Exploiting Syntactic and Semantic Information for Relation Extraction from Wikipedia. In: IJCAI Workshop on Text-Mining & Link-Analysis, TextLink 2007 (2007)
Google Scholar
Ponzetto, S.P., Strube, M.: Deriving a large-scale taxonomy from wikipedia. In: AAAI 2007, pp. 1440–1445 (2007)
Google Scholar
Ponzetto, S.P., Strube, M.: Knowledge derived from wikipedia for computing semantic relatedness. Journal of Artificial Intelligence Research 30, 181–212 (2007)
Google Scholar
Strube, M., Ponzetto, S.P.: Wikirelate! computing semantic relatedness using wikipedia. In: AAAI (2006)
Google Scholar
Suchanek, F., Kasneci, G., Weikum, G.: Yago: A large ontology from wikipedia and wordnet. Research Report MPI-I-2007-5-003, Max-Planck-Institut für Informatik, Stuhlsatzenhausweg 85, 66123 Saarbrücken, Germany (2007)
Google Scholar
Suchanek, F.M., Kasneci, G., Weikum, G.: Yago: a core of semantic knowledge. In: WWW (2007)
Google Scholar
Wang, G., Yu, Y., Zhu, H.: Pore: Positive-only relation extraction from wikipedia text. In: ISWC/ASWC, pp. 580–594 (2007)
Google Scholar
Wu, F., Weld, D.S.: Autonomously semantifying wikipedia. In: CIKM (2007)
Google Scholar
Wu, F., Weld, D.S.: Automatically refining the wikipedia infobox ontology. In: WWW, pp. 635–644 (2008)
Google Scholar
Yu, J., Thom, J.A., Tam, A.M.: Ontology evaluation using wikipedia categories for browsing. In: CIKM, pp. 223–232 (2007)
Google Scholar
Zesch, T., Gurevych, I.: Analysis of the wikipedia category graph for nlp applications. In: Proceedings of the TextGraphs-2 Workshop (NAACL-HLT) (2007)
Google Scholar
Zirn, C., Nastase, V., Strube, M.: Distinguishing between instances and classes in the wikipedia taxonomy. In: Bechhofer, S., Hauswirth, M., Hoffmann, J., Koubarakis, M. (eds.) ESWC 2008. LNCS, vol. 5021, pp. 376–387. Springer, Heidelberg (2008)
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

Apex Data and Knowledge Management Lab, Shanghai Jiao Tong University, Shanghai, 200240, China
Qiaoling Liu, Kaifeng Xu, Haofen Wang & Yong Yu
IBM China Research Lab, Beijing, 100094, China
Lei Zhang & Yue Pan

Authors

Qiaoling Liu
View author publications
You can also search for this author in PubMed Google Scholar
Kaifeng Xu
View author publications
You can also search for this author in PubMed Google Scholar
Lei Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Haofen Wang
View author publications
You can also search for this author in PubMed Google Scholar
Yong Yu
View author publications
You can also search for this author in PubMed Google Scholar
Yue Pan
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

The Open University Knowledge Media Institute, Walton Hall, MK6 7AA, Milton Keynes, United Kingdom
John Domingue
Shinawatra University 99 Moo 10 Bangtoey, Samkok, 12160, Pathum Thani, Thailand
Chutiporn Anutariya

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Liu, Q., Xu, K., Zhang, L., Wang, H., Yu, Y., Pan, Y. (2008). Catriple: Extracting Triples from Wikipedia Categories. In: Domingue, J., Anutariya, C. (eds) The Semantic Web. ASWC 2008. Lecture Notes in Computer Science, vol 5367. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-89704-0_23

Download citation

DOI: https://doi.org/10.1007/978-3-540-89704-0_23
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-89703-3
Online ISBN: 978-3-540-89704-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics