Skip to main content

Evaluating and Comparing Web-Scale Extracted Knowledge Bases in Chinese and English

  • Conference paper
  • First Online:
Semantic Technology (JIST 2015)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9544))

Included in the following conference series:

Abstract

DBpedia and YAGO are the two main data sources serving as the hub of Linking Open Data (LOD), and they both contain Chinese data. Zhishi.me and SSCO extract Chinese knowledge from Wikipedia and other Chinese Encyclopedic Web sites like Baidu-Baike and Hudong-Baike. The quality of these Knowledge Bases (KBs) are not well investigated while their qualities are key to smart applications. In this paper, we evaluate three large Chinese KBs including DBpedia Chinese, zhishi.me and SSCO, and further compare them with English KBs. Since traditional methods on evaluating Web ontology can not be easily adapted to web-scale extracted KBs, we design two metric sets considering Richness and Correctness based on a quasi-formal conceptual representation to measure and compare these KBs. We also design a novel metric set on overlapped instances of different KBs to make the metric results comparable. Finally, we employ random sampling to reduce human efforts for assessing the correctness. The findings in these KBs give a detailed status report of the current situation of extracted KBs in both Chinese and English.

This work was partially supported by the National Science Foundation of China (project No: 61402173), and Software and Integrated Circuit Industry Development Special Funds of Shanghai Economic and Information Commission (project No: 140304).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://wiki.dbpedia.org/.

  2. 2.

    http://www.mpi-inf.mpg.de/yago/.

  3. 3.

    http://zhishi.me/.

  4. 4.

    http://ssco.zhishimofang.com.

  5. 5.

    http://lists.w3.org/Archives/Public/public-lod/2011Apr/0145.html/.

  6. 6.

    http://kbeval.nlp-bigdatalab.com/results201505.rar.

  7. 7.

    http://linkeddata.informatik.hu-berlin.de/LDSrcAss/datenquelle.php.

  8. 8.

    http://stattrek.com/sample-size/simple-random-sample.aspx.

  9. 9.

    http://stattrek.com/hypothesis-test/difference-in-proportions.aspx.

References

  1. Bizer, C., Lehmann, J., Kobilarov, G., Auer, S., Becker, C., Cyganiak, R., Hellmann, S.: Dbpedia-a crystallization point for the web of data. Web Semant.: Sci. Serv. Agents World Wide Web 7(3), 154–165 (2009)

    Article  Google Scholar 

  2. Suchanek, F.M., Kasneci, G., Weikum, G.: Yago: a core of semantic knowledge. In: Proceedings of the 16th international conference on World Wide Web, pp. 697–706. ACM (2007)

    Google Scholar 

  3. Niu, X., Sun, X., Wang, H., Rong, S., Qi, G., Yu, Y.: Zhishi.me - Weaving chinese linking open data. In: Aroyo, L., Welty, C., Alani, H., Taylor, J., Bernstein, A., Kagal, L., Noy, N., Blomqvist, E. (eds.) ISWC 2011, Part II. LNCS, vol. 7032, pp. 205–220. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  4. Hu, F., Shao, Z., Ruan, T.: Self-supervised chinese ontology learning from online encyclopedias. The Scientific World Journal, Accepted

    Google Scholar 

  5. Wang, Z., Li, J., Wang, Z., Li, S., Li, M., Zhang, D., Shi, Y., Liu, Y., Zhang, P., Tang, J.: Xlore: A large-scale english-chinese bilingual knowledge graph. In: Proceedings of the International Semantic Web Conference (2013)

    Google Scholar 

  6. Zaveri, A., Rula, A., Maurino, A., Pietrobon, R., Lehmann, J., Auer, S.: Quality assessment methodologies for linked open data. SWJ (2012)

    Google Scholar 

  7. Kontokostas, D., Zaveri, A., Auer, S., Lehmann, J.: TripleCheckMate: A tool for crowdsourcing the quality assessment of linked data. In: Klinov, P., Mouromtsev, D. (eds.) KESW 2013. CCIS, vol. 394, pp. 265–272. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  8. Hoffart, J., Suchanek, F.M., Berberich, K., Weikum, G.: Yago2: A spatially and temporally enhanced knowledge base from wikipedia. In: Proceedings of the Twenty-Third International Joint Conference on Artificial Intelligence, pp. 3161–3165. AAAI Press (2013)

    Google Scholar 

  9. Ell, B., Vrandečić, D., Simperl, E.: Labels in the web of data. In: Aroyo, L., Welty, C., Alani, H., Taylor, J., Bernstein, A., Kagal, L., Noy, N., Blomqvist, E. (eds.) ISWC 2011, Part I. LNCS, vol. 7031, pp. 162–176. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  10. Zhang, H., Li, Y.F., Tan, H.B.K.: Measuring design complexity of semantic web ontologies. J. Syst. Softw. 83(5), 803–814 (2010)

    Article  Google Scholar 

  11. Guéret, C., Groth, P., Stadler, C., Lehmann, J.: Assessing linked data mappings using network measures. In: Simperl, E., Cimiano, P., Polleres, A., Corcho, O., Presutti, V. (eds.) ESWC 2012. LNCS, vol. 7295, pp. 87–102. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  12. Hogan, A., Harth, A., Passant, A., Decker, S., Polleres, A.: Weaving the pedantic web. In: Proceedings of the WWW Workshop on Linked Data on the Web (2010)

    Google Scholar 

  13. Färber, M., Ell, B., Menne, C., Rettinger, A.: A comparative survey of dbpedia, freebase, opencyc, wikidata, and yago

    Google Scholar 

  14. Zaveri, A., Kontokostas, D., Sherif, M.A., Bühmann, L., Morsey, M., Auer, S., Lehmann, J.: User-driven quality evaluation of dbpedia. In: Proceedings of the 9th International Conference on Semantic Systems, pp. 97–104. ACM (2013)

    Google Scholar 

  15. Wienand, D., Paulheim, H.: Detecting incorrect numerical data in DBpedia. In: Presutti, V., d’Amato, C., Gandon, F., d’Aquin, M., Staab, S., Tordai, A. (eds.) ESWC 2014. LNCS, vol. 8465, pp. 504–518. Springer, Heidelberg (2014)

    Chapter  Google Scholar 

  16. Paulheim, H., Bizer, C.: Improving the quality of linked data using statistical distributions. Inter. J. Semant. Web Inf. Syst. (IJSWIS) 10(2), 63–86 (2014)

    Article  Google Scholar 

  17. Kontokostas, D., Westphal, P., Auer, S., Hellmann, S., Lehmann, J., Cornelissen, R., Zaveri, A.: Test-driven evaluation of linked data quality. In: Proceedings of the 23rd International Conference on World Wide Web, pp. 747–758. ACM (2014)

    Google Scholar 

  18. Dong, X., Gabrilovich, E., Heitz, G., Horn, W., Lao, N., Murphy, K., Strohmann, T., Sun, S., Zhang, W.: Knowledge vault: A web-scale approach to probabilistic knowledge fusion. In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 601–610. ACM (2014)

    Google Scholar 

  19. Mendes, P.N., Mühleisen, H., Bizer, C.: Sieve: linked data quality assessment and fusion. In: Proceedings of the 2012 Joint EDBT/ICDT Workshops, pp. 116–123. ACM (2012)

    Google Scholar 

  20. Bizer, C., Cyganiak, R.: Quality-driven information filtering using the wiqa policy framework. Web Semant.: Sci. Serv. Agents World Wide Web 7(1), 1–10 (2009)

    Article  Google Scholar 

  21. Rieß, C., Heino, N., Tramp, S., Auer, S.: EvoPat – Pattern-based evolution and refactoring of RDF knowledge bases. In: Patel-Schneider, P.F., Pan, Y., Hitzler, P., Mika, P., Zhang, L., Pan, J.Z., Horrocks, I., Glimm, B. (eds.) ISWC 2010, Part I. LNCS, vol. 6496, pp. 647–662. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tong Ruan .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Ruan, T., Dong, X., Wang, H., Li, Y. (2016). Evaluating and Comparing Web-Scale Extracted Knowledge Bases in Chinese and English. In: Qi, G., Kozaki, K., Pan, J., Yu, S. (eds) Semantic Technology. JIST 2015. Lecture Notes in Computer Science(), vol 9544. Springer, Cham. https://doi.org/10.1007/978-3-319-31676-5_12

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-31676-5_12

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-31675-8

  • Online ISBN: 978-3-319-31676-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics