Abstract
DBpedia and YAGO are the two main data sources serving as the hub of Linking Open Data (LOD), and they both contain Chinese data. Zhishi.me and SSCO extract Chinese knowledge from Wikipedia and other Chinese Encyclopedic Web sites like Baidu-Baike and Hudong-Baike. The quality of these Knowledge Bases (KBs) are not well investigated while their qualities are key to smart applications. In this paper, we evaluate three large Chinese KBs including DBpedia Chinese, zhishi.me and SSCO, and further compare them with English KBs. Since traditional methods on evaluating Web ontology can not be easily adapted to web-scale extracted KBs, we design two metric sets considering Richness and Correctness based on a quasi-formal conceptual representation to measure and compare these KBs. We also design a novel metric set on overlapped instances of different KBs to make the metric results comparable. Finally, we employ random sampling to reduce human efforts for assessing the correctness. The findings in these KBs give a detailed status report of the current situation of extracted KBs in both Chinese and English.
This work was partially supported by the National Science Foundation of China (project No: 61402173), and Software and Integrated Circuit Industry Development Special Funds of Shanghai Economic and Information Commission (project No: 140304).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
- 2.
- 3.
- 4.
- 5.
- 6.
- 7.
- 8.
- 9.
References
Bizer, C., Lehmann, J., Kobilarov, G., Auer, S., Becker, C., Cyganiak, R., Hellmann, S.: Dbpedia-a crystallization point for the web of data. Web Semant.: Sci. Serv. Agents World Wide Web 7(3), 154–165 (2009)
Suchanek, F.M., Kasneci, G., Weikum, G.: Yago: a core of semantic knowledge. In: Proceedings of the 16th international conference on World Wide Web, pp. 697–706. ACM (2007)
Niu, X., Sun, X., Wang, H., Rong, S., Qi, G., Yu, Y.: Zhishi.me - Weaving chinese linking open data. In: Aroyo, L., Welty, C., Alani, H., Taylor, J., Bernstein, A., Kagal, L., Noy, N., Blomqvist, E. (eds.) ISWC 2011, Part II. LNCS, vol. 7032, pp. 205–220. Springer, Heidelberg (2011)
Hu, F., Shao, Z., Ruan, T.: Self-supervised chinese ontology learning from online encyclopedias. The Scientific World Journal, Accepted
Wang, Z., Li, J., Wang, Z., Li, S., Li, M., Zhang, D., Shi, Y., Liu, Y., Zhang, P., Tang, J.: Xlore: A large-scale english-chinese bilingual knowledge graph. In: Proceedings of the International Semantic Web Conference (2013)
Zaveri, A., Rula, A., Maurino, A., Pietrobon, R., Lehmann, J., Auer, S.: Quality assessment methodologies for linked open data. SWJ (2012)
Kontokostas, D., Zaveri, A., Auer, S., Lehmann, J.: TripleCheckMate: A tool for crowdsourcing the quality assessment of linked data. In: Klinov, P., Mouromtsev, D. (eds.) KESW 2013. CCIS, vol. 394, pp. 265–272. Springer, Heidelberg (2013)
Hoffart, J., Suchanek, F.M., Berberich, K., Weikum, G.: Yago2: A spatially and temporally enhanced knowledge base from wikipedia. In: Proceedings of the Twenty-Third International Joint Conference on Artificial Intelligence, pp. 3161–3165. AAAI Press (2013)
Ell, B., Vrandečić, D., Simperl, E.: Labels in the web of data. In: Aroyo, L., Welty, C., Alani, H., Taylor, J., Bernstein, A., Kagal, L., Noy, N., Blomqvist, E. (eds.) ISWC 2011, Part I. LNCS, vol. 7031, pp. 162–176. Springer, Heidelberg (2011)
Zhang, H., Li, Y.F., Tan, H.B.K.: Measuring design complexity of semantic web ontologies. J. Syst. Softw. 83(5), 803–814 (2010)
Guéret, C., Groth, P., Stadler, C., Lehmann, J.: Assessing linked data mappings using network measures. In: Simperl, E., Cimiano, P., Polleres, A., Corcho, O., Presutti, V. (eds.) ESWC 2012. LNCS, vol. 7295, pp. 87–102. Springer, Heidelberg (2012)
Hogan, A., Harth, A., Passant, A., Decker, S., Polleres, A.: Weaving the pedantic web. In: Proceedings of the WWW Workshop on Linked Data on the Web (2010)
Färber, M., Ell, B., Menne, C., Rettinger, A.: A comparative survey of dbpedia, freebase, opencyc, wikidata, and yago
Zaveri, A., Kontokostas, D., Sherif, M.A., Bühmann, L., Morsey, M., Auer, S., Lehmann, J.: User-driven quality evaluation of dbpedia. In: Proceedings of the 9th International Conference on Semantic Systems, pp. 97–104. ACM (2013)
Wienand, D., Paulheim, H.: Detecting incorrect numerical data in DBpedia. In: Presutti, V., d’Amato, C., Gandon, F., d’Aquin, M., Staab, S., Tordai, A. (eds.) ESWC 2014. LNCS, vol. 8465, pp. 504–518. Springer, Heidelberg (2014)
Paulheim, H., Bizer, C.: Improving the quality of linked data using statistical distributions. Inter. J. Semant. Web Inf. Syst. (IJSWIS) 10(2), 63–86 (2014)
Kontokostas, D., Westphal, P., Auer, S., Hellmann, S., Lehmann, J., Cornelissen, R., Zaveri, A.: Test-driven evaluation of linked data quality. In: Proceedings of the 23rd International Conference on World Wide Web, pp. 747–758. ACM (2014)
Dong, X., Gabrilovich, E., Heitz, G., Horn, W., Lao, N., Murphy, K., Strohmann, T., Sun, S., Zhang, W.: Knowledge vault: A web-scale approach to probabilistic knowledge fusion. In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 601–610. ACM (2014)
Mendes, P.N., Mühleisen, H., Bizer, C.: Sieve: linked data quality assessment and fusion. In: Proceedings of the 2012 Joint EDBT/ICDT Workshops, pp. 116–123. ACM (2012)
Bizer, C., Cyganiak, R.: Quality-driven information filtering using the wiqa policy framework. Web Semant.: Sci. Serv. Agents World Wide Web 7(1), 1–10 (2009)
Rieß, C., Heino, N., Tramp, S., Auer, S.: EvoPat – Pattern-based evolution and refactoring of RDF knowledge bases. In: Patel-Schneider, P.F., Pan, Y., Hitzler, P., Mika, P., Zhang, L., Pan, J.Z., Horrocks, I., Glimm, B. (eds.) ISWC 2010, Part I. LNCS, vol. 6496, pp. 647–662. Springer, Heidelberg (2010)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Ruan, T., Dong, X., Wang, H., Li, Y. (2016). Evaluating and Comparing Web-Scale Extracted Knowledge Bases in Chinese and English. In: Qi, G., Kozaki, K., Pan, J., Yu, S. (eds) Semantic Technology. JIST 2015. Lecture Notes in Computer Science(), vol 9544. Springer, Cham. https://doi.org/10.1007/978-3-319-31676-5_12
Download citation
DOI: https://doi.org/10.1007/978-3-319-31676-5_12
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-31675-8
Online ISBN: 978-3-319-31676-5
eBook Packages: Computer ScienceComputer Science (R0)