Skip to main content

CASIA-KB: A Multi-source Chinese Semantic Knowledge Base Built from Structured and Unstructured Web Data

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8388))

Abstract

Knowledge bases play a crucial role in intelligent systems, especially in the Web age. Many domain dependent and general purpose knowledge bases have been developed to support various kinds of applications. In this paper, we propose the CASIA-KB, a Chinese semantic knowledge base built from various Web resources. CASIA-KB utilizes Semantic Web and Natural Language Processing techniques and mainly focuses on declarative knowledge. Most of the knowledge is textual knowledge extracted from structured and unstructured sources, such as Web-based Encyclopedias (where more formal and static knowledge comes from), Microblog posts and News (where most updated factual knowledge comes from). CASIA-KB also aims at bringing in images and videos (which serve as non-textual knowledge) as relevant knowledge for specific instances and concepts since they bring additional interpretation and understanding of textual knowledge. For knowledge base organization, we briefly discussed the current ontology of CASIA-KB and the entity linking efforts for linking semantically equivalent entities together. In addition, we build up a SPARQL endpoint with visualization functionality for query processing and result presentation, which can produce query output in different formats and with result visualization supports. Analysis on the entity degree distributions of each individual knowledge source and the whole CASIA-KB shows that each of the branch knowledge base follows power law distribution and when entities from different resources are linked together to build a merged knowledge base, the whole knowledge base still keeps this structural property.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    http://jena.apache.org/documentation/tdb/

  2. 2.

    CASIA Tour Recommendation System: http://www.linked-neuron-data.org/CASIAPOIR.

  3. 3.

    Sogou Lab News Data <http://www.sogou.com/labs/dl/ca.html>.

References

  1. Lenat, D.B.: CYC: a large-scale investment in knowledge infrastructure. Commun. ACM 38(11), 32–38 (1995)

    Google Scholar 

  2. Bizer, C., Lehmann, J., Kobilarov, G., Auer, S., Becker, C., Cyganiak, R., Hellmann, S.: DBpedia - a crystallization point for the web of data. Web Semant.: Sci. Serv. Agents World Wide Web 7(3), 154–165 (2009)

    Article  Google Scholar 

  3. Suchanek, F.M., Kasneci, G., Weikum, G.: YAGO: a large ontology from Wikipedia and WordNet. Web Semant.: Sci. Serv. Agents World Wide Web 6(3), 203–217 (2008)

    Article  Google Scholar 

  4. Havasi, C., Speer, R., Alonso, J.: ConceptNet: a lexical resource for common sense knowledge. In: Nicolov, N., Angelova, G., Mitkov, R. (eds.) Recent Advances in Natural Language Processing, vol. 5. John Bernjamins Publishers, Amsterdam (2009)

    Google Scholar 

  5. Carlson, A., Betteridge, J., Kisiel, B., Settles, B., Hruschka, E.R., Jr., Mitchell, T.M.: Toward an architecture for never-ending language learning. In: Proceedings of the Twenty-Fourth AAAI Conference on Artificial Intelligence (AAAI 2010), Georgia, USA, 11–15 July 2010

    Google Scholar 

  6. Cao, C., Feng, Q., Gao, Y., Gu, F., Si, J., Sui, Y., Tian, W., Wang, H., Wang, L., Zeng, Q., Zhang, C., Zheng, Y., Zhou, X.: Progress in the development of national knowledge infrastructure. J. Comput. Sci. Technol. 17(5), 523–534 (2002)

    Article  MATH  Google Scholar 

  7. Niu, X., Sun, X., Wang, H., Rong, S., Qi, G., Yu, Y.: Zhishi.me - weaving Chinese linking open data. In: Aroyo, L., Welty, C., Alani, H., Taylor, J., Bernstein, A., Kagal, L., Noy, N., Blomqvist, E. (eds.) ISWC 2011, Part II. LNCS, vol. 7032, pp. 205–220. Springer, Heidelberg (2011)

    Google Scholar 

  8. Wang, Z., Wang, Z., Li, J., Pan, J.: Knowledge extraction from Chinese Wiki Encyclopedias. J. Zhejiang Univ.-SCI. C 13(4), 268–280 (2012)

    Article  Google Scholar 

  9. Wang, Z., Li, J., Wang, Z., Tang, J.: Cross-lingual knowledge linking across Wiki knowledge bases. In: Proceedings of the 21st World Wide Web Conference (WWW 2012), Lyon, France, 16–20 April 2012, pp. 459–468 (2012)

    Google Scholar 

  10. Zeng, Y., Wang, H., Hao, H., Xu, B.: Statistical and structural analysis of web-based collaborative knowledge bases generated from Wiki Encyclopedia. In: Proceedings of the 2012 IEEE/WIC/ACM International Conference on Web Intelligence (WI 2012), Macau, China, 4–7 December 2012, pp. 553–557 (2012)

    Google Scholar 

  11. Zeng, Y., Hao, H., Xu, B.: Entity conceptualization and understanding based on web-scale knowledge bases. In: Proceedings of the 2013 IEEE International Conference on System, Man, and Cybernetics (SMC 2013), Manchester, UK (2013)

    Google Scholar 

  12. Larkin, J., Simon, H.: Why a diagram is (Sometimes) worth ten thousand words. Cogn. Sci. 11, 65–99 (1987)

    Article  Google Scholar 

  13. Song, Y., Wang, H., Wang, Z., Li, H., Chen, W.: Short text conceptualization using a probabilistic knowledgebase. In: Proceedings of the 22nd International Joint Conference on Artificial Intelligence (IJCAI 2011), Barcelona, Spain, 16–22 July 2011, pp. 2330–2336 (2011)

    Google Scholar 

  14. Zeng, Y., Wang, D., Zhang, T., Wang, H., Hao, H.: Linking entities in short texts based on a Chinese semantic knowledge base. In: Zhou, G., Li, J., Zhao, D., Feng, Y. (eds.) NLPCC 2013. CCIS, vol. 400, pp. 266–276. Springer, Heidelberg (2013)

    Google Scholar 

  15. Sun, X., Zhuge, H.: Merging complex networks. In: Proceedings of the Seventh International Conference on Semantics Knowledge and Grid (SKG 2011), Beijing, China, 24–26 October 2011, pp. 233–236 (2011)

    Google Scholar 

Download references

Acknowledgement

The set of news Web pages are from Sogou LabFootnote 3. Peng Zhou is involved in crawling the Baidu and Hudong Encyclopedia Web pages. The set of Sina Weibo microblog posts are crawled by Bo Xu and Feng Wang from the Computational Brain Group at Institute of Automation, Chinese Academy of Sciences. This study was supported by the Young Scientists Fund of the National Natural Science Foundation of China (61100128).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yi Zeng .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Zeng, Y., Wang, D., Zhang, T., Wang, H., Hao, H., Xu, B. (2014). CASIA-KB: A Multi-source Chinese Semantic Knowledge Base Built from Structured and Unstructured Web Data. In: Kim, W., Ding, Y., Kim, HG. (eds) Semantic Technology. JIST 2013. Lecture Notes in Computer Science(), vol 8388. Springer, Cham. https://doi.org/10.1007/978-3-319-06826-8_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-06826-8_7

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-06825-1

  • Online ISBN: 978-3-319-06826-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics