Skip to main content

Mining High-Quality Fine-Grained Type Information from Chinese Online Encyclopedias

  • Conference paper
  • First Online:
Web Information Systems Engineering – WISE 2018 (WISE 2018)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11234))

Included in the following conference series:

Abstract

Entity typing is a necessary step in building knowledge graphs. So far, plenty of efforts have been made in mining type information for entities from online encyclopedias, but usually only coarse-grained type information could be obtained for entities, which are not fine enough for the purpose of knowledge graphs construction or query answering. The situation becomes even worse for mining type information for entities in Chinese. In this paper, we work on mining high-quality fine-grained type information for entities from not only the title-labels and info-boxes in the entity’s encyclopedias page, but also the abstracts and crowd-labels in the page, which could provide a lot more candidate fine-grained type information (with noises). To maintain the high quality of the mined type information, initially we only get reliable type information from the title-labels and info-boxes. Then by putting entities, attributes, values and types into one graph, some path information can be obtained between each candidate entity-type pair, then we rely on a proposed Path-CNN binary classification model to identify more correct entity-type pairs from the graph. Compared with the previous approach and DBpedia, our work could mine a lot more high-quality fine-grained type information for entities from the online encyclopedia. By performing our approach on the largest Chinese online encyclopedia, Baidu Baike, we have generated 25,651,022 type information (with more than 80% accuracy) for the entities involved in this encyclopedia.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Auer, S., Bizer, C., Kobilarov, G., Lehmann, J., Cyganiak, R., Ives, Z.: DBpedia: a nucleus for a web of open data. In: Aberer, K., et al. (eds.) ASWC/ISWC -2007. LNCS, vol. 4825, pp. 722–735. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-76298-0_52

    Chapter  Google Scholar 

  2. Bollacker, K., Evans, C., Paritosh, P., Sturge, T., Taylor, J.: Freebase: a collaboratively created graph database for structuring human knowledge. In: SIGMOD 2008, pp. 1247–1250 (2008)

    Google Scholar 

  3. Bordes, A., Usunier, N., Garcia-Duran, A., Weston, J., Yakhnenko, O.: Translating embeddings for modeling multi-relational data. In: ICONIP, pp. 2787–2795 (2013)

    Google Scholar 

  4. Chang, J.Z., Tsai, R.T., Chang, J.S.: Wikisense: supersense tagging of Wikipedia named entities based wordnet. In: PACLIC 23, pp. 72–81 (2009)

    Google Scholar 

  5. Cui, W., Wang, H., Wang, H., Song, Y., Hwang, S.W., Wang, W.: KBQA: learning question answering over QA corpora and knowledge bases. PVLDB 10(5), 565–576 (2017)

    Google Scholar 

  6. Dakka, W., Cucerzan, S.: Augmenting Wikipedia with named entity tags. In: IJCNLP, pp. 545–552 (2008)

    Google Scholar 

  7. Dong, Y., Chawla, N.V., Swami, A.: metapath2vec: scalable representation learning for heterogeneous networks. In: KDD, pp. 135–144 (2017)

    Google Scholar 

  8. Fellbaum, C., Miller, G.: WordNet: An Electronic Lexical Database. MIT Press, Cambridge (1998)

    MATH  Google Scholar 

  9. Higashinaka, R., Sadamitsu, K., Saito, K., Makino, T., Matsuo, Y.: Creating an extended named entity dictionary from Wikipedia. In: COLING, pp. 1163–1178 (2012)

    Google Scholar 

  10. Lin, Y., Liu, Z., Zhu, X., Zhu, X., Zhu, X.: Learning entity and relation embeddings for knowledge graph completion. In: AAAI, pp. 2181–2187 (2015)

    Google Scholar 

  11. Navigli, R., Ponzetto, S.P.: BabelNet: the automatic construction, evaluation and application of a wide-coverage multilingual semantic network. Artif. Intell. 193(6), 217–250 (2012)

    Article  MathSciNet  Google Scholar 

  12. Niu, X., Sun, X., Wang, H., Rong, S., Qi, G., Yu, Y.: Zhishi.me - weaving Chinese linking open data. In: Aroyo, L., et al. (eds.) ISWC 2011. LNCS, vol. 7032, pp. 205–220. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-25093-4_14

    Chapter  Google Scholar 

  13. Sekine, S., Sudo, K., Nobata, C.: Extended named entity hierarchy. In: LREC (2002)

    Google Scholar 

  14. Shen, W., Han, J., Wang, J., Yuan, X., Yang, Z.: Shine+: a general framework for domain-specific entity linking with heterogeneous information networks. IEEE Trans. Knowl. Data Eng. 30(2), 353–366 (2018)

    Article  Google Scholar 

  15. Suchanek, F.M., Kasneci, G., Weikum, G.: Yago: a large ontology from Wikipedia and wordnet. Web Semant.: Sci. Serv. Agents World Wide Web 6(3), 203–217 (2008)

    Article  Google Scholar 

  16. Suzuki, M., Matsuda, K., Sekine, S., Okazaki, N., Inui, K.: Neural joint learning for classifying Wikipedia articles into fine-grained named entity types. In: PACLIC 30 (2016)

    Google Scholar 

  17. Tardif, S., Curran, J.R., Murphy, T.: Improved text categorisation for Wikipedia named entities. In: ALTA, pp. 104–108 (2009)

    Google Scholar 

  18. Toral, A., Mu, R.: A proposal to automatically build and maintain gazetteers for named entity recognition by using Wikipedia. In: EACL, pp. 56–61 (2006)

    Google Scholar 

  19. Wang, Q., Liu, J., Luo, Y., Wang, B., Lin, C.Y.: Knowledge base completion via coupled path ranking. In: ACL, pp. 1308–1318 (2016)

    Google Scholar 

  20. Wu, T., Ling, S., Qi, G., Wang, H.: Mining type information from Chinese online encyclopedias. In: Supnithi, T., Yamaguchi, T., Pan, J.Z., Wuwongse, V., Buranarach, M. (eds.) JIST 2014. LNCS, vol. 8943, pp. 213–229. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-15615-6_16

    Chapter  Google Scholar 

  21. Wu, W., Li, H., Wang, H., Zhu, K.Q.: Probase: a probabilistic taxonomy for text understanding. In: SIGMOD, pp. 481–492 (2012)

    Google Scholar 

  22. Xu, B., et al.: CN-DBpedia: a never-ending Chinese knowledge extraction system. In: Benferhat, S., Tabia, K., Ali, M. (eds.) IEA/AIE 2017. LNCS (LNAI), vol. 10351, pp. 428–438. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-60045-1_44

    Chapter  Google Scholar 

Download references

Acknowledgments

This research is partially supported by National Natural Science Foundation of China (Grant No. 61632016, 61402313, 61472263), and the Natural Science Research Project of Jiangsu Higher Education Institution (No. 17KJA520003).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zhixu Li .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Hao, M., Li, Z., Zhao, Y., Zheng, K. (2018). Mining High-Quality Fine-Grained Type Information from Chinese Online Encyclopedias. In: Hacid, H., Cellary, W., Wang, H., Paik, HY., Zhou, R. (eds) Web Information Systems Engineering – WISE 2018. WISE 2018. Lecture Notes in Computer Science(), vol 11234. Springer, Cham. https://doi.org/10.1007/978-3-030-02925-8_25

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-02925-8_25

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-02924-1

  • Online ISBN: 978-3-030-02925-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics