Abstract
Entity typing is a necessary step in building knowledge graphs. So far, plenty of efforts have been made in mining type information for entities from online encyclopedias, but usually only coarse-grained type information could be obtained for entities, which are not fine enough for the purpose of knowledge graphs construction or query answering. The situation becomes even worse for mining type information for entities in Chinese. In this paper, we work on mining high-quality fine-grained type information for entities from not only the title-labels and info-boxes in the entity’s encyclopedias page, but also the abstracts and crowd-labels in the page, which could provide a lot more candidate fine-grained type information (with noises). To maintain the high quality of the mined type information, initially we only get reliable type information from the title-labels and info-boxes. Then by putting entities, attributes, values and types into one graph, some path information can be obtained between each candidate entity-type pair, then we rely on a proposed Path-CNN binary classification model to identify more correct entity-type pairs from the graph. Compared with the previous approach and DBpedia, our work could mine a lot more high-quality fine-grained type information for entities from the online encyclopedia. By performing our approach on the largest Chinese online encyclopedia, Baidu Baike, we have generated 25,651,022 type information (with more than 80% accuracy) for the entities involved in this encyclopedia.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Auer, S., Bizer, C., Kobilarov, G., Lehmann, J., Cyganiak, R., Ives, Z.: DBpedia: a nucleus for a web of open data. In: Aberer, K., et al. (eds.) ASWC/ISWC -2007. LNCS, vol. 4825, pp. 722–735. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-76298-0_52
Bollacker, K., Evans, C., Paritosh, P., Sturge, T., Taylor, J.: Freebase: a collaboratively created graph database for structuring human knowledge. In: SIGMOD 2008, pp. 1247–1250 (2008)
Bordes, A., Usunier, N., Garcia-Duran, A., Weston, J., Yakhnenko, O.: Translating embeddings for modeling multi-relational data. In: ICONIP, pp. 2787–2795 (2013)
Chang, J.Z., Tsai, R.T., Chang, J.S.: Wikisense: supersense tagging of Wikipedia named entities based wordnet. In: PACLIC 23, pp. 72–81 (2009)
Cui, W., Wang, H., Wang, H., Song, Y., Hwang, S.W., Wang, W.: KBQA: learning question answering over QA corpora and knowledge bases. PVLDB 10(5), 565–576 (2017)
Dakka, W., Cucerzan, S.: Augmenting Wikipedia with named entity tags. In: IJCNLP, pp. 545–552 (2008)
Dong, Y., Chawla, N.V., Swami, A.: metapath2vec: scalable representation learning for heterogeneous networks. In: KDD, pp. 135–144 (2017)
Fellbaum, C., Miller, G.: WordNet: An Electronic Lexical Database. MIT Press, Cambridge (1998)
Higashinaka, R., Sadamitsu, K., Saito, K., Makino, T., Matsuo, Y.: Creating an extended named entity dictionary from Wikipedia. In: COLING, pp. 1163–1178 (2012)
Lin, Y., Liu, Z., Zhu, X., Zhu, X., Zhu, X.: Learning entity and relation embeddings for knowledge graph completion. In: AAAI, pp. 2181–2187 (2015)
Navigli, R., Ponzetto, S.P.: BabelNet: the automatic construction, evaluation and application of a wide-coverage multilingual semantic network. Artif. Intell. 193(6), 217–250 (2012)
Niu, X., Sun, X., Wang, H., Rong, S., Qi, G., Yu, Y.: Zhishi.me - weaving Chinese linking open data. In: Aroyo, L., et al. (eds.) ISWC 2011. LNCS, vol. 7032, pp. 205–220. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-25093-4_14
Sekine, S., Sudo, K., Nobata, C.: Extended named entity hierarchy. In: LREC (2002)
Shen, W., Han, J., Wang, J., Yuan, X., Yang, Z.: Shine+: a general framework for domain-specific entity linking with heterogeneous information networks. IEEE Trans. Knowl. Data Eng. 30(2), 353–366 (2018)
Suchanek, F.M., Kasneci, G., Weikum, G.: Yago: a large ontology from Wikipedia and wordnet. Web Semant.: Sci. Serv. Agents World Wide Web 6(3), 203–217 (2008)
Suzuki, M., Matsuda, K., Sekine, S., Okazaki, N., Inui, K.: Neural joint learning for classifying Wikipedia articles into fine-grained named entity types. In: PACLIC 30 (2016)
Tardif, S., Curran, J.R., Murphy, T.: Improved text categorisation for Wikipedia named entities. In: ALTA, pp. 104–108 (2009)
Toral, A., Mu, R.: A proposal to automatically build and maintain gazetteers for named entity recognition by using Wikipedia. In: EACL, pp. 56–61 (2006)
Wang, Q., Liu, J., Luo, Y., Wang, B., Lin, C.Y.: Knowledge base completion via coupled path ranking. In: ACL, pp. 1308–1318 (2016)
Wu, T., Ling, S., Qi, G., Wang, H.: Mining type information from Chinese online encyclopedias. In: Supnithi, T., Yamaguchi, T., Pan, J.Z., Wuwongse, V., Buranarach, M. (eds.) JIST 2014. LNCS, vol. 8943, pp. 213–229. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-15615-6_16
Wu, W., Li, H., Wang, H., Zhu, K.Q.: Probase: a probabilistic taxonomy for text understanding. In: SIGMOD, pp. 481–492 (2012)
Xu, B., et al.: CN-DBpedia: a never-ending Chinese knowledge extraction system. In: Benferhat, S., Tabia, K., Ali, M. (eds.) IEA/AIE 2017. LNCS (LNAI), vol. 10351, pp. 428–438. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-60045-1_44
Acknowledgments
This research is partially supported by National Natural Science Foundation of China (Grant No. 61632016, 61402313, 61472263), and the Natural Science Research Project of Jiangsu Higher Education Institution (No. 17KJA520003).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Hao, M., Li, Z., Zhao, Y., Zheng, K. (2018). Mining High-Quality Fine-Grained Type Information from Chinese Online Encyclopedias. In: Hacid, H., Cellary, W., Wang, H., Paik, HY., Zhou, R. (eds) Web Information Systems Engineering – WISE 2018. WISE 2018. Lecture Notes in Computer Science(), vol 11234. Springer, Cham. https://doi.org/10.1007/978-3-030-02925-8_25
Download citation
DOI: https://doi.org/10.1007/978-3-030-02925-8_25
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-02924-1
Online ISBN: 978-3-030-02925-8
eBook Packages: Computer ScienceComputer Science (R0)