MEFE: A Multi-fEature Knowledge Fusion and Evaluation Method Based on BERT

Ji, Yimu; Hu, Lin; Liu, Shangdong; Xu, Zhengyang; Liu, Yanlan; Liu, Kaihang; Tang, Shuning; Liu, Qiang; Xiao, Wan

doi:10.1007/978-3-030-60239-0_30

Yimu Ji^{9,10,11,12,13},
Lin Hu^9,11,
Shangdong Liu^{9,10,11,12,13},
Zhengyang Xu^9,11,
Yanlan Liu^9,11,
Kaihang Liu^9,11,
Shuning Tang^9,11,
Qiang Liu^9,11 &
…
Wan Xiao^11,12,14

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 12453))

Included in the following conference series:

International Conference on Algorithms and Architectures for Parallel Processing

1973 Accesses
1 Citations

Abstract

Knowledge fusion is an important part of constructing a knowledge graph. In recent years, with the development of major knowledge bases, the integration of multi-source knowledge bases is the focus and difficulty in the field of knowledge fusion. Due to the large differences in knowledge base structure, the efficiency and accuracy of fusion are not high. In response to this problem, this paper proposes MEFE (Multi-fEature Knowledge Fusion and Evaluation Method) based on BERT. MEFE comprehensively considers the attributes, descriptions and category characteristics of entities to perform knowledge fusion on multi-source knowledge bases. Firstly, MEFE uses entity category tags to build a category dictionary. Then, it vectorizes the category tags based on the dictionary and clusters the entities according to the category tags. Finally it uses BERT (Bidirectional Encoder Representation from Transformers) to calculate the entity similarity for the entity pairs in the same group. We calculate entity redundancy rate and information loss rate of knowledge base according to the fusion result, so as to evaluate the quality of the knowledge base. Experiments show that MEFE effectively improves the efficiency of knowledge fusion through clustering, and the use of BERT promotes the accuracy of fusion.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Lehmann, J., Isele, R., Jakob, M.: DBpedia: a largescale, multilingual knowledge base extracted from Wikipedia. Semant. Web 6(2), 167–195 (2015)
Article Google Scholar
Hoffart, J., Suchanek, F.M., Berberich, K.: YAGO2: a spatially and temporally enhanced knowledge base from Wikipedia. Artif. Intell. 194, 28–61 (2013)
Article MathSciNet Google Scholar
Wu, W., Li, H., Wang, H.: Probase: a probabilistic taxonomy for text understanding. In: Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data, Scottsdale, USA, pp. 481–492 (2012)
Google Scholar
Carlson, A., Betteridge, J., Kisiel, B.: Toward an architecture for never ending language learning. In: Proceedings of the 24th AAAI Conference on Artificial Intelligence, vol. 42, no. 4, pp. 1306–1313 (2010)
Google Scholar
Niu, X., Sun, X., Wang, H., Rong, S., Qi, G., Yu, Y.: Zhishi.me - weaving chinese linking open data. In: Aroyo, L., et al. (eds.) ISWC 2011. LNCS, vol. 7032, pp. 205–220. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-25093-4_14
Chapter Google Scholar
Solemn, L.G.: Feng Jianhua: overview of knowledge base entity alignment technology. Comput. Res. Develop. 53(1), 165–192 (2016)
Google Scholar
Suchanek, F.M., Abiteboul, S., Senellart, P.: PARIS: probabilistic alignment of relations, instances, and schema. Proc. VLDB Endow. 5(3), 157–168 (2011)
Article Google Scholar
Lacoste-Julien, S., Palla, K., Davies, A.: SIGMa: simple greedy matching for aligning large knowledge bases. In: Proceedings of the 2013 ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pp. 572–580. ACM, New York (2013)
Google Scholar
Cohen, W., Richman, J.: Learning to match and cluster large high-dimensional data sets for data integration. In: Proceedings of the 2002 ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pp. 475–480. ACM, New York (2002)
Google Scholar
McCallum, A., Wellner, B.: Conditional models of identity uncertainty with application to noun coreference. In: Proceedings of Advances in Neural Information Processing Systems, vol. 17, pp. 905–912. MIT Press, Cambridge, MA (2005)
Google Scholar
He, F., et al.: Unsupervised entity alignment using attribute triples and relation triples. In: Li, G., Yang, J., Gama, J., Natwichai, J., Tong, Y. (eds.) DASFAA 2019. LNCS, vol. 11446, pp. 367–382. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-18576-3_22
Chapter Google Scholar
Trisedya, B.D., Qi, J., Zhang, R.: Entity alignment between knowledge graphs using attribute embeddings. In: AAAI-19, vol. 33, no. 01, pp. 297–304 (2019)
Google Scholar
Zeng, W., Zhao, X., Tang, J.: Collective entity alignment via adaptive features. In: ICDE 2020, pp. 1870–1873 (2020)
Google Scholar
Zhuang, Y., Li, G., Zhong, Z.: Hike: a hybrid human-machine method for entity alignment in large-scale knowledge bases. In: CIKM 2017, pp. 1917–1926 (2017)
Google Scholar
Devlin, J., Chang, M.W., Lee, K.: Bert: pre-training of deep bidirectional transformers for language understanding. arXiv arXiv:1810.04805 (2018)

Download references

Acknowledgments

This work was supported by the National Key R&D Program of China (2017YFB1401300, 2017YFB1401302), Outstanding Youth of Jiangsu Natural Science Foundation (BK20170100), Key R&D Program of Jiangsu (BE2017166), Natural Science Foundation of the Jiangsu Higher Education Institutions of China (No. 19KJB520046), Natural Science Foundation of Jiangsu Province (No. BK20170900), Innovative and Entrepreneurial talents projects of Jiangsu Province, Jiangsu Planned Projects for Postdoctoral Research Funds (No. 2019K024), Six talent peak projects in Jiangsu Province, the Ministry of Education Foundation of Humanities and Social Sciences (No. 20YJC880104), NUPT DingShan Scholar Project and NUPTSF (NY219132) and CCF-Tencent Open Fund WeBank Special Funding (No. CCF-WebankRAGR20190104).

Author information

Authors and Affiliations

School of Computer Science, Nanjing University of Posts and Telecommunications, Nanjing, 210023, China
Yimu Ji, Lin Hu, Shangdong Liu, Zhengyang Xu, Yanlan Liu, Kaihang Liu, Shuning Tang & Qiang Liu
Jiangsu High Technology Research Key Laboratory for Wireless Sensor Networks, Nanjing, 210003, Jiangsu, China
Yimu Ji & Shangdong Liu
Institue of High Performance Computing and Bigdata, Nanjing University of Posts and Telecommunications, Nanjing, 210003, Jiangsu, China
Yimu Ji, Lin Hu, Shangdong Liu, Zhengyang Xu, Yanlan Liu, Kaihang Liu, Shuning Tang, Qiang Liu & Wan Xiao
Nanjing Center of HPC China, Nanjing, 210003, Jiangsu, China
Yimu Ji, Shangdong Liu & Wan Xiao
Jiangsu HPC and Intelligent Processing Engineer Research Center, Nanjing, 210003, Jiangsu, China
Yimu Ji & Shangdong Liu
College of Educational Science and Technology, Nanjing University of Posts and Telecommunications, Nanjing, 210023, China
Wan Xiao

Authors

Yimu Ji
View author publications
You can also search for this author in PubMed Google Scholar
Lin Hu
View author publications
You can also search for this author in PubMed Google Scholar
Shangdong Liu
View author publications
You can also search for this author in PubMed Google Scholar
Zhengyang Xu
View author publications
You can also search for this author in PubMed Google Scholar
Yanlan Liu
View author publications
You can also search for this author in PubMed Google Scholar
Kaihang Liu
View author publications
You can also search for this author in PubMed Google Scholar
Shuning Tang
View author publications
You can also search for this author in PubMed Google Scholar
Qiang Liu
View author publications
You can also search for this author in PubMed Google Scholar
Wan Xiao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Wan Xiao .

Editor information

Editors and Affiliations

Columbia University, New York, NY, USA
Meikang Qiu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ji, Y. et al. (2020). MEFE: A Multi-fEature Knowledge Fusion and Evaluation Method Based on BERT. In: Qiu, M. (eds) Algorithms and Architectures for Parallel Processing. ICA3PP 2020. Lecture Notes in Computer Science(), vol 12453. Springer, Cham. https://doi.org/10.1007/978-3-030-60239-0_30

Download citation

DOI: https://doi.org/10.1007/978-3-030-60239-0_30
Published: 29 September 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-60238-3
Online ISBN: 978-3-030-60239-0
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics