Abstract
The success of open-access knowledge graphs, such as YAGO, and commercial products, such as Google Knowledge Graph, has attracted much attention from both academic and industrial communities in building common-sense and domain-specific knowledge graphs. A natural question arises that how to effectively and efficiently manage a large-scale knowledge graph. Though systems and technologies that use relational storage engines or native graph database management systems are proposed, there exists no widely accepted solution. Therefore, a benchmark for management of knowledge graphs is required.
In this paper, we analyze the requirements of benchmarking knowledge graph management from a specific yet important point-of-view, i.e. characteristics of knowledge graph data. Seventeen statistical features of four knowledge graphs as well as two social networks are studied. We show that through these graphs depict similar structures, their tiny differences may result in totally different storage and indexing strategies, that should not be omitted. Finally, we put forward the requirements to seeding datasets and synthetic data generators for benchmarking knowledge graph management based on the study.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
References
Agichtein, E., Gravano, L.: Snowball: extracting relations from large plain-text collections. In: DL 2000, pp. 85–94. ACM (2000)
Armstrong, T.G., Ponnekanti, V., Borthakur, D., Callaghan, M.: Linkbench: a database benchmark based on the facebook social graph. In: SIGMOD, pp. 1185–1196. ACM (2013)
Boccaletti, S., Latora, V., Moreno, Y., Chavez, M., Hwang, D.-U.: Complex networks: structure and dynamics. Phys. Rep. 424(4–5), 175–308 (2006)
Broder, A., Kumar, R., Maghoul, F., Raghavan, P., Rajagopalan, S., Stata, R., Tomkins, A., Wiener, J.: Graph structure in the web. Comput. Netw. 33(1–6), 309–320 (2000)
Clauset, A., Shalizi, C.R., Newman, M.E.J.: Power-law distributions in empirical data. SIAM Rev. 51(4), 661–703 (2009)
Dalton, J., Dietz, L., Allan, J.: Entity query feature expansion using knowledge base links. In: SIGIR, pp. 365–374. ACM (2014)
David, E., Jon, K.: Networks, Crowds, and Markets: Reasoning About a Highly Connected World. Cambridge University Press, New York (2010)
Erling, O., Averbuch, A., Larriba-Pey, J., Chafi, H., Gubichev, A., Prat, A., Pham, M.-D., Boncz, P.: The ldbc social network benchmark: interactive workload. In: SIGMOD, pp. 619–630. ACM (2015)
Fellbaum, C. (ed.): WordNet: an electronic lexical database. MIT Press, Cambridge (1998)
Hoffart, J., Suchanek, F.M., Berberich, K., Weikum, G.: Yago2: a spatially and temporally enhanced knowledge base from wikipedia. Artif. Intell. 194, 28–61 (2013)
Joshi, M., Sawant, U., Chakrabarti, S.: Knowledge graph and corpus driven segmentation and answer inference for telegraphic entity-seeking queries. In: EMNLP, pp. 1104–1114. ACL (2014)
Kumar, R., Novak, J., Tomkins, A.: Structure and evolution of online social networks. In: KDD, pp. 611–617. ACM (2006)
Lancichinetti, A., Fortunato, S., Radicchi, F.: Benchmark graphs for testing community detection algorithms. Phys. Rev. E 78(4), 046110 (2008)
Lehmann, J., Isele, R., Jakob, M., Jentzsch, A., Kontokostas, D., Mendes, P.N., Hellmann, S., Morsey, M., van Kleef, P., Auer, S., Bizer, C.: DBpedia: a large-scale, multilingual knowledge base extracted from wikipedia. Semant. Web J. 6(2), 167–195 (2015)
Leskovec, J., Sosič, R.: SNAP: a general purpose network analysis and graph mining library in C++, June 2014. http://snap.stanford.edu/snap
Ma, H., Wei, J., Qian, W., Yu, C., Xia, F., Zhou, A.: On benchmarking online social media analytical queries. In: GRADES, p. 10 (2013)
Rajaraman, A., Ullman, J.D.: Mining of Massive Datasets. Cambridge University Press, New York (2011)
Singhal, A.: Introducing the knowledge graph: things, not strings. Official Google Blog, May 2012
Wasserman, S., Faust, K.: Social Network Analysis: Methods and Applications, vol. 8. Cambridge University Press, New-York (1994)
Watts, D., Strogatz, S.: Collective dynamics of ’small-world’ networks. Nature 393, 440–442 (1998)
Acknowledgement
This work is partially supported by National Hightech R&D Program (863 Program) under grant number 2015AA015307, and National Science Foundation of China under grant numbers 61432006 and 61170086. The authors would also like to thank Ping An Technology (Shenzhen) Co., Ltd. for the support of this research.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Cheng, W., Wang, C., Xiao, B., Qian, W., Zhou, A. (2016). On Statistical Characteristics of Real-Life Knowledge Graphs. In: Zhan, J., Han, R., Zicari, R. (eds) Big Data Benchmarks, Performance Optimization, and Emerging Hardware. BPOE 2015. Lecture Notes in Computer Science(), vol 9495. Springer, Cham. https://doi.org/10.1007/978-3-319-29006-5_4
Download citation
DOI: https://doi.org/10.1007/978-3-319-29006-5_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-29005-8
Online ISBN: 978-3-319-29006-5
eBook Packages: Computer ScienceComputer Science (R0)