Skip to main content

On Statistical Characteristics of Real-Life Knowledge Graphs

  • Conference paper
  • First Online:
Big Data Benchmarks, Performance Optimization, and Emerging Hardware (BPOE 2015)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9495))

Included in the following conference series:

Abstract

The success of open-access knowledge graphs, such as YAGO, and commercial products, such as Google Knowledge Graph, has attracted much attention from both academic and industrial communities in building common-sense and domain-specific knowledge graphs. A natural question arises that how to effectively and efficiently manage a large-scale knowledge graph. Though systems and technologies that use relational storage engines or native graph database management systems are proposed, there exists no widely accepted solution. Therefore, a benchmark for management of knowledge graphs is required.

In this paper, we analyze the requirements of benchmarking knowledge graph management from a specific yet important point-of-view, i.e. characteristics of knowledge graph data. Seventeen statistical features of four knowledge graphs as well as two social networks are studied. We show that through these graphs depict similar structures, their tiny differences may result in totally different storage and indexing strategies, that should not be omitted. Finally, we put forward the requirements to seeding datasets and synthetic data generators for benchmarking knowledge graph management based on the study.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 34.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 44.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://www.geonames.org/.

  2. 2.

    http://finance.sina.com.cn/.

References

  1. Agichtein, E., Gravano, L.: Snowball: extracting relations from large plain-text collections. In: DL 2000, pp. 85–94. ACM (2000)

    Google Scholar 

  2. Armstrong, T.G., Ponnekanti, V., Borthakur, D., Callaghan, M.: Linkbench: a database benchmark based on the facebook social graph. In: SIGMOD, pp. 1185–1196. ACM (2013)

    Google Scholar 

  3. Boccaletti, S., Latora, V., Moreno, Y., Chavez, M., Hwang, D.-U.: Complex networks: structure and dynamics. Phys. Rep. 424(4–5), 175–308 (2006)

    Article  MathSciNet  Google Scholar 

  4. Broder, A., Kumar, R., Maghoul, F., Raghavan, P., Rajagopalan, S., Stata, R., Tomkins, A., Wiener, J.: Graph structure in the web. Comput. Netw. 33(1–6), 309–320 (2000)

    Article  Google Scholar 

  5. Clauset, A., Shalizi, C.R., Newman, M.E.J.: Power-law distributions in empirical data. SIAM Rev. 51(4), 661–703 (2009)

    Article  MATH  MathSciNet  Google Scholar 

  6. Dalton, J., Dietz, L., Allan, J.: Entity query feature expansion using knowledge base links. In: SIGIR, pp. 365–374. ACM (2014)

    Google Scholar 

  7. David, E., Jon, K.: Networks, Crowds, and Markets: Reasoning About a Highly Connected World. Cambridge University Press, New York (2010)

    Google Scholar 

  8. Erling, O., Averbuch, A., Larriba-Pey, J., Chafi, H., Gubichev, A., Prat, A., Pham, M.-D., Boncz, P.: The ldbc social network benchmark: interactive workload. In: SIGMOD, pp. 619–630. ACM (2015)

    Google Scholar 

  9. Fellbaum, C. (ed.): WordNet: an electronic lexical database. MIT Press, Cambridge (1998)

    MATH  Google Scholar 

  10. Hoffart, J., Suchanek, F.M., Berberich, K., Weikum, G.: Yago2: a spatially and temporally enhanced knowledge base from wikipedia. Artif. Intell. 194, 28–61 (2013)

    Article  MATH  MathSciNet  Google Scholar 

  11. Joshi, M., Sawant, U., Chakrabarti, S.: Knowledge graph and corpus driven segmentation and answer inference for telegraphic entity-seeking queries. In: EMNLP, pp. 1104–1114. ACL (2014)

    Google Scholar 

  12. Kumar, R., Novak, J., Tomkins, A.: Structure and evolution of online social networks. In: KDD, pp. 611–617. ACM (2006)

    Google Scholar 

  13. Lancichinetti, A., Fortunato, S., Radicchi, F.: Benchmark graphs for testing community detection algorithms. Phys. Rev. E 78(4), 046110 (2008)

    Article  Google Scholar 

  14. Lehmann, J., Isele, R., Jakob, M., Jentzsch, A., Kontokostas, D., Mendes, P.N., Hellmann, S., Morsey, M., van Kleef, P., Auer, S., Bizer, C.: DBpedia: a large-scale, multilingual knowledge base extracted from wikipedia. Semant. Web J. 6(2), 167–195 (2015)

    Google Scholar 

  15. Leskovec, J., Sosič, R.: SNAP: a general purpose network analysis and graph mining library in C++, June 2014. http://snap.stanford.edu/snap

  16. Ma, H., Wei, J., Qian, W., Yu, C., Xia, F., Zhou, A.: On benchmarking online social media analytical queries. In: GRADES, p. 10 (2013)

    Google Scholar 

  17. Rajaraman, A., Ullman, J.D.: Mining of Massive Datasets. Cambridge University Press, New York (2011)

    Book  Google Scholar 

  18. Singhal, A.: Introducing the knowledge graph: things, not strings. Official Google Blog, May 2012

    Google Scholar 

  19. Wasserman, S., Faust, K.: Social Network Analysis: Methods and Applications, vol. 8. Cambridge University Press, New-York (1994)

    Book  Google Scholar 

  20. Watts, D., Strogatz, S.: Collective dynamics of ’small-world’ networks. Nature 393, 440–442 (1998)

    Article  Google Scholar 

Download references

Acknowledgement

This work is partially supported by National Hightech R&D Program (863 Program) under grant number 2015AA015307, and National Science Foundation of China under grant numbers 61432006 and 61170086. The authors would also like to thank Ping An Technology (Shenzhen) Co., Ltd. for the support of this research.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Weining Qian .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Cheng, W., Wang, C., Xiao, B., Qian, W., Zhou, A. (2016). On Statistical Characteristics of Real-Life Knowledge Graphs. In: Zhan, J., Han, R., Zicari, R. (eds) Big Data Benchmarks, Performance Optimization, and Emerging Hardware. BPOE 2015. Lecture Notes in Computer Science(), vol 9495. Springer, Cham. https://doi.org/10.1007/978-3-319-29006-5_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-29006-5_4

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-29005-8

  • Online ISBN: 978-3-319-29006-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics