Skip to main content
Log in

PreKar: A learned performance predictor for knowledge graph stores

  • Published:
World Wide Web Aims and scope Submit manuscript

Abstract

Effective knowledge graph storage management is identified as the basic premise to make full use of knowledge graphs. Due to the lack of performance evaluation for knowledge graph stores, it is difficult for users to decide which one is the best. However, none of existing studies of performance prediction focuses on storage structures. To fill this gap, we propose a learned performance predictor PreKar to estimate the time costs of processing the given workloads on the candidate stores. However, it is challenging to learn a well-trained model due to the low-diversity of historical workloads and the requirement of lightweight embedding strategies. To address this problem, we first develop a novel candidate stores generator, which not only discovers all possible candidate stores for model training, but also multiplies the umber of training instances. Based on the generated stores, we derive an effective and lightweight encoder to not only embed the main features of workloads and stores into the model, but also guarantee the high-efficiency of PreKar. Experimental results on real knowledge graphs demonstrate that PreKar achieves high accuracy on performance prediction and saves a huge amount of time to obtain performance for knowledge graph stores.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

Notes

  1. https://github.com/database-ai4db-group/Dotil/tree/master/PreKar

References

  1. Abadi, D.J., Marcus, A., Madden, S.R., Hollenbach, K.: Sw-store: a vertically partitioned dbms for semantic web data management. The VLDB Journal 18(2), 385–406 (2009)

    Article  Google Scholar 

  2. Acharya, M.S., Armaan, A., Antony, A.S.: A comparison of regression models for prediction of graduate admissions. In: 2019 International Conference on Computational Intelligence in Data Science (ICCIDS), pp. 1–5 (2019) IEEE

  3. Bruno, N., Chaudhuri, S., Gravano, L.: Stholes: A multidimensional workload-aware histogram. In: Proceedings of the 2001 ACM SIGMOD International Conference on Management of Data, pp. 211–222 (2001)

  4. Cai, Y., Hang, H., Yang, H., Lin, Z.: Boosted histogram transform for regression. In: International Conference on Machine Learning, pp. 1251–1261 (2020). PMLR

  5. Cai, T., Li, J., Mian, A.S., Sellis, T., Yu, J.X., et al.: Target-aware holistic influence maximization in spatial social networks. IEEE Transactions on Knowledge and Data Engineering (2020)

  6. Chakkappen, S., Budalakoti, S., Krishnamachari, R., Valluri, S.R., Wood, A., Zait, M.: Adaptive statistics in oracle 12c. Proceedings of the VLDB Endowment 10(12), 1813–1824 (2017)

    Article  Google Scholar 

  7. Chen, J., Zhong, M., Li, J., Wang, D., Qian, T., Tu, H.: Effective deep attributed network representation learning with topology adapted smoothing. IEEE Transactions on Cybernetics (2021)

  8. Durand, M., Flajolet, P.: Loglog counting of large cardinalities. In: European Symposium on Algorithms, pp. 605–617 (2003). Springer

  9. Dutt, A., Wang, C., Nazi, A., Kandula, S., Narasayya, V., Chaudhuri, S.: Selectivity estimation for range predicates using lightweight models. Proceedings of the VLDB Endowment 12(9), 1044–1057 (2019)

    Article  Google Scholar 

  10. Dutta, K., Chandra, S., Gourisaria, M.K., Harshvardhan, G.: A data mining based target regression-oriented approach to modelling of health insurance claims. In: 2021 5th International Conference on Computing Methodologies and Communication (ICCMC), pp. 1168–1175 (2021). IEEE

  11. Graefe, G., Ward, K.: Dynamic query evaluation plans. In: Proceedings of the 1989 ACM SIGMOD International Conference on Management of Data, pp. 358–366(1989)

  12. Gunopulos, D., Kollios, G., Tsotras, V.J., Domeniconi, C.: Selectivity estimators for multidimensional range queries over real attributes. the VLDB Journal 14(2), 137–154 (2005)

  13. Gunopulos, D., Kollios, G., Tsotras, V.J., Domeniconi, C.: Approximating multi-dimensional aggregate range queries over real attributes. Acm Sigmod Record 29(2), 463–474 (2000)

    Article  Google Scholar 

  14. Harbi, R., Abdelaziz, I., Kalnis, P., Mamoulis, N., Ebrahim, Y., Sahli, M.: Accelerating sparql queries by exploiting hash-based locality and adaptive partitioning. The VLDB Journal 25(3), 355–380 (2016)

    Article  Google Scholar 

  15. Harris, S., Nicholas, G.: 3store: Efficient bulk rdf storage. In: Proceedings of the 1st International Workshop on Practical and Scalable Semantic Systems, pp. 81–95 (2004)

  16. Hasan, S., Thirumuruganathan, S., Augustine, J., Koudas, N., Das, G.: Deep learning models for selectivity estimation of multi-attribute queries. In: Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data, pp. 1035–1050 (2020)

  17. Jagadish, H., Jin, H., Ooi, B.C., Tan, K.-L.: Global optimization of histograms. ACM SIGMOD Record 30(2), 223–234 (2001)

    Article  Google Scholar 

  18. Ke, G., Meng, Q., Finley, T., Wang, T., Chen, W., Ma, W., Ye, Q., Liu, T.-Y.: Lightgbm: A highly efficient gradient boosting decision tree. Advances in Neural Information Processing Systems 30, 3146–3154 (2017)

    Google Scholar 

  19. Kipf, A., Kipf, T., Radke, B., Leis, V., Boncz, P., Kemper, A.: Learned cardinalities: Estimating correlated joins with deep learning. arXiv:1809.00677 (2018)

  20. Leis, V., Gubichev, A., Mirchev, A., Boncz, P., Kemper, A., Neumann, T.: How good are query optimizers, really? Proceedings of the VLDB Endowment 9(3), 204–215 (2015)

    Article  Google Scholar 

  21. Li, Z., Wang, X., Li, J., Zhang, Q.: Deep attributed network representation learning of complex coupling and interaction. Knowledge-Based Systems 212, 106618 (2021)

    Article  Google Scholar 

  22. LinkedGeoData. (2015) http://www.linkedgeodata.org/About

  23. Liu, F., Blanas, S.: Forecasting the cost of processing multi-join queries via hashing for main-memory databases. In: Proceedings of the Sixth ACM Symposium on Cloud Computing, pp. 153–166(2015)

  24. LUBM. (2020) http://swat.cse.lehigh.edu/projects/lubm/

  25. Mackert, L.F., Lohman, G.M.: R* optimizer validation and performance evaluation for local queries. In: Proceedings of the 1986 ACM SIGMOD International Conference on Management of Data, pp. 84–95 (1986)

  26. Markl, V., Megiddo, N., Kutsch, M., Tran, T.M., Haas, P., Srivastava, U.: Consistently estimating the selectivity of conjuncts of predicates. In: Proceedings of the 31st International Conference on Very Large Data Bases, pp. 373–384 (2005)

  27. Müller, M., Moerkotte, G., Kolb, O.: Improved selectivity estimation by combining knowledge from sampling and synopses. Proceedings of the VLDB Endowment 11(9), 1016–1028 (2018)

    Article  Google Scholar 

  28. Neo4j. (2022) https://neo4j.com/docs/developer-manual/current/

  29. Neumann, T., Weikum, G.: Rdf-3x: a risc-style engine for rdf. Proceedings of the VLDB Endowment 1(1), 647–659 (2008)

    Article  Google Scholar 

  30. Ozcaglar, C., Geyik, S., Schmitz, B., Sharma, P., Shelkovnykov, A., Ma, Y., Buchanan, E.: Entity personalized talent search models with tree interaction features. In: The World Wide Web Conference, pp. 3116–3122 (2019)

  31. Pan, Z., Heflin, J.: Dldb: Extending relational databases to support semantic web queries. In: Proceedings of the 1st International Workshop on Practical and Scalable Semantic Systems, pp. 109–113 (2004)

  32. Qi, Z., Wang, H., Zhang, H.: A dual-store structure for knowledge graphs. arXiv e-prints, 2012 (2020)

  33. Srivastava, U., Haas, P.J., Markl, V., Kutsch, M., Tran, T.M.: Isomer: Consistent histogram construction using query feedback. In: 22nd International Conference on Data Engineering (ICDE’06), pp. 39–39 (2006). IEEE

  34. Sun, W., Fokoue, A., Srinivas, K., Kementsietsidis, A., Hu, G., Xie, G.: Sqlgraph: An efficient relational-based property graph store. In: Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, pp. 1887–1901 (2015)

  35. Sun, J., Li, G.: An end-to-end learning-based cost estimator. Proceedings of the VLDB Endowment 13(3), 307–319 (2019)

    Article  Google Scholar 

  36. To, H., Chiang, K., Shahabi, C.: Entropy-based histograms for selectivity estimation. In: Proceedings of the 22nd ACM International Conference on Information & Knowledge Management, pp. 1939–1948 (2013)

  37. Tyree, S., Weinberger, K.Q., Agrawal, K., Paykin, J.: Parallel boosted regression trees for web search ranking. In: Proceedings of the 20th International Conference on World Wide Web, pp. 387–396 (2011)

  38. UniProt. (2021) https://www.uniprot.org/help/about

  39. Wang, X., Chen, W.: Knowledge graph data management: Models, methods, and systems. In: International Conference on Web Information Systems Engineering, pp. 3–12 (2020). Springer

  40. Wang, X., Qu, C., Wu, W., Wang, J., Zhou, Q.: Are we ready for learned cardinality estimation? Proceedings of the VLDB Endowment 14(9), 1640–1654 (2021)

    Article  Google Scholar 

  41. WatDiv query templates. https://dsg.uwaterloo.ca/watdiv/basic-testing.shtml

  42. WatDiv. (1933) https://dsg.uwaterloo.ca/watdiv/

  43. Weiss, C., Karras, P., Bernstein, A.: Hexastore: sextuple indexing for semantic web data management. Proceedings of the VLDB Endowment 1(1), 1008–1019 (2008)

    Article  Google Scholar 

  44. Whang, K.-Y., Vander-Zanden, B.T., Taylor, H.M.: A linear-time probabilistic counting algorithm for database applications. ACM Transactions on Database Systems (TODS) 15(2), 208–229 (1990)

    Article  Google Scholar 

  45. Wilkinson, K.: Jena property table implementation. In: Proceedings of the 2nd International Workshop on Scalable Semantic Web Knowledge Base Systems, pp. 35–46 (2006)

  46. Wu, W., Chi, Y., Zhu, S., Tatemura, J., Hacigümüs, H., Naughton, J.F.: Predicting query execution time: Are optimizer cost models really unusable? In: 2013 IEEE 29th International Conference on Data Engineering (ICDE), pp. 1081–1092 (2013). IEEE

  47. Wu, W., Naughton, J.F., Singh, H.: Sampling-based query re-optimization. In: Proceedings of the 2016 International Conference on Management of Data, pp. 1721–1736 (2016)

  48. Wu, Y.-L., Agrawal, D., El Abbadi, A.: Applying the golden rule of sampling for query estimation. ACM SIGMOD Record 30(2), 449–460 (2001)

    Article  Google Scholar 

  49. Xue, G., Zhong, M., Li, J., Chen, J., Zhai, C., Kong, R.: Dynamic network embedding survey. Neurocomputing 472, 212–223 (2022)

    Article  Google Scholar 

  50. YAGO. (2020) https://yago-knowledge.org/

  51. Yang, Y., Guan, Z., Li, J., Zhao, W., Cui, J., Wang, Q.: Interpretable and efficient heterogeneous graph convolutional network. IEEE Transactions on Knowledge and Data Engineering (2021)

  52. Zhang, Z., Yang, W., Wushour, S.: Traffic accident prediction based on lstm-gbrt model. Journal of Control Science and Engineering 2020 (2020)

  53. Zou, L., Özsu, M.T., Chen, L., Shen, X., Huang, R., Zhao, D.: gstore: a graph-based sparql query engine. The VLDB Journal 23(4), 565–590 (2014)

    Article  Google Scholar 

Download references

Acknowledgements

This paper was supported by National Nature Science Foundation of China grant U1866602.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hongzhi Wang.

Ethics declarations

Conflicts of interest

The authors declare that they have no conflict of interest.

Financial interests

This study was funded by National Nature Science Foundation of China (Grant number U1866602).

Non-financial interests

none.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This article belongs to the Topical Collection: Special Issue on Decision Making in Heterogeneous Network Data Scenarios and Applications

Guest Editors: Jiannxin Li, Chengfei Liu, Ziyu Guan, and Yinghui Wu.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Qi, Z., Wang, H., Shen, Z. et al. PreKar: A learned performance predictor for knowledge graph stores. World Wide Web 26, 321–341 (2023). https://doi.org/10.1007/s11280-022-01033-2

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11280-022-01033-2

Keywords

Navigation