PreKar: A learned performance predictor for knowledge graph stores

Qi, Zhixin; Wang, Hongzhi; Shen, Ziming; Yang, Donghua

doi:10.1007/s11280-022-01033-2

PreKar: A learned performance predictor for knowledge graph stores

Published: 23 March 2022

Volume 26, pages 321–341, (2023)
Cite this article

World Wide Web Aims and scope Submit manuscript

Zhixin Qi¹,
Hongzhi Wang ORCID: orcid.org/0000-0002-7521-2871¹,
Ziming Shen¹ &
…
Donghua Yang¹

443 Accesses
1 Altmetric
Explore all metrics

Abstract

Effective knowledge graph storage management is identified as the basic premise to make full use of knowledge graphs. Due to the lack of performance evaluation for knowledge graph stores, it is difficult for users to decide which one is the best. However, none of existing studies of performance prediction focuses on storage structures. To fill this gap, we propose a learned performance predictor PreKar to estimate the time costs of processing the given workloads on the candidate stores. However, it is challenging to learn a well-trained model due to the low-diversity of historical workloads and the requirement of lightweight embedding strategies. To address this problem, we first develop a novel candidate stores generator, which not only discovers all possible candidate stores for model training, but also multiplies the umber of training instances. Based on the generated stores, we derive an effective and lightweight encoder to not only embed the main features of workloads and stores into the model, but also guarantee the high-efficiency of PreKar. Experimental results on real knowledge graphs demonstrate that PreKar achieves high accuracy on performance prediction and saves a huge amount of time to obtain performance for knowledge graph stores.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 3

Empirical Evaluation of a Cloud-Based Graph Database: the Case of Neptune

A Dependency-Aware Storage Schema Selection Mechanism for In-Memory Big Data Computing Frameworks

Article 09 April 2019

Bo Wang, Jie Tang, … Deyu Qi

Cognitive SSD+: a deep learning engine for energy-efficient unstructured data retrieval

Article 20 May 2022

Shengwen Liang, Ying Wang, … Xiaowei Li

Notes

https://github.com/database-ai4db-group/Dotil/tree/master/PreKar

References

Abadi, D.J., Marcus, A., Madden, S.R., Hollenbach, K.: Sw-store: a vertically partitioned dbms for semantic web data management. The VLDB Journal 18(2), 385–406 (2009)
Article Google Scholar
Acharya, M.S., Armaan, A., Antony, A.S.: A comparison of regression models for prediction of graduate admissions. In: 2019 International Conference on Computational Intelligence in Data Science (ICCIDS), pp. 1–5 (2019) IEEE
Bruno, N., Chaudhuri, S., Gravano, L.: Stholes: A multidimensional workload-aware histogram. In: Proceedings of the 2001 ACM SIGMOD International Conference on Management of Data, pp. 211–222 (2001)
Cai, Y., Hang, H., Yang, H., Lin, Z.: Boosted histogram transform for regression. In: International Conference on Machine Learning, pp. 1251–1261 (2020). PMLR
Cai, T., Li, J., Mian, A.S., Sellis, T., Yu, J.X., et al.: Target-aware holistic influence maximization in spatial social networks. IEEE Transactions on Knowledge and Data Engineering (2020)
Chakkappen, S., Budalakoti, S., Krishnamachari, R., Valluri, S.R., Wood, A., Zait, M.: Adaptive statistics in oracle 12c. Proceedings of the VLDB Endowment 10(12), 1813–1824 (2017)
Article Google Scholar
Chen, J., Zhong, M., Li, J., Wang, D., Qian, T., Tu, H.: Effective deep attributed network representation learning with topology adapted smoothing. IEEE Transactions on Cybernetics (2021)
Durand, M., Flajolet, P.: Loglog counting of large cardinalities. In: European Symposium on Algorithms, pp. 605–617 (2003). Springer
Dutt, A., Wang, C., Nazi, A., Kandula, S., Narasayya, V., Chaudhuri, S.: Selectivity estimation for range predicates using lightweight models. Proceedings of the VLDB Endowment 12(9), 1044–1057 (2019)
Article Google Scholar
Dutta, K., Chandra, S., Gourisaria, M.K., Harshvardhan, G.: A data mining based target regression-oriented approach to modelling of health insurance claims. In: 2021 5th International Conference on Computing Methodologies and Communication (ICCMC), pp. 1168–1175 (2021). IEEE
Graefe, G., Ward, K.: Dynamic query evaluation plans. In: Proceedings of the 1989 ACM SIGMOD International Conference on Management of Data, pp. 358–366(1989)
Gunopulos, D., Kollios, G., Tsotras, V.J., Domeniconi, C.: Selectivity estimators for multidimensional range queries over real attributes. the VLDB Journal 14(2), 137–154 (2005)
Gunopulos, D., Kollios, G., Tsotras, V.J., Domeniconi, C.: Approximating multi-dimensional aggregate range queries over real attributes. Acm Sigmod Record 29(2), 463–474 (2000)
Article Google Scholar
Harbi, R., Abdelaziz, I., Kalnis, P., Mamoulis, N., Ebrahim, Y., Sahli, M.: Accelerating sparql queries by exploiting hash-based locality and adaptive partitioning. The VLDB Journal 25(3), 355–380 (2016)
Article Google Scholar
Harris, S., Nicholas, G.: 3store: Efficient bulk rdf storage. In: Proceedings of the 1st International Workshop on Practical and Scalable Semantic Systems, pp. 81–95 (2004)
Hasan, S., Thirumuruganathan, S., Augustine, J., Koudas, N., Das, G.: Deep learning models for selectivity estimation of multi-attribute queries. In: Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data, pp. 1035–1050 (2020)
Jagadish, H., Jin, H., Ooi, B.C., Tan, K.-L.: Global optimization of histograms. ACM SIGMOD Record 30(2), 223–234 (2001)
Article Google Scholar
Ke, G., Meng, Q., Finley, T., Wang, T., Chen, W., Ma, W., Ye, Q., Liu, T.-Y.: Lightgbm: A highly efficient gradient boosting decision tree. Advances in Neural Information Processing Systems 30, 3146–3154 (2017)
Google Scholar
Kipf, A., Kipf, T., Radke, B., Leis, V., Boncz, P., Kemper, A.: Learned cardinalities: Estimating correlated joins with deep learning. arXiv:1809.00677 (2018)
Leis, V., Gubichev, A., Mirchev, A., Boncz, P., Kemper, A., Neumann, T.: How good are query optimizers, really? Proceedings of the VLDB Endowment 9(3), 204–215 (2015)
Article Google Scholar
Li, Z., Wang, X., Li, J., Zhang, Q.: Deep attributed network representation learning of complex coupling and interaction. Knowledge-Based Systems 212, 106618 (2021)
Article Google Scholar
LinkedGeoData. (2015) http://www.linkedgeodata.org/About
Liu, F., Blanas, S.: Forecasting the cost of processing multi-join queries via hashing for main-memory databases. In: Proceedings of the Sixth ACM Symposium on Cloud Computing, pp. 153–166(2015)
LUBM. (2020) http://swat.cse.lehigh.edu/projects/lubm/
Mackert, L.F., Lohman, G.M.: R* optimizer validation and performance evaluation for local queries. In: Proceedings of the 1986 ACM SIGMOD International Conference on Management of Data, pp. 84–95 (1986)
Markl, V., Megiddo, N., Kutsch, M., Tran, T.M., Haas, P., Srivastava, U.: Consistently estimating the selectivity of conjuncts of predicates. In: Proceedings of the 31st International Conference on Very Large Data Bases, pp. 373–384 (2005)
Müller, M., Moerkotte, G., Kolb, O.: Improved selectivity estimation by combining knowledge from sampling and synopses. Proceedings of the VLDB Endowment 11(9), 1016–1028 (2018)
Article Google Scholar
Neo4j. (2022) https://neo4j.com/docs/developer-manual/current/
Neumann, T., Weikum, G.: Rdf-3x: a risc-style engine for rdf. Proceedings of the VLDB Endowment 1(1), 647–659 (2008)
Article Google Scholar
Ozcaglar, C., Geyik, S., Schmitz, B., Sharma, P., Shelkovnykov, A., Ma, Y., Buchanan, E.: Entity personalized talent search models with tree interaction features. In: The World Wide Web Conference, pp. 3116–3122 (2019)
Pan, Z., Heflin, J.: Dldb: Extending relational databases to support semantic web queries. In: Proceedings of the 1st International Workshop on Practical and Scalable Semantic Systems, pp. 109–113 (2004)
Qi, Z., Wang, H., Zhang, H.: A dual-store structure for knowledge graphs. arXiv e-prints, 2012 (2020)
Srivastava, U., Haas, P.J., Markl, V., Kutsch, M., Tran, T.M.: Isomer: Consistent histogram construction using query feedback. In: 22nd International Conference on Data Engineering (ICDE’06), pp. 39–39 (2006). IEEE
Sun, W., Fokoue, A., Srinivas, K., Kementsietsidis, A., Hu, G., Xie, G.: Sqlgraph: An efficient relational-based property graph store. In: Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, pp. 1887–1901 (2015)
Sun, J., Li, G.: An end-to-end learning-based cost estimator. Proceedings of the VLDB Endowment 13(3), 307–319 (2019)
Article Google Scholar
To, H., Chiang, K., Shahabi, C.: Entropy-based histograms for selectivity estimation. In: Proceedings of the 22nd ACM International Conference on Information & Knowledge Management, pp. 1939–1948 (2013)
Tyree, S., Weinberger, K.Q., Agrawal, K., Paykin, J.: Parallel boosted regression trees for web search ranking. In: Proceedings of the 20th International Conference on World Wide Web, pp. 387–396 (2011)
UniProt. (2021) https://www.uniprot.org/help/about
Wang, X., Chen, W.: Knowledge graph data management: Models, methods, and systems. In: International Conference on Web Information Systems Engineering, pp. 3–12 (2020). Springer
Wang, X., Qu, C., Wu, W., Wang, J., Zhou, Q.: Are we ready for learned cardinality estimation? Proceedings of the VLDB Endowment 14(9), 1640–1654 (2021)
Article Google Scholar
WatDiv query templates. https://dsg.uwaterloo.ca/watdiv/basic-testing.shtml
WatDiv. (1933) https://dsg.uwaterloo.ca/watdiv/
Weiss, C., Karras, P., Bernstein, A.: Hexastore: sextuple indexing for semantic web data management. Proceedings of the VLDB Endowment 1(1), 1008–1019 (2008)
Article Google Scholar
Whang, K.-Y., Vander-Zanden, B.T., Taylor, H.M.: A linear-time probabilistic counting algorithm for database applications. ACM Transactions on Database Systems (TODS) 15(2), 208–229 (1990)
Article Google Scholar
Wilkinson, K.: Jena property table implementation. In: Proceedings of the 2nd International Workshop on Scalable Semantic Web Knowledge Base Systems, pp. 35–46 (2006)
Wu, W., Chi, Y., Zhu, S., Tatemura, J., Hacigümüs, H., Naughton, J.F.: Predicting query execution time: Are optimizer cost models really unusable? In: 2013 IEEE 29th International Conference on Data Engineering (ICDE), pp. 1081–1092 (2013). IEEE
Wu, W., Naughton, J.F., Singh, H.: Sampling-based query re-optimization. In: Proceedings of the 2016 International Conference on Management of Data, pp. 1721–1736 (2016)
Wu, Y.-L., Agrawal, D., El Abbadi, A.: Applying the golden rule of sampling for query estimation. ACM SIGMOD Record 30(2), 449–460 (2001)
Article Google Scholar
Xue, G., Zhong, M., Li, J., Chen, J., Zhai, C., Kong, R.: Dynamic network embedding survey. Neurocomputing 472, 212–223 (2022)
Article Google Scholar
YAGO. (2020) https://yago-knowledge.org/
Yang, Y., Guan, Z., Li, J., Zhao, W., Cui, J., Wang, Q.: Interpretable and efficient heterogeneous graph convolutional network. IEEE Transactions on Knowledge and Data Engineering (2021)
Zhang, Z., Yang, W., Wushour, S.: Traffic accident prediction based on lstm-gbrt model. Journal of Control Science and Engineering 2020 (2020)
Zou, L., Özsu, M.T., Chen, L., Shen, X., Huang, R., Zhao, D.: gstore: a graph-based sparql query engine. The VLDB Journal 23(4), 565–590 (2014)
Article Google Scholar

Download references

Acknowledgements

This paper was supported by National Nature Science Foundation of China grant U1866602.

Author information

Authors and Affiliations

School of Computer Science and Technology, Harbin Institute of Technology, Xidazhi Street, 150001, Harbin, Heilongjiang, China
Zhixin Qi, Hongzhi Wang, Ziming Shen & Donghua Yang

Authors

Zhixin Qi
View author publications
You can also search for this author in PubMed Google Scholar
Hongzhi Wang
View author publications
You can also search for this author in PubMed Google Scholar
Ziming Shen
View author publications
You can also search for this author in PubMed Google Scholar
Donghua Yang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hongzhi Wang.

Ethics declarations

Conflicts of interest

The authors declare that they have no conflict of interest.

Financial interests

This study was funded by National Nature Science Foundation of China (Grant number U1866602).

Non-financial interests

none.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This article belongs to the Topical Collection: Special Issue on Decision Making in Heterogeneous Network Data Scenarios and Applications

Guest Editors: Jiannxin Li, Chengfei Liu, Ziyu Guan, and Yinghui Wu.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Qi, Z., Wang, H., Shen, Z. et al. PreKar: A learned performance predictor for knowledge graph stores. World Wide Web 26, 321–341 (2023). https://doi.org/10.1007/s11280-022-01033-2

Download citation

Received: 15 October 2021
Revised: 16 February 2022
Accepted: 28 February 2022
Published: 23 March 2022
Issue Date: January 2023
DOI: https://doi.org/10.1007/s11280-022-01033-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

PreKar: A learned performance predictor for knowledge graph stores

Abstract

Access this article

Similar content being viewed by others

Empirical Evaluation of a Cloud-Based Graph Database: the Case of Neptune

A Dependency-Aware Storage Schema Selection Mechanism for In-Memory Big Data Computing Frameworks

Cognitive SSD+: a deep learning engine for energy-efficient unstructured data retrieval

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflicts of interest

Financial interests

Non-financial interests

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

PreKar: A learned performance predictor for knowledge graph stores

Abstract

Access this article

Similar content being viewed by others

Empirical Evaluation of a Cloud-Based Graph Database: the Case of Neptune

A Dependency-Aware Storage Schema Selection Mechanism for In-Memory Big Data Computing Frameworks

Cognitive SSD+: a deep learning engine for energy-efficient unstructured data retrieval

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflicts of interest

Financial interests

Non-financial interests

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation