GStar: an efficient framework for answering top-k star queries on billion-node knowledge graphs

Jin, Jiahui; Luo, Junzhou; Khemmarat, Samamon; Dong, Fang; Gao, Lixin

doi:10.1007/s11280-018-0611-0

GStar: an efficient framework for answering top-k star queries on billion-node knowledge graphs

Published: 19 July 2018

Volume 22, pages 1611–1638, (2019)
Cite this article

World Wide Web Aims and scope Submit manuscript

Jiahui Jin ORCID: orcid.org/0000-0001-9570-1456¹,
Junzhou Luo¹,
Samamon Khemmarat²,
Fang Dong¹ &
…
Lixin Gao²

600 Accesses
1 Altmetric
Explore all metrics

Abstract

Massive knowledge graphs, such as Linked Open Data or Freebase, contain billions of labeled entities and relationships. Star queries aim to identify an entity given a set of related entities, and they are common with massive knowledge graphs. It is important to find the best way to answer star queries, and we can do this by treating it as a graph pattern-matching problem. Because knowledge graphs are noisy and incomplete in nature, we must find answers that match the star pattern closely, and extract a precise match if possible. Thus, here we propose GStar, a framework to identify the top-k best answers for a star query. GStar effectively and efficiently answers top-k star queries on billion-node graphs through a novel query model, an index-free query algorithm, and a distributed query system. We evaluate GStar through experiments on real-world knowledge graphs. Experimental results show that our query model effectively answers real-life star-pattern queries; our query algorithm can answer top-k queries in a near-real-time manner without requiring expensive graph indices; and the distributed system scales well with both the graph size and number of machines used for computation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Figure 7

Fast ObjectRank for Large Knowledge Databases

WDBench: A Wikidata Graph Query Benchmark

Scalable top-k query on information networks with hierarchical inheritance relations

Article 03 June 2023

Notes

Star queries are common on many databases. For relational databases, a star query joins a number of small (dimension) tables to a large (fact) table using a primary key to foreign key join, while for RDF (Resource Description Framework) databases, the star query has the form of a number of triple patterns with different properties sharing the same subject. In this paper, a star query has a star shape where the root node represents a queried entity that is unknown and the leaf nodes represent related entities that are already known.
We recommend to set $\alpha $’s value smaller than $\frac {1}{N|{V^{S}_{Q}}|}$. The setting of $\alpha $ will be discussed in Section 6.5.
Ω₊ is a node set containing the candidates that are visited by the propagations. We use ${\Omega }_{+}$ to instead ${\Omega }$ due to the algorithm’s efficiency.
http://www.informatik.uni-trier.de/
http://dblp.l3s.de/dblp++.php
In Figure 5, the edge between “Get Back” and “Hey Jude!” indicates the relationship of “released after”.
Steven Spielberg is the executive producer of Transformers and The Lovely Bones.
Here, we suppose the cap of path number, N, is a fixed constant for all settings of $\alpha $.
http://glaros.dtc.umn.edu/gkhome/metis/metis/overview

References

Akiba, T., Sommer, C., Kawarabayashi, K.-i.: Shortest-path queries for complex networks: Exploiting low tree-width outside the core. In: Proceedings of the 15th International Conference on Extending Database Technology, EDBT ’12, pp 144–155. ACM, New York (2012)
Akiba, T., Iwata, Y., Yoshida, Y.: Fast exact shortest-path distance queries on large networks by pruned landmark labeling. In: Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data, SIGMOD ’13, pp 349–360. ACM, New York (2013)
Bizer, C., Lehmann, J., Kobilarov, G., Auer, S., Becker, C., Cyganiak, R., Hellmann, S.: DBpedia - a crystallization point for the Web of data. Web Semant. 7(3), 154–165 (2009)
Article Google Scholar
Brandes, U.: A faster algorithm for betweenness centrality. J. Math. Sociol. 25, 163–177 (2001)
Article MATH Google Scholar
Chakrabarti, D., Zhan, Y., Faloutsos, C.: R-MAT: A recursive model for graph mining. In: Proceedings of the Fourth SIAM International Conference on Data Mining, SDM’04, pp. 442–446 (2004)
Checconi, F., Petrini, F.: Traversing trillions of edges in real time: Graph exploration on large-scale parallel machines. In: Proceedings of the 28th IEEE International Parallel and Distributed Processing Symposium, IPDPS ’14, pp. 425–434 (2014)
Cheng, J., Zeng, X., Yu, J.X.: Top-k graph pattern matching over large graphs. In: Proceedings of the 29th IEEE International Conference on Data Engineering, ICDE ’13, pp. 1033–1044 (2013)
Dong, X., Gabrilovich, E., Heitz, G., Horn, W., Ni, L., Murphy, K., Strohmann, T., Sun, S., Zhang, W.: Knowledge vault: A Web-scale approach to probabilistic knowledge fusion. In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’14, pp. 601–610 (2014)
Fagin, R., Lotem, A., Naor, M.: Optimal aggregation algorithms for middleware. In: Proceedings of the Twentieth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS ’01, pp 102–113. ACM, New York (2001)
Fan, W., Li, J., Ma, S., Tang, N., Wu, Y., Wu, Y.: Graph pattern matching: From intractable to polynomial time. PVLDB, 3(1–2), 264–275 (2010)
Google Scholar
Han W.-S., Lee, J., Pham, M.-D., Yu, J.X.: igraph: A framework for comparisons of disk-based graph indexing techniques. PVLDB 3(1), 449–459 (2010)
Google Scholar
He, H., Wang, H., Yang, J., Yu, P.S.: Blinks: Ranked keyword searches on graphs. In: Proceedings of the 2007 ACM SIGMOD International Conference on Management of Data, SIGMOD ’07, pp 305–316. ACM, New York (2007)
Ilyas, I.F., Beskales, G., Soliman, M.A.: A survey of top-k query processing techniques in relational database systems. ACM Comput. Surv. 40(4), 11,1–11,58 (2008)
Article Google Scholar
Jin, J., Khemmarat, S., Gao, L., Luo, J.: A distributed approach for top-k star queries on massive information networks. In: Proceedings of the 20th IEEE International Conference on Parallel and Distributed Systems, ICPADS ’14, pp. 9–16 (2014)
Jin, J., Luo, J., Khemmarat, S., Dong, F., Gao, L.: Supplementary file of gstar: An efficient framework for answering top-k star queries on billion-node knowledge graphs. http://cse.seu.edu.cn/PersonalPage/jhjin/upload/supplementary-file-wwwj.pdf (2017)
Jin, J., Luo, J., Khemmarat, S., Gao, L.: Querying Web-scale knowledge graphs through effective pruning of search space. IEEE Trans Parallel Distrib Syst 28 (8), 2342–2356 (2017)
Article Google Scholar
Khan, A., Li, N., Yan, X., Guan, Z., Chakraborty, S., Tao, S.: Neighborhood based fast graph search in large networks. In: Proceedings of the 2011 ACM SIGMOD International Conference on Management of Data, SIGMOD ’11, pp 901–912. ACM, New York (2011)
Khan, A., Wu, Y., Aggarwal, C.C., Yan, X.: Nema: Fast graph search with label similarity. PVLDB 6(3), 181–192 (2013)
Google Scholar
Khemmarat, S., Gao, L.: Fast top-k path-based relevance query on massive graphs. In Proceedings of the 30th IEEE International Conference on Data Engineering, ICDE ’14, pp. 316–327 (2014)
Lee, J., Han, W.-S., Kasperovics, R., Lee, J.-H.: An in-depth comparison of subgraph isomorphism algorithms in graph databases. PVLDB, 6(2), 133–144 (2012)
Google Scholar
Low, Y., Gonzalez, J., Kyrola, A., Bickson, D., Guestrin, C., Hellerstein, J.M.: Distributed graphlab: A framework for machine learning in the cloud. PVLDB 5(8), 716–727 (2012)
Google Scholar
Malewicz, G., Austern, M.H., Bik, A.J., Dehnert, J.C., Horn, I., Leiser, N., Czajkowski, G.: Pregel: A system for large-scale graph processing. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data, SIGMOD ’10, pp 135–146. ACM, New York (2010)
Neumann, T., Weikum, G.: Rdf-3x: A risc-style engine for rdf. PVLDB 1(1), 647–659 (2008)
Google Scholar
Neumann, T., Bender, M., Michel, S., Schenkel, R., Triantafillou, P., Weikum, G.: Distributed top-k aggregation queries at large. Distrib Parallel Datab 26(1), 3–27 (2009)
Article Google Scholar
Power, R., Li, J.: Piccolo: Building fast, distributed programs with partitioned tables. In: Proceedings of the 9th USENIX Conference on Operating Systems Design and Implementation, OSDI’10, pp 1–14. USENIX Association, Berkeley (2010)
Qiu, T., Qiao, R., Han, M., Sangaiah, A.K., Lee, I.: A lifetime-enhanced data collecting scheme for internet of things. IEEE Commun. Mag. 55(11), 132–137 (2017)
Article Google Scholar
Qiu, T., Zhao, A., Xia, F., Si, W., Wu, D.: ROSE: Robustness strategy for scale-free wireless sensor networks. IEEE/ACM Trans. Network. 25(5), 2944–2959 (2017)
Article Google Scholar
Shang, H., Zhang, Y., Lin, X., Yu, J.X.: Taming verification hardness: An efficient algorithm for testing subgraph isomorphism. Proc. VLDB Endow. 1(1), 364–375 (2008)
Article Google Scholar
Stanton, I., Kliot, G.: Streaming graph partitioning for large distributed graphs. In: Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’12, pp 1222–1230. ACM, New York (2012)
Suchanek, F.M., Kasneci, G., Weikum, G.: Yago: A core of semantic knowledge. In: Proceedings of the 16th International Conference on World Wide Web, WWW ’07, pp 697–706. ACM, New York (2007)
Sun, Z., Wang, H., Wang, H., Shao, B., Li, J.: Efficient subgraph matching on billion node graphs. PVLDB 5(9), 788–799 (2012)
Google Scholar
Tian, Y., Patel, J.M.: Tale: A tool for approximate large graph matching. In: Proceedings of the 24th IEEE International Conference on Data Engineering, ICDE ’08, pp. 963–972 (2008)
Tong, H., Faloutsos, C., Gallagher, B., Eliassi-Rad, T.: Fast best-effort pattern matching in large attributed graphs. In: Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’07, pp 737–746. ACM, New York (2007)
Vrandečić, D., Krötzsch, M.: Wikidata: A free collaborative knowledgebase. Commun. ACM 57(10), 78–85 (2014)
Article Google Scholar
Yan, D., Cheng, J., Yang, F., Lu, Y., Lui, J.C.S., Zhang, Q., Ng, W.: A general-purpose query-centric framework for querying big graphs. PVLDB 9(7), 564–575 (2016)
Google Scholar
Zeng, K., Yang, J., Wang, H., Shao, B., Wang, Z.: A distributed graph engine for Web scale rdf data. PVLDB, 265–276 (2013)
Zhang, Y., Gao, Q., Gao, L., Wang, C.: Maiter: An asynchronous graph processing framework for delta-based accumulative iterative computation. IEEE Tran Parallel Distrib Syst 25(8), 2091–2100 (2014)
Article Google Scholar
Zou, L., Chen, L., Tamer Özsu, M.: Distance-join: Pattern match query in a large graph database. PVLDB 2(1), 886–897 (2009)
Google Scholar
Zou, L., Mo, J., Chen, L., Tamer Özsu, M., Dongyan, Z.: gStore: Answering SPARQL queries via subgraph matching. PVLDB 4(8), 482–493 (2011)
Google Scholar

Download references

Acknowledgements

This work is supported by National Key R&D Program of China 2017YFB1003000, National Natural Science Foundation of China under Grants No. 61702096, No. 61632008, No. 61320106007, No. 61572129, No. 61602112, No. 61502097, No. 61370207 and No. 61702097; International S&T Cooperation Program of China No. 2015DFA10490; the Natural Science Foundation of Jiangsu Province under grant BK20170689; BK20160695 and Jiangsu Provincial Key Laboratory of Network and Information Security under Grants No.BM2003201; Key Laboratory of Computer Network and Information Integration of Ministry of Education of China under Grants No.93K-9; the Fundamental Research Funds for the Central Universities; and partially supported by Collaborative Innovation Center of Novel Software Technology and Industrialization and Collaborative Innovation Center of Wireless Communications Technology. This work is also partially supported by U.S. NSF grants CNS-1217284 and CCF-1018114. Jiahui Jin was a visiting student at UMass Amherst, supported by China Scholarship Council, when this work was performed. Any opinions, findings, conclusions or recommendations expressed in this paper are those of the authors and do not necessarily reflect the views of the sponsor. Preliminary version [14] of this paper appeared in Proceedings of the 20th IEEE International Conference on Parallel and Distributed Systems (ICPADS’14).

Author information

Authors and Affiliations

School of Computer Science and Engineering, Southeast University, Nanjing, China
Jiahui Jin, Junzhou Luo & Fang Dong
Department of Electrical and Computer Engineering, University of Massachusetts Amherst, Amherst, MA, USA
Samamon Khemmarat & Lixin Gao

Authors

Jiahui Jin
View author publications
You can also search for this author in PubMed Google Scholar
Junzhou Luo
View author publications
You can also search for this author in PubMed Google Scholar
Samamon Khemmarat
View author publications
You can also search for this author in PubMed Google Scholar
Fang Dong
View author publications
You can also search for this author in PubMed Google Scholar
Lixin Gao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jiahui Jin.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Jin, J., Luo, J., Khemmarat, S. et al. GStar: an efficient framework for answering top-k star queries on billion-node knowledge graphs. World Wide Web 22, 1611–1638 (2019). https://doi.org/10.1007/s11280-018-0611-0

Download citation

Received: 30 November 2017
Revised: 11 April 2018
Accepted: 04 June 2018
Published: 19 July 2018
Issue Date: 15 July 2019
DOI: https://doi.org/10.1007/s11280-018-0611-0

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

GStar: an efficient framework for answering top-k star queries on billion-node knowledge graphs

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Fast ObjectRank for Large Knowledge Databases

WDBench: A Wikidata Graph Query Benchmark

Scalable top-k query on information networks with hierarchical inheritance relations

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now