StarMR: An Efficient Star-Decomposition Based Query Processor for SPARQL Basic Graph Patterns Using MapReduce

Xu, Qiang; Wang, Xin; Li, Jianxin; Gan, Ying; Chai, Lele; Wang, Junhu

doi:10.1007/978-3-319-96890-2_34

Qiang Xu¹⁶,
Xin Wang^16,17,
Jianxin Li¹⁸,
Ying Gan¹⁶,
Lele Chai¹⁶ &
…
Junhu Wang¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10987))

Included in the following conference series:

Asia-Pacific Web (APWeb) and Web-Age Information Management (WAIM) Joint International Conference on Web and Big Data

1518 Accesses
3 Citations

Abstract

With the proliferation of knowledge graphs, large amounts of RDF graphs have been released, which raises the need for addressing the challenge of distributed SPARQL queries. In this paper, we propose an efficient distributed method, called , to answer the SPARQL basic graph pattern (BGP) queries on big RDF graphs using MapReduce. In our method, query graphs are decomposed into a set of stars that utilize the semantic and structural information embedded RDF graphs as heuristics. Two optimization techniques are proposed to further improve the efficiency of our algorithms. One filters out invalid input data, the other postpones the Cartesian product operations. The extensive experiments on both synthetic and real-world datasets show that our method outperforms the state-of-the-art method S2X by an order of magnitude.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

References

Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)
Article Google Scholar
Dyer, M., Greenhill, C.: The complexity of counting graph homomorphisms. Random Struct. Algorithms 17(3–4), 260–289 (2000)
Article MathSciNet Google Scholar
Erling, O., Mikhailov, I.: Virtuoso: RDF support in a native RDBMS. In: de Virgilio, R., Giunchiglia, F., Tanca, L. (eds.) Semantic Web Information Management, pp. 501–519. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-04329-1_21
Chapter Google Scholar
Gonzalez, J.E., Xin, R.S., Dave, A., Crankshaw, D., Franklin, M.J., Stoica, I.: GraphX: graph processing in a distributed dataflow framework. In: OSDI, vol. 14, pp. 599–613 (2014)
Google Scholar
Gurajada, S., Seufert, S., Miliaraki, I., Theobald, M.: TriAD: a distributed shared-nothing RDF engine based on asynchronous message passing. In: Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data, pp. 289–300. ACM (2014)
Google Scholar
Hammoud, M., Rabbou, D.A., Nouri, R., Beheshti, S.M.R., Sakr, S.: DREAM: distributed RDF engine with adaptive query planner and minimal communication. Proc. VLDB Endow. 8(6), 654–665 (2015)
Article Google Scholar
Husain, M., McGlothlin, J., Masud, M.M., Khan, L., Thuraisingham, B.M.: Heuristics-based query processing for large RDF graphs using cloud computing. IEEE Trans. Knowl. Data Eng. 23(9), 1312–1327 (2011)
Article Google Scholar
Lai, L., Qin, L., Lin, X., Chang, L.: Scalable subgraph enumeration in MapReduce. Proc. VLDB Endow. 8(10), 974–985 (2015)
Article Google Scholar
Peng, P., Zou, L., Özsu, M.T., Chen, L., Zhao, D.: Processing SPARQL queries over distributed RDF graphs. VLDB J. 25(2), 243–268 (2016)
Article Google Scholar
Pérez, J., Arenas, M., Gutierrez, C.: Semantics and complexity of SPARQL. In: Cruz, I., et al. (eds.) ISWC 2006. LNCS, vol. 4273, pp. 30–43. Springer, Heidelberg (2006). https://doi.org/10.1007/11926078_3
Chapter Google Scholar
Rohloff, K., Schantz, R.E.: High-performance, massively scalable distributed systems using the MapReduce software framework: the SHARD triple-store. In: Programming Support Innovations for Emerging Distributed Applications, p. 4. ACM (2010)
Google Scholar
Schätzle, A., Przyjaciel-Zablocki, M., Berberich, T., Lausen, G.: S2X: graph-parallel querying of RDF with GraphX. In: Wang, F., Luo, G., Weng, C., Khan, A., Mitra, P., Yu, C. (eds.) Big-O(Q)/DMAH -2015. LNCS, vol. 9579, pp. 155–168. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-41576-5_12
Chapter Google Scholar
Schätzle, A., Przyjaciel-Zablocki, M., Skilevic, S., Lausen, G.: S2RDF: RDF querying with SPARQL on spark. Proc. VLDB Endow. 9(10), 804–815 (2016)
Article Google Scholar
Sun, Z., Wang, H., Wang, H., Shao, B., Li, J.: Efficient subgraph matching on billion node graphs. Proc. VLDB Endow. 5(9), 788–799 (2012)
Article Google Scholar
Zaharia, M., Chowdhury, M., Franklin, M.J., Shenker, S., Stoica, I.: Spark: cluster computing with working sets. HotCloud 10(10–10), 95 (2010)
Google Scholar
Zeng, K., Yang, J., Wang, H., Shao, B., Wang, Z.: A distributed graph engine for web scale RDF data. Proc. VLDB Endow. 6, 265–276 (2013). VLDB Endowment
Article Google Scholar
Zou, L., Özsu, M.T., Chen, L., Shen, X., Huang, R., Zhao, D.: gStore: a graph-based SPARQL query engine. VLDB J. 23(4), 565–590 (2014)
Article Google Scholar

Download references

Acknowledgments

This work is supported by the National Natural Science Foundation of China (61572353), the National High-tech R&D Program of China (863 Program) (2013AA013204), and the Natural Science Foundation of Tianjin (17JCYBJC15400).

Author information

Authors and Affiliations

School of Computer Science and Technology, Tianjin University, Tianjin, China
Qiang Xu, Xin Wang, Ying Gan & Lele Chai
Tianjin Key Laboratory of Cognitive Computing and Application, Tianjin, China
Xin Wang
The Department of Computer Science and Software Engineering, The University of Western Australia, Perth, Australia
Jianxin Li
School of Information and Communication Technology, Griffith University, Brisbane, Australia
Junhu Wang

Authors

Qiang Xu
View author publications
You can also search for this author in PubMed Google Scholar
Xin Wang
View author publications
You can also search for this author in PubMed Google Scholar
Jianxin Li
View author publications
You can also search for this author in PubMed Google Scholar
Ying Gan
View author publications
You can also search for this author in PubMed Google Scholar
Lele Chai
View author publications
You can also search for this author in PubMed Google Scholar
Junhu Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xin Wang .

Editor information

Editors and Affiliations

South China University of Technology, Guangzhou, China
Yi Cai
Nagoya University, Nagoya, Japan
Yoshiharu Ishikawa
Hong Kong Baptist University, Kowloon Tong, Hong Kong, China
Jianliang Xu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Xu, Q., Wang, X., Li, J., Gan, Y., Chai, L., Wang, J. (2018). StarMR: An Efficient Star-Decomposition Based Query Processor for SPARQL Basic Graph Patterns Using MapReduce. In: Cai, Y., Ishikawa, Y., Xu, J. (eds) Web and Big Data. APWeb-WAIM 2018. Lecture Notes in Computer Science(), vol 10987. Springer, Cham. https://doi.org/10.1007/978-3-319-96890-2_34

Download citation

DOI: https://doi.org/10.1007/978-3-319-96890-2_34
Published: 19 July 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-96889-6
Online ISBN: 978-3-319-96890-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics