A data distribution model for RDF

Schroeder, Rebeca; Penteado, Raqueline R. M.; Hara, Carmem S.

doi:10.1007/s10619-020-07296-w

A data distribution model for RDF

Published: 16 May 2020

Volume 39, pages 129–167, (2021)
Cite this article

Distributed and Parallel Databases Aims and scope Submit manuscript

Rebeca Schroeder ORCID: orcid.org/0000-0001-8882-3375¹,
Raqueline R. M. Penteado² &
Carmem S. Hara³

376 Accesses
5 Citations
Explore all metrics

Abstract

The ever-increasing amount of RDF data made available requires data to be partitioned across multiple servers. We have witnessed some research progress made towards scaling RDF query processing based on suitable data distribution methods. In general, they work well for queries matching simple triple patterns, but they are not efficient for queries involving more complex patterns. In this paper, we present an RDF data distribution method which overcomes the shortcomings of the current approaches in order to scale RDF storage both on the volume of data and query processing. We apply a method that identifies frequent patterns accessed by queries in order to keep related data in the same partition. We deploy our reasoning on a summarized view of data in order to avoid exhaustive analysis on large datasets. As result, partitioning templates are obtained from data items in an RDF structure. In addition, we provide an approach for dynamic data insertions even if new data do not conform to the original RDF structure. Apart from the repartitioning approaches, we use an overflow repository to store data which may not follow the original schema. Our study shows that our method scales well and is effective to improve the overall performance by decreasing the amount of message passing among servers, compared to alternative data distribution approaches for RDF.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Partitioning Templates for RDF

Semantic Partitioning for RDF Datasets

Scalable Schema Discovery for RDF Data

Notes

References

Abadi, D.J., Marcus, A., Madden, S.R., Hollenbach, K.: SW-Store: a vertically partitioned DBMS for semantic web data management. VLDB J. 18(2), 385–406 (2009). https://doi.org/10.1007/s00778-008-0125-y
Article Google Scholar
Agrawal, S., Narasayya, V., Yang, B.: Integrating vertical and horizontal partitioning into automated physical database design. In: Proceedings of the 2004 ACM SIGMOD International Conference on Management of Data, pp. 359–370 (2004). https://doi.org/10.1145/1007568.1007609
Aluç, G., Özsu, M.T., Daudjee, K.: Building self-clustering RDF databases using Tunable-LSH. VLDB J. 28, 173–195 (2018)
Article Google Scholar
Bellatreche, L., Bouchakri, R., Cuzzocrea, A., Maabout, S.: Horizontal partitioning of very-large data warehouses under dynamically-changing query workloads via incremental algorithms. In: Proceedings of ACM Symposium on Applied Computing, pp. 208–210 (2013)
Bizer, C., Schultz, A.: The Berlin SPARQL benchmark. Int. J. Semant. Web Inf. Syst. 5(2), 1–24 (2009). https://doi.org/10.4018/jswis.2009040101
Article Google Scholar
Bok, K., Kim, C., Jeong, J., Lim, J., Yoo, J.: Dynamic partitioning of large scale RDF graph in dynamic environments. In: Lee, W., Choi, W., Jung, S., Song, M. (eds) Proceedings of the 7th International Conference on Emerging Databases, pp. 43–49 (2018). https://doi.org/10.1007/978-981-10-6520-0_5
Bordawekar, R., Shmueli, O.: An algorithm for partitioning trees augmented with sibling edges. Inf. Process. Lett. 108(3), 136–142 (2008). https://doi.org/10.1016/j.ipl.2008.04.010
Article MathSciNet MATH Google Scholar
Cong, G., Fan, W., Kementsietsidis, A.: Distributed query evaluation with performance guarantees. In: Proceedings of the 2007 ACM SIGMOD International Conference on Management of Data, pp. 509–520. ACM Press, New York (2007). https://doi.org/10.1145/1247480.1247537
Cruz, F., Maia, F., Matos, M., Oliveira, R., Paulo, J., Pereira , J., Vilaça, R.: MeT: workload aware elasticity for NoSQL. In: ACM European Conference on Computer Systems, pp. 183–196 (2013). https://doi.org/10.1145/2465351.2465370
Curino, C., Jones, E., Zhang, Y., Madden, S.: Schism: a workload-driven approach to database replication and partitioning. Proc. VLDB Endow. 3(1–2), 48–57 (2010). https://doi.org/10.14778/1920841.1920853
Article Google Scholar
Feng, J., Meng, C., Song, J., Zhang, X., Feng, Z., Zou, L.: SPARQL query parallel processing: a survey. In: 2017 IEEE International Congress on Big Data (BigData Congress), pp. 444–451 (2017). https://doi.org/10.1109/BigDataCongress.2017.65
Hose, K., Schenkel, R.: WARP: workload-aware replication and partitioning for RDF. In: ICDE Workshop: Data Engineering Meets the Semantic Web, pp. 1–6 (2013). https://doi.org/10.1109/ICDEW.2013.6547414
Jiewen Huang, D.J.A.: Scalable SPARQL querying of large RDF graphs. PVLDB 4(11), 1123–1134 (2011)
Google Scholar
Madkour, A., Aly, A.M., Aref, W.G.: WORQ: Workload-driven RDF query processing. Semant. Web ISWC 2018, 583–599 (2018)
Google Scholar
METIS: Family of Graph and Hypergraph Partitioning Software (2018). URL http://glaros.dtc.umn.edu/gkhome/views/metis
Navathe, S., Ra, M.: Vertical partitioning for database design: a graphical algorithm. In: Proceedings of the 1989 ACM SIGMOD International Conference on Management of Data, vol. 18, pp. 440–450 (1989). https://doi.org/10.1145/67544.66966
Nejdl, W., Siberski, W., Sintek, M.: Design issues and challenges for RDF and schema-based peer-to-peer systems. ACM SIGMOD Rec. 32(3), 41–46 (2003). https://doi.org/10.1145/945721.945731
Article Google Scholar
Neumann, T., Moerkotte, G.: Characteristic sets: Accurate cardinality estimation for RDF queries with multiple joins. In: IEEE 27th International Conference on Data Engineering (ICDE), pp. 984–994 (2011). https://doi.org/10.1109/ICDE.2011.5767868
Ozsu, M.T., Valduriez, P.: Principles of Distributed Database Systems. Prentice-Hall, New York (1991)
Google Scholar
Pavlo, A., Curino, C., Zdonik, S.: Skew-aware automatic database partitioning in shared-nothing, parallel OLTP systems. In: Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data, pp. 61–72 (2012). https://doi.org/10.1145/2213836.2213844
Penteado, R.R.M.: Otimização de Consultas SPARQL em Bases RDF Distribuídas. PhD thesis, Universidade Federal do Paraná (2017)
Pham, M.: Self-organizing structured RDF in MonetDB. In: Data Engineering Workshops (ICDEW), 2013 IEEE 29th International Conference on, pp. 310–313 (2013). https://doi.org/10.1109/ICDEW.2013.6547471
Quamar, A., Kumar, K.A., Deshpande, A.: SWORD: Scalable workload-aware data placement for transactional workloads. In: Proceedings of the 16th International Conference on Extending Database Technology, pp. 430–441 (2013). https://doi.org/10.1145/2452376.2452427
Schroeder, R., Hara, C.S.: Partitioning templates for RDF. In: Advances in Databases and Information Systems, Poitiers, France, pp. 305–319 (2015). https://doi.org/10.1007/978-3-319-23135-8_21
Schroeder, R., Mello, R., Hara, C.: Affinity-based XML Fragmentation. In: International Workshop on the Web and Databases (2012). URL http://db.disi.unitn.eu/pages/WebDB2012/papers/p23.pdf
Schütt, T., Schintke, F., Reinefeld, A.: Scalaris: reliable transactional P2P key/value store. In: ACM SIGPLAN Workshop on ERLANG, pp. 41–48 (2008). https://doi.org/10.1145/1411273.1411280
Shanbhag, A., Jindal, A., Madden, S., Quiane, J., Elmore, A.J.: A robust partitioning scheme for ad-hoc query workloads. In: Proceedings of the 2017 Symposium on Cloud Computing, pp. 229–241 (2017). https://doi.org/10.1145/3127479.3131613
Shang, Z., Yu, J.X.: Catch the wind: graph workload balancing on cloud. In: IEEE 29th International Conference on Data Engineering, pp. 553–564 (2013). https://doi.org/10.1109/ICDE.2013.6544855
Shute, J., Whipkey, C., Menestrina, D., Vingralek, R., Samwel, B., Handy, B., Rollins, E., Oancea, M., Littlefield, K., Ellner, S., Cieslewicz, J., Rae, I., Stancescu, T., Apte, H.: F1: a distributed SQL database that scales. Proc. VLDB Endow. 6(11), 1068–1079 (2013). https://doi.org/10.14778/2536222.2536232
Article Google Scholar
Vazirani, V.V.: Approximation Algorithms. Springer, Berlin (2003)
Book Google Scholar
Wang, L., Xiao, Y., Shao, B., Wang, H.: How to partition a billion-node graph. In: IEEE 30th International Conference on Data Engineering (ICDE), pp. 568–579 (2014). https://doi.org/10.1109/ICDE.2014.6816682
Xiong, P.: Dynamic management of resources and workloads for RDBMS in cloud: a control-theoretic approach. In: Proceedings of the on SIGMOD/PODS 2012 PhD Symposium, pp. 63–68. ACM, New York (2012). https://doi.org/10.1145/2213598.2213614
Yang, M., Wu, G.: A workload-based partitioning scheme for parallel RDF data processing. In: Semantic Web and Web Science, Springer Proceedings in Complexity, pp. 311–324. Springer, New York (2013). https://link.springer.com/chapter/10.1007/978-1-4614-6880-6_27
Yang, T., Chen, J., Wang, X., Chen, Y., Du, X.: Efficient SPARQL query evaluation via automatic data partitioning. In: Meng, M., Feng, L., Bressan, S., Winiwarter, W., Song, W. (eds) Database Systems for Advanced Applications, pp. 244–258. Springer, Berlin (2013). URL https://link.springer.com/chapter/10.1007/978-3-642-37450-0_18
Zeng, K., Yang, J., Wang, H., Shao, B., Wang, Z.: A Distributed graph engine for web scale RDF data. Proc. VLDB Endow. 6(4), 265–276 (2013)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Universidade do Estado de Santa Catarina, Joinville, SC, Brazil
Rebeca Schroeder
Universidade Estadual de Maringá, Maringá, PR, Brazil
Raqueline R. M. Penteado
Universidade Federal do Paraná, Curitiba, PR, Brazil
Carmem S. Hara

Authors

Rebeca Schroeder
View author publications
You can also search for this author inPubMed Google Scholar
Raqueline R. M. Penteado
View author publications
You can also search for this author inPubMed Google Scholar
Carmem S. Hara
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Rebeca Schroeder.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Schroeder, R., Penteado, R.R.M. & Hara, C.S. A data distribution model for RDF. Distrib Parallel Databases 39, 129–167 (2021). https://doi.org/10.1007/s10619-020-07296-w

Download citation

Published: 16 May 2020
Issue Date: March 2021
DOI: https://doi.org/10.1007/s10619-020-07296-w

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A data distribution model for RDF

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Partitioning Templates for RDF

Semantic Partitioning for RDF Datasets

Scalable Schema Discovery for RDF Data

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now