Towards Semantification of Big Data Technology

Mami, Mohamed Nadjib; Scerri, Simon; Auer, Sören; Vidal, Maria-Esther

doi:10.1007/978-3-319-43946-4_25

Mohamed Nadjib Mami^15,16,
Simon Scerri^15,16,
Sören Auer^15,16 &
…
Maria-Esther Vidal^15,16

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9829))

Included in the following conference series:

International Conference on Big Data Analytics and Knowledge Discovery

1268 Accesses
11 Citations

Abstract

Much attention has been devoted to support the volume and velocity dimensions of Big Data. As a result, a plethora of technology components supporting various data structures (e.g., key-value, graph, relational), modalities (e.g., stream, log, real-time) and computing paradigms (e.g., in-memory, cluster/cloud) are meanwhile available. However, systematic support for managing the variety of data, the third dimension in the classical Big Data definition, is still missing. In this article, we present SeBiDA, an approach for managing hybrid Big Data. SeBiDA supports the Semantification of Big Data using the RDF data model, i.e., non-semantic Big Data is semantically enriched by using RDF vocabularies. We empirically evaluate the performance of SeBiDA for two dimensions of Big Data, i.e., volume and variety; the Berlin Benchmark is used in the study. The results suggest that even in large datasets, query processing time is not affected by data variety.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
http://newvantage.com/wp-content/uploads/2014/12/Big-Data-Survey-2014-Summary-Report-110314.pdf, https://www.capgemini-consulting.com/resource-file-access/resource/pdf/cracking_the_data_conundrum-big_data_pov_13-1-15_v2.pdf.
2.
https://developers.google.com/transit/gtfs/.
3.
http://www.w3.org/TR/r2rml/.
4.
http://www.w3.org/2013/csvw/wiki/Main_Page.
5.
http://www.w3.org/TR/json-ld/.
6.
We disregard blank nodes, which can be avoided or replaced by IRIs [4].
7.
A set of properties \(P_C\) of an RDF class C where: \(\forall \) p \(\in \) \(P_C\) (p rdfs:domain C).
8.
mb and foaf are prefixes for mobility and friend of friend vocabularies, respectively.
9.
When the object is occasionally not typed or is a URL.
10.
https://spark.apache.org/.
11.
https://www.mongodb.org.
12.
http://lov.okfn.org/dataset/lov/terms.
13.
http://developers.google.com/transit/gtfs/reference.
14.
http://www.hpl.hp.com/techreports/2006/HPL-2006-140.html.
15.
https://parquet.apache.org.
16.
https://github.com/EIS-Bonn/SeBiDA.
17.
Using a command line: ./generate -fc -pc [scaling factor] -s [file format] -fn [file name], where file format is nt for RDF data and xml for XML data. More details in: http://wifo5-03.informatik.uni-mannheim.de/bizer/berlinsparqlbenchmark/spec/BenchmarkRules/#datagenerator+.
18.
https://github.com/gh-rdf3x/gh-rdf3x.

References

Du, J.H., Wang, H.F., Ni, Y., Yu, Y.: HadoopRDF: a scalable semantic data analytical engine. ICIC 2012. LNCS, vol. 7390, pp. 633–641. Springer, Heidelberg (2012)
Chapter Google Scholar
Franke, C., Morin, S., Chebotko, A., Abraham, J., Brazier, P.: Distributed semantic web data management in HBase and MySQL cluster. In: Cloud Computing (CLOUD), pp. 105–112. IEEE (2011)
Google Scholar
Gartner, D.L.: 3-D data management: controlling data volume, velocity and variety. 6 February 2001
Google Scholar
Hogan, A.: Skolemising blank nodes while preserving isomorphism. In: 24th International Conference on World Wide Web, 2015. WWW (2015)
Google Scholar
Kaoudi, Z., Manolescu, I.: RDF in the clouds: a survey. VLDB J. 24(1), 67–91 (2015)
Article Google Scholar
Khadilkar, V., Kantarcioglu, M., Thuraisingham, B., Castagna, P.: Jena-HBase: a distributed, scalable and efficient RDF triple store. In: 11th International Semantic Web Conference Posters & Demos, ISWC-PD (2012)
Google Scholar
Martínez-Prieto, M.A., Cuesta, C.E., Arias, M., Fernández, J.D.: The solid architecture for real-time management of big semantic data. Future Gener. Comput. Syst. 47, 62–79 (2015)
Article Google Scholar
Nie, Z., Du, F., Chen, Y., Du, X., Xu, L.: Efficient SPARQL query processing in mapreduce through data partitioning and indexing. In: Sheng, Q.Z., Wang, G., Jensen, C.S., Xu, G. (eds.) APWeb 2012. LNCS, vol. 7235, pp. 628–635. Springer, Heidelberg (2012)
Chapter Google Scholar
Papailiou, N., Konstantinou, I., Tsoumakos, D., Karras, P., Koziris, N.: H2RDF+: High-performance distributed joins over large-scale RDF graphs. In: BigData Conference. IEEE (2013)
Google Scholar
Punnoose, R., Crainiceanu, A., Rapp, D.: Rya: a scalable RDF triple store for the clouds. In: Proceedings of the 1st International Workshop on Cloud Intelligence, pp. 4. ACM (2012)
Google Scholar
Schätzle, A., Przyjaciel-Zablocki, M., Dorner, C., Hornung, T., Lausen, G.: Cascading map-side joins over HBase for scalable join processing. In: SSWS+ HPCSW, pp. 59 (2012)
Google Scholar
Schätzle, A., Przyjaciel-Zablocki, M., Hornung, T., Lausen, G.: PigSPARQL: a SPARQL query processing baseline for big data. In: International Semantic Web Conference (Posters & Demos), pp. 241–244 (2013)
Google Scholar
Schätzle, A., Przyjaciel-Zablocki, M., Neu, A., Lausen, G.: Sempala: interactive SPARQL query processing on hadoop. In: Mika, P., Tudorache, T., Bernstein, A., Welty, C., Knoblock, C., Vrandečić, D., Groth, P., Noy, N., Janowicz, K., Goble, C. (eds.) ISWC 2014, Part I. LNCS, vol. 8796, pp. 164–179. Springer, Heidelberg (2014)
Google Scholar
Sun, J., Jin, Q.: Scalable RDF store based on HBase and mapreduce. In: 3rd International Conference on Advanced Computer Theory and Engineering. IEEE (2010)
Google Scholar

Download references

Author information

Authors and Affiliations

University of Bonn, Bonn, Germany
Mohamed Nadjib Mami, Simon Scerri, Sören Auer & Maria-Esther Vidal
Fraunhofer IAIS, Sankt Augustin, Germany
Mohamed Nadjib Mami, Simon Scerri, Sören Auer & Maria-Esther Vidal

Authors

Mohamed Nadjib Mami
View author publications
You can also search for this author in PubMed Google Scholar
Simon Scerri
View author publications
You can also search for this author in PubMed Google Scholar
Sören Auer
View author publications
You can also search for this author in PubMed Google Scholar
Maria-Esther Vidal
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mohamed Nadjib Mami .

Editor information

Editors and Affiliations

University of Science and Technology , Rolla, Missouri, USA
Sanjay Madria
Osaka University , Osaka, Japan
Takahiro Hara

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Mami, M.N., Scerri, S., Auer, S., Vidal, ME. (2016). Towards Semantification of Big Data Technology. In: Madria, S., Hara, T. (eds) Big Data Analytics and Knowledge Discovery. DaWaK 2016. Lecture Notes in Computer Science(), vol 9829. Springer, Cham. https://doi.org/10.1007/978-3-319-43946-4_25

Download citation

DOI: https://doi.org/10.1007/978-3-319-43946-4_25
Published: 06 August 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-43945-7
Online ISBN: 978-3-319-43946-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics