Abstract
Much attention has been devoted to support the volume and velocity dimensions of Big Data. As a result, a plethora of technology components supporting various data structures (e.g., key-value, graph, relational), modalities (e.g., stream, log, real-time) and computing paradigms (e.g., in-memory, cluster/cloud) are meanwhile available. However, systematic support for managing the variety of data, the third dimension in the classical Big Data definition, is still missing. In this article, we present SeBiDA, an approach for managing hybrid Big Data. SeBiDA supports the Semantification of Big Data using the RDF data model, i.e., non-semantic Big Data is semantically enriched by using RDF vocabularies. We empirically evaluate the performance of SeBiDA for two dimensions of Big Data, i.e., volume and variety; the Berlin Benchmark is used in the study. The results suggest that even in large datasets, query processing time is not affected by data variety.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
- 2.
- 3.
- 4.
- 5.
- 6.
We disregard blank nodes, which can be avoided or replaced by IRIs [4].
- 7.
A set of properties \(P_C\) of an RDF class C where: \(\forall \) p \(\in \) \(P_C\) (p rdfs:domain C).
- 8.
mb and foaf are prefixes for mobility and friend of friend vocabularies, respectively.
- 9.
When the object is occasionally not typed or is a URL.
- 10.
- 11.
- 12.
- 13.
- 14.
- 15.
- 16.
- 17.
Using a command line: ./generate -fc -pc [scaling factor] -s [file format] -fn [file name], where file format is nt for RDF data and xml for XML data. More details in: http://wifo5-03.informatik.uni-mannheim.de/bizer/berlinsparqlbenchmark/spec/BenchmarkRules/#datagenerator+.
- 18.
References
Du, J.H., Wang, H.F., Ni, Y., Yu, Y.: HadoopRDF: a scalable semantic data analytical engine. ICIC 2012. LNCS, vol. 7390, pp. 633–641. Springer, Heidelberg (2012)
Franke, C., Morin, S., Chebotko, A., Abraham, J., Brazier, P.: Distributed semantic web data management in HBase and MySQL cluster. In: Cloud Computing (CLOUD), pp. 105–112. IEEE (2011)
Gartner, D.L.: 3-D data management: controlling data volume, velocity and variety. 6 February 2001
Hogan, A.: Skolemising blank nodes while preserving isomorphism. In: 24th International Conference on World Wide Web, 2015. WWW (2015)
Kaoudi, Z., Manolescu, I.: RDF in the clouds: a survey. VLDB J. 24(1), 67–91 (2015)
Khadilkar, V., Kantarcioglu, M., Thuraisingham, B., Castagna, P.: Jena-HBase: a distributed, scalable and efficient RDF triple store. In: 11th International Semantic Web Conference Posters & Demos, ISWC-PD (2012)
Martínez-Prieto, M.A., Cuesta, C.E., Arias, M., Fernández, J.D.: The solid architecture for real-time management of big semantic data. Future Gener. Comput. Syst. 47, 62–79 (2015)
Nie, Z., Du, F., Chen, Y., Du, X., Xu, L.: Efficient SPARQL query processing in mapreduce through data partitioning and indexing. In: Sheng, Q.Z., Wang, G., Jensen, C.S., Xu, G. (eds.) APWeb 2012. LNCS, vol. 7235, pp. 628–635. Springer, Heidelberg (2012)
Papailiou, N., Konstantinou, I., Tsoumakos, D., Karras, P., Koziris, N.: H2RDF+: High-performance distributed joins over large-scale RDF graphs. In: BigData Conference. IEEE (2013)
Punnoose, R., Crainiceanu, A., Rapp, D.: Rya: a scalable RDF triple store for the clouds. In: Proceedings of the 1st International Workshop on Cloud Intelligence, pp. 4. ACM (2012)
Schätzle, A., Przyjaciel-Zablocki, M., Dorner, C., Hornung, T., Lausen, G.: Cascading map-side joins over HBase for scalable join processing. In: SSWS+ HPCSW, pp. 59 (2012)
Schätzle, A., Przyjaciel-Zablocki, M., Hornung, T., Lausen, G.: PigSPARQL: a SPARQL query processing baseline for big data. In: International Semantic Web Conference (Posters & Demos), pp. 241–244 (2013)
Schätzle, A., Przyjaciel-Zablocki, M., Neu, A., Lausen, G.: Sempala: interactive SPARQL query processing on hadoop. In: Mika, P., Tudorache, T., Bernstein, A., Welty, C., Knoblock, C., Vrandečić, D., Groth, P., Noy, N., Janowicz, K., Goble, C. (eds.) ISWC 2014, Part I. LNCS, vol. 8796, pp. 164–179. Springer, Heidelberg (2014)
Sun, J., Jin, Q.: Scalable RDF store based on HBase and mapreduce. In: 3rd International Conference on Advanced Computer Theory and Engineering. IEEE (2010)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Mami, M.N., Scerri, S., Auer, S., Vidal, ME. (2016). Towards Semantification of Big Data Technology. In: Madria, S., Hara, T. (eds) Big Data Analytics and Knowledge Discovery. DaWaK 2016. Lecture Notes in Computer Science(), vol 9829. Springer, Cham. https://doi.org/10.1007/978-3-319-43946-4_25
Download citation
DOI: https://doi.org/10.1007/978-3-319-43946-4_25
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-43945-7
Online ISBN: 978-3-319-43946-4
eBook Packages: Computer ScienceComputer Science (R0)