Skip to main content

Towards Semantification of Big Data Technology

  • Conference paper
  • First Online:
Book cover Big Data Analytics and Knowledge Discovery (DaWaK 2016)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9829))

Included in the following conference series:

Abstract

Much attention has been devoted to support the volume and velocity dimensions of Big Data. As a result, a plethora of technology components supporting various data structures (e.g., key-value, graph, relational), modalities (e.g., stream, log, real-time) and computing paradigms (e.g., in-memory, cluster/cloud) are meanwhile available. However, systematic support for managing the variety of data, the third dimension in the classical Big Data definition, is still missing. In this article, we present SeBiDA, an approach for managing hybrid Big Data. SeBiDA supports the Semantification of Big Data using the RDF data model, i.e., non-semantic Big Data is semantically enriched by using RDF vocabularies. We empirically evaluate the performance of SeBiDA for two dimensions of Big Data, i.e., volume and variety; the Berlin Benchmark is used in the study. The results suggest that even in large datasets, query processing time is not affected by data variety.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://newvantage.com/wp-content/uploads/2014/12/Big-Data-Survey-2014-Summary-Report-110314.pdf, https://www.capgemini-consulting.com/resource-file-access/resource/pdf/cracking_the_data_conundrum-big_data_pov_13-1-15_v2.pdf.

  2. 2.

    https://developers.google.com/transit/gtfs/.

  3. 3.

    http://www.w3.org/TR/r2rml/.

  4. 4.

    http://www.w3.org/2013/csvw/wiki/Main_Page.

  5. 5.

    http://www.w3.org/TR/json-ld/.

  6. 6.

    We disregard blank nodes, which can be avoided or replaced by IRIs [4].

  7. 7.

    A set of properties \(P_C\) of an RDF class C where: \(\forall \) p \(\in \) \(P_C\) (p rdfs:domain C).

  8. 8.

    mb and foaf are prefixes for mobility and friend of friend vocabularies, respectively.

  9. 9.

    When the object is occasionally not typed or is a URL.

  10. 10.

    https://spark.apache.org/.

  11. 11.

    https://www.mongodb.org.

  12. 12.

    http://lov.okfn.org/dataset/lov/terms.

  13. 13.

    http://developers.google.com/transit/gtfs/reference.

  14. 14.

    http://www.hpl.hp.com/techreports/2006/HPL-2006-140.html.

  15. 15.

    https://parquet.apache.org.

  16. 16.

    https://github.com/EIS-Bonn/SeBiDA.

  17. 17.

    Using a command line: ./generate -fc -pc [scaling factor] -s [file format] -fn [file name], where file format is nt for RDF data and xml for XML data. More details in: http://wifo5-03.informatik.uni-mannheim.de/bizer/berlinsparqlbenchmark/spec/BenchmarkRules/#datagenerator+.

  18. 18.

    https://github.com/gh-rdf3x/gh-rdf3x.

References

  1. Du, J.H., Wang, H.F., Ni, Y., Yu, Y.: HadoopRDF: a scalable semantic data analytical engine. ICIC 2012. LNCS, vol. 7390, pp. 633–641. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  2. Franke, C., Morin, S., Chebotko, A., Abraham, J., Brazier, P.: Distributed semantic web data management in HBase and MySQL cluster. In: Cloud Computing (CLOUD), pp. 105–112. IEEE (2011)

    Google Scholar 

  3. Gartner, D.L.: 3-D data management: controlling data volume, velocity and variety. 6 February 2001

    Google Scholar 

  4. Hogan, A.: Skolemising blank nodes while preserving isomorphism. In: 24th International Conference on World Wide Web, 2015. WWW (2015)

    Google Scholar 

  5. Kaoudi, Z., Manolescu, I.: RDF in the clouds: a survey. VLDB J. 24(1), 67–91 (2015)

    Article  Google Scholar 

  6. Khadilkar, V., Kantarcioglu, M., Thuraisingham, B., Castagna, P.: Jena-HBase: a distributed, scalable and efficient RDF triple store. In: 11th International Semantic Web Conference Posters & Demos, ISWC-PD (2012)

    Google Scholar 

  7. Martínez-Prieto, M.A., Cuesta, C.E., Arias, M., Fernández, J.D.: The solid architecture for real-time management of big semantic data. Future Gener. Comput. Syst. 47, 62–79 (2015)

    Article  Google Scholar 

  8. Nie, Z., Du, F., Chen, Y., Du, X., Xu, L.: Efficient SPARQL query processing in mapreduce through data partitioning and indexing. In: Sheng, Q.Z., Wang, G., Jensen, C.S., Xu, G. (eds.) APWeb 2012. LNCS, vol. 7235, pp. 628–635. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  9. Papailiou, N., Konstantinou, I., Tsoumakos, D., Karras, P., Koziris, N.: H2RDF+: High-performance distributed joins over large-scale RDF graphs. In: BigData Conference. IEEE (2013)

    Google Scholar 

  10. Punnoose, R., Crainiceanu, A., Rapp, D.: Rya: a scalable RDF triple store for the clouds. In: Proceedings of the 1st International Workshop on Cloud Intelligence, pp. 4. ACM (2012)

    Google Scholar 

  11. Schätzle, A., Przyjaciel-Zablocki, M., Dorner, C., Hornung, T., Lausen, G.: Cascading map-side joins over HBase for scalable join processing. In: SSWS+ HPCSW, pp. 59 (2012)

    Google Scholar 

  12. Schätzle, A., Przyjaciel-Zablocki, M., Hornung, T., Lausen, G.: PigSPARQL: a SPARQL query processing baseline for big data. In: International Semantic Web Conference (Posters & Demos), pp. 241–244 (2013)

    Google Scholar 

  13. Schätzle, A., Przyjaciel-Zablocki, M., Neu, A., Lausen, G.: Sempala: interactive SPARQL query processing on hadoop. In: Mika, P., Tudorache, T., Bernstein, A., Welty, C., Knoblock, C., Vrandečić, D., Groth, P., Noy, N., Janowicz, K., Goble, C. (eds.) ISWC 2014, Part I. LNCS, vol. 8796, pp. 164–179. Springer, Heidelberg (2014)

    Google Scholar 

  14. Sun, J., Jin, Q.: Scalable RDF store based on HBase and mapreduce. In: 3rd International Conference on Advanced Computer Theory and Engineering. IEEE (2010)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mohamed Nadjib Mami .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Mami, M.N., Scerri, S., Auer, S., Vidal, ME. (2016). Towards Semantification of Big Data Technology. In: Madria, S., Hara, T. (eds) Big Data Analytics and Knowledge Discovery. DaWaK 2016. Lecture Notes in Computer Science(), vol 9829. Springer, Cham. https://doi.org/10.1007/978-3-319-43946-4_25

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-43946-4_25

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-43945-7

  • Online ISBN: 978-3-319-43946-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics