skip to main content
10.1145/2811222.2811229acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

Towards a Programmable Semantic Extract-Transform-Load Framework for Semantic Data Warehouses

Published: 22 October 2015 Publication History

Abstract

In order to create better decisions for business analytics, organizations increasingly use external data, structured, semi-structured and unstructured, in addition to the (mostly structured) internal data. Current Extract-Transform-Load (ETL) tools are not suitable for this "open world scenario" because they do not consider semantic issues in the integration process. Also, current ETL tools neither support processing semantic-aware data nor create a Semantic Data Warehouse (DW) as a semantic repository of semantically integrated data. This paper describes SETL: a (Python-based) programmable Semantic ETL framework. SETL builds on Semantic Web (SW) standards and tools and supports developers by offering a number of powerful modules, classes and methods for (dimensional and semantic) DW constructs and tasks. Thus it supports semantic-aware data sources, semantic integration, and creating a semantic DW, composed of an ontology and its instances. A comprehensive experimental evaluation comparing SETL to a solution made with traditional tools (requiring much more hand-coding) on a concrete use case, shows that SETL provides better performance, knowledge base quality and programmer productivity.

References

[1]
A. Abelló, O. Romero, T. B. Pedersen, R. Berlanga, V. Nebot, M. J. Aramburu, and A. Simitsis. Using Semantic Web Technologies for Exploratory OLAP: A Survey. TKDE, 99:571--588, 2014.
[2]
A. B. Andersen, N. Gür, K. Hose, K. A. Jakobsen, and T. B. Pedersen. Publishing Danish Agricultural Government Data as Semantic Web Data. In JIST, 2015.
[3]
ArcPy-ArcGIS Python Library, http://help.arcgis.com/en/arcgisdesktop/10.0/help/index.html#//000v00000001000000.
[4]
S. K. Bansal. Towards a Semantic Extract-Transform-Load (ETL) Framework for Big Data Integration. In Big Data, pages 522--529, 2014.
[5]
L. Bellatreche, S. Khouri, and N. Berkani. Semantic Data Warehouse Design: From ETL to Deployment à la Carte. In DASFAA, pages 64--83, 2013.
[6]
D. Colazzo, F. Goasdoué, I. Manolescu, and A. Roatiş. RDF Analytics: Lenses over Semantic Graphs. In WWW, pages 467--478, 2014.
[7]
Danish Central Company Registry (CVR) Data, http://cvr.dk/.
[8]
DBpedia, http://dbpedia.org/.
[9]
Esri, Arcgis. http://www.esri.com/software/arcgis.
[10]
L. Etcheverry, A. Vaisman, and E. Zimányi. Modeling and Querying Data Warehouses on the Semantic Web Using QB4OLAP. In DaWak, pages 45--56, 2014.
[11]
Ministry of Food, Agriculture and Fisheries of Denmark, http://en.fvm.dk/.
[12]
A. Harth, K. Hose, and R. Schenkel. Linked Data Management. CRC Press, 2014.
[13]
K. A. Jakobsen, A. B. Andersen, K. Hose, and T. B. Pedersen. Optimizing RDF Data Cubes for Efficient Processing of Analytical Queries. In COLD, 2015.
[14]
Apache Jena TDB, https://jena.apache.org/documentation/tdb/.
[15]
V. Nebot and R. Berlanga. Building Data Warehouses with Semantic Web Data. DSS, 52(4):853--868, 2012.
[16]
V. Nebot, R. Berlanga, J. M. Pérez, M. J. Aramburu, and T. B. Pedersen. Multidimensional integrated ontologies: a framework for designing semantic data warehouses. JDS, XIII:1--36, 2009.
[17]
OWL Web Ontology Language, www.w3.org/TR/owl-ref/.
[18]
Petl - Extract, Transform and Load (Tables of Data), https://petl.readthedocs.org/en/latest/.
[19]
Pyshp, https://code.google.com/p/pyshp/.
[20]
Rdflib, https://github.com/RDFLib/rdflib.
[21]
ESRI, Shapefile Technical Description, INC, 1998.
[22]
D. Skoutas and A. Simitsis. Designing ETL Processes using Semantic Web Technologies. In DOLAP, pages 67--74, 2006.
[23]
D. Skoutas and A. Simitsis. Ontology-based Conceptual Design of ETL Processes for both Structured and Semi-structured Data. IJSWIS, 3(4):1--24, 2007.
[24]
SPARQLWrapper, http://rdflib.github.io/sparqlwrapper/.
[25]
M. Thenmozhi and K. Vivekanandan. An Ontological Approach to Handle Multidimensional Schema Evolution for Data Warehouse. IJDMS, 6(4):33--52, 2014.
[26]
V. Theodorou, A. Abelló, and W. Lehner. Quality Measures for ETL Processes. In DaWaK, pages 9--22, 2014.
[27]
G. Tummarello, R. Delbru, and E. Oren. Sindice. com: Weaving the open linked data. In ISWC, 2007.
[28]
B. Villazón-Terrazas, L. M. Vilches-Blázquez, O. Corcho, and A. Gómez-Pérez. A Methodological Guidelines for Publishing Government Linked Data. In Linking government data, pages 27--49, 2011.
[29]
W3C. Resource Description Framework, http://www.w3.org/RDF/.
[30]
W3C. Resource Description Framework Schema, http://www.w3.org/TR/rdf-schema/.
[31]
M. E. Zorrilla, J.-N. Mazón, Ó. Ferrández, I. Garrigós, F. Daniel, and J. Trujillo. Business Intelligence Applications and the Web: Models, Systems and Technologies. Business Science Reference, 2012.

Cited By

View all
  • (2024)$SETL_{onDEMAND}$: Towards an on Demand ETL Approach for Semantic Data Warehouses2024 6th International Conference on Electrical Engineering and Information & Communication Technology (ICEEICT)10.1109/ICEEICT62016.2024.10534564(1315-1320)Online publication date: 2-May-2024
  • (2024)Knowledge Graph Generation and Enabling Multidimensional Analytics on Bangladesh Agricultural DataIEEE Access10.1109/ACCESS.2024.341638812(87512-87531)Online publication date: 2024
  • (2023)Covid-19 Knowledge Graph Generation and Enabling Analysis across Healthcare, Socioeconomic, and Environmental Dimensions2023 26th International Conference on Computer and Information Technology (ICCIT)10.1109/ICCIT60459.2023.10441327(1-6)Online publication date: 13-Dec-2023
  • Show More Cited By

Index Terms

  1. Towards a Programmable Semantic Extract-Transform-Load Framework for Semantic Data Warehouses

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      DOLAP '15: Proceedings of the ACM Eighteenth International Workshop on Data Warehousing and OLAP
      October 2015
      108 pages
      ISBN:9781450337854
      DOI:10.1145/2811222
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 22 October 2015

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. knowledge base
      2. rdf
      3. semantic data warehouse
      4. semantic etl framework
      5. semantic integration

      Qualifiers

      • Research-article

      Funding Sources

      • European Commission

      Conference

      CIKM'15
      Sponsor:

      Acceptance Rates

      DOLAP '15 Paper Acceptance Rate 8 of 31 submissions, 26%;
      Overall Acceptance Rate 29 of 79 submissions, 37%

      Upcoming Conference

      CIKM '25

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)26
      • Downloads (Last 6 weeks)4
      Reflects downloads up to 10 Feb 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)$SETL_{onDEMAND}$: Towards an on Demand ETL Approach for Semantic Data Warehouses2024 6th International Conference on Electrical Engineering and Information & Communication Technology (ICEEICT)10.1109/ICEEICT62016.2024.10534564(1315-1320)Online publication date: 2-May-2024
      • (2024)Knowledge Graph Generation and Enabling Multidimensional Analytics on Bangladesh Agricultural DataIEEE Access10.1109/ACCESS.2024.341638812(87512-87531)Online publication date: 2024
      • (2023)Covid-19 Knowledge Graph Generation and Enabling Analysis across Healthcare, Socioeconomic, and Environmental Dimensions2023 26th International Conference on Computer and Information Technology (ICCIT)10.1109/ICCIT60459.2023.10441327(1-6)Online publication date: 13-Dec-2023
      • (2023)Augmented Data Warehouses for Value CaptureNew Technologies, Artificial Intelligence and Smart Data10.1007/978-3-031-47366-1_13(168-182)Online publication date: 21-Nov-2023
      • (2022)High-level ETL for semantic data warehousesSemantic Web10.3233/SW-21042913:1(85-132)Online publication date: 1-Jan-2022
      • (2022)Multidimensional enrichment of spatial RDF data for SOLAPSemantic Web10.3233/SW-21042313:1(5-39)Online publication date: 1-Jan-2022
      • (2021)A Systematic Literature Review on Big Data Extraction, Transformation and Loading (ETL)Intelligent Computing10.1007/978-3-030-80126-7_24(308-324)Online publication date: 7-Jul-2021
      • (2020)Towards Synaptic Behavior of Nanoscale ReRAM Devices for Neuromorphic Computing ApplicationsACM Journal on Emerging Technologies in Computing Systems10.1145/338185916:2(1-18)Online publication date: 29-Apr-2020
      • (2020)QCORACM Journal on Emerging Technologies in Computing Systems10.1145/338096416:2(1-17)Online publication date: 18-Mar-2020
      • (2020)SETLBI: An Integrated Platform for Semantic Business IntelligenceCompanion Proceedings of the Web Conference 202010.1145/3366424.3383533(167-171)Online publication date: 20-Apr-2020
      • Show More Cited By

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media