Skip to main content
Log in

The contribution of linked open data to augment a traditional data warehouse

  • Published:
Journal of Intelligent Information Systems Aims and scope Submit manuscript

Abstract

The arrival of Big Data has contributed positively to the evolution of the data warehouse (DW ) technology. This gives birth of augmented DW s that aim at maximizing the effectiveness of existing ones. Various augmentation scenarios have been proposed and adopted by firms and industry covering several aspects such as new data sources (e.g., Linked Open Data (LOD), social, stream and IoT data), data ingestion, advanced deployment infrastructures, programming paradigms, data visualization. These scenarios allow companies reaching valuable decisions. By examining traditional DW s, we realized that they do not fulfill all decision-maker requirements since data sources alimenting a target DW are not rich enough to capture Big Data. The arrival of LOD era is an excellent opportunity to enrich traditional DW s with a new V dimension: Value. In this paper, we first conceptualize the variety of internal and external sources and study its effect on the ETL phase to ease the value capturing. Secondly, a Value-driven approach for the DW design is discussed. Thirdly, three realistic scenarios for integrating LOD in the DW landscape are given. Finally, experiments are conducted showing the added value by augmenting the existing DW environment with LOD.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Notes

  1. http://www.wfmc.org/

  2. e.g. Dbpedia SPARQL endpoint: https://dbpedia.org/sparql

  3. http://linkedgeodata.org/

  4. http://www.scholarlydata.org/dumps/

  5. https://permid.org/download

  6. http://swat.cse.lehigh.edu/projects/lubm/

  7. https://www.springernature.com/gp/researchers/scigraph

  8. http://swat.cse.lehigh.edu/projects/lubm/queries-sparql.txt

  9. http://www.colinda.org/

  10. http://www.wikicfp.com/cfp/

References

  • Abelló, A., Romero, O., Pedersen, T.B., Llavori, R.B., Nebot, V., Cabo, M.J.A., Simitsis, A. (2015). Using semantic web technologies for exploratory OLAP: a survey. IEEE Transition Knowledge Data Engineering, 27(2), 571–588.

    Article  Google Scholar 

  • Abelló Gamazo, A., Gallinucci, E., Golfarelli, M., Rizzi Bach, S., Romero Moral, O. (2016). Towards exploratory olap on linked data. In SEBD (pp. 86–93).

  • Baldacci, L., Golfarelli, M., Graziani, S., Rizzi, S. (2017). Qetl: an approach to on-demand etl from non-owned data sources. DKE, 112, 17–37.

    Article  Google Scholar 

  • Ballou, D.P., & Tayi, G.K. (1999). Enhancing data quality in data warehouse environments. Communications of the ACM, 42(1), 73–78.

    Article  Google Scholar 

  • Beheshti, A., Benatallah, B., Nouri, R., Tabebordbar, A. (2018). Corekg: a knowledge lake service. Proceedings of the VLDB Endowment, 11(12), 1942–1945.

    Article  Google Scholar 

  • Beheshti, A., Benatallah, B., Tabebordbar, A., Motahari-Nezhad, H.R., Barukh, M.C., Nouri, R. (2019). Datasynapse: a social data curation foundry. Distributed and Parallel Databases, 37(3), 351–384.

    Article  Google Scholar 

  • Berkani, N., & Bellatreche, L. (2017). A variety-sensitive ETL processes. In DEXA, (Vol. 2 pp. 201–216).

  • Berkani, N., Bellatreche, L., Benatallah, B. (2016). A value-added approach to design BI applications. In DaWaK (pp. 361–375).

  • Berkani, N., Bellatreche, L., Khouri, S., Ordonez, C. (2019). Value-driven approach for designing extended data warehouses. In DOLAP.

  • Berro, A., Megdiche, I., Teste, O. (2015). Graph-based ETL processes for warehousing statistical open data. In ICEIS, (Vol. 2015 pp. 271–278).

  • Boehm, B. (2003). Value-based software engineering: reinventing. ACM SIGSOFT Software Engineering Notes, 28(2), 3.

    Article  Google Scholar 

  • Božič, K., & Dimovski, V. (2019). Business intelligence and analytics for value creation: the role of absorptive capacity. IJIM, 46, 93–103.

    Google Scholar 

  • Calvanese, D., & et al. (1999). A principled approach to data integration and reconciliation in data warehousing. In DMDW (p. 16).

  • Deb Nath, R.P., Hose, K., Pedersen, T.B. (2015). Towards a programmable semantic extract-transform-load framework for semantic data warehouses. In DOLAP (pp. 15–24).

  • Dehainsala, H., Pierra, G., Bellatreche, L. (2007). OntoDB: an ontology-based database for data intensive applications. In DASFAA (pp. 497–508).

  • Deza, M.M., & Deza, E. (2009). Encyclopedia of distances. In Encyclopedia of distances (pp. 1–583): Springer.

  • Eckerson, W. (2003). Smart companies in the 21st century: the secrets of creating successful business intelligence solutions. TDWI Report Series 7.

  • Etcheverry, L., Vaisman, A., Zimányi, E. (2014). Modeling and querying data warehouses on the semantic web using qb4olap. In DaWAK (pp. 45–56).

  • Golfarelli, M., & Rizzi, S. (2009). A survey on temporal data warehousing. International Journal of Data Warehousing and Mining (IJDWM), 5(1), 1–17.

    Article  Google Scholar 

  • Gordijn, J., & Akkermans, J. (2003). Value-based requirements engineering: exploring innovative e-commerce ideas. Requirements Engineering, 8(2), 114–134.

    Article  Google Scholar 

  • Gosain, A., & et al. (2015). Literature review of data model quality metrics of data warehouse. Procedia Computer Science, 48, 236–243.

    Article  Google Scholar 

  • Guarino, N., Andersson, B., Johannesson, P., Livieri, B. (2016). Towards an ontology of value ascription. In FOIS, IOS Press, (Vol. 283 p. 331).

  • Hoffart, J., & et al. (2011). YAGO2: exploring and querying world knowledge in time, space, context, and many languages. In WWW (pp. 229–232).

  • Hoffer, J.A., Ramesh, V., Topi, H. (2011). Modern database management. Upper Saddle River: Prentice Hall.

    Google Scholar 

  • Kämpgen, B., O’Riain, S., Harth, A. (2012). Interacting with statistical linked data via OLAP operations. In ESWC (pp. 87–101).

  • Konstantinou, N., & et al. (2017). The VADA architecture for cost-effective data wrangling. In SIGMOD (pp. 1599–1602).

  • Matei, A., Chao, K., Godwin, N. (2014). OLAP for multidimensional semantic web databases. In BIRTE (pp. 81–96).

  • Mountantonakis, M., & Tzitzikas, Y. (2018). Scalable methods for measuring the connectivity and quality of large numbers of linked datasets. JDIQ, 9(3), 15.

    Article  Google Scholar 

  • Nebot, V., & Llavori, R.B. (2012). Building data warehouses with semantic web data. Decision Support Systems, 52(4), 853–868.

    Article  Google Scholar 

  • Ravat, F., Song, J., Teste, O. (2016). Designing multidimensional cubes from warehoused data and linked open data. In RCIS (pp. 1–12).

  • Saad, R., Teste, O., Trojahn, C. (2013). Olap manipulations on rdf data following a constellation model. In 1st international workshop on semantic statistics.

  • Sabharwal, S., Nagpal, S., Aggarwal, G. (2017). Empirical analysis of metrics for object oriented multidimensional model of data warehouse using unsupervised machine learning techniques. JSAEM, 8(2), 703–715.

    Google Scholar 

  • Sales, T.P., Guarino, N., Guizzardi, G., Mylopoulos, J. (2017). An ontological analysis of value propositions. In EDOC (pp. 184–193): IEEE.

  • Sales, T.P., Baião, F.A., Guizzardi, G., Almeida, J.P.A., Guarino, N., Mylopoulos, J. (2018). The common ontology of value and risk. In ER (pp. 121–135).

  • Serrano, M., Trujillo, J., Calero, C., Piattini, M. (2007). Metrics for data warehouse conceptual models understandability. JIST, 49(8), 851–870.

    Google Scholar 

  • Skoutas, D., & Simitsis, A. (2007). Ontology-based conceptual design of ETL processes for both structured and semi-structured data. Semantic Web, 3(4), 1–24.

    Article  Google Scholar 

  • Thew, S., & Sutcliffe, A. (2018). Value-based requirements engineering: method and experience. Requirements Engineering, 23(4), 443–464.

    Article  Google Scholar 

  • van Der Aalst, W.M., Ter Hofstede, A.H., Kiepuszewski, B., Barros, A.P. (2003). Workflow patterns. Distributed and Parallel Databases, 14(1), 5–51.

    Article  Google Scholar 

  • Wegmann, A. (2003). On the systemic enterprise architecture methodology (seam). In CONF (pp. 483–490).

  • Zaveri, A., Rula, A., Maurino, A., Pietrobon, R., Lehmann, J., Auer, S. (2016). Quality assessment for linked data: a survey. Semantic Web, 7(1), 63–93.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nabila Berkani.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Berkani, N., Bellatreche, L., Khouri, S. et al. The contribution of linked open data to augment a traditional data warehouse. J Intell Inf Syst 55, 397–421 (2020). https://doi.org/10.1007/s10844-020-00594-w

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10844-020-00594-w

Keywords

Navigation