Abstract
The arrival of Big Data has contributed positively to the evolution of the data warehouse (DW ) technology. This gives birth of augmented DW s that aim at maximizing the effectiveness of existing ones. Various augmentation scenarios have been proposed and adopted by firms and industry covering several aspects such as new data sources (e.g., Linked Open Data (LOD), social, stream and IoT data), data ingestion, advanced deployment infrastructures, programming paradigms, data visualization. These scenarios allow companies reaching valuable decisions. By examining traditional DW s, we realized that they do not fulfill all decision-maker requirements since data sources alimenting a target DW are not rich enough to capture Big Data. The arrival of LOD era is an excellent opportunity to enrich traditional DW s with a new V dimension: Value. In this paper, we first conceptualize the variety of internal and external sources and study its effect on the ETL phase to ease the value capturing. Secondly, a Value-driven approach for the DW design is discussed. Thirdly, three realistic scenarios for integrating LOD in the DW landscape are given. Finally, experiments are conducted showing the added value by augmenting the existing DW environment with LOD.
Similar content being viewed by others
Notes
e.g. Dbpedia SPARQL endpoint: https://dbpedia.org/sparql
References
Abelló, A., Romero, O., Pedersen, T.B., Llavori, R.B., Nebot, V., Cabo, M.J.A., Simitsis, A. (2015). Using semantic web technologies for exploratory OLAP: a survey. IEEE Transition Knowledge Data Engineering, 27(2), 571–588.
Abelló Gamazo, A., Gallinucci, E., Golfarelli, M., Rizzi Bach, S., Romero Moral, O. (2016). Towards exploratory olap on linked data. In SEBD (pp. 86–93).
Baldacci, L., Golfarelli, M., Graziani, S., Rizzi, S. (2017). Qetl: an approach to on-demand etl from non-owned data sources. DKE, 112, 17–37.
Ballou, D.P., & Tayi, G.K. (1999). Enhancing data quality in data warehouse environments. Communications of the ACM, 42(1), 73–78.
Beheshti, A., Benatallah, B., Nouri, R., Tabebordbar, A. (2018). Corekg: a knowledge lake service. Proceedings of the VLDB Endowment, 11(12), 1942–1945.
Beheshti, A., Benatallah, B., Tabebordbar, A., Motahari-Nezhad, H.R., Barukh, M.C., Nouri, R. (2019). Datasynapse: a social data curation foundry. Distributed and Parallel Databases, 37(3), 351–384.
Berkani, N., & Bellatreche, L. (2017). A variety-sensitive ETL processes. In DEXA, (Vol. 2 pp. 201–216).
Berkani, N., Bellatreche, L., Benatallah, B. (2016). A value-added approach to design BI applications. In DaWaK (pp. 361–375).
Berkani, N., Bellatreche, L., Khouri, S., Ordonez, C. (2019). Value-driven approach for designing extended data warehouses. In DOLAP.
Berro, A., Megdiche, I., Teste, O. (2015). Graph-based ETL processes for warehousing statistical open data. In ICEIS, (Vol. 2015 pp. 271–278).
Boehm, B. (2003). Value-based software engineering: reinventing. ACM SIGSOFT Software Engineering Notes, 28(2), 3.
Božič, K., & Dimovski, V. (2019). Business intelligence and analytics for value creation: the role of absorptive capacity. IJIM, 46, 93–103.
Calvanese, D., & et al. (1999). A principled approach to data integration and reconciliation in data warehousing. In DMDW (p. 16).
Deb Nath, R.P., Hose, K., Pedersen, T.B. (2015). Towards a programmable semantic extract-transform-load framework for semantic data warehouses. In DOLAP (pp. 15–24).
Dehainsala, H., Pierra, G., Bellatreche, L. (2007). OntoDB: an ontology-based database for data intensive applications. In DASFAA (pp. 497–508).
Deza, M.M., & Deza, E. (2009). Encyclopedia of distances. In Encyclopedia of distances (pp. 1–583): Springer.
Eckerson, W. (2003). Smart companies in the 21st century: the secrets of creating successful business intelligence solutions. TDWI Report Series 7.
Etcheverry, L., Vaisman, A., Zimányi, E. (2014). Modeling and querying data warehouses on the semantic web using qb4olap. In DaWAK (pp. 45–56).
Golfarelli, M., & Rizzi, S. (2009). A survey on temporal data warehousing. International Journal of Data Warehousing and Mining (IJDWM), 5(1), 1–17.
Gordijn, J., & Akkermans, J. (2003). Value-based requirements engineering: exploring innovative e-commerce ideas. Requirements Engineering, 8(2), 114–134.
Gosain, A., & et al. (2015). Literature review of data model quality metrics of data warehouse. Procedia Computer Science, 48, 236–243.
Guarino, N., Andersson, B., Johannesson, P., Livieri, B. (2016). Towards an ontology of value ascription. In FOIS, IOS Press, (Vol. 283 p. 331).
Hoffart, J., & et al. (2011). YAGO2: exploring and querying world knowledge in time, space, context, and many languages. In WWW (pp. 229–232).
Hoffer, J.A., Ramesh, V., Topi, H. (2011). Modern database management. Upper Saddle River: Prentice Hall.
Kämpgen, B., O’Riain, S., Harth, A. (2012). Interacting with statistical linked data via OLAP operations. In ESWC (pp. 87–101).
Konstantinou, N., & et al. (2017). The VADA architecture for cost-effective data wrangling. In SIGMOD (pp. 1599–1602).
Matei, A., Chao, K., Godwin, N. (2014). OLAP for multidimensional semantic web databases. In BIRTE (pp. 81–96).
Mountantonakis, M., & Tzitzikas, Y. (2018). Scalable methods for measuring the connectivity and quality of large numbers of linked datasets. JDIQ, 9(3), 15.
Nebot, V., & Llavori, R.B. (2012). Building data warehouses with semantic web data. Decision Support Systems, 52(4), 853–868.
Ravat, F., Song, J., Teste, O. (2016). Designing multidimensional cubes from warehoused data and linked open data. In RCIS (pp. 1–12).
Saad, R., Teste, O., Trojahn, C. (2013). Olap manipulations on rdf data following a constellation model. In 1st international workshop on semantic statistics.
Sabharwal, S., Nagpal, S., Aggarwal, G. (2017). Empirical analysis of metrics for object oriented multidimensional model of data warehouse using unsupervised machine learning techniques. JSAEM, 8(2), 703–715.
Sales, T.P., Guarino, N., Guizzardi, G., Mylopoulos, J. (2017). An ontological analysis of value propositions. In EDOC (pp. 184–193): IEEE.
Sales, T.P., Baião, F.A., Guizzardi, G., Almeida, J.P.A., Guarino, N., Mylopoulos, J. (2018). The common ontology of value and risk. In ER (pp. 121–135).
Serrano, M., Trujillo, J., Calero, C., Piattini, M. (2007). Metrics for data warehouse conceptual models understandability. JIST, 49(8), 851–870.
Skoutas, D., & Simitsis, A. (2007). Ontology-based conceptual design of ETL processes for both structured and semi-structured data. Semantic Web, 3(4), 1–24.
Thew, S., & Sutcliffe, A. (2018). Value-based requirements engineering: method and experience. Requirements Engineering, 23(4), 443–464.
van Der Aalst, W.M., Ter Hofstede, A.H., Kiepuszewski, B., Barros, A.P. (2003). Workflow patterns. Distributed and Parallel Databases, 14(1), 5–51.
Wegmann, A. (2003). On the systemic enterprise architecture methodology (seam). In CONF (pp. 483–490).
Zaveri, A., Rula, A., Maurino, A., Pietrobon, R., Lehmann, J., Auer, S. (2016). Quality assessment for linked data: a survey. Semantic Web, 7(1), 63–93.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Berkani, N., Bellatreche, L., Khouri, S. et al. The contribution of linked open data to augment a traditional data warehouse. J Intell Inf Syst 55, 397–421 (2020). https://doi.org/10.1007/s10844-020-00594-w
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10844-020-00594-w