Abstract
In this paper we target the current problem of evolution of heterogeneous data sources of a data warehouse. Evolution may be caused by changes in the structure of data sources that are often independent from a data warehouse as well as by changes in information requirements. The solution we introduce in this paper is based on the architecture of a data analysis system that apart from a data highway that collects and transforms data also employs a metadata repository and various tools that provide different kinds of analysis of stored data. The unique feature of our solution is an adaptation component that incorporates mechanisms for automatic discovery of changes in the structure of integrated data sets and propagation of these changes in a data warehouse and other components of a data analysis system. In addition to the presentation of our approach, we give details of approbation of our software prototype in the case study system.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Bentayeb, F., Favre, C., Boussaid, O.: A user-driven data warehouse evolution approach for concurrent personalized analysis needs. Integr. Comput.-Aided Eng. 15(1), 21–36 (2008)
Wojciechowski, A.: ETL workflow reparation by means of case-based reasoning. Inf. Syst. Front. 20, 21–43 (2018)
Ahmed, W., Zimányi, E., Wrembel, R.: A logical model for multiversion data warehouses. In: Bellatreche, L., Mohania, M.K. (eds.) DaWaK 2014. LNCS, vol. 8646, pp. 23–34. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10160-6_3
Golfarelli, M., Lechtenbörger, J., Rizzi, S., Vossen, G.: Schema versioning in data warehouses: enabling cross-version querying via schema augmentation. Data Knowl. Eng. 59(2), 435–459 (2006)
Malinowski, E., Zimányi, E.: A conceptual model of temporal data warehouses and its transformation to the ER and object-relational models. Data Knowl. Eng. 64(1), 101–133 (2008)
Thenmozhi, M., Vivekanandan, K.: An ontological approach to handle multidimensional schema evolution for data warehouse. Int. J. Database Manag. Syst. 6(3), 33–52 (2014)
Thakur, G., Gosain, A.: DWEVOLVE: a requirement based framework for data warehouse evolution. ACM SIGSOFT Softw. Eng. Notes 36(6), 1–8 (2011)
Kaisler, S., Armour, F., Espinosa, J.A., Money, W: Big data: issues and challenges moving forward. In: Proceedings of the 2013 46th Hawaii International Conference on System Sciences, HICSS 2013, pp. 995–1004. IEEE Computer Society (2013). https://doi.org/10.1109/HICSS.2013.645
Cuzzocrea, A., Bellatreche, L., Song, I.-Y.: Data warehousing and OLAP over big data: current challenges and future research directions. In: Proceedings of the Sixteenth International Workshop on Data Warehousing and OLAP (DOLAP 2013), San Francisco, California, USA, pp. 67–70 (2013)
Holubová, I., Klettke, M., Störl, U.: Evolution management of multi-model data. In: Gadepally, V., et al. (eds.) DMAH/Poly -2019. LNCS, vol. 11721, pp. 139–153. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-33752-0_10
Solodovnikova, D., Niedrite, L.: Handling evolution in big data architectures. Balt. J. Mod. Comput. 8(1), 21–47 (2020)
Sumbaly, R., Kreps, J., Shah, S.: The big data ecosystem at linkedin. In: Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data, SIGMOD 2013, pp. 1125–1134. ACM, New York (2013). https://doi.org/10.1145/2463676.2463707
Chen, S.: Cheetah: a high performance, custom data warehouse on top of MapReduce. VLDB Endow. 3(2), 1459–1468 (2010)
Kimball, R., Ross, M.: The Data Warehouse Toolkit: The Definitive Guide to Dimensional Modeling, 3rd edn. Wiley, Hoboken (2013)
Nadal, S., Romero, O., Abelló, A., Vassiliadis, P., Vansummeren, S.: An integration-oriented ontology to govern evolution in Big Data ecosystems. In: Workshops of the EDBT/ICDT 2017 Joint Conference (2017)
Wang, Z., Zhou, L., Das, A., Dave, V., Jin, Z., Zou, J.: Survive the schema changes: integration of unmanaged data using deep learning. arXiv preprint arXiv:2010.07586 (2020)
Holubová, I., Vavrek, M., Scherzinger, S.: Evolution management in multi-model databases. Data Knowl. Eng. 136 (2021)
Solodovnikova, D., Niedrite, L., Niedritis, A.: On metadata support for integrating evolving heterogeneous data sources. In: Welzer, T., et al. (eds.) ADBIS 2019. CCIS, vol. 1064, pp. 378–390. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-30278-8_38
Quix, C., Hai, R., Vatov, I.: Metadata extraction and management in data lakes with GEMMS. Complex Syst. Inform. Model. Q. 9, 67–83 (2016)
Solodovnikova, D., Niedrite, L., Svilpe, L.: Managing evolution of heterogeneous data sources of a data warehouse. In: Proceedings of the 23rd International Conference on Enterprise Information Systems, ICEIS 2021, vol. 1, pp. 1–2. Online Streaming (2021)
Solodovnikova, D., Niedrite, L.: Towards a data warehouse architecture for managing big data evolution. In: Proceedings of the 7th International Conference on Data Science, Technology and Applications (DATA 2018), Porto, Portugal, pp. 63–70 (2018)
Solodovnikova, D., Niedrite, L.: Change discovery in heterogeneous data sources of a data warehouse. In: Robal, T., Haav, H.-M., Penjam, J., Matulevičius, R. (eds.) DB&IS 2020. CCIS, vol. 1243, pp. 23–37. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-57672-1_3
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 Springer Nature Switzerland AG
About this paper
Cite this paper
Solodovnikova, D., Niedrite, L., Svilpe, L. (2022). An Approach to Evolution Management in Integrated Heterogeneous Data Sources. In: Filipe, J., Śmiałek, M., Brodsky, A., Hammoudi, S. (eds) Enterprise Information Systems. ICEIS 2021. Lecture Notes in Business Information Processing, vol 455. Springer, Cham. https://doi.org/10.1007/978-3-031-08965-7_3
Download citation
DOI: https://doi.org/10.1007/978-3-031-08965-7_3
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-08964-0
Online ISBN: 978-3-031-08965-7
eBook Packages: Computer ScienceComputer Science (R0)