Abstract
Data Warehouses are an established approach for analyzing data. But with the advent of big data the approach hits its limits due to lack of agility, flexibility and system complexity. To overcome these limits, the idea of data lakes has been proposed. The data lake is not a replacement for data warehouses. Moreover, both solutions have their application areas. So it is necessary to integrate both approaches into a common architecture. This paper describes and compares both approaches, shows different ways of integrating data lakes into data warehouse architectures.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Devlin, B.: Data Warehouse: From Architecture to Implementation. SEI Series in Software Engineering. Addison Wesley, Boston (1996)
Gardner, S.R.: Building the data warehouse. CACM 41(9), 52–60 (1998)
Inmon, W.H.: Building the Data Warehouse, 4th edn. Wiley, New York (1996)
Kimball, R.: The Data Warehouse Toolkit, 3rd edn. Wiley, New York (2013)
Vaisman, A., Zimányi, E.: Data Warehouse Systems. DSA. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-642-54655-6
Kimball, R., Reeves, L., Ross, M., Thornthwaite, W.: The Data Warehouse Life Cycle Toolkit. Wiley, New York (1998)
Thomsen, E.: OLAP Solutions 2E w/WS: Building Multidimensional Information Systems, 2nd edn. Wiley, New York (2002)
Golfarelli, M., Rizzi, S.: Data Warehouse Design: Modern Principles and Methodologies. McGraw-Hill, New York (2009)
Hartenauer, J.: Introduction to Business Intelligence: Concepts and Tools, 2nd edn. AV Akademikerverlag, Riga (Latvia) (2012)
Gudivada, V., Baeza-Yates, R., Raghavan, V.: Big data: promises and Problems. IEEE Comput. 48(3), 20–23 (2015)
Laney, D.: 3D Data Management: Controlling Data Volume, Velocity, and Variety. https://blogs.gartner.com/douglaney/files/2012/01/ad949-3D-Data-Management-Controlling-Data-Volume-Velocity-and-Variety.pdf. Accessed 31 Aug 2020
Siewert, S.: Big data in the cloud: data velocity, volume, variety, veracity. IBM Developer, 9 July 2013. https://www.ibm.com/developerworks/library/bd-bigdatacloud/index.html. Accessed 31 Aug 2020
Flouris, I., Giatrakos, N., Deligiannakis, A., Garofalakis, M., Kamp, M., Mock, M.: Issues in complex event processing: status and prospects in the big data era. J. Syst. Softw. 127, 217–236 (2017)
Orenga-Rogla, S., Chalmeta, R.: Framework for implementing a big data ecosystem in organizations. Commun. ACM 62(1), 58–65 (2019)
Shalev-Shwartz, S., Ben-David, S.: Understanding Machine Learning: From Theory to Algorithms. Cambridge University Press, Cambridge (2014)
Bonaccorso, G.: Mastering machine learning algorithms: expert techniques to implement popular machine learning algorithms and fine-tune your models. Packt Publishing, Birmingham (2018)
Cohen, J., Dolan, B., Dunlap, M., Hellerstein, J., Welton, C.: MAD skills: new analysis practices for big data. PVLDB 2(2), 1481–1492 (2009)
Dhar, V.: Data science and prediction. Commun. ACM 56(12), 64–73 (2013)
Deshpande, K., Desai, B.: Limitations of datawarehouse platforms and assessment of hadoop as an alternative. IJITMIS 5(2), 51–58 (2014)
Pasupuleti, P., Purra, B.: Data Lake Development with Big Data. Packt Publishing, Birmingham (2015)
Inmon, W.H.: Data Lake Architecture: Designing the Data Lake and Avoiding the Garbage Dump. Technics Publications, New Jersey (2016)
John, T., Misra, P.: Data Lake for Enterprises: Lambda Architecture for Building Enterprise Data Systems. Packt Publishing, Birmingham (2017)
Gupta, S., Giri, V.: Practical Enterprise Data Lake Insights: Handle Data-Driven Challenges in an Enterprise Big Data Lake. Apress, New York (2018)
Mathis, C.: Data lakes. Datenbank-Spektrum 17(3), 289–293 (2017)
Ladley, J.: Data Governance: How to Design, Deploy and Sustain an Effective Data Governance Program. The Morgan Kaufmann Series on Business Intelligence. Morgan Kaufmann, Burlington (2012)
Seiner, R.S.: Non-Invasive Data Governance: The Path of Least Resistance and Greatest Success. Technics Publications, New Jersey (2014)
Soares, S.: The Chief Data Officer Handbook for Data Governance. MC Press LLC, Boise (2015)
Talabis, M.: Information Security Analytics: Finding Security Insights, Patterns, and Anomalies in Big Data. Syngress, Rockland (2014)
Spivey, B., Echeverria, J.: Hadoop Security: Protecting Your Big Data Platform. O’Reilly, Newton (2015)
Dunning, T., Friedman, E.: Sharing Big Data Safely: Managing Data Security. O’Reilly Media, Newton (2016)
Ghavami, P.: Big Data Governance: Modern Data Management Principles for Hadoop, NoSQL Big Data Analytics. CreateSpace Independent Publishing Platform, Scotts Valley (2015)
Regulation (eu) 2016/679 of the european parliament and of the council of 27 April 2016 on the protection of natural persons with regard to the processing of personal data and on the free movement of such data, and repealing directive 95/46/ec (general data protection regulation). https://eur-lex.europa.eu/eli/reg/2016/679/oj. Accessed 31 Aug 2020
Russom, P., (eds.).: Data lakes: purposes, practices, patterns, and platforms. best practice report Q1/2017, TDWI (2017)
Bejek Jr., W.P.: Kafka Streams in Action. Manning, New York (2017)
Narkhede, N., Shapira, G., Palino, T.: Kafka: The Definitive Guide: Real-Time Data and Stream Processing at Scale. O’Reilly, Newton (2017)
Apache Kafka Project homepage. https://kafka.apache.org/. Accessed 31 Aug 2020
Ting, K., Cecho, J.: Apache Sqoop Cookbook. O’Reilly, Newton (2013)
White, T.: Hadoop: The Definitive Guide. O’Reilly, Newton (2015)
Apache sqoop Project homepage. https://sqoop.apache.org/. Accessed 31 Aug 2020
Alapati, S.: Expert Hadoop Administration: Managing, Tuning, and Securing Spark, YARN, and HDFS. Addison Wesley, Boston (2016)
HDFS. http://hadoop.apache.org/hdfs/. Accessed 31 Aug 2020
MapR. https://mapr.com/. Accessed 31 Aug 2020
Ozone. https://hadoop.apache.org/ozone/. Accessed 31 Aug 2020
Ellen, M.D., Tzoumas, K.: Introduction to Apache Flink: Stream Processing for Real Time and Beyond. O’Reilly, Newton (2016)
Apache Flink. https://flink.apache.org/. Accessed 31 Aug 2020
Allen, S., Pathirana, P., Jankowski, M.: Storm Applied: Strategies for Real-Time Event Processing. Manning, New York (2015)
Apache Storm. https://storm.apache.org/. Accessed 31 Aug 2020
Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. In: Brewer, E.A., Chen, P. (eds), 6th Symposium on Operating System Design and Implementation (OS-DI 2004), San Francisco, California, USA, 6–8 December 2004, pp. 137–150. USENIX Association (2004)
Chambers, B., Zaharu, M.: Spark: The Definitive Guide: Big data processing made simple. O’Reilly, Newton (2018)
Apache Spark. https://spark.apache.org/. Accessed 31 Aug 2020
Sadalage, P., Fowler, M.: NoSQL Distilled: A Brief Guide to the Emerging World of Polyglot Persistence. Addison-Wesley, Boston (2012)
Harrison, G.: Next Generation Databases: NoSQL and Big Data. Apress, New York (2015)
Harrison, G.: Seven NoSQL Databases in a Week: Get Up and Running with the Fundamentals and Functionalities of Seven of the Most Popular NoSQL Databases. Packt Publishing, Birmingham (2018)
SAS Institure. https://www.sas.com/. Accessed 31 Aug 2020
The R Project for Statistical Computing. https://www.r-project.org/. Accessed 31 Aug 2020
Python Software Foundation. https://www.python.org/. Accessed 31 Aug 2020
Microsoft Azure. https://azure.microsoft.com/. Accessed 31 Aug 2020
AWS. https://aws.amazon.com/. Accessed 31 Aug 2020
Andrade, H., Gedik, B., Turaga, B.: Fundamentals of Stream Processing: Application Design, Systems, and Analytics. Cambridge University Press, Cambridge (2014)
Basak, A., Venkataraman, K., Murphy, R., Singh, M.: Stream Analytics with Microsoft Azure: Real-Time Data Processing for Quick Insights using Azure Stream Analytics. Packt Publishing, Birmingham (2017)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Herden, O. (2020). Architectural Patterns for Integrating Data Lakes into Data Warehouse Architectures. In: Bellatreche, L., Goyal, V., Fujita, H., Mondal, A., Reddy, P.K. (eds) Big Data Analytics. BDA 2020. Lecture Notes in Computer Science(), vol 12581. Springer, Cham. https://doi.org/10.1007/978-3-030-66665-1_2
Download citation
DOI: https://doi.org/10.1007/978-3-030-66665-1_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-66664-4
Online ISBN: 978-3-030-66665-1
eBook Packages: Computer ScienceComputer Science (R0)