Abstract
By moving data into a centralized, scalable storage location inside an organization – the data lake – companies and other institutions aim to discover new information and to generate value from the data. The data lake can help to overcome organizational boundaries and system complexity. However, to generate value from the data, additional techniques, tools, and processes need to be established which help to overcome data integration and other challenges around this approach. Although there is a certain agreed-on notion of the central idea, there is no accepted definition what components or functionality a data lake has or how an architecture looks like. Throughout this article, we will start with the central idea and discuss various related aspects and technologies.
Similar content being viewed by others
References
Dixon J (2010) Pentaho, hadoop, and data lakes. https://jamesdixon.wordpress.com/2010/10/14/pentaho-hadoop-and-data-lakes/
Dong XL, Srivastava D (2015) Big Data Integration. Morgan and Claypool Publishers, San Rafael, CA
Ramakrishnan R et al (2017) Azure data lake store: a hyperscale distributed file service for big data analytics. Proc ACM SIGMOD Int Conf Manag Data. https://doi.org/10.1145/3035918.3056100
Maltzahn C, Molina-Estolano E, Khurana A, Nelson AJ, Brandt SA, Weil S (2010) Ceph as a scalable alternative to the Hadoop distributed file system. login 35(4):38–49
Cohen J, Dolan B, Dunlap M, Hellerstein JM, Welton C (2009) MAD skills: new analysis practices for big data. Proc VLDB Endow 2009:1481–1492
Xin RS, Rosen J, Zahira M, Franklin MJ, Shenker S, Stoica I (2013) Shark: SQL and rich analytics at scale. Proc ACM SIGMOD Int Conf Manag Data 2013:13–24
Kreps J (2014) Questioning the lambda architecture. http://milinda.pathirage.org/kappa-architecture.com/. Accessed: 30. Sept. 2017
Marz N (2011) How to beat the CAP theorem. http://nathanmarz.com/blog/how-to-beat-the-cap-theorem.html. Accessed: 30. Sept. 2017
Acknowledgements
I would like to thank Christian Sengstock and Martin Hartig for feedback and discussions while writing this article.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Mathis, C. Data Lakes. Datenbank Spektrum 17, 289–293 (2017). https://doi.org/10.1007/s13222-017-0272-7
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13222-017-0272-7