Abstract
Data Lake is a term denoting a repository storing heterogeneous data, both structured and unstructured, resulting in a flexible organization that allows Data Lake users to reorganize and integrate dynamically the information they need according to the required query or analysis. The success of its implementation depends on many factors, notably the distributed storage, the kind of media deployed, the data access protocols and the network used. However, flaws in the design might become evident only in a later phase of the system development, causing significant delays in complex projects. This article presents an application of queuing networks modeling technique to detect significant issues, such as bottlenecks and performance degradation, for different workload scenarios.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
- 2.
- 3.
- 4.
- 5.
- 6.
See, for example, https://www.healthcatalyst.com/four-essential-zones-healthcare-data-lake.
- 7.
- 8.
- 9.
The term filter denotes a type of data processing, producing a result.
References
Bertoli, M., Casale, G., Serazzi, G.: JMT: performance engineering tools for system modeling. SIGMETRICS Perform. Eval. Rev. 36(4), 10–15 (2009). https://doi.org/10.1145/1530873.1530877
Bian, H., Chandra, B., Mytilinis, I., Ailamaki, A.: Storage management in smart data lake. In: EDBT/ICDT Workshops (2021)
Bird, I., Campana, S., Girone, M., Espinal, X., McCance, G., Schovancová, J.: Architecture and prototype of a WLCG data lake for HL-LHC. In: EPJ Web of Conferences, vol. 214, p. 04024. EDP Sciences (2019)
Chessell, M., Scheepers, F., Nguyen, N., van Kessel, R., van der Starre, R.: Governing and managing big data for analytics and decision makers (2014)
Derakhshannia, M., Gervet, C., Hajj-Hassan, H., Laurent, A., Martin, A.: Data lake governance: Towards a systemic and natural ecosystem analogy. Future Internet 12(8), 126 (2020)
Ghemawat, S., Gobioff, H., Leung, S.T.: The Google file system. In: Proceedings of the Nineteenth ACM Symposium on Operating Systems Principles, pp. 29–43 (2003)
Hai, R., Quix, C., Jarke, M.: Data lake concept and systems: a survey. CoRR abs/2106.09592 (2021). https://arxiv.org/abs/2106.09592
Inmon, B.: Data Lake Architecture: Designing the Data Lake and Avoiding the Garbage Dump. Technics Publications (2016)
Miloslavskaya, N., Tolstoy, A.: Big data, fast data and data lake concepts. Procedia Comput. Sci. 88, 300–305 (2016). https://www.sciencedirect.com/science/article/pii/S1877050916316957, 7th Annual International Conference on Biologically Inspired Cognitive Architectures, BICA 2016, held July 16 to July 19, 2016 in New York City, NY, USA
Nargesian, F., Zhu, E., Miller, R.J., Pu, K.Q., Arocena, P.C.: Data lake management: challenges and opportunities. Proc. VLDB Endow. 12(12), 1986–1989 (2019). https://doi.org/10.14778/3352063.3352116
Nogueira, I.D., Romdhane, M., Darmont, J.: Modeling data lake metadata with a data vault. In: Proceedings of the 22nd International Database Engineering & Applications Symposium, pp. 253–261 (2018)
Ramakrishnan, R., et al.: Azure data lake store: a hyperscale distributed file service for big data analytics. In: Proceedings of the 2017 ACM International Conference on Management of Data, pp. 51–63 (2017)
Walker, C., Alrehamy, H.: Personal data lake with data gravity pull. In: 2015 IEEE Fifth International Conference on Big Data and Cloud Computing, pp. 160–167. IEEE (2015)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Barbierato, E., Gribaudo, M., Serazzi, G., Tanca, L. (2021). Performance Evaluation of a Data Lake Architecture via Modeling Techniques. In: Ballarini, P., Castel, H., Dimitriou, I., Iacono, M., Phung-Duc, T., Walraevens, J. (eds) Performance Engineering and Stochastic Modeling. EPEW ASMTA 2021 2021. Lecture Notes in Computer Science(), vol 13104. Springer, Cham. https://doi.org/10.1007/978-3-030-91825-5_7
Download citation
DOI: https://doi.org/10.1007/978-3-030-91825-5_7
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-91824-8
Online ISBN: 978-3-030-91825-5
eBook Packages: Computer ScienceComputer Science (R0)