Skip to main content

Performance Evaluation of a Data Lake Architecture via Modeling Techniques

  • Conference paper
  • First Online:
Performance Engineering and Stochastic Modeling (EPEW 2021, ASMTA 2021)

Abstract

Data Lake is a term denoting a repository storing heterogeneous data, both structured and unstructured, resulting in a flexible organization that allows Data Lake users to reorganize and integrate dynamically the information they need according to the required query or analysis. The success of its implementation depends on many factors, notably the distributed storage, the kind of media deployed, the data access protocols and the network used. However, flaws in the design might become evident only in a later phase of the system development, causing significant delays in complex projects. This article presents an application of queuing networks modeling technique to detect significant issues, such as bottlenecks and performance degradation, for different workload scenarios.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 59.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 79.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://infocus.delltechnologies.com/william_schmarzo/why-do-i-need-a-data-lake-for-big-data/.

  2. 2.

    https://jamesdixon.wordpress.com/.

  3. 3.

    https://searchcio.techtarget.com/feature/Data-lake-governance-A-big-data-do-or-die.

  4. 4.

    https://venturebeat.com/2014/06/25/the-next-big-disruption-in-big-data/.

  5. 5.

    https://www.gartner.com/en/documents/3980938-data-hubs-data-lakes-and-data-warehouses-how-they-are-di.

  6. 6.

    See, for example, https://www.healthcatalyst.com/four-essential-zones-healthcare-data-lake.

  7. 7.

    https://cloud.google.com/architecture/build-a-data-lake-on-gcp.

  8. 8.

    https://docs.aws.amazon.com/AmazonS3/latest/userguide/managing-storage.html.

  9. 9.

    The term filter denotes a type of data processing, producing a result.

References

  1. Bertoli, M., Casale, G., Serazzi, G.: JMT: performance engineering tools for system modeling. SIGMETRICS Perform. Eval. Rev. 36(4), 10–15 (2009). https://doi.org/10.1145/1530873.1530877

    Article  Google Scholar 

  2. Bian, H., Chandra, B., Mytilinis, I., Ailamaki, A.: Storage management in smart data lake. In: EDBT/ICDT Workshops (2021)

    Google Scholar 

  3. Bird, I., Campana, S., Girone, M., Espinal, X., McCance, G., Schovancová, J.: Architecture and prototype of a WLCG data lake for HL-LHC. In: EPJ Web of Conferences, vol. 214, p. 04024. EDP Sciences (2019)

    Google Scholar 

  4. Chessell, M., Scheepers, F., Nguyen, N., van Kessel, R., van der Starre, R.: Governing and managing big data for analytics and decision makers (2014)

    Google Scholar 

  5. Derakhshannia, M., Gervet, C., Hajj-Hassan, H., Laurent, A., Martin, A.: Data lake governance: Towards a systemic and natural ecosystem analogy. Future Internet 12(8), 126 (2020)

    Article  Google Scholar 

  6. Ghemawat, S., Gobioff, H., Leung, S.T.: The Google file system. In: Proceedings of the Nineteenth ACM Symposium on Operating Systems Principles, pp. 29–43 (2003)

    Google Scholar 

  7. Hai, R., Quix, C., Jarke, M.: Data lake concept and systems: a survey. CoRR abs/2106.09592 (2021). https://arxiv.org/abs/2106.09592

  8. Inmon, B.: Data Lake Architecture: Designing the Data Lake and Avoiding the Garbage Dump. Technics Publications (2016)

    Google Scholar 

  9. Miloslavskaya, N., Tolstoy, A.: Big data, fast data and data lake concepts. Procedia Comput. Sci. 88, 300–305 (2016). https://www.sciencedirect.com/science/article/pii/S1877050916316957, 7th Annual International Conference on Biologically Inspired Cognitive Architectures, BICA 2016, held July 16 to July 19, 2016 in New York City, NY, USA

  10. Nargesian, F., Zhu, E., Miller, R.J., Pu, K.Q., Arocena, P.C.: Data lake management: challenges and opportunities. Proc. VLDB Endow. 12(12), 1986–1989 (2019). https://doi.org/10.14778/3352063.3352116

  11. Nogueira, I.D., Romdhane, M., Darmont, J.: Modeling data lake metadata with a data vault. In: Proceedings of the 22nd International Database Engineering & Applications Symposium, pp. 253–261 (2018)

    Google Scholar 

  12. Ramakrishnan, R., et al.: Azure data lake store: a hyperscale distributed file service for big data analytics. In: Proceedings of the 2017 ACM International Conference on Management of Data, pp. 51–63 (2017)

    Google Scholar 

  13. Walker, C., Alrehamy, H.: Personal data lake with data gravity pull. In: 2015 IEEE Fifth International Conference on Big Data and Cloud Computing, pp. 160–167. IEEE (2015)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Enrico Barbierato .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Barbierato, E., Gribaudo, M., Serazzi, G., Tanca, L. (2021). Performance Evaluation of a Data Lake Architecture via Modeling Techniques. In: Ballarini, P., Castel, H., Dimitriou, I., Iacono, M., Phung-Duc, T., Walraevens, J. (eds) Performance Engineering and Stochastic Modeling. EPEW ASMTA 2021 2021. Lecture Notes in Computer Science(), vol 13104. Springer, Cham. https://doi.org/10.1007/978-3-030-91825-5_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-91825-5_7

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-91824-8

  • Online ISBN: 978-3-030-91825-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics