Skip to main content

A Metadata Framework for Data Lagoons

  • Conference paper
  • First Online:
New Trends in Databases and Information Systems (ADBIS 2019)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1064))

Included in the following conference series:

Abstract

In this work, we present a Metadata Framework in the direction of extending intelligence mechanisms from the Cloud to the Edge. To this end, we build on our previously introduced notion of Data Lagoons—the analogous to Data Lakes at the network edge—and we introduce a novel architecture and Metadata model for the efficient interaction between Data Lagoons and Data Lakes. We identify the service and data planes of our architecture and we illustrate the application of our framework on a use case from the TPCx-IoT benchmark. To our knowledge, our approach is the first one to examine the integration of Data Lakes with Edge components, taking under consideration data and infrastructure resources of Edge Nodes.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    IoT Growth Report By Gartner. http://www.gartner.com/newsroom/id/3598917. SmartMeter. 2017.

  2. 2.

    https://docs.openstack.org/heat/rocky/template_guide/hot_spec.html.

  3. 3.

    https://www.etsi.org/technologies/nfv.

  4. 4.

    http://www.tpc.org/tpcx-iot/.

  5. 5.

    http://www.podiumdata.com/solutions/.

  6. 6.

    https://www.informatica.com/de/solutions/explore-ecosystems/aws/aws-data-lakes.html.

References

  1. Batini, C., Cappiello, C., Francalanci, C., Maurino, A.: Methodologies for data quality assessment and improvement. ACM Comput. Surv. 41, 1–52 (2009). https://doi.org/10.1145/1541880.1541883

    Article  Google Scholar 

  2. Naumann, F.: Data profiling revisited. In: SIGMOD Record, vol. 42, pp. 40–49. ACM (2014). https://doi.org/10.1145/2590989.2590995

    Article  Google Scholar 

  3. Stein, B., Morrison, A.: The enterprise data lake: better integration and deeper analytics. In: PwC Technology Forecast: Rethinking integration, vol. 1, p. 18 (2014)

    Google Scholar 

  4. Alrehamy, H., Walker, C.: Personal data lake with data gravity pull. In: Proceedings of BDCloud, pp. 160–167. IEEE (2015)

    Google Scholar 

  5. López, P., et al.: Edge-centric computing: vision and challenges. Comput. Commun. Rev. 45, 37–42 (2015). https://doi.org/10.1145/2831347.2831354

    Article  Google Scholar 

  6. Marz, N., Warren, J.: Big data: principles and best practices of scalable realtime data systems. In: Big Data. Manning Publications Co. (2015)

    Google Scholar 

  7. Terrizzano, I., Schwarz, P., Roth, M., Colino, J.: The challenging yourney from the wild to the lake. In: CIDR (2015)

    Google Scholar 

  8. Mina H. et al.: CLAMS: bringing quality to data lakes. In: Proceedings of SIGMOD, pp. 2089–2092. (2016). https://doi.org/10.1145/2882903.2899391

  9. Hai, R., Geisler, S., Quix, C.: Constance: an intelligent data lake system. In: Proceedings of SIGMOD, pp. 2097–2100 (2016). https://doi.org/10.1145/2882903.2899389

  10. Alon, Y., et al.: Managing google’s data lake: an overview of the goods system. In: IEEE Data Engineering Bulletin, vol. 39, pp. 5–14. IEEE (2016). https://doi.org/10.1145/1541880.1541883

    Article  Google Scholar 

  11. Jovanovic, P., Romero, O., Simitsis, A., Abelló, A.: Incremental consolidation of data-intensive multi-flows. IEEE Trans. Knowl. Data Eng. 28, 1203–1216 (2016). https://doi.org/10.1109/TKDE.2016.2515609

    Article  Google Scholar 

  12. LaPlante, A., Sharma, B.: Architecting Data Lakes. O’Reilly Media, Newton (2016)

    Google Scholar 

  13. Quix, C., Hai, R., Vatov, I.: Metadata extraction and management in data lakes With GEMMS. Complex Syst. Inform. Model. Q. 9, 67–83 (2016)

    Article  Google Scholar 

  14. Tee, S.J., et al.: Seasonal influence on moisture interpretation for transformer aging assessment. IEEE Electr. Insul. Mag. 32, 29–37 (2016). https://doi.org/10.1109/MEI.2016.7527123

    Article  Google Scholar 

  15. Jarke, M., Quix, C.: On warehouses lakes, and spaces: the changing role of conceptual modeling for data integration. In: Cabot, J., Gómez, C., Pastor, O., Sancho, M., Teniente, E. (eds.) Conceptual Modeling Perspectives, pp. 231–245. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-67271-7_16

    Chapter  Google Scholar 

  16. Lin, J., et al.: A survey on internet of things: architecture, enabling technologies, security and privacy, and applications. IEEE Internet Things J. 4, 1125–1142 (2017). https://doi.org/10.1109/JIOT.2017.2683200

    Article  Google Scholar 

  17. Maccioni, A., Torlone, R.: Crossing the finish line faster when paddling the data lake with kayak. PVLDB 10, 1853–1856 (2017)

    Google Scholar 

  18. Ramakrishnan, R. et al.: Azure data lake store: a hyperscale distributed file service for big data analytics. In: Proceedings of SIGMOD, pp. 51–63. (2017). https://doi.org/10.1145/3035918.3056100

  19. Satyanarayanan, M.: The emergence of edge computing. Computer 50, 30–39 (2017). https://doi.org/10.1109/MC.2017.9

    Article  Google Scholar 

  20. Theodorou, V., Abelló, A., Thiele, M., Lehner, M.: Frequent patterns in ETL workflows: an empirical approach. Data Knowl. Eng. 112, 1–16 (2017)

    Article  Google Scholar 

  21. Poess, M. et al.: Analysis of TPCx-IoT: the first industry standard benchmark for IoT gateway systems. In: IEEE 34th International Conference on Data Engineering (ICDE), pp. 1519–1530. IEEE (2018). https://doi.org/10.1109/ICDE.2018.00170

  22. Berkani, N., Khouri, S., Bellatreche, L.: Value and variety driven approach for extended data warehouses design. In: Information Retrieval, Document and Semantic Web, vol. 2 (2019)

    Google Scholar 

  23. Theodorou, V., Diamantopoulos, N.: GLT: edge gateway ELT for data-driven intelligence placement. In: 2019 IEEE/ACM 1st International Workshop on Data-Driven Decisions, Experimentation and Evolution (DDrEE), Montreal, (2019, in press)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Vasileios Theodorou .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Theodorou, V., Hai, R., Quix, C. (2019). A Metadata Framework for Data Lagoons. In: Welzer, T., et al. New Trends in Databases and Information Systems. ADBIS 2019. Communications in Computer and Information Science, vol 1064. Springer, Cham. https://doi.org/10.1007/978-3-030-30278-8_44

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-30278-8_44

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-30277-1

  • Online ISBN: 978-3-030-30278-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics