Abstract
In today’s digital environment, businesses have to access, store and analyze in a real time fashion vast amounts of data issued from streaming graph-structure data sources. To meet these requirements, companies owning the data warehouse (\(\mathcal {DW}\)) technology have to combine hardware and software solutions to reduce the time latency between a \(\mathcal {DW}\) and its data sources. The explosion of advanced hardware deployment platforms such as polystore represents an opportunity as pointed in recent studies. But, deploying a graph-structure \(\mathcal {DW}\) over a polystore is not a simple task, since it requires two important phases which are data partitioning and allocation. We claim that these phases have to be connected to the ETL (Extract, Transform, Load) phase, especially its loading process. This connection questions the initial schedule of ETL and deployment processes. In this paper, we present a new approach that connects ETL and deployment processes and challenges their traditional scheduling to meet real time analysis requirements.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
In this paper, we use fragmentation and partitioning interchangeably.
- 2.
- 3.
- 4.
- 5.
- 6.
- 7.
References
Berkani, N., Bellatreche, L.: A variety-sensitive ETL processes. In: Benslimane, D., Damiani, E., Grosky, W.I., Hameurlain, A., Sheth, A., Wagner, R.R. (eds.) DEXA 2017. LNCS, vol. 10439, pp. 201–216. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-64471-4_17
Berkani, N., Bellatreche, L., Benatallah, B.: A value-added approach to design BI applications. In: Madria, S., Hara, T. (eds.) DaWaK 2016. LNCS, vol. 9829, pp. 361–375. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-43946-4_24
Berkani, N., Bellatreche, L., Ordonez, C.: ETL-aware materialized view selection in semantic data streamwarehouses. In: RCIS. IEEE (2018)
Bondiombouy, C., Valduriez, P.: Query processing in multistore systems: an overview. IJCC 5(4), 309–346 (2016)
Bornea, M.A., Deligiannakis, A., Kotidis, Y., Vassalos, V.: Semi-streamed index join for near-real time execution of ETL transformations. In: ICDE, pp. 159–170 (2011)
Boukorca, A., Bellatreche, L., Cuzzocrea, A.: SLEMAS: an approach for selecting MV under query scheduling constraints. In: COMAD, pp. 66–73 (2014)
Duggan, J., et al.: The bigdawg polystore system. ACM Sigmod Rec. 44(2), 11–16 (2015)
Galárraga, L., Hose, K., Schenkel, R.: Partout: a distributed engine for efficient RDF processing. In: WWW, pp. 267–268. ACM (2014)
Hose, K., Schenkel, R.: WARP: workload-aware replication and partitioning for RDF. In: ICDE Workshops, pp. 1–6 (2013)
Huang, J., Abadi, D.J., Ren, K.: Scalable SPARQL querying of large RDF graphs. PVLDB 4(11), 1123–1134 (2011)
Jörg, T., Deßloch, S.: Towards generating ETL processes for incremental loading. In: IDEAS, pp. 101–110 (2008)
Jörg, T., Dessloch, S.: Formalizing ETL jobs for incremental loading of data warehouses. In: BTW, pp. 327–346 (2009)
Jörg, T., Dessloch, S.: Near real-time data warehousing using state-of-the-art ETL tools. In: Castellanos, M., Dayal, U., Miller, R.J. (eds.) BIRTE 2009. LNBIP, vol. 41, pp. 100–117. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-14559-9_7
Karakasidis, A., Vassiliadis, P., Pitoura, E.: ETL queues for active data warehousing. In: IQIS, pp. 28–39 (2005)
Karypis, G., Kumar, V.: Multilevel k-way hypergraph partitioning. In: DAC, pp. 343–348 (1999)
Le, W., Kementsietsidis, A., Duan, S., Li, F.: Scalable multi-query optimization for SPARQL. In: ICDE, pp. 666–677 (2012)
Lee, K., Liu, L.: Scaling queries over big RDF graphs with semantic hash partitioning. Proc. VLDB Endow. 6(14), 1894–1905 (2013)
Mayer, R., Mayer, C., Tariq, M.A., Rothermel, K.: Graphcep: real-time data analytics using parallel complex event and graph processing. In: DEBS, pp. 309–316 (2016)
Meehan, J., Aslantas, C., Zdonik, S., Tatbul, N., Du, J.: Data ingestion for the connected world. In: CIDR (2017)
Ordonez, C., Johnson, T., Urbanek, S., Shkapenyuk, V., Srivastava, D.: Integrating the R language runtime system with a data stream warehouse. In: Benslimane, D., Damiani, E., Grosky, W.I., Hameurlain, A., Sheth, A., Wagner, R.R. (eds.) DEXA 2017. LNCS, vol. 10439, pp. 217–231. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-64471-4_18
Ozsu, M.T., Valduriez, P.: Principles of Distributed Database Systems, 3rd edn. Springer, Heidelberg (2011). https://doi.org/10.1007/978-1-4419-8834-8
Peng, P., Zou, L., Chen, L., Zhao, D.: Query workload-based RDF graph fragmentation and allocation. In: EDBT, pp. 377–388 (2016)
Ram, P., Do, L.: Extracting delta for incremental data warehouse maintenance. In: ICDE, pp. 220–229 (2000)
Simitsis, A., Wilkinson, K., Dayal, U., Castellanos, M.: Optimizing ETL workflows for fault-tolerance. In: ICDE, pp. 385–396 (2010)
Skoutas, D., Simitsis, A.: Ontology-based conceptual design of ETL processes for both structured and semi-structured data. Seman. Web 3(4), 1–24 (2007)
Suchanek, F.M., Kasneci, G., Weikum, G.: Yago: a core of semantic knowledge. In: WWW, pp. 697–706 (2007)
Vassiliadis, P., Simitsis, A.: Near real time ETL. In: Vassiliadis, P., Simitsis, A., et al. (eds.) New Trends in Data Warehousing and Data Analysis, pp. 1–31. Springer, Heidelberg (2009). https://doi.org/10.1007/978-0-387-87431-9
Waas, F., Wrembel, R., Freudenreich, T., Thiele, M., Koncilia, C., Furtado, P.: On-demand ELT architecture for right-time BI: extending the vision. IJDWM 9(2), 21–38 (2013)
Wu, B., Zhou, Y., Yuan, P., Liu, L., Jin, H.: Scalable SPARQL querying using path partitioning. In: ICDE, pp. 795–806 (2015)
Zeng, K., Yang, J., Wang, H., Shao, B., Wang, Z.: A distributed graph engine for web scale RDF data. PVLDB 6(4), 265–276 (2013)
Zhu, M., Risch, T.: Querying combined cloud-based and relational databases. In: Cloud and Service Computing (CSC), pp. 330–335. IEEE (2011)
Zhu, Y., An, L., Liu, S.: Data updating and query in real-time data warehouse system. In: CSSE, vol. 5, pp. 1295–1297 (2008)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Berkani, N., Bellatreche, L. (2018). Streaming ETL in Polystore Era. In: Vaidya, J., Li, J. (eds) Algorithms and Architectures for Parallel Processing. ICA3PP 2018. Lecture Notes in Computer Science(), vol 11336. Springer, Cham. https://doi.org/10.1007/978-3-030-05057-3_42
Download citation
DOI: https://doi.org/10.1007/978-3-030-05057-3_42
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-05056-6
Online ISBN: 978-3-030-05057-3
eBook Packages: Computer ScienceComputer Science (R0)