Skip to main content
Log in

A Real-time Materialized View Approach for Analytic Flows in Hybrid Cloud Environments

  • Schwerpunktbeitrag
  • Published:
Datenbank-Spektrum Aims and scope Submit manuscript

Abstract

Next-generation business intelligence (BI) enables enterprises to quickly react in changing business environments. Increasingly, data integration pipelines need to be merged with query pipelines for real-time analytics from operational data. Newly emerging hybrid analytic flows have been becoming attractive which consist of a set of extract-transform-load (ETL) jobs together with analytic jobs running over multiple platforms with different functionality.

In traditional databases, materialized views are used to optimize query performance. In cross-platform, large-scale data transformation environments, similar challenges (e.g. view selection) arise when using materialized views. In this work, we propose an approach that generates materialized views in hybrid flows and maintains these views in a query-driven, incremental manner. To accelerate data integration processes, the location of a materialization point in a transformation flow varies dynamically based on metrics like source update rates and maintenance cost in terms of flow operations. Besides, by picking up the most suitable platform for accommodating views, for example, materializing and maintaining intermediate results of Hadoop jobs in relational databases, better performance has been shown.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Notes

  1. http://www.postgresql.org.

  2. http://kettle.pentaho.com/.

References

  1. Joerg T, Dessloch S (2008) Towards generating ETL processes for incremental loading. In: IDEAS ’08 Proceedings of the 2008 international symposium on Database engineering & applications 101–110

  2. Dayal U, Castellanos M, Simitsis A, Wilkinson K (2009) Data integration flows for business intelligence. EDBT ’09 Proceedings of the 12th international conference on extending database technology: advances in database technology, 1–11

  3. Simitsis A, Wilkinson K, Castellanos M, Dayal U (2012) Optimizing analytic data flows for multiple execution engines. SIGMOD ’12 Proceedings of the 2012 ACM SIGMOD international conference on management of data, 829–840

  4. Oracle white paper (2012) Best practices for real-time data warehousing

  5. http://strataconf.com/stratany2013/public/schedule/detail/30630. Accessed: 1 Nov. 2013

  6. Blakeley JA, Larson PA, Tompa FW (1986) Efficiently updating materialized views. ACM SIGMOD Record 15:61–71

  7. Gupta A, Mumick IS (1995) Maintenance of materialized views: problems, techniques, and applications. IEEE Data Eng Bull 18:3–18

  8. Zhuge Y, Garcia-Molina H, Hammer J, Widom J (1995) View maintenance in a warehousing environment. ACM SIGMOD Record 24:316–217

  9. Gupta H (1997) Selection of views to materialize in a data warehouse. ICDT ’97 Proceedings of the 6th international conference on database theory, 98–112

  10. Hanson EN (1987) A performance analysis of view materialization strategies. SIGMOD ’87 Proceedings of the 1987 ACM SIGMOD international conference on Management of data 440–453

  11. Griffin T, Libkin L, Trickey H (1997) An improved algorithm for the incremental recomputation of active relational expressions. IEEE Trans Knowl Data Eng 9:508–511

  12. Dessloch S, Hernandez MA, Wisnesky R, Radwan A, Zhou J (2008) Orchid: integrating schema mapping and ETL. ICDE ’08 Proceedings of the 2008 IEEE 24th international conference on data engineering, 1307–1316

  13. Kossmann D (2000) The state of the art in distributed query processing. ACM Comput Surv 32:422–469

  14. Simitsis A, Vassiliadis P, Timos S (2005) Optimizing ETL processes in data warehouses. ICDE ’05 Proceedings of the 21st international conference on data engineering, 564–575

  15. Behrend A, Joerg T (2010) Optimized incremental ETL jobs for maintaining data warehouses. IDEAS ’10 Proceedings of the 14th international database engineering & applications symposium, 216–224

  16. Gupta A, Jagadish HV, Mumick IS (1996) Data integration using self-maintainable views. EDBT ’96 Proceedings of the 5th international conference on extending database technology: advances in database technology, 140–144

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Weiping Qu.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Qu, W., Dessloch, S. A Real-time Materialized View Approach for Analytic Flows in Hybrid Cloud Environments. Datenbank Spektrum 14, 97–106 (2014). https://doi.org/10.1007/s13222-014-0155-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13222-014-0155-0

Keywords

Navigation