Data Warehousing in Cloud Environments

Thomsen, Christian; Pedersen, Torben Bach

doi:10.1007/978-1-4614-8265-9_80623

Data Warehousing in Cloud Environments

Christian Thomsen³ &
Torben Bach Pedersen³

Reference work entry
First Online: 01 January 2018

83 Accesses

Synonyms

Cloud Data Warehousing; Cloud Warehousing; Data Warehousing as a Service

Definition

Data warehousing was born in business information systems environments dominated by relational databases running on traditional servers. Later, the types of source data and source systems widened, and the deployment environments increasingly included high-end MPP systems. Today, data warehousing has joined the cloud computing wave, running DW systems on both private, public, and hybrid clouds, based mainly on clusters of commodity machines. Cloud-based data warehouses employ components for cloud-based data storage, querying, and processing, often using file-based storage of complex, non-relational, types of data. A widely used platform is Hadoop, the open-source version of Google’s MapReduce platform for scalable dataflow processing on commodity clusters, which was among the earliest systems for cloud data warehousing. While Hadoop is scalable, fault tolerant, and versatile, it is not...

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 4,499.99; Price excludes VAT (USA)

Hardcover Book: USD 6,499.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Abadi DJ. Data Management in the cloud: limitations and opportunities. IEEE Data Eng Bull. 2009;32(1):3–12.
MathSciNet Google Scholar
Abouzeid A, Bajda-Pawlikowski K, Abadi D, Silberschatz A, Rasin A. HadoopDB: an architectural hybrid of MapReduce and DBMS technologies for analytical workloads. Proc VLDB Endow. 2009;2(1):922–933. https://doi.org/10.14778/1687627.1687731.
Article Google Scholar
Agarwal S, Mozafari B, Panda A, Milner H, Madden S, Stoica I. BlinkDB: queries with bounded errors and bounded response times on very large data. In: Proceedings of the 8th ACM SIGOPS/EuroSys European Conference on Computer Systems; 2013. https://doi.org/10.1145/2465351.2465355.
Armbrust M, Xin RS, Lian C, et al. Spark SQL: relational data processing in spark. In: Proceedings of the ACM SIGMOD International Conference on Management of Data; 2015. https://doi.org/10.1145/2723372.2742797.
Chan L. Presto: interacting with petabytes of data at Facebook. 2016. https://www.facebook. com/notes/facebook-engineering/presto-interacting-with-petaby tes-of-data-at-facebook/10151786197628920. Accessed 28 June 2016.
Dean J, Ghemawat S. MapReduce: a flexible data processing tool. Commun ACM. 2010;53(1):72–77. https://doi.org/10.1145/1629175.1629198.
Article Google Scholar
Gupta A, Agarwal D, Tan D, et al. Amazon redshift and the case for simpler data warehouses. In: Proceedings of the ACM SIGMOD International Conference on Management of Data; 2015. https://doi.org/10.1145/2723372.2742795.
Liu X, Thomsen C, Pedersen TB. ETLMR: a highly scalable dimensional ETL framework based on MapReduce. In: Proceedings of the 13th International Conference on Data Warehousing and Knowledge Discovery; 2011. https://doi.org/10.1007/978-3-642-23544-3_8.
Chapter Google Scholar
Liu X, Thomsen C, Pedersen TB. CloudETL: scalable dimensional ETL for hive. In: Proceedings of the 18th International Database Engineering & Applications Symposium; 2014. https://doi.org/10.1145/2628194.2628249.
Olston C, Reed B, Srivastava U, Kumar R, Tomkins A. Pig Latin: a not-so-foreign language for data processing. In: Proceedings of the ACM SIGMOD International Conference on Management of Data; 2008. https://doi.org/10.1145/1376616.1376726.
Özcan F, Hoa D, Beyer KS, Balmin A, Liu CJ, Li Y. Emerging trends in the enterprise analytics: connecting Hadoop and DB2 warehouse. In: Proceedings of the ACM SIGMOD International Conference on Management of Data; 2011. https://doi.org/10.1145/1989323.1989446.
Pavlo A, Paulson E, Rasin A, Abadi DJ, DeWitt DJ, Madden S, Stonebraker M. A comparison of approaches to large-scale data processing. In: Proceedings of the ACM SIGMOD International Conference on Management of Data; 2009. https://doi.org/10.1145/1559845.1559865.
Pike R, Dorward S, Griesemer R, Quinlan S. Interpreting the data: parallel analysis with Sawzall. Sci Program. 2005;13(4):277–298.
Google Scholar
Stonebreaker M, Abadi D, DeWitt DJ, Madden S, Paulson E, Pavlo A, Rasin A. MapReduce and parallel DBMSs: friends of foes? Commun ACM. 2010;53(1):64–71. https://doi.org/10.1145/1629175.1629197.
Article Google Scholar
Thusso A, Sarma JS, Jain N, Shao Z, Chakka P, Anthony S, et al. Hive – a warehousing solution over a Map-Reduce framework. In: Proceedings of the 35th International Conference on Very Large Data Bases; 2009. https://doi.org/10.14778/1687553.1687609.
Article Google Scholar
Xin R, Rosen J, Zaharia M, Franklin MJ, Shenker S, Stoica I. Shark: SQL and rich analytics at scale. In: Proceedings of the ACM SIGMOD International Conference on Management of Data; 2013. https://doi.org/10.1145/2463676.2465288.
Zaharia M, Chowdhury M, Das T, et al. Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing. In: Proceedings of the 9th USENIX Symposium on Networked Systems Design & Implementation; 2012.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, Aalborg University, Aalborg, Denmark
Christian Thomsen & Torben Bach Pedersen

Authors

Christian Thomsen
View author publications
You can also search for this author in PubMed Google Scholar
Torben Bach Pedersen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Christian Thomsen .

Editor information

Editors and Affiliations

Georgia Institute of Technology College of Computing, Atlanta, GA, USA
Ling Liu
University of Waterloo School of Computer Science, Waterloo, ON, Canada
M. Tamer Özsu

Rights and permissions

Reprints and permissions

Copyright information

About this entry

Cite this entry

Thomsen, C., Pedersen, T.B. (2018). Data Warehousing in Cloud Environments. In: Liu, L., Özsu, M.T. (eds) Encyclopedia of Database Systems. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-8265-9_80623

Download citation

DOI: https://doi.org/10.1007/978-1-4614-8265-9_80623
Published: 07 December 2018
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4614-8266-6
Online ISBN: 978-1-4614-8265-9
eBook Packages: Computer ScienceReference Module Computer Science and Engineering

Publish with us

Policies and ethics