Abstract
Many novel challenges and opportunities are associated with Big Data which require rethinking for many aspects of the traditional data warehouse architecture. Indeed, big data are collections of data sets so large and complex to process using classical data warehousing. This data is sourced from many different places such as social media and stored in different formats. It is primarily unstructured data needs a high performance information technology infrastructure that provides superior computational efficiency and storage capacity. This infrastructure should be flexible and scalable to ensure its management over large scale. In recent years, cloud computing is gaining momentum with more and more successful adoptions. This paper proposes a new data warehouse infrastructure as a service to effectively support distribution of big data storage, computing and parallelized programming.
Keywords
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
NoSQL database. http://nosql-database.org/
Big data-as-a-service: a market and technology perspective. Technical report, EMC Solution Group (2012)
Abouzeid, A., Bajda-Pawlikowski, K., Abadi, D., Silberschatz, A., Rasin, A.: HadoopDB: an architectural hybrid of MapReduce and DBMS technologies for analytical workloads. Proc. VLDB Endow. 2(1), 922–933 (2009). http://www.vldb.org/pvldb/2/vldb09-861.pdf
Abouzied, A., Bajda-Pawlikowski, K., Huang, J., Abadi, D.J., Silberschatz, A.: HadoopDB in action: building real world applications. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of data, pp. 1111–1114. ACM (2010)
Agrawal, D., Das, S., El Abbadi, A.: Big data and cloud computing: current state and future opportunities. In: Proceedings of the 14th International Conference on Extending Database Technology, pp. 530–533. ACM (2011)
Aloisioa, G., Fiorea, S., Foster, I., Williams, D.: Scientific big data analytics challenges at large scale. In: Proceedings of Big Data and Extreme-scale Computing (BDEC) (2013)
Bakshi, K.: Considerations for big data: architecture and approach. In: 2012 IEEE Aerospace Conference, pp. 1–7. IEEE (2012)
Bhatia, A., Vaswani, G.: Big data–a review. IEEE Int. J. Eng. Sci. Res. Technol. IJESRT (2013)
Borthakur, D.: The hadoop distributed file system: architecture and design. Hadoop Proj. Website 11(2007), 21 (2007)
Chaiken, R., et al.: SCOPE: easy and efficient parallel processing of massive data sets. Proc. VLDB Endow. 1(2), 1265–1276 (2008)
Chaudhuri, S.: What next?: a half-dozen data management research goals for big data and the cloud. In: Proceedings of the 31st Symposium on Principles of Database Systems, pp. 1–4. ACM (2012)
Chaudhuri, S., Dayal, U., Narasayya, V.: An overview of business intelligence technology. Commun. ACM 54(8), 88–98 (2011)
Chen, S.: Cheetah: a high performance, custom data warehouse on top of MapReduce. Proc. VLDB Endow. 3(1–2), 1459–1468 (2010)
Cooper, B.F., et al.: PNUTS: Yahoo!’s hosted data serving platform. Proc. VLDB Endow. 1(2), 1277–1288 (2008)
Cuzzocrea, A., Bellatreche, L., Song, I.: Data warehousing and OLAP over big data: current challenges and future research directions. In: Proceedings of the Sixteenth International Workshop on Data Warehousing and OLAP, DOLAP 2013, San Francisco, CA, USA, 28 October 2013, pp. 67–70 (2013)
Cuzzocrea, A., Song, I.Y., Davis, K.C.: Analytics over large-scale multidimensional data: the big data revolution! In: Proceedings of the ACM 14th International Workshop on Data Warehousing and OLAP, pp. 101–104. ACM (2011)
Dabbèchi, H., Nabli, A., Bouzguenda, L.: Towards cloud-based data warehouse as a service for big data analytics. In: Nguyen, N.-T., Manolopoulos, Y., Iliadis, L., Trawiński, B. (eds.) ICCCI 2016. LNCS (LNAI), vol. 9876, pp. 180–189. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-45246-3_17
Dittrich, J., Quiané-Ruiz, J., Jindal, A., Kargin, Y., Setty, V., Schad, J.: Hadoop++: making a yellow elephant run like a cheetah (without it even noticing). PVLDB 3(1), 518–529 (2010). http://www.comp.nus.edu.sg/~vldb2010/proceedings/files/papers/R46.pdf
Eltabakh, M.Y., Tian, Y., Özcan, F., Gemulla, R., Krettek, A., McPherson, J.: CoHadoop: flexible data placement and its exploitation in hadoop. PVLDB 4(9), 575–585 (2011). http://www.vldb.org/pvldb/vol4/p575-eltabakh.pdf
Essaidi, M.: ODBIS: towards a platform for on-demand business intelligence services. In: Proceedings of the 2010 EDBT/ICDT Workshops, p. 12. ACM (2010)
Fiore, S., D’Anca, A., Palazzo, C., Foster, I., Williams, D.N., Aloisio, G.: Ophidia: toward big data analytics for escience. Procedia Comput. Sci. 18, 2376–2385 (2013)
Apache Hadoop: Hadoop (2009)
Herodotou, H., et al.: Starfish: a self-tuning system for big data analytics. In: CIDR, vol. 11, pp. 261–272 (2011)
Ji, C., Li, Y., Qiu, W., Awada, U., Li, K.: Big data processing in cloud computing environments. In: 2012 12th International Symposium on Pervasive Systems, Algorithms and Networks (ISPAN), pp. 17–23. IEEE (2012)
Kala Karun, A., Chitharanjan, K.: A review on hadoop—HDFS infrastructure extensions. In: 2013 IEEE Conference on Information & Communication Technologies (ICT), pp. 132–137. IEEE (2013)
Kataria, M., Mittal, M.P.: Big data: a review. Int. J. Comput. Sci. Mob. Comput. 3(7), 106–110 (2014)
Lämmel, R.: Google’s MapReduce programming model—revisited. Sci. Comput. Program. 70(1), 1–30 (2008)
O’Driscoll, A., Daugelaite, J., Sleator, R.D.: ‘Big data’, hadoop and cloud computing in genomics. J. Biomed. Inform. 46(5), 774–781 (2013)
Sagiroglu, S., Sinanc, D.: Big data: a review. In: 2013 International Conference on Collaboration Technologies and Systems (CTS), pp. 42–47. IEEE (2013)
Sangupamba, O.M., Prat, N., Comyn-Wattiau, I.: Business intelligence and big data in the cloud: opportunities for design-science researchers. In: Indulska, M., Purao, S. (eds.) ER 2014. LNCS, vol. 8823, pp. 75–84. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-12256-4_8
Strauch, C., Sites, U.L.S., Kriha, W.: NoSQL databases. Lecture Notes, Stuttgart Media University (2011)
Thusoo, A., et al.: Hive: a warehousing solution over a map-reduce framework. Proc. VLDB Endow. 2(2), 1626–1629 (2009)
Thusoo, A., et al.: Hive-a petabyte scale data warehouse using hadoop. In: 2010 IEEE 26th International Conference on Data Engineering (ICDE), pp. 996–1005. IEEE (2010)
Vaquero, L.M., Rodero-Merino, L., Caceres, J., Lindner, M.: A break in the clouds: towards a cloud definition. ACM SIGCOMM Comput. Commun. Rev. 39(1), 50–55 (2008)
Wanderman-Milne, S., Li, N.: Runtime code generation in cloudera impala. IEEE Data Eng. Bull. 37(1), 31–37 (2014)
Wang, K., Zhou, X., Qiao, K., Lang, M., McClelland, B., Raicu, I.: Towards scalable distributed workload manager with monitoring-based weakly consistent resource stealing. In: Proceedings of the 24th International Symposium on High-Performance Parallel and Distributed Computing, pp. 219–222. ACM (2015)
Wang, L., et al.: G-Hadoop: MapReduce across distributed data centers for data-intensive computing. Futur. Gener. Comp. Syst. 29(3), 739–750 (2013). https://doi.org/10.1016/j.future.2012.09.001
Xinhua, E., Han, J., Wang, Y., Liu, L.: Big data-as-a-service: definition and architecture. In: 2013 15th IEEE International Conference on Communication Technology (ICCT), pp. 738–742. IEEE (2013)
Zheng, Z., Zhu, J., Lyu, M.R.: Service-generated big data and big data-as-a-service: an overview. In: 2013 IEEE International Congress on Big Data (BigData Congress), pp. 403–410. IEEE (2013)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this chapter
Cite this chapter
Dabbèchi, H., Nabli, A., Bouzguenda, L., Haddar, K. (2018). DWIaaS: Data Warehouse Infrastructure as a Service for Big Data Analytics. In: Thanh Nguyen, N., Kowalczyk, R. (eds) Transactions on Computational Collective Intelligence XXX. Lecture Notes in Computer Science(), vol 11120. Springer, Cham. https://doi.org/10.1007/978-3-319-99810-7_7
Download citation
DOI: https://doi.org/10.1007/978-3-319-99810-7_7
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-99809-1
Online ISBN: 978-3-319-99810-7
eBook Packages: Computer ScienceComputer Science (R0)