Abstract
Data warehouses and OLAP systems are business intelligence technologies. They allow decision-makers to analyze on the fly huge volumes of data represented according to the multidimensional model. Cloud computing on the impulse of ICT majors like Google, Microsoft and Amazon, has recently focused the attention. OLAP querying and data warehousing in such a context consists in a major issue. Indeed, problems to be tackled are basic ones for large scale distributed OLAP systems (large amount of data querying, semantic and structural heterogeneity) from a new point of view, considering specificities from these architectures (pay-as-you-go rule, elasticity, and user-friendliness). In this paper we address the pay-as-you-go rules for warehousing data storage. We propose to use the multidimensional arrays storage techniques for clouds. First experiments validate our proposal.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Amazon ec2, http://aws.amazon.com/ec2/
Amazon s3, http://aws.amazon.com/s3/
Hadoop, http://hadoop.apache.org/
Microsoft azure, http://www.microsoft.com/windowsazure/
Aouiche, K., Darmont, J.: Data mining-based materialized view and index selection in data warehouses. Journal of Intelligent Information Systems 33(1), 65–93 (2009)
Armbrust, M., Fox, A., Griffith, R., Katz, A.D.J.R.H., Konwinski, A., Lee, G., Patterson, D.A., Rabkin, A., Stoica, I., Zaharia, M.: Above the clouds: A berkeley view of cloud computing. Technical Report UCB/EECS-2009-28, Berkeley (2009)
Chaiken, R., Jenkins, B., Larson, P.-Å., Ramsey, B., Shakib, D., Weaver, S., Zhou, J.: Scope: easy and efficient parallel processing of massive data sets. PVLDB 1(2), 1265–1276 (2008)
Dar, S., Franklin, M.J., Jonsson, B.T., Srivastava, D., Tan, M.: Semantic data caching and replacement. In: VLDB, Bombay, India, pp. 330–341 (1996)
Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. Communications of the ACM 51(1), 107–113 (2008)
Gates, A., Natkovich, O., Chopra, S., Kamath, P., Narayanam, S., Olston, C., Reed, B., Srinivasan, S., Srivastava, U.: Building a highlevel dataflow system on top of mapreduce: The pig experience. PVLDB 2(2), 1414–1425 (2009)
Ghemawat, S., Gobioff, H., Leung, S.-T.: The google file system. In: SOSP, Bolton Landing, USA, pp. 29–43 (2003)
Gray, J., Bosworth, A., Layman, A., Pirahesh, H.: Data cube: A relational aggregation operator generalizing group-by, cross-tab, and sub-total. In: ICDE, New Orleans, USA, pp. 152–159 (1996)
Inmon, W.: Building the Data Warehouse. Wiley, New York (1996)
Keller, A.M., Basu, J.: A predicate-based caching scheme for client-server database architectures. VLDB Journal 5(1), 35–47 (1996)
Kimball, R.: The data warehouse toolkit: practical techniques for building dimensional data warehouses. John Wiley & Sons, Inc., Chichester (1996)
Malinowski, E., Zimnyi, E.: Advanced Data Warehouse Design: From Conventional to Spatial and Temporal Applications (Data-Centric Systems and Applications. Springer Publishing Company, Incorporated, Heidelberg (2008)
Olston, C., Reed, B., Srivastava, U., Kumar, R., Tomkins, A.: Pig latin: a not-so-foreign language for data processing. In: SIGMOD, pp. 1099–1110 (2008)
Pedersen, T.B., Jensen, C.S., Dyreson, C.E.: A foundation for capturing and querying complex multidimensional data. Information Systems 26(5), 383–423 (2001)
Pike, R., Dorward, S., Griesemer, R., Quinlan, S.: Interpreting the data: Parallel analysis with sawzall. Scientific Programming 13(4), 277–298 (2005)
Rafanelli, M.: Operators for multidimensional aggregate data. In: Multidimensional Databases: problems and solutions, pp. 116–165 (2003)
Savary, L., Gardarin, G., Zeitouni, K.: Geocache: A cache for gml geographical data. IJDWM 3(1), 67–88 (2007)
Stonebraker, M., Abadi, D.J., Batkin, A., Chen, X., Cherniack, M., Ferreira, M., Lau, E., Lin, A., Madden, S., O’Neil, E.J., O’Neil, P.E., Rasin, A., Tran, N., Zdonik, S.B.: C-store: A column-oriented dbms. In: VLDB, pp. 553–564 (2008)
Stonebraker, M., Abadi, D.J., DeWitt, D.J., Madden, S., Paulson, E., Pavlo, A., Rasin, A.: Mapreduce and parallel dbmss: friends or foes? Communications of the ACM 53(1), 64–71 (2010)
Tao, Y., Papadias, D.: Historical spatio-temporal aggregation. ACM Transaction Information Systems 23(1), 61–102 (2005)
Thusoo, A., Sarma, J.S., Jain, N., Shao, Z., Chakka, P., Anthony, S., Liu, H., Wyckoff, P., Murthy, R.: Hive - a warehousing solution over a map-reduce framework. PVLDB 2(2), 1626–1629 (2009)
H.-c. Yang, A., Dasdan, R.-L., Hsiao, R.-L., Parker, D.S.: Map-reduce-merge: simplified relational data processing on large clusters. In: SIGMOD, Beijing, China, pp. 1029–1040 (2007)
Zhang, S., Han, J., Liu, Z., Wang, K., Feng, S.: Spatial queries evaluation with mapreduce. In: GCC, pp. 287–292 (2009)
Zhao, Y., Deshpande, P., Naughton, J.F.: An array-based algorithm for simultaneous multidimensional aggregates. In: Peckham, J. (ed.) SIGMOD, Tucson, USA, pp. 159–170 (1997)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
d’Orazio, L., Bimonte, S. (2010). Multidimensional Arrays for Warehousing Data on Clouds. In: Hameurlain, A., Morvan, F., Tjoa, A.M. (eds) Data Management in Grid and Peer-to-Peer Systems. Globe 2010. Lecture Notes in Computer Science, vol 6265. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-15108-8_3
Download citation
DOI: https://doi.org/10.1007/978-3-642-15108-8_3
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-15107-1
Online ISBN: 978-3-642-15108-8
eBook Packages: Computer ScienceComputer Science (R0)