Multidimensional Arrays for Warehousing Data on Clouds

d’Orazio, Laurent; Bimonte, Sandro

doi:10.1007/978-3-642-15108-8_3

Laurent d’Orazio¹⁹ &
Sandro Bimonte²⁰

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 6265))

Included in the following conference series:

International Conference on Data Management in Grid and P2P Systems

491 Accesses
7 Citations

Abstract

Data warehouses and OLAP systems are business intelligence technologies. They allow decision-makers to analyze on the fly huge volumes of data represented according to the multidimensional model. Cloud computing on the impulse of ICT majors like Google, Microsoft and Amazon, has recently focused the attention. OLAP querying and data warehousing in such a context consists in a major issue. Indeed, problems to be tackled are basic ones for large scale distributed OLAP systems (large amount of data querying, semantic and structural heterogeneity) from a new point of view, considering specificities from these architectures (pay-as-you-go rule, elasticity, and user-friendliness). In this paper we address the pay-as-you-go rules for warehousing data storage. We propose to use the multidimensional arrays storage techniques for clouds. First experiments validate our proposal.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Amazon ec2, http://aws.amazon.com/ec2/
Amazon s3, http://aws.amazon.com/s3/
Hadoop, http://hadoop.apache.org/
Microsoft azure, http://www.microsoft.com/windowsazure/
Aouiche, K., Darmont, J.: Data mining-based materialized view and index selection in data warehouses. Journal of Intelligent Information Systems 33(1), 65–93 (2009)
Article Google Scholar
Armbrust, M., Fox, A., Griffith, R., Katz, A.D.J.R.H., Konwinski, A., Lee, G., Patterson, D.A., Rabkin, A., Stoica, I., Zaharia, M.: Above the clouds: A berkeley view of cloud computing. Technical Report UCB/EECS-2009-28, Berkeley (2009)
Google Scholar
Chaiken, R., Jenkins, B., Larson, P.-Å., Ramsey, B., Shakib, D., Weaver, S., Zhou, J.: Scope: easy and efficient parallel processing of massive data sets. PVLDB 1(2), 1265–1276 (2008)
Google Scholar
Dar, S., Franklin, M.J., Jonsson, B.T., Srivastava, D., Tan, M.: Semantic data caching and replacement. In: VLDB, Bombay, India, pp. 330–341 (1996)
Google Scholar
Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. Communications of the ACM 51(1), 107–113 (2008)
Article Google Scholar
Gates, A., Natkovich, O., Chopra, S., Kamath, P., Narayanam, S., Olston, C., Reed, B., Srinivasan, S., Srivastava, U.: Building a highlevel dataflow system on top of mapreduce: The pig experience. PVLDB 2(2), 1414–1425 (2009)
Google Scholar
Ghemawat, S., Gobioff, H., Leung, S.-T.: The google file system. In: SOSP, Bolton Landing, USA, pp. 29–43 (2003)
Google Scholar
Gray, J., Bosworth, A., Layman, A., Pirahesh, H.: Data cube: A relational aggregation operator generalizing group-by, cross-tab, and sub-total. In: ICDE, New Orleans, USA, pp. 152–159 (1996)
Google Scholar
Inmon, W.: Building the Data Warehouse. Wiley, New York (1996)
Google Scholar
Keller, A.M., Basu, J.: A predicate-based caching scheme for client-server database architectures. VLDB Journal 5(1), 35–47 (1996)
Article Google Scholar
Kimball, R.: The data warehouse toolkit: practical techniques for building dimensional data warehouses. John Wiley & Sons, Inc., Chichester (1996)
Google Scholar
Malinowski, E., Zimnyi, E.: Advanced Data Warehouse Design: From Conventional to Spatial and Temporal Applications (Data-Centric Systems and Applications. Springer Publishing Company, Incorporated, Heidelberg (2008)
Google Scholar
Olston, C., Reed, B., Srivastava, U., Kumar, R., Tomkins, A.: Pig latin: a not-so-foreign language for data processing. In: SIGMOD, pp. 1099–1110 (2008)
Google Scholar
Pedersen, T.B., Jensen, C.S., Dyreson, C.E.: A foundation for capturing and querying complex multidimensional data. Information Systems 26(5), 383–423 (2001)
Article MATH Google Scholar
Pike, R., Dorward, S., Griesemer, R., Quinlan, S.: Interpreting the data: Parallel analysis with sawzall. Scientific Programming 13(4), 277–298 (2005)
Google Scholar
Rafanelli, M.: Operators for multidimensional aggregate data. In: Multidimensional Databases: problems and solutions, pp. 116–165 (2003)
Google Scholar
Savary, L., Gardarin, G., Zeitouni, K.: Geocache: A cache for gml geographical data. IJDWM 3(1), 67–88 (2007)
Google Scholar
Stonebraker, M., Abadi, D.J., Batkin, A., Chen, X., Cherniack, M., Ferreira, M., Lau, E., Lin, A., Madden, S., O’Neil, E.J., O’Neil, P.E., Rasin, A., Tran, N., Zdonik, S.B.: C-store: A column-oriented dbms. In: VLDB, pp. 553–564 (2008)
Google Scholar
Stonebraker, M., Abadi, D.J., DeWitt, D.J., Madden, S., Paulson, E., Pavlo, A., Rasin, A.: Mapreduce and parallel dbmss: friends or foes? Communications of the ACM 53(1), 64–71 (2010)
Article Google Scholar
Tao, Y., Papadias, D.: Historical spatio-temporal aggregation. ACM Transaction Information Systems 23(1), 61–102 (2005)
Article Google Scholar
Thusoo, A., Sarma, J.S., Jain, N., Shao, Z., Chakka, P., Anthony, S., Liu, H., Wyckoff, P., Murthy, R.: Hive - a warehousing solution over a map-reduce framework. PVLDB 2(2), 1626–1629 (2009)
Google Scholar
H.-c. Yang, A., Dasdan, R.-L., Hsiao, R.-L., Parker, D.S.: Map-reduce-merge: simplified relational data processing on large clusters. In: SIGMOD, Beijing, China, pp. 1029–1040 (2007)
Google Scholar
Zhang, S., Han, J., Liu, Z., Wang, K., Feng, S.: Spatial queries evaluation with mapreduce. In: GCC, pp. 287–292 (2009)
Google Scholar
Zhao, Y., Deshpande, P., Naughton, J.F.: An array-based algorithm for simultaneous multidimensional aggregates. In: Peckham, J. (ed.) SIGMOD, Tucson, USA, pp. 159–170 (1997)
Google Scholar

Download references

Author information

Authors and Affiliations

LIMOS, Blaise Pascal University, France
Laurent d’Orazio
Cemagref, France
Sandro Bimonte

Authors

Laurent d’Orazio
View author publications
You can also search for this author in PubMed Google Scholar
Sandro Bimonte
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Institut de Recherche en Informatique de Toulouse (IRIT), Paul Sabatier University, 118, route de Narbonne, 31062, Toulouse Cedex, France
Abdelkader Hameurlain
IRIT Institut de Recherche en Informatique de Toulouse, Paul Sabatier University, 118, route de Narbonne, 31062, Toulouse Cedex, France
Franck Morvan
Institute of Software Technology and Interactive Systems, Vienna University of Technology, Favoritenstr. 9/188, 1040, Wien, Austria
A Min Tjoa

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

d’Orazio, L., Bimonte, S. (2010). Multidimensional Arrays for Warehousing Data on Clouds. In: Hameurlain, A., Morvan, F., Tjoa, A.M. (eds) Data Management in Grid and Peer-to-Peer Systems. Globe 2010. Lecture Notes in Computer Science, vol 6265. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-15108-8_3

Download citation

DOI: https://doi.org/10.1007/978-3-642-15108-8_3
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-15107-1
Online ISBN: 978-3-642-15108-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics