Skip to main content

A Joint Design Approach of Partitioning and Allocation in Parallel Data Warehouses

  • Conference paper
Data Warehousing and Knowledge Discovery (DaWaK 2009)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 5691))

Included in the following conference series:

Abstract

Traditionally, designing a parallel data warehouse consists first in fragmenting its schema and then allocating the generated fragments over the nodes of the parallel machine. The main drawback of this approach is that interdependency between fragmentation and allocation processes is not taken into account during the design phase. This interdependency is characterized by the fact that generated of fragments are one of the inputs of the allocation problem and both processes optimize the same set of queries. In this paper, we present a new approach for designing parallel relational data warehouses on a shared nothing machine, where the fragmentation and the allocation are done simultaneously. To allocate efficiently query workload over nodes, a load balancing method is given. Finally, a validation of our proposals is presented.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Apers, P.M.G.: Data allocation in distributed database systems. ACM Transactions on database systems 13(3), 263–304 (1988)

    Article  Google Scholar 

  2. Bellatreche, L., Boukhalfa, K., Abdalla, H.I.: Saga: A combination of genetic and simulated annealing algorithms for physical data warehouse design. In: Bell, D.A., Hong, J. (eds.) BNCOD 2006. LNCS, vol. 4042, pp. 212–219. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  3. Bellatreche, L., Boukhalfa, K., Richard, P.: Data partitioning in data warehouses: Hardness study, heuristics and oracle validation. In: Song, I.-Y., Eder, J., Nguyen, T.M. (eds.) DaWaK 2008. LNCS, vol. 5182, pp. 87–96. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  4. Chakravarthy, S., Muthuraj, J., Varadarajan, R., Navathe, S.B.: An objective function for vertically partitioning relations in distributed databases and its analysis. Distributed and Parallel Databases Journal 2(2), 183–207 (1994)

    Article  Google Scholar 

  5. OLAP Council. Apb-1 olap benchmark, release ii (1998), http://www.olapcouncil.org/research/bmarkly.htm

  6. DeWitt, D.J., Gray, J.: Parallel database systems: The future of high performance database systems. Communnications of the ACM 35(6), 85–98 (1992)

    Article  Google Scholar 

  7. DeWitt, D.J.D., Madden, S., Stonebraker, M.: How to build a high-performance data warehouse, http://db.lcs.mit.edu/madden/high_perf.pdf

  8. Eadon, G., Chong, E.I., Shankar, S., Raghavan, A., Srinivasan, J., Das, S.: Supporting table partitioning by reference in oracle. In: SIGMOD 2008 (2008)

    Google Scholar 

  9. Furtado, P.: Experimental evidence on partitioning in parallel data warehouses. In: DOLAP, pp. 23–30 (2004)

    Google Scholar 

  10. Karlapalem, K., Pun, N.M.: Query driven data allocation algorithms for distributed database systems. In: Tjoa, A.M. (ed.) DEXA 1997. LNCS, vol. 1308, pp. 347–356. Springer, Heidelberg (1997)

    Chapter  Google Scholar 

  11. Bouganim, L., Florescu, D., Valduriez, P.: Dynamic load balancing in hierarchical parallel database systems. In: Proceedings of the International Conference on Very Large Databases, pp. 436–447 (1996)

    Google Scholar 

  12. Lima, A.B., Furtado, C., Valduriez, P., Mattoso, M.: Improving parallel olap query processing in database clusters with data replication. To appear in Distributed and Parallel Database Journal (2009)

    Google Scholar 

  13. Mahapatra, T., Mishra, S.: Oracle Parallel Processing. O’Reilly, Sebastopol (2000)

    Google Scholar 

  14. Mehta, M., DeWitt, D.J.: Data placement in shared-nothing parallel database systems. VLDB Journal 6(1), 53–72 (1997)

    Article  Google Scholar 

  15. Menon, S.: Allocating fragments in distributed databases. IEEE Transactions on Parallel and Distributed Systems 16(7), 577–585 (2005)

    Article  Google Scholar 

  16. Navathe, S.B., Ra, M.: Vertical partitioning for database design: a graphical algorithm. In: ACM SIGMOD, pp. 440–450 (1989)

    Google Scholar 

  17. Rahm, E., Marek, R.: Analysis of dynamic load balancing strategies for parallel shared nothing database systems. In: Proceedings of the International Conference on Very Large Databases, pp. 182–193 (1993)

    Google Scholar 

  18. Stöhr, T., Märtens, H., Rahm, E.: Multi-dimensional database allocation for parallel data warehouses. In: Proceedings of the International Conference on Very Large Databases, pp. 273–284 (2000)

    Google Scholar 

  19. Stöhr, T., Rahm, E.: Warlock: A data allocation tool for parallel warehouses. In: Proceedings of the International Conference on Very Large Databases, pp. 721–722 (2001)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Bellatreche, L., Benkrid, S. (2009). A Joint Design Approach of Partitioning and Allocation in Parallel Data Warehouses. In: Pedersen, T.B., Mohania, M.K., Tjoa, A.M. (eds) Data Warehousing and Knowledge Discovery. DaWaK 2009. Lecture Notes in Computer Science, vol 5691. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-03730-6_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-03730-6_9

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-03729-0

  • Online ISBN: 978-3-642-03730-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics