ABSTRACT
Nowadays, data warehouses store Peta-bytes of data. Queries defined on data warehouses are generally complex. Several techniques are used for optimizing queries in data warehouses such as indexes, partitioning and materialized views. Selecting the best configuration of indexes, or partitions or materialized views are all NP-hard. Here, we focus on the horizontal partitioning problem in data warehouses. Several approaches were proposed for solving horizontal partitioning problem in data warehouses including genetic algorithms using a small set of query workload in general. We present a new methodology based on data mining and particle swarm optimization for solving the horizontal partitioning problem in data warehouses using relatively large query workload. First, we compute attraction between predicates followed by a hierarchical clustering of predicates. In the second step, we use discrete particle swarm optimization for selecting the best partitioning schema. Several experiments are performed to demonstrate the effectiveness of the proposed approach and the results are compared to the best well known method so far, the genetic algorithm based approach. The proposed approach is found to be faster and more effective than the genetic algorithm based approach for solving the data warehouse horizontal partitioning.
- Apb-1, olap benchmark, release ii, olap council, http://www.olapcouncil.org/. Nov. 1998.Google Scholar
- L. Bellatreche. Selection of redundant and non redundant optimization structures in vldbs. In Database and Expert Systems Applications, 2007. DEXA'07. 18th International Workshop on, pages 819--824. IEEE, 2007. Google ScholarDigital Library
- L. Bellatreche, K. Boukhalfa, P. Richard, and K. Y. Woameno. Referential horizontal partitioning selection problem in data warehouses: Hardness study and selection algorithms. International Journal of Data Warehousing and Mining (IJDWM), 5(4):1--23, 2009.Google Scholar
- L. Bellatreche, K. Karlapalem, and A. Simonet. Algorithms and support for horizontal class partitioning in object-oriented databases. Distrib. Parallel Databases, 8(2):155--179, Apr. 2000. Google ScholarDigital Library
- R. Bouchakri, L. Bellatreche, Z. Faget, and S. Breç. A coding template for handling static and incremental horizontal partitioning in data warehouses. Journal of Decision Systems, 23(4):481--498, 2014.Google ScholarCross Ref
- T. Calinski and J. Harabasz. A dendrite method for cluster analysis. Communications in Statistics, 3(1):1--27, 1974.Google Scholar
- S. Ceri, M. Negri, and G. Pelagatti. Horizontal data partitioning in database design. In Proceedings of the 1982 ACM SIGMOD International Conference on Management of Data, SIGMOD '82, pages 128--136, New York, NY, USA, 1982. ACM. Google ScholarDigital Library
- B. Jarboui, M. Cheikh, P. Siarry, and A. Rebai. Combinatorial particle swarm optimization (cpso) for partitional clustering problem. Applied Mathematics and Computation, 192(2):337--345, 2007.Google ScholarCross Ref
- K. Karlapalem, S. B. Navathe, and M. Ammar. Optimal redesign policies to support dynamic processing of applications on a distributed relational database system. Inf. Syst., 21(4):353--367, June 1996. Google ScholarDigital Library
- J. Kennedy and R. Eberhart. Particle swarm optimization. In Neural Networks, 1995. Proceedings., IEEE International Conference on, volume 4, pages 1942--1948 vol.4, Nov 1995.Google ScholarCross Ref
- K. Madduri and K. Wu. Efficient joins with compressed bitmap indexes. In Proceedings of the 18th ACM conference on Information and knowledge management, pages 1017--1026. ACM, 2009. Google ScholarDigital Library
- P. Mishra and M. H. Eich. Join processing in relational databases. ACM Computing Surveys (CSUR), 24(1):63--113, 1992. Google ScholarDigital Library
- M. T. Ozsu. Principles of Distributed Database Systems. Prentice Hall Press, Upper Saddle River, NJ, USA, 3rd edition, 2007. Google ScholarDigital Library
- M. Steinbrunn, G. Moerkotte, and A. Kemper. Heuristic and randomized optimization for the join ordering problem. The VLDB JournalâĂŤThe International Journal on Very Large Data Bases, 6(3):191--208, 1997. Google ScholarDigital Library
- P.-N. Tan, M. Steinbach, and V. Kumar. Introduction to Data Mining, (First Edition). Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA, 2005. Google ScholarDigital Library
- L. Toumi, A. Moussaoui, and A. Ugur. Particle swarm optimization for bitmap join indexes selection problem in data warehouses. The Journal of Supercomputing, 68(2):672--708, 2014. Google ScholarDigital Library
- J. H. Ward. Hierarchical grouping to optimize an objective function. Journal of the American Statistical Association, 58(301):236--244, 1963.Google ScholarCross Ref
Index Terms
- EMeD-Part: An Efficient Methodology for Horizontal Partitioning in Data Warehouses
Recommendations
Improved particle swarm optimization algorithm using design of experiment and data mining techniques
Particle swarm optimization (PSO) is a relatively new global optimization algorithm. Benefitting from its simple concept, fast convergence speed and strong ability of optimization, it has gained much attention in recent years. However, PSO suffers from ...
Meta-heuristics for Portfolio Optimization: Part II—Empirical Analysis
Advances in Swarm IntelligenceAbstractA companion paper identified five meta-heuristic approaches for the unconstrained portfolio optimization problem. Four of which, artificial bee colony (ABC), firefly algorithm (FA), a genetic algorithm (GA) and particle swarm optimization (PSO), ...
A review on particle swarm optimization algorithms and their applications to data clustering
Data clustering is one of the most popular techniques in data mining. It is a method of grouping data into clusters, in which each cluster must have data of great similarity and high dissimilarity with other cluster data. The most popular clustering ...
Comments