Practical throughput estimation for parallel databases
Methods for estimating the performance of database management systems can aid the design of database systems by identifying potential performance bottle-necks or by predicting the relative performance of different designs. Performance estimation is critical in parallel database systems with distributed memory, where an effective overall performance depends on a good choice among a wide range of ways of placing data. An approach is described for performance estimation for shared-nothing parallel database systems. It estimates system throughput for a given benchmark or set of queries, and can exercise different data placement schemes to determine the data layout that provides the best throughput value.