Abstract
Partitioning a multi-dimensional data set into rectangular partitions subject to certain constraints is an important problem that arises in many database applications, including histogram-based selectivity estimation, load-balancing, and construction of index structures. While provably optimal and efficient algorithms exist for partitioning one-dimensional data, the multi-dimensional problem has received less attention, except for a few special cases. As a result, the heuristic partitioning techniques that are used in practice are not well understood, and come with no guarantees on the quality of the solution. In this paper, we present algorithmic and complexity-theoretic results for the fundamental problem of partitioning a two-dimensional array into rectangular tiles of arbitrary size in a way that minimizes the number of tiles required to satisfy a given constraint. Our main results are approximation algorithms for several partitioning problems that provably approximate the optimal solutions within small constant factors, and that run in linear or close to linear time. We also establish the NP-hardness of several partitioning problems, therefore it is unlikely that there are efficient, i.e., polynomial time, algorithms for solving these problems exactly.
We also discuss a few applications in which partitioning problems arise. One of the applications is the problem of constructing multi-dimensional histograms. Our results, for example, give an efficient algorithm to construct the V-Optimal histograms which are known to be the most accurate histograms in several selectivity estimation problems. Our algorithms are the first to provide guaranteed bounds on the quality of the solution.
This work was done while the author was at Bell Labs.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
S. Anily and A. Federgruen. Structured partitioning problems. Operations Research, 13, 130–149, 1991.
S. Arora. Polynomial time approximation schemes for euclidean tsp and other geometric problems. Proc 37th IEEE Symp. of Foundations of Computer Science (FOCS), pages 2–12, 1996.
S. Bokhari. Partitioning problems in parallel, pipelined, and distributed computing. IEEE Transactions on Computers, 37, 38–57, 1988.
Brönnimann and Goodrich. Almost optimal set covers in finite VC-dimension. In Proceedings of the 10th Annual Symposium on Computational Geometry, 1994.
B. Carpentieri and J. Storer. A split-merge parallel block matching algorithm
M. Charikar, C. Chekuri, T. Feder, and R. Motwani. Personal communication, 1996.
K. L. Clarkson. A Las Vegas algorithm for linear programming when the dimension is small. In Proc. 29th Annual IEEE Symposium on Foundations of Computer Science, pages 452–456, October 1988.
F. d’Amore and P. Franciosa. On the optimal binary plane partition for sets of isothetic rectangles. Information Proc. Letters, 44, 255–259, 1992.
R. Fowler, M. Paterson, and S. Tanimoto. Optimal packing and covering in the plane are np-complete. Information Proc. Letters, 12, 133–137, 1981.
G. Fox, M. Johnson, G. Lyzenga, S. Otto, J. Salmon, and D. Walker. Solving Problems on Concurrent Processors, volume 1. Prentice-Hall, Englewood Cliffs, New Jersey, 1988.
M. Grigni and F. Manne. On the complexity of the generalized block distribution. Proc. of 3rd international workshop on parallel algorithms for irregularly structured problems (IRREGULAR’ 96), Lecture notes in computer science 1117, Springer, 319–326, 1996.
D. Haussler and E. Welzl. Epsilon-nets and simplex range queries. Discrete and Computational Geometry, 2:127–151, 1987.
Y. Ioannidis. Universality of serial histograms. Proc. of the 19th Int. Conf. on Very Large Databases, pages 256–267, December 1993.
Y. Ioannidis and V. Poosala. Balancing histogram optimality and practicality for query result size estimation. Proc. of ACM SIGMOD Conf, pages 233–244, May 1995.
H. V. Jagadish, N. Koudas, S. Muthukrishnan, V. Poosala, K. Sevcik, and T. Suel. Optimal histograms with quality guarantees. Proc. of the 24rd Int. Conf. on Very Large Databases, pages 275–286, August 1998.
J. Jain and A. Jain. Displacement measurement and its application in interframe coding. IEEE Transactions on communications, 29, 1799–1808, 1981.
M. Kaddoura, S. Ranka and A. Wang. Array decomposition for nonuniform computational environments. Technical Report, Syracuse University, 1995.
S. Khanna, S. Muthukrishnan, and M. Paterson. Approximating rectangle tiling and packing. Proc Symp. on Discrete Algorithms (SODA), pages 384–393, 1998.
S. Khanna, S. Muthukrishnan, and S. Skiena. Efficient array partitioning. Proc. Intl. Colloq. on Automata, Languages, and Programming (ICALP), pages 616–626, 1997.
R. P. Kooi. The optimization of queries in relational databases. PhD thesis, Case Western Reserve University, Sept 1980.
D. Lichtenstein. Planar formulae and their uses. SIAM J. Computing, 11, 329–343, 1982.
N. Littlestone. Learning quickly when irrelevant attributes abound: A new linearthreshold algorithm. In Proceedings of the 28th Annual Symposium on Foundations of Computer Science, pages 68–77, October 1987.
F. Manne. Load Balancing in Parallel Sparse Matrix Computations. Ph.d. thesis, Department of Informatics, University of Bergen, Norway, 1993.
F. Manne and T. Sorevik. Partitioning an array onto a mesh of processors. Proc. of Workshop on Applied Parallel Computing in Industrial Problems. 1996.
C. Manning. Introduction to Digital Video Coding and Block Matching Algorithms. http://atlantis.ucc.ie/dvideo/dv.html.
J. Mitchell. Guillotine subdivisions approximate polygonal subdivisions: A simple method for geometric k-mst problem. Proc. ACM-SIAM Symp. on Discrete Algorithms (SODA), pages 402–408, 1996.
M. Muralikrishna and David J Dewitt. Equi-depth histograms for estimating selectivity factors for multi-dimensional queries. Proc. of ACM SIGMOD Conf, pages 28–36, 1988.
J. Nievergelt, H. Hinterberger, and K. C. Sevcik. The grid file: An adaptable, symmetric multikey file structure. ACM Transactions on Database Systems, 9(1):38–71, March 1984.
S. Muthukrishnan, V. Poosala and T. Suel. On rectangular partitionings in two dimensions: algorithms, complexity and applications. Manuscript, 1998.
V. Poosala, Y. Ioannidis, P. Haas, and E. Shekita. Improved histograms for selectivity estimation of range predicates. Proc. of ACMSIGMOD Conf, pages 294–305, June 1996.
V. Poosala. Histogram-based estimation techniques in databases. PhD thesis, Univ. of Wisconsin-Madison, 1997.
V. Poosala and Y. Ioannidis. Selectivity estimation without the attribute value independence assumption. Proc. of the 23rd Int. Conf. on Very Large Databases, August 1997.
G. P. Shapiro and C. Connell. Accurate estimation of the number of tuples satisfying a condition. Proc. of ACM SIGMOD Conf, pages 256–276, 1984.
E. Welzl. Partition trees for triangle counting and other range searching problems. In Proceedings of the 4th Annual Symposium on Computational Geometry, pages 23–33, June 1988.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 1999 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Muthukrishnan, S., Poosala, V., Suel, T. (1999). On Rectangular Partitionings in Two Dimensions: Algorithms, Complexity and Applications. In: Beeri, C., Buneman, P. (eds) Database Theory — ICDT’99. ICDT 1999. Lecture Notes in Computer Science, vol 1540. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-49257-7_16
Download citation
DOI: https://doi.org/10.1007/3-540-49257-7_16
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-65452-0
Online ISBN: 978-3-540-49257-3
eBook Packages: Springer Book Archive