Abstract
Many data mining approaches focus on the discovery of similar (and frequent) data values in large data sets. We present an alternative, but complementary approach in which we search for empty regions in the data. We consider the problem of finding all maximal empty rectangles in large, two-dimensional data sets. We introduce a novel, scalable algorithm for finding all such rectangles. The algorithm achieves this with a single scan over a sorted data set and requires only a small bounded amount of memory. We also describe an algorithm to find all maximal empty hyper-rectangles in a multi-dimensional space. We consider the complexity of this search problem and present new bounds on the number of maximal empty hyper-rectangles. We briefly overview experimental results obtained by applying our algorithm to a synthetic data set.
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
R. Agrawal, T. Imielinksi, and A. Swami. Mining Association Rules between Sets of Items in Large Databases. ACM SIGMOD, 22(2), June 1993.
M. J. Atallah and Fredrickson G. N. A note on finding a maximum empty rectangle. Discrete Applied Mathematics, (13):87–91, 1986.
D. Barbará, W. DuMouchel, C. Faloutsos, P. J. Haas, J. M. Hellerstein, Y. E. Ioannidis, H. V. Jagadish, T. Johnson, R. T. Ng, V. Poosala, K. A. Ross, and K. C. Sevcik. The New Jersey Data Reduction Report. Data Engineering Bulletin, 20(4):3–45, 1997.
Bernard Chazelle, Robert L. (Scot) Drysdale III, and D. T. Lee. Computing the largest empty rectangle. SIAM J. Comput., 15(1):550–555, 1986.
Q. Cheng, J. Gryz, F. Koo, C. Leung, L. Liu, X. Qian, and B. Schiefer. Implementation of two semantic query optimization techniques in DB2 universal database. In Proceedings of the 25th VLDB, pages 687–698, Edinburgh, Scotland, 1999.
J. Edmonds, J. Gryz, D. Liang, and R. J. Miller. Mining for Empty Rectangles in Large Data Sets (Extended Version). Technical Report CSRG-410, Department of Computer Science, University of Toronto, 2000.
M. R. Garey and D. S. Johnson. Computers and Intractability. W. H. Freeman and Co., New York, 1979.
H. V. Jagadish, J. Madar, and R. T. Ng. Semantic Compression and Pattern Extraction with Fascicles. In Proc. of VLDB, pages 186–197, 1999.
B. Liu, K. Wang, L.-F. Mun, and X.-Z. Qi. Using Decision Tree Induction for Discovering Holes in Data. In 5th Pacific Rim International Conference on Artificial Intelligence, pages 182–193, 1998.
Bing Liu, Liang-Ping Ku, and Wynne Hsu. Discovering interesting holes in data. In Proceedings of IJCAI, pages 930–935, Nagoya, Japan, 1997. Morgan Kaufmann.
R. J. Miller and Y. Yang. Association Rules over Interval Data. ACM SIGMOD, 26(2):452–461, May 1997.
A. Namaad, W. L. Hsu, and D. T. Lee. On the maximum empty rectangle problem. Applied Discrete Mathematics, (8):267–277, 1984.
M. Orlowski. A New Algorithm for the Largest Empty Rectangle Problem. Algorithmica, 5(1):65–73, 1990.
T. Zhang, R. Ramakrishnan, and M. Livny. BIRCH: An Efficient Data Clustering Method for Very Large Databases. ACM SIGMOD, 25(2), June 1996.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2001 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Edmonds, J., Gryz, J., Liang, D., Miller, R.J. (2001). Mining for Empty Rectangles in Large Data Sets. In: Van den Bussche, J., Vianu, V. (eds) Database Theory — ICDT 2001. ICDT 2001. Lecture Notes in Computer Science, vol 1973. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44503-X_12
Download citation
DOI: https://doi.org/10.1007/3-540-44503-X_12
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-41456-8
Online ISBN: 978-3-540-44503-6
eBook Packages: Springer Book Archive