Abstract
Consider a scientific range query, such as find all places in Africa where yesterday the temperature was over 35 degrees and it rained. In theory, one can answer such queries by returning all geographic points that satisfy the query condition. However, in practice, users do not find this low-level answer very useful; instead they require the points to be consolidated into regions, i.e., sets of points that all satisfy the query conditions and are adjacent in the underlying mesh. In this paper, we show that when a high-quality index is used to find the points and a good traditional connected component labeling algorithm is used to build the regions, the cost of consolidating the points into regions dominates range query response time. We then show how to find query result points and consolidate them into regions in expected time that is sublinear in the number of result points. This seemingly miraculous speedup comes from a point lookup phase that uses bitmap indexes and produces a compressed bitmap as the intermediate query result, followed by a region consolidation phase that operates directly on the intermediate query result bitmap and exploits the spatial properties of the underlying mesh to greatly reduce the cost of consolidating the result points into regions. Our experiments with real-world scientific data demonstrate that in practice, our approach to region consolidation is over 10 times faster than a traditional connected component algorithm.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Shi, Q., Jaja, J.F.: Efficient techniques for range search queries on earth science data. In: SSDBM, pp. 142–151 (2002)
Wu, K., Koegler, W., Chen, J., Shoshani, A.: Using bitmap index for interactive exploration of large datasets. In: SSDBM (2003)
Wu, K., Otoo, E.J., Shoshani, A.: On the performance of bitmap indices for high cardinality attributes. In: VLDB 2004, pp. 24–35 (2004)
Apaydin, T., Canahuate, G., Ferhatosmanoglu, H., Tosun, A.S.: Approximate encoding for direct access and query processing over compressed bitmaps. In: VLDB 2006, pp. 846–857 (2006)
O’Neil, P.: Model 204 architecture and performance. In: Gawlick, D., Reuter, A., Haynie, M. (eds.) HPTS 1987. LNCS, vol. 359, pp. 40–59. Springer, Heidelberg (1989)
Stockinger, K., Wu, K., Shoshani, A.: Evaluation strategies for bitmap indices with binning. In: Galindo, F., Takizawa, M., Traunmüller, R. (eds.) DEXA 2004. LNCS, vol. 3180, pp. 120–129. Springer, Heidelberg (2004)
Stockinger, K.: Design and implementation of bitmap indices for scientific data. In: IDEAS 2001, pp. 47–57 (2001)
Wu, M.C.: Query optimization for selections using bitmaps. In: SIGMOD 1999, pp. 227–238 (1999)
Wu, M.C., Buchmann, A.P.: Encoded bitmap indexing for data warehouses. In: ICDE 1998, pp. 220–230 (1998)
Wu, K., Otoo, E.J., Shoshani, A.: Optimizing bitmap indices with efficient compression. ACM TODS 31(1), 1–38 (2006)
Koudas, N.: Space efficient bitmap indexing. In: CIKM 2000, pp. 194–201 (2000)
Chan, C.Y., Ioannidis, Y.E.: Bitmap index design and evaluation. In: SIGMOD 1998, pp. 355–366 (1998)
Rotem, D., Stockinger, K., Wu, K.: Optimizing candidate check costs for bitmap indices. In: CIKM 2005, pp. 648–655 (2005)
Sinha, R.R., Winslett, M.: Multi-resolution bitmap indexes for scientific data. ACM TODS 32(3), 46–84 (2007)
Wu, K., Otoo, E., Suzuki, K.: Two strategies to speed up connected component labeling algorithms. Pattern Analysis and Applications (2008)
Cormen, T.H., Leiserson, C.E., Rivest, R.L., Stein, C.: 21. Data Structures for Disjoint Sets. In: Introduction to Algorithms, 2nd edn., pp. 498–524. McGraw Hill, New York
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Sinha, R.R., Winslett, M., Wu, K. (2009). Finding Regions of Interest in Large Scientific Datasets. In: Winslett, M. (eds) Scientific and Statistical Database Management. SSDBM 2009. Lecture Notes in Computer Science, vol 5566. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-02279-1_10
Download citation
DOI: https://doi.org/10.1007/978-3-642-02279-1_10
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-02278-4
Online ISBN: 978-3-642-02279-1
eBook Packages: Computer ScienceComputer Science (R0)