Skip to main content

Finding Regions of Interest in Large Scientific Datasets

  • Conference paper
Book cover Scientific and Statistical Database Management (SSDBM 2009)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 5566))

Abstract

Consider a scientific range query, such as find all places in Africa where yesterday the temperature was over 35 degrees and it rained. In theory, one can answer such queries by returning all geographic points that satisfy the query condition. However, in practice, users do not find this low-level answer very useful; instead they require the points to be consolidated into regions, i.e., sets of points that all satisfy the query conditions and are adjacent in the underlying mesh. In this paper, we show that when a high-quality index is used to find the points and a good traditional connected component labeling algorithm is used to build the regions, the cost of consolidating the points into regions dominates range query response time. We then show how to find query result points and consolidate them into regions in expected time that is sublinear in the number of result points. This seemingly miraculous speedup comes from a point lookup phase that uses bitmap indexes and produces a compressed bitmap as the intermediate query result, followed by a region consolidation phase that operates directly on the intermediate query result bitmap and exploits the spatial properties of the underlying mesh to greatly reduce the cost of consolidating the result points into regions. Our experiments with real-world scientific data demonstrate that in practice, our approach to region consolidation is over 10 times faster than a traditional connected component algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Shi, Q., Jaja, J.F.: Efficient techniques for range search queries on earth science data. In: SSDBM, pp. 142–151 (2002)

    Google Scholar 

  2. Wu, K., Koegler, W., Chen, J., Shoshani, A.: Using bitmap index for interactive exploration of large datasets. In: SSDBM (2003)

    Google Scholar 

  3. Wu, K., Otoo, E.J., Shoshani, A.: On the performance of bitmap indices for high cardinality attributes. In: VLDB 2004, pp. 24–35 (2004)

    Google Scholar 

  4. Apaydin, T., Canahuate, G., Ferhatosmanoglu, H., Tosun, A.S.: Approximate encoding for direct access and query processing over compressed bitmaps. In: VLDB 2006, pp. 846–857 (2006)

    Google Scholar 

  5. O’Neil, P.: Model 204 architecture and performance. In: Gawlick, D., Reuter, A., Haynie, M. (eds.) HPTS 1987. LNCS, vol. 359, pp. 40–59. Springer, Heidelberg (1989)

    Google Scholar 

  6. Stockinger, K., Wu, K., Shoshani, A.: Evaluation strategies for bitmap indices with binning. In: Galindo, F., Takizawa, M., Traunmüller, R. (eds.) DEXA 2004. LNCS, vol. 3180, pp. 120–129. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  7. Stockinger, K.: Design and implementation of bitmap indices for scientific data. In: IDEAS 2001, pp. 47–57 (2001)

    Google Scholar 

  8. Wu, M.C.: Query optimization for selections using bitmaps. In: SIGMOD 1999, pp. 227–238 (1999)

    Google Scholar 

  9. Wu, M.C., Buchmann, A.P.: Encoded bitmap indexing for data warehouses. In: ICDE 1998, pp. 220–230 (1998)

    Google Scholar 

  10. Wu, K., Otoo, E.J., Shoshani, A.: Optimizing bitmap indices with efficient compression. ACM TODS 31(1), 1–38 (2006)

    Article  Google Scholar 

  11. Koudas, N.: Space efficient bitmap indexing. In: CIKM 2000, pp. 194–201 (2000)

    Google Scholar 

  12. Chan, C.Y., Ioannidis, Y.E.: Bitmap index design and evaluation. In: SIGMOD 1998, pp. 355–366 (1998)

    Google Scholar 

  13. Rotem, D., Stockinger, K., Wu, K.: Optimizing candidate check costs for bitmap indices. In: CIKM 2005, pp. 648–655 (2005)

    Google Scholar 

  14. Sinha, R.R., Winslett, M.: Multi-resolution bitmap indexes for scientific data. ACM TODS 32(3), 46–84 (2007)

    Article  Google Scholar 

  15. Wu, K., Otoo, E., Suzuki, K.: Two strategies to speed up connected component labeling algorithms. Pattern Analysis and Applications (2008)

    Google Scholar 

  16. Cormen, T.H., Leiserson, C.E., Rivest, R.L., Stein, C.: 21. Data Structures for Disjoint Sets. In: Introduction to Algorithms, 2nd edn., pp. 498–524. McGraw Hill, New York

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Sinha, R.R., Winslett, M., Wu, K. (2009). Finding Regions of Interest in Large Scientific Datasets. In: Winslett, M. (eds) Scientific and Statistical Database Management. SSDBM 2009. Lecture Notes in Computer Science, vol 5566. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-02279-1_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-02279-1_10

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-02278-4

  • Online ISBN: 978-3-642-02279-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics