Skip to main content

Improving the Accuracy of Histograms for Geographic Data Objects

  • Conference paper
Book cover Database Systems for Advanced Applications (DASFAA 2012)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7238))

Included in the following conference series:

  • 1637 Accesses

Abstract

Histograms have been widely used for estimating selectivity in query optimization. In this paper, we propose a new technique to improve the accuracy of histograms for two-dimensional geographic data objects that are used in many real-world applications. Typically, a histogram consists of a collection of rectangular regions, called buckets. The main idea of our technique is to use a straight line to convert each rectangular bucket to a new one with two separating regions. The converted buckets, called bichromatic buckets, can approximate the distribution of data objects better while preserving the simplicity of originally rectangular ones. To construct bichromatic buckets, we propose an algorithm to find good separating lines. We also describe how to apply the proposed technique to existing histogram construction methods to improve the accuracy of the constructed histograms. Results from extensive experiments using real-life data sets demonstrate that our technique improves the accuracy of the histograms by 2 times on average.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Crime incidents in 2010 by City of Portland police bureau (2011), http://www.civicapps.org/datasets/crime-incidents-2010

  2. The geonames database (2011), http://www.geonames.org

  3. R-tree portal (2011), http://www.rtreeportal.org

  4. Aboulnaga, A., Chaudhuri, S.: Self-tuning histograms: Building histograms without looking at data. In: SIGMOD Conference, pp. 181–192 (1999)

    Google Scholar 

  5. Acharya, S., Poosala, V., Ramaswamy, S.: Selectivity estimation in spatial databases. In: SIGMOD Conference, pp. 13–24 (1999)

    Google Scholar 

  6. Blohsfeld, B., Korus, D., Seeger, B.: A comparison of selectivity estimators for range queries on metric attributes. In: SIGMOD Conference, pp. 239–250 (1999)

    Google Scholar 

  7. Bruno, N., Chaudhuri, S., Gravano, L.: Stholes: A multidimensional workload-aware histogram. In: SIGMOD Conference, pp. 211–222 (2001)

    Google Scholar 

  8. Eavis, T., Lopez, A.: Rk-hist: an r-tree based histogram for multi-dimensional selectivity estimation. In: CIKM, pp. 475–484 (2007)

    Google Scholar 

  9. Guha, S., Shim, K., Woo, J.: Rehist: Relative error histogram construction algorithms. In: VLDB, pp. 300–311 (2004)

    Google Scholar 

  10. Gunopulos, D., Kollios, G., Tsotras, V.J., Domeniconi, C.: Selectivity estimators for multidimensional range queries over real attributes. VLDB Journal 14(2), 137–154 (2005)

    Article  Google Scholar 

  11. Haas, P.J., Swami, A.N.: Sequential sampling procedures for query size estimation. In: SIGMOD Conference, pp. 341–350 (1992)

    Google Scholar 

  12. Ioannidis, Y.E.: The history of histograms (abridged). In: VLDB, pp. 19–30 (2003)

    Google Scholar 

  13. Lee, J.H., Kim, D.H., Chung, C.W.: Multi-dimensional selectivity estimation using compressed histogram information. In: SIGMOD Conference, pp. 205–214 (1999)

    Google Scholar 

  14. Lipton, R.J., Naughton, J.F., Schneider, D.A.: Practical selectivity estimation through adaptive sampling. In: SIGMOD Conference, pp. 1–11 (1990)

    Google Scholar 

  15. Matias, Y., Vitter, J.S., Wang, M.: Wavelet-based histograms for selectivity estimation. In: SIGMOD Conference, pp. 448–459 (1998)

    Google Scholar 

  16. Muralikrishna, M., DeWitt, D.J.: Equi-depth histograms for estimating selectivity factors for multi-dimensional queries. In: SIGMOD Conference, pp. 28–36 (1988)

    Google Scholar 

  17. Muthukrishnan, S., Poosala, V., Suel, T.: On Rectangular Partitionings in Two Dimensions: Algorithms, Complexity, and Applications. In: Beeri, C., Bruneman, P. (eds.) ICDT 1999. LNCS, vol. 1540, pp. 236–256. Springer, Heidelberg (1998)

    Chapter  Google Scholar 

  18. Poosala, V., Ioannidis, Y.E.: Selectivity estimation without the attribute value independence assumption. In: VLDB, pp. 486–495 (1997)

    Google Scholar 

  19. Roh, Y.J., Kim, J.H., Chung, Y.D., Son, J.H., Kim, M.H.: Hierarchically organized skew-tolerant histograms for geographic data objects. In: SIGMOD Conference, pp. 627–638 (2010)

    Google Scholar 

  20. Srivastava, U., Haas, P.J., Markl, V., Kutsch, M., Tran, T.M.: Isomer: Consistent histogram construction using query feedback. In: ICDE, pp. 39–51 (2006)

    Google Scholar 

  21. Thaper, N., Guha, S., Indyk, P., Koudas, N.: Dynamic multidimensional histograms. In: SIGMOD Conference, pp. 428–439 (2002)

    Google Scholar 

  22. Vitter, J.S., Wang, M., Iyer, B.R.: Data cube approximation and histograms via wavelets. In: CIKM, pp. 96–104 (1998)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Mai, H.T., Kim, J., Kim, M.H. (2012). Improving the Accuracy of Histograms for Geographic Data Objects. In: Lee, Sg., Peng, Z., Zhou, X., Moon, YS., Unland, R., Yoo, J. (eds) Database Systems for Advanced Applications. DASFAA 2012. Lecture Notes in Computer Science, vol 7238. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-29038-1_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-29038-1_3

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-29037-4

  • Online ISBN: 978-3-642-29038-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics