Synonyms
Definition
A B-bucket histogram of length N is a partition of the set [0,N) of N integers into intervals [b 0,b 1) ∪ [b 1,b 2) ∪...∪ [b B−1,b B ), where b 0 = 0 and b B = N, together with a collection of B heights h j , for 0 ≤ j < B, one for each bucket. On point query i, the histogram answer is h j , where j is the index of the interval (or “bucket”) containing i; that is, the unique j with b j ≤ i < b j+1. In vector notation, χ S is the vector that is 1 on the set S and zero elsewhere and the answer vector of a histogram is \(\vec{H} = \mathop{{\sum} }\nolimits_{0 \le j < B} h_j \chi_{\left[ \left.b_j ,b_{j+1}\right) \right.}\).
A histogram, \(\vec{H}\), is often used to approximate some other function, \(\vec{A}\), on [0,N). In building a B-bucket histogram, it is desirable to choose B − 1 boundaries b j and B heights h j that tend to minimize some distance, e.g., the sum square error \(\left\|\vec{A} -\vec{H}\right\|^2 =...
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Recommended Reading
Cormode G. and Muthukrishnan S. An improved data stream summary: the count-min sketch and its applications. In Proc. 6th Latin American Symp. Theoretical Informatics, 2004, pp. 29–38.
Gilbert A., Guha S., Indyk P., Kotidis Y., Muthukrishnan S., and Strauss M. Fast, small-space algorithms for approximate histogram maintenance. In Proc. 34th Annual ACM Symp. on Theory of Computing, 2002, pp. 389–398.
Guha S., Koudas N., and Shim K. Approximation and streaming algorithms for histogram construction problems. ACM Trans. Database Sys., 31(1):396–438, March 2006.
Ioannidis Y. The history of histograms (abridged). In Proc. 29th Int. Conf. on Very Large Data Bases, 2003, pp. 19–30.
Jagadish H., Koudas N., Muthukrishnan S., Poosala V., Sevcik K., and Suel T. Optimal histograms with quality guarantees. In Proc. 24th Int. Conf. on Very Large Data Bases, 1998, pp. 275–286.
Muthukrishnan S. and Strauss M. Approximate histogram and wavelet summaries of streaming data. In Data-Stream Management – Processing High-Speed Data Streams. Springer, New York (Data-Centric Systems and Applications Series), 2009.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer Science+Business Media, LLC
About this entry
Cite this entry
Strauss, M.J. (2009). Histograms on Streams. In: LIU, L., ÖZSU, M.T. (eds) Encyclopedia of Database Systems. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-39940-9_191
Download citation
DOI: https://doi.org/10.1007/978-0-387-39940-9_191
Publisher Name: Springer, Boston, MA
Print ISBN: 978-0-387-35544-3
Online ISBN: 978-0-387-39940-9
eBook Packages: Computer ScienceReference Module Computer Science and Engineering