Maintaining time-decaying stream aggregates

https://doi.org/10.1016/j.jalgor.2005.01.006Get rights and content

Abstract

We formalize the problem of maintaining time-decaying aggregates and statistics of a data stream: the relative contribution of each data item to the aggregate is scaled down by a factor that depends on, and is non-increasing with, elapsed time. Time-decaying aggregates are used in applications where the significance of data items decreases over time. We develop storage-efficient algorithms, and establish upper and lower bounds. Surprisingly, even though maintaining decaying aggregates have become a widely-used tool, our work seems to be the first both to explore it formally and to provide storage-efficient algorithms for important families of decay functions, including polynomial decay.

References (16)

  • E. Cohen

    Size-estimation framework with applications to transitive closure and reachability

    J. Comput. System Sci.

    (1997)
  • E. Cohen et al.

    Managing TCP connections under persistent HTTP

    Comput. Networks

    (1999)
  • B. Babcock et al.

    Models and issues in data stream systems

  • A. Bremler-Barr et al.

    Predicting and bypassing internet end-to-end service degradations

  • E. Cohen et al.

    Efficient estimation algorithms for neighborhood variance and other moments

  • E. Cohen et al.

    Spatially-decaying aggregation over a network: model and algorithms

  • E. Cohen et al.

    Maintaining time-decaying stream aggregates

  • C. Cortes, D. Pregibon, Giga-mining, in: Proceedings of KDD, New York, August...
There are more references available in the full text version of this article.

Cited by (78)

  • Multi-label classification via incremental clustering on an evolving data stream

    2019, Pattern Recognition
    Citation Excerpt :

    Xioufis et al. [9] used two windows per label to capture the positive and negative groups of samples. The decay function approach, meanwhile, use weights to mitigate or strengthen the importance of samples based on their age, i.e., the older sample is less important than the new one [35]. The learning model, therefore, can adapt to the changes that appear in the new data.

  • Multi-label classification via label correlation and first order feature dependance in a data stream

    2019, Pattern Recognition
    Citation Excerpt :

    Xioufis et al. [9] used two windows per label to capture the positive and negative group of samples. The decay function strategy, meanwhile, uses weights to mitigate or strengthen the importance of samples based on their ages, i.e. the older sample is less important than the new one [69,70]. The learning model, therefore, can adapt to changes in the newer data.

  • Sampling Big Ideas in Query Optimization

    2023, Proceedings of the ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems
View all citing articles on Scopus

A preliminary version of this paper appeared in: Proc. of the 2003 ACM Symposium on Principles of Database Systems, PODS 2003, ACM, 2003.

1

Part of this work was done while the author was at AT&T.

View full text