ABSTRACT
The streaming computation model is a standard model for large-scale data analysis: the input arrives one element at a time, and the goal is to maintain an approximately optimal solution using only a constant, or, at worst, polylogarithmic space.
In practice, however, recency plays a large role, and one often wishes to consider only the last w elements that have arrived, the so-called sliding window problem. A trivial approach is to simply store the last w elements in a buffer; our goal is to develop algorithms with space and update time sublinear in w. In this regime, there are two frameworks: exponential histograms and smooth histograms, which can be used to obtain sliding window algorithms for families of functions satisfying certain properties.
Unfortunately, these frameworks have limitations and cannot always be applied directly. A prominent example is the problem of maximizing submodular function with cardinality constraints. While some of these difficulties can be rectified on a case-by-case basis, here, we describe an alternative approach to designing efficient sliding window algorithms for maximization problems. Then we instantiate this approach on a wide range of problems, yielding better algorithms for submodular function optimization, diversity optimization and general subadditive optimization. In doing so, we improve state-of-the art results obtained using problem-specific algorithms.
- Kook Jin Ahn, Sudipto Guha, and Andrew McGregor. 2012. Graph Sketches: Sparsification, Spanners, and Subgraphs. In Proceedings of the 31st ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems (PODS '12). ACM, New York, NY, USA, 5--14. Google ScholarDigital Library
- Noga Alon, Yossi Matias, and Mario Szegedy. 1996. The space complexity of approximating the frequency moments. In Proceedings of the twenty-eighth annual ACM symposium on Theory of computing. ACM, 20--29. Google ScholarDigital Library
- Arvind Arasu and Gurmeet Singh Manku. 2004. Approximate counts and quantiles over sliding windows. In Proceedings of the twenty-third ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems. ACM, 286--296. Google ScholarDigital Library
- Brian Babcock, Mayur Datar, and Rajeev Motwani. 2002. Sampling from a moving window over streaming data. In Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms. Society for Industrial and Applied Mathematics, 633--634.Google ScholarDigital Library
- Brain Babcock, Mayur Datar, Rajeev Motwani, and Liadan O'Callaghan. 2003. Maintaining Variance and K-medians over Data Stream Windows. In Proceedings of the Twenty-second ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS '03). ACM, New York, NY, USA, 234--243. Google ScholarDigital Library
- Ashwinkumar Badanidiyuru, Baharan Mirzasoleiman, Amin Karbasi, and Andreas Krause. 2014. Streaming submodular maximization: Massive data summarization on the fly. In ACM SIGKDD. ACM, 671--680.Google Scholar
- Ziv Bar-Yossef, TS Jayram, Ravi Kumar, D Sivakumar, and Luca Trevisan. 2002. Counting distinct elements in a data stream. In International Workshop on Randomization and Approximation Techniques in Computer Science. Springer, 1--10. Google ScholarDigital Library
- Ran Ben Basat, Gil Einziger, Roy Friedman, and Yaron Kassner. 2016. Efficient summing over sliding windows. arXiv preprint arXiv:1604.02450 (2016).Google Scholar
- Paul Beame, Raphael Clifford, and Widad Machmouchi. 2013. Element Distinctness, Frequency Moments, and Sliding Windows. In Proceedings of the 2013 IEEE 54th Annual Symposium on Foundations of Computer Science (FOCS '13). IEEE Computer Society, Washington, DC, USA, 290--299. Google ScholarDigital Library
- Vladimir Braverman, Petros Drineas, Jalaj Upadhyay, and Samson Zhou. 2018a. Numerical Linear Algebra in the Sliding Window Model. arXiv preprint arXiv:1805.03765 (2018).Google Scholar
- Vladimir Braverman, Ran Gelles, and Rafail Ostrovsky. 2014. How to catch l2-heavy-hitters on sliding windows. Theoretical Computer Science, Vol. 554 (2014), 82--94. Google ScholarDigital Library
- Vladimir Braverman, Elena Grigorescu, Harry Lang, David P Woodruff, and Samson Zhou. 2018b. Nearly Optimal Distinct Elements and Heavy Hitters on Sliding Windows. arXiv preprint arXiv:1805.00212 (2018).Google Scholar
- Vladimir Braverman, Harry Lang, Keith Levin, and Morteza Monemizadeh. 2016a. Clustering Problems on Sliding Windows. In SODA. 1374--1390. Google ScholarDigital Library
- Vladimir Braverman, Harry Lang, Keith Levin, and Morteza Monemizadeh. 2016b. Clustering problems on sliding windows. In Proceedings of the Twenty-Seventh Annual ACM-SIAM Symposium on Discrete Algorithms. Society for Industrial and Applied Mathematics, 1374--1390. Google ScholarDigital Library
- Vladimir Braverman and Rafail Ostrovsky. 2007. Smooth Histograms for Sliding Windows. In FOCS. 283--293. Google ScholarDigital Library
- Vladimir Braverman, Rafail Ostrovsky, and Carlo Zaniolo. 2009. Optimal sampling from sliding windows. In Proceedings of the twenty-eighth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems. ACM, 147--156. Google ScholarDigital Library
- Matteo Ceccarello, Andrea Pietracaprina, Geppino Pucci, and Eli Upfal. 2017. MapReduce and streaming algorithms for diversity maximization in metric spaces of bounded doubling dimension. Proceedings of the VLDB Endowment, Vol. 10, 5 (2017), 469--480. Google ScholarDigital Library
- Ho-Leung Chan, Tak Wah Lam, Lap-Kei Lee, and Hing-Fung Ting. 2012. Continuous Monitoring of Distributed Data Streams over a Time-Based Sliding Window. Algorithmica, Vol. 62, 3--4 (2012), 1088--1111.Google ScholarDigital Library
- Chandra Chekuri, Shalmoli Gupta, and Kent Quanrud. 2015. Streaming algorithms for submodular function maximization. In International Colloquium on Automata, Languages, and Programming. Springer, 318--330.Google Scholar
- Jiecao Chen, Huy L. Nguyen, and Qin Zhang. 2016. Submodular Maximization over Sliding Windows. CoRR, Vol. abs/1611.00129 (2016). arxiv: 1611.00129 http://arxiv.org/abs/1611.00129Google Scholar
- Vincent Cohen-Addad, Chris Schwiegelshohn, and Christian Sohler. 2016. Diameter and k-Center in Sliding Windows. In 43rd International Colloquium on Automata, Languages, and Programming (ICALP 2016) (Leibniz International Proceedings in Informatics (LIPIcs)), Ioannis Chatzigiannakis, Michael Mitzenmacher, Yuval Rabani, and Davide Sangiorgi (Eds.), Vol. 55. Schloss Dagstuhl--Leibniz-Zentrum fuer Informatik, Dagstuhl, Germany, 19:1--19:12.Google Scholar
- Michael S Crouch, Andrew McGregor, and Daniel Stubbs. 2013. Dynamic graphs in the sliding-window model. In European Symposium on Algorithms. Springer, 337--348.Google ScholarCross Ref
- Mayur Datar, Aristides Gionis, Piotr Indyk, and Rajeev Motwani. 2002. Maintaining stream statistics over sliding windows. SIAM journal on computing, Vol. 31, 6 (2002), 1794--1813. Google ScholarDigital Library
- Alessandro Epasto, Silvio Lattanzi, Sergei Vassilvitskii, and Morteza Zadimoghaddam. 2017. Submodular Optimization Over Sliding Windows. In Proceedings of the 26th International Conference on World Wide Web (WWW '17). International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, Switzerland, 421--430.Google ScholarDigital Library
- Philippe Flajolet and G Nigel Martin. 1985. Probabilistic counting algorithms for data base applications. Journal of computer and system sciences, Vol. 31, 2 (1985), 182--209. Google ScholarDigital Library
- Phillip B Gibbons and Srikanta Tirthapura. 2002. Distributed streams algorithms for sliding windows. In SPAA. ACM, 63--72. Google ScholarDigital Library
- Nuno Homem and Joao Paulo Carvalho. 2011. Finding top-k elements in a time-sliding window. Evolving Systems, Vol. 2, 1 (2011), 51--70.Google ScholarCross Ref
- Regant YS Hung and Hing-Fung Ting. 2008. Finding heavy hitters over the sliding window of a weighted data stream. Lecture Notes in Computer Science, Vol. 4957 (2008), 699--710. Google ScholarDigital Library
- Piotr Indyk. 2007. Sketching, streaming and sublinear-space algorithms. Graduate course notes, available at (2007).Google Scholar
- Piotr Indyk, Sepideh Mahabadi, Mohammad Mahdian, and Vahab S. Mirrokni. 2014. Composable Core-sets for Diversity and Coverage Maximization. In Proceedings of the 33rd ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS '14). ACM, New York, NY, USA, 100--108.Google Scholar
- Lap-Kei Lee and HF Ting. 2006 a. Maintaining significant stream statistics over sliding windows. In Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm. Society for Industrial and Applied Mathematics, 724--732. Google ScholarDigital Library
- Lap-Kei Lee and HF Ting. 2006 b. A simpler and more efficient deterministic scheme for finding frequent items over sliding windows. In Proceedings of the twenty-fifth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems. ACM, 290--297. Google ScholarDigital Library
- Andrew McGregor. 2005. Finding graph matchings in data streams. In APPROX-RANDOM, Vol. 3624. Springer, 170--181. Google ScholarDigital Library
- Shanmugavelayutham Muthukrishnan et almbox. 2005. Data streams: Algorithms and applications. Foundations and Trends® in Theoretical Computer Science, Vol. 1, 2 (2005), 117--236. Google ScholarDigital Library
- Michael Saks and Xiaodong Sun. 2002. Space Lower Bounds for Distance Approximation in the Data Stream Model. In Proceedings of the Thiry-fourth Annual ACM Symposium on Theory of Computing (STOC '02). ACM, New York, NY, USA, 360--369. Google ScholarDigital Library
- Hing-Fung Ting, Lap-Kei Lee, Ho-Leung Chan, and Tak Wah Lam. 2011. Approximating Frequent Items in Asynchronous Data Stream over a Sliding Window. Algorithms, Vol. 4, 3 (2011), 200--222.Google ScholarCross Ref
- Yanhao Wang, Qi Fan, Yuchen Li, and Kian-Lee Tan. 2017. Real-time influence maximization on dynamic social streams. Proceedings of the VLDB Endowment, Vol. 10, 7 (2017), 805--816. Google ScholarDigital Library
- Linfeng Zhang and Yong Guan. 2008. Frequency estimation over sliding windows. In Data Engineering, 2008. ICDE 2008. IEEE 24th International Conference on. IEEE, 1385--1387. Google ScholarDigital Library
Index Terms
- Better Sliding Window Algorithms to Maximize Subadditive and Diversity Objectives
Recommendations
Submodular Optimization Over Sliding Windows
WWW '17: Proceedings of the 26th International Conference on World Wide WebMaximizing submodular functions under cardinality constraints lies at the core of numerous data mining and machine learning applications, including data diversification, data summarization, and coverage problems. In this work, we study this question in ...
Derandomization for Sliding Window Algorithms with Strict Correctness∗
AbstractIn the sliding window streaming model the goal is to compute an output value that only depends on the last n symbols from the data stream. Thereby, only space sublinear in the window size n should be used. Quite often randomization is used in ...
Data Stream Grouping Aggregate Algorithms Based on Compound Sliding Window
ISCSCT '08: Proceedings of the 2008 International Symposium on Computer Science and Computational Technology - Volume 02Grouping aggregate queries Based on Sliding Window is a focus problem in data stream research. Among existing research works, the grouping aggregate algorithms are presented for immediate continuous queries, and they didn’t take into consideration the ...
Comments