Fast Identification of Heavy Hitters by Cached and Packed Group Testing

Kaneta, Yusaku; Uno, Takeaki; Arimura, Hiroki

doi:10.1007/978-3-030-32686-9_17

Yusaku Kaneta¹⁰,
Takeaki Uno¹¹ &
Hiroki Arimura¹²

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 11811))

Included in the following conference series:

International Symposium on String Processing and Information Retrieval

681 Accesses

Abstract

The $\epsilon $-approximate $\phi $-heavy hitters problem is, for any element from some universe $\mathbb {U}=[0..n)$, to maintain its frequency under an arbitrary data stream of form $(x_i, \varDelta _i)\in \mathbb {U}\times \mathbb {Z}$ that changes the frequency of $x_i$ by $\varDelta _i$, such that one can output every element with frequency more than $\phi {N}$ and no element with frequency no more than $(\phi -\epsilon ){N}$ for ${N}=\sum _i \varDelta _i$ and prespecified parameters $\epsilon , \phi \in \mathbb {R}$. To solve this problem in small space, Cormode and Muthukrishnan (ACM TODS, 2005) have proposed an ${O}(\rho \epsilon ^{-1}\lg {n})$-space probabilistic data structure with good practical performance, where $\rho =\lg {(1/(\delta \phi ))}$ for any failure probability $\delta \in \mathbb {R}$. In this paper, we improve its output time from ${O}(\rho \epsilon ^{-1}(\lg {n}+\rho ))$ to ${O}(\rho ^2\epsilon ^{-1})$ for arbitrary updates ($\varDelta _i\in \mathbb {Z}$) and its update time from ${O}(\rho \lg {n})$ to amortized ${O}(\rho )$ for constant updates ($\varDelta _i\in {O}(1)$) with the same space and output guarantee by removing application-specific $\lg {n}$ terms that are not tunable, unlike other parameters $\delta $, $\epsilon $, and $\phi $.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Classification of Maximum Hittings by Large Families

Article 16 November 2019

Efficient Greedy Algorithms with Accuracy Guarantees for Combinatorial Restrictions

Article 01 February 2024

Parameterized Algorithms and Kernels for 3-Hitting Set with Parity Constraints

References

Basat, R.B., Einziger, G., Friedman, R., Luizelli, M.C., Waisbard, E.: Constant time updates in hierarchical heavy hitters. In: Proceedings of the Conference of the ACM Special Interest Group on Data Communication, SIGCOMM 2017, pp. 127–140 (2017)
Google Scholar
Belazzougui, D., Gagie, T., Navarro, G.: Better space bounds for parameterized range majority and minority. In: Dehne, F., Solis-Oba, R., Sack, J.-R. (eds.) WADS 2013. LNCS, vol. 8037, pp. 121–132. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-40104-6_11
Chapter MATH Google Scholar
Bender, M.A., et al.: The online event-detection problem. arXiv e-prints arXiv:1812.09824 (2018)
Bille, P., Thorup, M.: Regular expression matching with multi-strings and intervals. In: Proceedings of the 21 Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2010, pp. 1297–1308 (2010)
Google Scholar
Boyer, R.S., Moore, J.S.: MJRTY: a fast majority vote algorithm. In: Boyer, R.S. (ed.) Automated Reasoning: Essays in Honor of Woody Bledsoe, pp. 105–118. Springer, Dordrecht (1991). https://doi.org/10.1007/978-94-011-3488-0_5
Chapter MATH Google Scholar
Carter, J.L., Wegman, M.N.: Universal classes of hash functions. J. Comput. Syst. Sci. 18(2), 143–154 (1979)
Article MathSciNet Google Scholar
Charikar, M., Chen, K.C., Farach-Colton, M.: Finding frequent items in data streams. Theor. Comput. Sci. 312(1), 3–15 (2004)
Article MathSciNet Google Scholar
Cormode, G., Hadjieleftheriou, M.: Methods for finding frequent items in data streams. VLDB J. 19(1), 3–20 (2010)
Article Google Scholar
Cormode, G., Muthukrishnan, S.: An improved data stream summary: the count-min sketch and its applications. J. Algorithms 55(1), 58–75 (2005)
Article MathSciNet Google Scholar
Cormode, G., Muthukrishnan, S.: What’s hot and what’s not: tracking most frequent items dynamically. ACM Trans. Database Syst. 30(1), 249–278 (2005)
Article Google Scholar
Demaine, E.D., López-Ortiz, A., Munro, J.I.: Frequency estimation of internet packet streams with limited space. In: Möhring, R., Raman, R. (eds.) ESA 2002. LNCS, vol. 2461, pp. 348–360. Springer, Heidelberg (2002). https://doi.org/10.1007/3-540-45749-6_33
Chapter Google Scholar
Durocher, S., He, M., Munro, J.I., Nicholson, P.K., Skala, M.: Range majority in constant time and linear space. Inf. Comput. 222, 169–179 (2013)
Article MathSciNet Google Scholar
Feigenblat, G., Itzhaki, O., Porat, E.: The frequent items problem, under polynomial decay, in the streaming model. Theor. Comput. Sci. 411(34–36), 3048–3054 (2010)
Article MathSciNet Google Scholar
Frandsen, G.S., Skyum, S.: Dynamic maintenance of majority information in constant time per update. Inf. Process. Lett. 63(2), 75–78 (1997)
Article MathSciNet Google Scholar
Gagie, T., He, M., Munro, J.I., Nicholson, P.K.: Finding frequent elements in compressed 2D arrays and strings. In: Grossi, R., Sebastiani, F., Silvestri, F. (eds.) SPIRE 2011. LNCS, vol. 7024, pp. 295–300. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-24583-1_29
Chapter Google Scholar
Gagie, T., He, M., Navarro, G.: Compressed dynamic range majority datastructures. In: 2017 Data Compression Conference, DCC 2017, pp. 260–269 (2017)
Google Scholar
Grabowski, S., Fredriksson, K.: Bit-parallel string matching under Hamming distance in $O$$(n{m/w}])$ worst case time. Inf. Process. Lett. 105(5), 182–187 (2008)
Google Scholar
Hovmand, J.N., Nygaard, M.H.: Estimating frequencies and finding heavy hitters. Master’s thesis, Aarhus University (2016)
Google Scholar
Karp, R.M., Shenker, S., Papadimitriou, C.H.: A simple algorithm for finding frequent elements in streams and bags. ACM Trans. Database Syst. 28(1), 51–55 (2003)
Article Google Scholar
Karpinski, M., Nekrich, Y.: Searching for frequent colors in rectangles. In: Proceedings of the 20th Annual Canadian Conference on Computational Geometry, CCCG 2008 (2008)
Google Scholar
Kveton, B., Muthukrishnan, S., Vu, H.T., Xian, Y.: Finding subcube heavy hitters in analytics data streams. In: Proceedings of the 2018 World Wide Web Conference WWW 2018, pp. 1705–1714 (2018)
Google Scholar
Larsen, K.G., Nelson, J., Nguyen, H.L., Thorup, M.: Heavy hitters via cluster-preserving clustering. In: Proceedings of the IEEE 57th Annual Symposium on Foundations of Computer Science, FOCS 2016, pp. 61–70 (2016)
Google Scholar
Metwally, A., Agrawal, D., Abbadi, A.E.: An integrated efficient solution for computing frequent and top-$k$ elements in data streams. ACM Trans. Database Syst. 31(3), 1095–1133 (2006)
Google Scholar
Misra, J., Gries, D.: Finding repeated elements. Sci. Comput. Program. 2(2), 143–152 (1982)
Article MathSciNet Google Scholar
Muthukrishnan, S.: Data streams: algorithms and applications. Found. Trends Theor. Comput. Sci. 1(2), 117–236 (2005)
Article MathSciNet Google Scholar
Navarro, G., Thankachan, S.V.: Encodings for range majority queries. In: Kulikov, A.S., Kuznetsov, S.O., Pevzner, P. (eds.) CPM 2014. LNCS, vol. 8486, pp. 262–272. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-07566-2_27
Chapter Google Scholar
Pike, R., Dorward, S., Griesemer, R., Quinlan, S.: Interpreting the data: parallel analysis with Sawzall. Sci. Program. J. 13, 277–298 (2005)
Google Scholar
Sivaraman, V., Narayana, S., Rottenstreich, O., Muthukrishnan, S., Rexford, J.: Heavy-hitter detection entirely in the data plane. In: Proceedings of the Symposium on SDN Research, SOSR 2017, pp. 164–176 (2017)
Google Scholar
Thorup, M.: High speed hashing for integers and strings. arXiv e-prints arXiv:1504.06804 (2015)
Tong, D., Prasanna, V.K.: Sketch acceleration on FPGA and its applications in network anomaly detection. IEEE Trans. Parallel Distrib. Syst. 29(4), 929–942 (2018)
Article Google Scholar

Download references

Acknowledgements

The authors would like to thank anonymous referees for their comments that greatly improved the readability and structure of this paper.

Author information

Authors and Affiliations

Autonomous Networking Research and Innovation Department, Rakuten Mobile, Inc. and Rakuten Institute of Technology, Rakuten, Inc., Tokyo, Japan
Yusaku Kaneta
National Institute of Informatics, Tokyo, Japan
Takeaki Uno
IST, Hokkaido University, Sapporo, Japan
Hiroki Arimura

Authors

Yusaku Kaneta
View author publications
You can also search for this author in PubMed Google Scholar
Takeaki Uno
View author publications
You can also search for this author in PubMed Google Scholar
Hiroki Arimura
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yusaku Kaneta .

Editor information

Editors and Affiliations

University of A Coruña, A Coruña, Spain
Nieves R. Brisaboa
University of Helsinki, Helsinki, Finland
Simon J. Puglisi

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kaneta, Y., Uno, T., Arimura, H. (2019). Fast Identification of Heavy Hitters by Cached and Packed Group Testing. In: Brisaboa, N., Puglisi, S. (eds) String Processing and Information Retrieval. SPIRE 2019. Lecture Notes in Computer Science(), vol 11811. Springer, Cham. https://doi.org/10.1007/978-3-030-32686-9_17

Download citation

DOI: https://doi.org/10.1007/978-3-030-32686-9_17
Published: 03 October 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-32685-2
Online ISBN: 978-3-030-32686-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics