Abstract
The \(\epsilon \)-approximate \(\phi \)-heavy hitters problem is, for any element from some universe \(\mathbb {U}=[0..n)\), to maintain its frequency under an arbitrary data stream of form \((x_i, \varDelta _i)\in \mathbb {U}\times \mathbb {Z}\) that changes the frequency of \(x_i\) by \(\varDelta _i\), such that one can output every element with frequency more than \(\phi {N}\) and no element with frequency no more than \((\phi -\epsilon ){N}\) for \({N}=\sum _i \varDelta _i\) and prespecified parameters \(\epsilon , \phi \in \mathbb {R}\). To solve this problem in small space, Cormode and Muthukrishnan (ACM TODS, 2005) have proposed an \({O}(\rho \epsilon ^{-1}\lg {n})\)-space probabilistic data structure with good practical performance, where \(\rho =\lg {(1/(\delta \phi ))}\) for any failure probability \(\delta \in \mathbb {R}\). In this paper, we improve its output time from \({O}(\rho \epsilon ^{-1}(\lg {n}+\rho ))\) to \({O}(\rho ^2\epsilon ^{-1})\) for arbitrary updates (\(\varDelta _i\in \mathbb {Z}\)) and its update time from \({O}(\rho \lg {n})\) to amortized \({O}(\rho )\) for constant updates (\(\varDelta _i\in {O}(1)\)) with the same space and output guarantee by removing application-specific \(\lg {n}\) terms that are not tunable, unlike other parameters \(\delta \), \(\epsilon \), and \(\phi \).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Basat, R.B., Einziger, G., Friedman, R., Luizelli, M.C., Waisbard, E.: Constant time updates in hierarchical heavy hitters. In: Proceedings of the Conference of the ACM Special Interest Group on Data Communication, SIGCOMM 2017, pp. 127–140 (2017)
Belazzougui, D., Gagie, T., Navarro, G.: Better space bounds for parameterized range majority and minority. In: Dehne, F., Solis-Oba, R., Sack, J.-R. (eds.) WADS 2013. LNCS, vol. 8037, pp. 121–132. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-40104-6_11
Bender, M.A., et al.: The online event-detection problem. arXiv e-prints arXiv:1812.09824 (2018)
Bille, P., Thorup, M.: Regular expression matching with multi-strings and intervals. In: Proceedings of the 21 Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2010, pp. 1297–1308 (2010)
Boyer, R.S., Moore, J.S.: MJRTY: a fast majority vote algorithm. In: Boyer, R.S. (ed.) Automated Reasoning: Essays in Honor of Woody Bledsoe, pp. 105–118. Springer, Dordrecht (1991). https://doi.org/10.1007/978-94-011-3488-0_5
Carter, J.L., Wegman, M.N.: Universal classes of hash functions. J. Comput. Syst. Sci. 18(2), 143–154 (1979)
Charikar, M., Chen, K.C., Farach-Colton, M.: Finding frequent items in data streams. Theor. Comput. Sci. 312(1), 3–15 (2004)
Cormode, G., Hadjieleftheriou, M.: Methods for finding frequent items in data streams. VLDB J. 19(1), 3–20 (2010)
Cormode, G., Muthukrishnan, S.: An improved data stream summary: the count-min sketch and its applications. J. Algorithms 55(1), 58–75 (2005)
Cormode, G., Muthukrishnan, S.: What’s hot and what’s not: tracking most frequent items dynamically. ACM Trans. Database Syst. 30(1), 249–278 (2005)
Demaine, E.D., López-Ortiz, A., Munro, J.I.: Frequency estimation of internet packet streams with limited space. In: Möhring, R., Raman, R. (eds.) ESA 2002. LNCS, vol. 2461, pp. 348–360. Springer, Heidelberg (2002). https://doi.org/10.1007/3-540-45749-6_33
Durocher, S., He, M., Munro, J.I., Nicholson, P.K., Skala, M.: Range majority in constant time and linear space. Inf. Comput. 222, 169–179 (2013)
Feigenblat, G., Itzhaki, O., Porat, E.: The frequent items problem, under polynomial decay, in the streaming model. Theor. Comput. Sci. 411(34–36), 3048–3054 (2010)
Frandsen, G.S., Skyum, S.: Dynamic maintenance of majority information in constant time per update. Inf. Process. Lett. 63(2), 75–78 (1997)
Gagie, T., He, M., Munro, J.I., Nicholson, P.K.: Finding frequent elements in compressed 2D arrays and strings. In: Grossi, R., Sebastiani, F., Silvestri, F. (eds.) SPIRE 2011. LNCS, vol. 7024, pp. 295–300. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-24583-1_29
Gagie, T., He, M., Navarro, G.: Compressed dynamic range majority datastructures. In: 2017 Data Compression Conference, DCC 2017, pp. 260–269 (2017)
Grabowski, S., Fredriksson, K.: Bit-parallel string matching under Hamming distance in \(O\)\((n{m/w}])\) worst case time. Inf. Process. Lett. 105(5), 182–187 (2008)
Hovmand, J.N., Nygaard, M.H.: Estimating frequencies and finding heavy hitters. Master’s thesis, Aarhus University (2016)
Karp, R.M., Shenker, S., Papadimitriou, C.H.: A simple algorithm for finding frequent elements in streams and bags. ACM Trans. Database Syst. 28(1), 51–55 (2003)
Karpinski, M., Nekrich, Y.: Searching for frequent colors in rectangles. In: Proceedings of the 20th Annual Canadian Conference on Computational Geometry, CCCG 2008 (2008)
Kveton, B., Muthukrishnan, S., Vu, H.T., Xian, Y.: Finding subcube heavy hitters in analytics data streams. In: Proceedings of the 2018 World Wide Web Conference WWW 2018, pp. 1705–1714 (2018)
Larsen, K.G., Nelson, J., Nguyen, H.L., Thorup, M.: Heavy hitters via cluster-preserving clustering. In: Proceedings of the IEEE 57th Annual Symposium on Foundations of Computer Science, FOCS 2016, pp. 61–70 (2016)
Metwally, A., Agrawal, D., Abbadi, A.E.: An integrated efficient solution for computing frequent and top-\(k\) elements in data streams. ACM Trans. Database Syst. 31(3), 1095–1133 (2006)
Misra, J., Gries, D.: Finding repeated elements. Sci. Comput. Program. 2(2), 143–152 (1982)
Muthukrishnan, S.: Data streams: algorithms and applications. Found. Trends Theor. Comput. Sci. 1(2), 117–236 (2005)
Navarro, G., Thankachan, S.V.: Encodings for range majority queries. In: Kulikov, A.S., Kuznetsov, S.O., Pevzner, P. (eds.) CPM 2014. LNCS, vol. 8486, pp. 262–272. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-07566-2_27
Pike, R., Dorward, S., Griesemer, R., Quinlan, S.: Interpreting the data: parallel analysis with Sawzall. Sci. Program. J. 13, 277–298 (2005)
Sivaraman, V., Narayana, S., Rottenstreich, O., Muthukrishnan, S., Rexford, J.: Heavy-hitter detection entirely in the data plane. In: Proceedings of the Symposium on SDN Research, SOSR 2017, pp. 164–176 (2017)
Thorup, M.: High speed hashing for integers and strings. arXiv e-prints arXiv:1504.06804 (2015)
Tong, D., Prasanna, V.K.: Sketch acceleration on FPGA and its applications in network anomaly detection. IEEE Trans. Parallel Distrib. Syst. 29(4), 929–942 (2018)
Acknowledgements
The authors would like to thank anonymous referees for their comments that greatly improved the readability and structure of this paper.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Kaneta, Y., Uno, T., Arimura, H. (2019). Fast Identification of Heavy Hitters by Cached and Packed Group Testing. In: Brisaboa, N., Puglisi, S. (eds) String Processing and Information Retrieval. SPIRE 2019. Lecture Notes in Computer Science(), vol 11811. Springer, Cham. https://doi.org/10.1007/978-3-030-32686-9_17
Download citation
DOI: https://doi.org/10.1007/978-3-030-32686-9_17
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-32685-2
Online ISBN: 978-3-030-32686-9
eBook Packages: Computer ScienceComputer Science (R0)