Skip to main content

Fast Identification of Heavy Hitters by Cached and Packed Group Testing

  • Conference paper
  • First Online:
String Processing and Information Retrieval (SPIRE 2019)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 11811))

Included in the following conference series:

  • 681 Accesses

Abstract

The \(\epsilon \)-approximate \(\phi \)-heavy hitters problem is, for any element from some universe \(\mathbb {U}=[0..n)\), to maintain its frequency under an arbitrary data stream of form \((x_i, \varDelta _i)\in \mathbb {U}\times \mathbb {Z}\) that changes the frequency of \(x_i\) by \(\varDelta _i\), such that one can output every element with frequency more than \(\phi {N}\) and no element with frequency no more than \((\phi -\epsilon ){N}\) for \({N}=\sum _i \varDelta _i\) and prespecified parameters \(\epsilon , \phi \in \mathbb {R}\). To solve this problem in small space, Cormode and Muthukrishnan (ACM TODS, 2005) have proposed an \({O}(\rho \epsilon ^{-1}\lg {n})\)-space probabilistic data structure with good practical performance, where \(\rho =\lg {(1/(\delta \phi ))}\) for any failure probability \(\delta \in \mathbb {R}\). In this paper, we improve its output time from \({O}(\rho \epsilon ^{-1}(\lg {n}+\rho ))\) to \({O}(\rho ^2\epsilon ^{-1})\) for arbitrary updates (\(\varDelta _i\in \mathbb {Z}\)) and its update time from \({O}(\rho \lg {n})\) to amortized \({O}(\rho )\) for constant updates (\(\varDelta _i\in {O}(1)\)) with the same space and output guarantee by removing application-specific \(\lg {n}\) terms that are not tunable, unlike other parameters \(\delta \), \(\epsilon \), and \(\phi \).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Basat, R.B., Einziger, G., Friedman, R., Luizelli, M.C., Waisbard, E.: Constant time updates in hierarchical heavy hitters. In: Proceedings of the Conference of the ACM Special Interest Group on Data Communication, SIGCOMM 2017, pp. 127–140 (2017)

    Google Scholar 

  2. Belazzougui, D., Gagie, T., Navarro, G.: Better space bounds for parameterized range majority and minority. In: Dehne, F., Solis-Oba, R., Sack, J.-R. (eds.) WADS 2013. LNCS, vol. 8037, pp. 121–132. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-40104-6_11

    Chapter  MATH  Google Scholar 

  3. Bender, M.A., et al.: The online event-detection problem. arXiv e-prints arXiv:1812.09824 (2018)

  4. Bille, P., Thorup, M.: Regular expression matching with multi-strings and intervals. In: Proceedings of the 21 Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2010, pp. 1297–1308 (2010)

    Google Scholar 

  5. Boyer, R.S., Moore, J.S.: MJRTY: a fast majority vote algorithm. In: Boyer, R.S. (ed.) Automated Reasoning: Essays in Honor of Woody Bledsoe, pp. 105–118. Springer, Dordrecht (1991). https://doi.org/10.1007/978-94-011-3488-0_5

    Chapter  MATH  Google Scholar 

  6. Carter, J.L., Wegman, M.N.: Universal classes of hash functions. J. Comput. Syst. Sci. 18(2), 143–154 (1979)

    Article  MathSciNet  Google Scholar 

  7. Charikar, M., Chen, K.C., Farach-Colton, M.: Finding frequent items in data streams. Theor. Comput. Sci. 312(1), 3–15 (2004)

    Article  MathSciNet  Google Scholar 

  8. Cormode, G., Hadjieleftheriou, M.: Methods for finding frequent items in data streams. VLDB J. 19(1), 3–20 (2010)

    Article  Google Scholar 

  9. Cormode, G., Muthukrishnan, S.: An improved data stream summary: the count-min sketch and its applications. J. Algorithms 55(1), 58–75 (2005)

    Article  MathSciNet  Google Scholar 

  10. Cormode, G., Muthukrishnan, S.: What’s hot and what’s not: tracking most frequent items dynamically. ACM Trans. Database Syst. 30(1), 249–278 (2005)

    Article  Google Scholar 

  11. Demaine, E.D., López-Ortiz, A., Munro, J.I.: Frequency estimation of internet packet streams with limited space. In: Möhring, R., Raman, R. (eds.) ESA 2002. LNCS, vol. 2461, pp. 348–360. Springer, Heidelberg (2002). https://doi.org/10.1007/3-540-45749-6_33

    Chapter  Google Scholar 

  12. Durocher, S., He, M., Munro, J.I., Nicholson, P.K., Skala, M.: Range majority in constant time and linear space. Inf. Comput. 222, 169–179 (2013)

    Article  MathSciNet  Google Scholar 

  13. Feigenblat, G., Itzhaki, O., Porat, E.: The frequent items problem, under polynomial decay, in the streaming model. Theor. Comput. Sci. 411(34–36), 3048–3054 (2010)

    Article  MathSciNet  Google Scholar 

  14. Frandsen, G.S., Skyum, S.: Dynamic maintenance of majority information in constant time per update. Inf. Process. Lett. 63(2), 75–78 (1997)

    Article  MathSciNet  Google Scholar 

  15. Gagie, T., He, M., Munro, J.I., Nicholson, P.K.: Finding frequent elements in compressed 2D arrays and strings. In: Grossi, R., Sebastiani, F., Silvestri, F. (eds.) SPIRE 2011. LNCS, vol. 7024, pp. 295–300. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-24583-1_29

    Chapter  Google Scholar 

  16. Gagie, T., He, M., Navarro, G.: Compressed dynamic range majority datastructures. In: 2017 Data Compression Conference, DCC 2017, pp. 260–269 (2017)

    Google Scholar 

  17. Grabowski, S., Fredriksson, K.: Bit-parallel string matching under Hamming distance in \(O\)\((n{m/w}])\) worst case time. Inf. Process. Lett. 105(5), 182–187 (2008)

    Google Scholar 

  18. Hovmand, J.N., Nygaard, M.H.: Estimating frequencies and finding heavy hitters. Master’s thesis, Aarhus University (2016)

    Google Scholar 

  19. Karp, R.M., Shenker, S., Papadimitriou, C.H.: A simple algorithm for finding frequent elements in streams and bags. ACM Trans. Database Syst. 28(1), 51–55 (2003)

    Article  Google Scholar 

  20. Karpinski, M., Nekrich, Y.: Searching for frequent colors in rectangles. In: Proceedings of the 20th Annual Canadian Conference on Computational Geometry, CCCG 2008 (2008)

    Google Scholar 

  21. Kveton, B., Muthukrishnan, S., Vu, H.T., Xian, Y.: Finding subcube heavy hitters in analytics data streams. In: Proceedings of the 2018 World Wide Web Conference WWW 2018, pp. 1705–1714 (2018)

    Google Scholar 

  22. Larsen, K.G., Nelson, J., Nguyen, H.L., Thorup, M.: Heavy hitters via cluster-preserving clustering. In: Proceedings of the IEEE 57th Annual Symposium on Foundations of Computer Science, FOCS 2016, pp. 61–70 (2016)

    Google Scholar 

  23. Metwally, A., Agrawal, D., Abbadi, A.E.: An integrated efficient solution for computing frequent and top-\(k\) elements in data streams. ACM Trans. Database Syst. 31(3), 1095–1133 (2006)

    Google Scholar 

  24. Misra, J., Gries, D.: Finding repeated elements. Sci. Comput. Program. 2(2), 143–152 (1982)

    Article  MathSciNet  Google Scholar 

  25. Muthukrishnan, S.: Data streams: algorithms and applications. Found. Trends Theor. Comput. Sci. 1(2), 117–236 (2005)

    Article  MathSciNet  Google Scholar 

  26. Navarro, G., Thankachan, S.V.: Encodings for range majority queries. In: Kulikov, A.S., Kuznetsov, S.O., Pevzner, P. (eds.) CPM 2014. LNCS, vol. 8486, pp. 262–272. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-07566-2_27

    Chapter  Google Scholar 

  27. Pike, R., Dorward, S., Griesemer, R., Quinlan, S.: Interpreting the data: parallel analysis with Sawzall. Sci. Program. J. 13, 277–298 (2005)

    Google Scholar 

  28. Sivaraman, V., Narayana, S., Rottenstreich, O., Muthukrishnan, S., Rexford, J.: Heavy-hitter detection entirely in the data plane. In: Proceedings of the Symposium on SDN Research, SOSR 2017, pp. 164–176 (2017)

    Google Scholar 

  29. Thorup, M.: High speed hashing for integers and strings. arXiv e-prints arXiv:1504.06804 (2015)

  30. Tong, D., Prasanna, V.K.: Sketch acceleration on FPGA and its applications in network anomaly detection. IEEE Trans. Parallel Distrib. Syst. 29(4), 929–942 (2018)

    Article  Google Scholar 

Download references

Acknowledgements

The authors would like to thank anonymous referees for their comments that greatly improved the readability and structure of this paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yusaku Kaneta .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Kaneta, Y., Uno, T., Arimura, H. (2019). Fast Identification of Heavy Hitters by Cached and Packed Group Testing. In: Brisaboa, N., Puglisi, S. (eds) String Processing and Information Retrieval. SPIRE 2019. Lecture Notes in Computer Science(), vol 11811. Springer, Cham. https://doi.org/10.1007/978-3-030-32686-9_17

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-32686-9_17

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-32685-2

  • Online ISBN: 978-3-030-32686-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics