skip to main content
10.1145/2675743.2771838acmconferencesArticle/Chapter ViewAbstractPublication PagesdebsConference Proceedingsconference-collections
research-article

TOPiCo: detecting most frequent items from multiple high-rate event streams

Published:24 June 2015Publication History

ABSTRACT

Systems such as social networks, search engines or trading platforms operate geographically distant sites that continuously generate streams of events at high-rate. Such events can be access logs to web servers, feeds of messages from participants of a social network, or financial data, among others. The ability to timely detect trends and popularity variations is of paramount importance in such systems. In particular, determining what are the most popular events across all sites allows to capture the most relevant information in near real-time and quickly adapt the system to the load. This paper presents TOPiCo, a protocol that computes the most popular events across geo-distributed sites in a low cost, bandwidth-efficient and timely manner. TOPiCo starts by building the set of most popular events locally at each site. Then, it disseminates only events that have a chance to be among the most popular ones across all sites, significantly reducing the required bandwidth. We give a correctness proof of our algorithm and evaluate TOPiCo using a real-world trace of more than 240 million events spread across 32 sites. Our empirical results shows that (i) TOPiCo is timely and cost-efficient for detecting popular events in a large-scale setting, (ii) it adapts dynamically to the distribution of the events, and (iii) our protocol is particularly efficient for skewed distributions.

References

  1. Arlitt, M., and Jin, T. A workload characterization study of the 1998 world cup web site. Network, IEEE 14, 3 (2000), 30--37. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Babcock, B., and Olston, C. Distributed top-k monitoring. In Proceedings of the 2003 ACM SIGMOD international conference on Management of data (New York, NY, USA, 2003), SIGMOD '03, ACM, pp. 28--39. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Brenna, L., Gehrke, J., Hong, M., and Johansen, D. Distributed event stream processing with non-deterministic finite automata. In Proceedings of the Third ACM International Conference on Distributed Event-Based Systems (New York, NY, USA, 2009), DEBS '09, ACM, pp. 3:1--3:12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Cao, P., and Wang, Z. Efficient top-k query calculation in distributed networks. In Proceedings of the Twenty-third Annual ACM Symposium on Principles of Distributed Computing (New York, NY, USA, 2004), PODC '04, ACM, pp. 206--215. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Cormode, G., and Muthukrishnan, S. An improved data stream summary: The count-min sketch and its applications. J. Algorithms 55, 1 (Apr. 2005), 58--75. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Culhane, W., Jayaram, K. R., and Eugster, P. Fast, expressive top-k matching. In Proceedings of the 15th International Middleware Conference (New York, NY, USA, 2014), Middleware '14, ACM, pp. 73--84. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Demaine, E. D., López-Ortiz, A., and Munro, J. I. Frequency estimation of internet packet streams with limited space. In Proceedings of the 10th Annual European Symposium on Algorithms (London, UK, UK, 2002), ESA '02, Springer-Verlag, pp. 348--360. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Fagin, R., Kumar, R., and Sivakumar, D. Comparing top k lists. In Proceedings of the Fourteenth Annual ACM-SIAM Symposium on Discrete Algorithms (Philadelphia, PA, USA, 2003), SODA '03, Society for Industrial and Applied Mathematics, pp. 28--36. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Fagin, R., Lotem, A., and Naor, M. Optimal aggregation algorithms for middleware. In Proceedings of the Twentieth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (New York, NY, USA, 2001), PODS '01, ACM, pp. 102--113. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Guerrieri, A., Montresor, A., and Velegrakis, Y. Top-k item identification on dynamic and distributed datasets. In Euro-Par 2014 Parallel Processing, F. Silva, I. Dutra, and V. Santos Costa, Eds., vol. 8632 of Lecture Notes in Computer Science. Springer International Publishing, 2014, pp. 270--281.Google ScholarGoogle ScholarCross RefCross Ref
  11. Guntzer, J., Balke, W.-T., and Kiessling, W. Towards efficient multi-feature queries in heterogeneous environments. In Proceedings of the International Conference on Information Technology: Coding and Computing (Washington, DC, USA, 2001), ITCC '01, IEEE Computer Society, pp. 622--. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Hirzel, M. Partition and compose: Parallel complex event processing. In Proceedings of the 6th ACM International Conference on Distributed Event-Based Systems (New York, NY, USA, 2012), DEBS '12, ACM, pp. 191--200. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Ilyas, I. F., Beskales, G., and Soliman, M. A. A survey of top-k query processing techniques in relational database systems. ACM Comput. Surv. 40, 4 (Oct. 2008), 11:1--11:58. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Lahiri, B., Chandrashekar, J., and Tirthapura, S. Space-efficient tracking of persistent items in a massive data stream. In Proceedings of the 5th ACM International Conference on Distributed Event-based System (New York, NY, USA, 2011), DEBS '11, ACM, pp. 255--266. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Lahiri, B., and Tirthapura, S. Identifying frequent items in a network using gossip. Journal of Parallel and Distributed Computing 70, 12 (2010), 1241--1253. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Manjhi, A., Shkapenyuk, V., Dhamdhere, K., and Olston, C. Finding (recently) frequent items in distributed data streams. In Proceedings of the 21st International Conference on Data Engineering (Washington, DC, USA, 2005), ICDE '05, IEEE Computer Society, pp. 767--778. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Michel, S., Triantafillou, P., and Weikum, G. KLEE: A Framework for Distributed Top-k Query Algorithms. VLDB '05 - Proceedings of the 31st VLDB conference (2005), 637--648. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Misra, J., and Gries, D. Finding repeated elements. Sci. Comput. Program. 2, 2 (1982), 143--152.Google ScholarGoogle ScholarCross RefCross Ref
  19. Sacha, J., and Montresor, A. Identifying frequent items in distributed data sets. Computing 95, 4 (Apr. 2013), 289--307. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Singh, S., Estan, C., Varghese, G., and Savage, S. Automated worm fingerprinting. In Proceedings of the 6th Conference on Symposium on Operating Systems Design & Implementation - Volume 6 (Berkeley, CA, USA, 2004), OSDI'04, USENIX Association, pp. 4--4. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Theobald, M., Weikum, G., and Schenkel, R. Top-k query evaluation with probabilistic guarantees. In Proceedings of the Thirtieth International Conference on Very Large Data Bases - Volume 30 (2004), VLDB '04, VLDB Endowment, pp. 648--659. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Tudoran, R., Nano, O., Santos, I., Costan, A., Soncu, H., Bougé, L., and Antoniu, G. Jetstream: Enabling high performance event streaming across cloud data-centers. In Proceedings of the 8th ACM International Conference on Distributed Event-Based Systems (New York, NY, USA, 2014), DEBS '14, ACM, pp. 23--34. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Vitter, J. S. Random sampling with a reservoir. ACM Transactions on Mathematical Software 11, 1 (1985). Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Wang, X., Candan, K. S., and Song, J. Complex pattern ranking (cpr): Evaluating top-k pattern queries over event streams. In Proceedings of the 5th ACM International Conference on Distributed Event-based System (New York, NY, USA, 2011), DEBS '11, ACM, pp. 279--290. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Weigert, S., Hiltunen, M. A., and Fetzer, C. Community-based analysis of netflow for early detection of security incidents. In Proceedings of the 25th International Conference on Large Installation System Administration (Berkeley, CA, USA, 2011), LISA'11, USENIX Association. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Wong, R. C.-W., and Fu, A. W.-C. Mining top-k frequent itemset from data streams. Journal of Data Mining and Knowledge Discovery 13, 2 (2006), 193--217. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. TOPiCo: detecting most frequent items from multiple high-rate event streams

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        DEBS '15: Proceedings of the 9th ACM International Conference on Distributed Event-Based Systems
        June 2015
        385 pages
        ISBN:9781450332866
        DOI:10.1145/2675743

        Copyright © 2015 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 24 June 2015

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        Overall Acceptance Rate130of553submissions,24%

        Upcoming Conference

        DEBS '24
      • Article Metrics

        • Downloads (Last 12 months)1
        • Downloads (Last 6 weeks)0

        Other Metrics

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader