skip to main content
10.1145/3183713.3190663acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

Robust, Scalable, Real-Time Event Time Series Aggregation at Twitter

Published: 27 May 2018 Publication History

Abstract

Twitter's data engineering team is faced with the challenge of processing billions of events every day in batch and in real time, and we have built various tools to meet these demands. In this paper, we describe TSAR (TimeSeries AggregatoR), a robust, scalable, real-time event time series aggregation framework built primarily for engagement monitoring: aggregating interactions with Tweets, segmented along a multitude of dimensions such as device, engagement type, etc. TSAR is built on top of Summingbird, an open-source framework for integrating batch and online MapReduce computations, and removes much of the tedium associated with building end-to-end aggregation pipelines---from the ingestion and processing of events to the publication of results in heterogeneous datastores. Clients are provided a query interface that powers dashboards and supports downstream ad hoc analytics.

References

[1]
Sameet Agarwal, Rakesh Agrawal, Prasad Deshpande, Ashish Gupta, Jeffrey F. Naughton, Raghu Ramakrishnan, and Sunita Sarawagi. 1996. On the Computation of Multidimensional Aggregates. Proceedings of the 22nd International Conference on Very Large Databases (VLDB 1996). 506--521.
[2]
Tyler Akidau, Alex Balikov, Kaya Bekirouglu, Slava Chernyak, Josh Haberman, Reuven Lax, Sam McVeety, Daniel Mills, Paul Nordstrom, and Sam Whittle. 2013. MillWheel: Fault-Tolerant Stream Processing at Internet Scale. Proceedings of the VLDB Endowment Vol. 6, 11 (2013), 1033--1044.
[3]
Tyler Akidau, Robert Bradshaw, Craig Chambers, Slava Chernyak, Rafael J. Fernández-Moctezuma, Reuven Lax, Sam McVeety, Daniel Mills, Frances Perry, Eric Schmidt, and Sam Whittle. 2015. The Dataflow Model: A Practical Approach to Balancing Correctness, Latency, and Cost in Massive-Scale, Unbounded, Out-of-Order Data Processing. Proceedings of the VLDB Endowment Vol. 8, 12 (2015), 1792--1803.
[4]
Arvind Arasu and Jennifer Widom. 2004. Resource Sharing in Continuous Sliding-Window Aggregates Proceedings of the 30th International Conference on Very Large Data Bases (VLDB 2004). 336--347.
[5]
Oscar Boykin, Sam Ritchie, Ian O'Connell, and Jimmy Lin. 2014. Summingbird: A Framework for Integrating Batch and Online MapReduce Computations. Proceedings of the VLDB Endowment Vol. 7, 13 (2014), 1481--1484.
[6]
Pankaj Gupta, Ashish Goel, Jimmy Lin, Aneesh Sharma, Dong Wang, and Reza Zadeh. 2013. WTF: The Who to Follow Service at Twitter. Proceedings of the 22nd International World Wide Web Conference (WWW 2013). 505--514.
[7]
Venky Harinarayan, Anand Rajaraman, and Jeffrey D. Ullman. 1996. Implementing Data Cubes Efficiently. In Proceedings of the 1996 ACM SIGMOD International Conference on Management of Data. 205--216.
[8]
Pat Helland. 2015. Immutability Changes Everything. In Proceedings of the 7th Biennial Conference on Innovative Data Systems Research (CIDR 2015).
[9]
Joseph M. Hellerstein, Peter J. Haas, and Helen J. Wang. 1997. Online Aggregation Proceedings of the 1997 ACM SIGMOD International Conference on Management of Data. 171--182.
[10]
Sailesh Krishnamurthy, Chung Wu, and Michael Franklin. 2006. On-the-Fly Sharing for Streamed Aggregation. In Proceedings of the 2006 ACM SIGMOD International Conference on Management of Data. 623--634.
[11]
Sanjeev Kulkarni, Nikunj Bhagat, Maosong Fu, Vikas Kedigehalli, Christopher Kellogg, Sailesh Mittal, Jignesh M. Patel, Karthik Ramasamy, and Siddarth Taneja. 2015. Twitter Heron: Stream Processing at Scale. In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data (SIGMOD 2015). 239--250.
[12]
George Lee, Jimmy Lin, Chuang Liu, Andrew Lorek, and Dmitriy Ryaboy. 2012. The Unified Logging Infrastructure for Data Analytics at Twitter. Proceedings of the VLDB Endowment Vol. 5, 12 (2012), 1771--1780.
[13]
Jimmy Lin. 2017. The Lambda and the Kappa. IEEE Internet Computing Vol. 21, 5 (2017), 60--66.
[14]
Jimmy Lin and Dmitriy Ryaboy. 2012. Scaling Big Data Mining Infrastructure: The Twitter Experience. SIGKDD Explorations, Vol. 14, 2 (2012), 6--19.
[15]
Arnab Nandi, Cong Yu, Phil Bohannon, and Raghu Ramakrishnan. 2011. Distributed Cube Materialization on Holistic Measures Proceedings of the 27th International Conference on Data Engineering (ICDE 2011). 183--194.
[16]
Kenneth A. Ross and Divesh Srivastava. 1997. Fast Computation of Sparse Datacubes. In Proceedings of the 23rd International Conference on Very Large Databases (VLDB 1997). 116--125.
[17]
Aneesh Sharma, Jerry Jiang, Praveen Bommannavar, Brian Larson, and Jimmy Lin . 2016. GraphJet: Real-Time Content Recommendations at Twitter. Proceedings of the VLDB Endowment Vol. 9, 13 (2016), 1281--1292.
[18]
Kanat Tangwongsan, Martin Hirzel, Scott Schneider, and Kun-Lung Wu. 2015. General Incremental Sliding-Window Aggregation. Proceedings of the VLDB Endowment Vol. 8, 7 (2015), 702--713.
[19]
Ankit Toshniwal, Siddarth Taneja, Amit Shukla, Karthik Ramasamy, Jignesh M. Patel, Sanjeev Kulkarni, Jason Jackson, Krishna Gade, Maosong Fu, Jake Donham, Nikunj Bhagat, Sailesh Mittal, and Dmitriy Ryaboy. 2014. Storm @Twitter Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data (SIGMOD 2014). 147--156.
[20]
Jun Yang and Jennifer Widom. 2003. Incremental Computation and Maintenance of Temporal Aggregates. The VLDB Journal, Vol. 12, 3 (2003), 262--283.
[21]
Matei Zaharia, Tathagata Das, Haoyuan Li, Timothy Hunter, Scott Shenker, and Ion Stoica. 2013. Discretized Streams: Fault-Tolerant Streaming Computation at Scale Proceedings of the 24th ACM Symposium on Operating Systems Principles (SOSP 2013). 423--438.

Cited By

View all
  • (2022)Estimating the Best Time to View Cherry Blossoms Using Time-Series Forecasting MethodMachine Learning and Knowledge Extraction10.3390/make40200184:2(418-431)Online publication date: 30-Apr-2022
  • (2021)Temporal Aggregation of Spanning Event Stream: An Extended Framework to Handle the Many Stream ModelsTransactions on Large-Scale Data- and Knowledge-Centered Systems XLIX10.1007/978-3-662-64148-4_1(1-32)Online publication date: 1-Sep-2021
  • (2020)Scalable and Reliable Multi-dimensional Sensor Data Aggregation in Data Streaming ArchitecturesData-Enabled Discovery and Applications10.1007/s41688-020-00041-34:1Online publication date: 6-Oct-2020
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGMOD '18: Proceedings of the 2018 International Conference on Management of Data
May 2018
1874 pages
ISBN:9781450347037
DOI:10.1145/3183713
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 May 2018

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. batch processing
  2. online processing

Qualifiers

  • Research-article

Conference

SIGMOD/PODS '18
Sponsor:

Acceptance Rates

SIGMOD '18 Paper Acceptance Rate 90 of 461 submissions, 20%;
Overall Acceptance Rate 785 of 4,003 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)23
  • Downloads (Last 6 weeks)2
Reflects downloads up to 07 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2022)Estimating the Best Time to View Cherry Blossoms Using Time-Series Forecasting MethodMachine Learning and Knowledge Extraction10.3390/make40200184:2(418-431)Online publication date: 30-Apr-2022
  • (2021)Temporal Aggregation of Spanning Event Stream: An Extended Framework to Handle the Many Stream ModelsTransactions on Large-Scale Data- and Knowledge-Centered Systems XLIX10.1007/978-3-662-64148-4_1(1-32)Online publication date: 1-Sep-2021
  • (2020)Scalable and Reliable Multi-dimensional Sensor Data Aggregation in Data Streaming ArchitecturesData-Enabled Discovery and Applications10.1007/s41688-020-00041-34:1Online publication date: 6-Oct-2020
  • (2020)Temporal Aggregation of Spanning Event Stream: A General FrameworkDatabase and Expert Systems Applications10.1007/978-3-030-59051-2_26(385-395)Online publication date: 14-Sep-2020
  • (2019)Scalable and Reliable Multi-Dimensional Aggregation of Sensor Data Streams2019 IEEE International Conference on Big Data (Big Data)10.1109/BigData47090.2019.9006452(3512-3517)Online publication date: Dec-2019
  • (2018)Big Social Data Mining in a Cloud Computing Environment2018 International Conference on Cloud Computing, Big Data and Blockchain (ICCBB)10.1109/ICCBB.2018.8756461(1-8)Online publication date: Nov-2018

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media