skip to main content
10.1145/3487553.3524858acmconferencesArticle/Chapter ViewAbstractPublication PagesthewebconfConference Proceedingsconference-collections
short-paper

Streaming Analytics with Adaptive Near-data Processing

Published: 16 August 2022 Publication History

Abstract

Streaming analytics applications need to process massive volumes of data in a timely manner, in domains ranging from datacenter telemetry and geo-distributed log analytics to Internet-of-Things systems. Such applications suffer from significant network transfer costs to transport the data to a stream processor and compute costs to analyze the data in a timely manner. Pushing the computation closer to the data source by partitioning the analytics query is an effective strategy to reduce resource costs for the stream processor. However, the partitioning strategy depends on the nature of resource bottleneck and resource variability that is encountered at the compute resources near the data source. In this paper, we investigate different issues which affect query partitioning strategies. We first study new partitioning techniques within cloud datacenters which operate under constrained compute conditions varying widely across data sources and different time slots. With insights obtained from the study, we suggest several different ways to improve the performance of stream analytics applications operating in different resource environments, by making effective partitioning decisions for a variety of use cases such as geo-distributed streaming analytics.

References

[1]
AWS. 2022. Industrial Internet of Things. https://aws.amazon.com/iot/solutions/industrial-iot/.
[2]
Shuja-Ur-Rehman Baig, Waheed Iqbal, Josep Lluis Berral, Abdelkarim Erradi, and David Carrera. 2019. Adaptive Prediction Models for Data Center Resources Utilization Estimation. IEEE Transactions on Network and Service Management 16, 4(2019), 1681–1693.
[3]
Daniel Berman. 2016. CloudFront Log Analysis Using the Logz.io ELK Stack. https://logz.io/blog/cloudfront-log-analysis/.
[4]
Matt Calder, Xun Fan, Zi Hu, Ethan Katz-Bassett, John Heidemann, and Ramesh Govindan. 2013. Mapping the Expansion of Google’s Serving Infrastructure. In IMC.
[5]
Tarek Elgamal, Atul Sandur, Phuong Nguyen, Klara Nahrstedt, and Gul Agha. 2018. DROPLET: Distributed Operator Placement for IoT Applications Spanning Edge and Cloud Resources. In CLOUD.
[6]
Hector Fernandez, Guillaume Pierre, and Thilo Kielmann. 2014. Autoscaling Web Applications in Heterogeneous Cloud Infrastructures. In IC2E.
[7]
Chuanxiong Guo, Lihua Yuan, Dong Xiang, Yingnong Dang, Ray Huang, Dave Maltz, Zhaoyi Liu, Vin Wang, Bin Pang, Hua Chen, Zhi-Wei Lin, and Varugis Kurien. 2015. Pingmesh: A Large-Scale System for Data Center Network Latency Measurement and Analysis. In SIGCOMM.
[8]
Arpit Gupta, Rob Harrison, Marco Canini, Nick Feamster, Jennifer Rexford, and Walter Willinger. 2018. Sonata: Query-Driven Streaming Network Telemetry. In SIGCOMM.
[9]
Fan Lai, Jie You, Xiangfeng Zhu, Harsha V. Madhyastha, and Mosharaf Chowdhury. 2020. Sol: Fast Distributed Computation Over Slow Networks. In NSDI.
[10]
Luo Mai, Kai Zeng, Rahul Potharaju, Le Xu, Steve Suh, Shivaram Venkataraman, Paolo Costa, Terry Kim, Saravanan Muthukrishnan, Vamsi Kuppa, Sudheer Dhulipalla, and Sriram Rao. 2018. Chi: A Scalable and Programmable Control Plane for Distributed Stream Processing Systems. Proc. VLDB Endow. 11, 10 (June 2018), 1303–1316.
[11]
Rahul Potharaju, Terry Kim, Wentao Wu, Vidip Acharya, Steve Suh, Andrew Fogarty, Apoorve Dave, Sinduja Ramanujam, Tomas Talius, Lev Novik, and Raghu Ramakrishnan. 2020. Helios: Hyperscale Indexing for the Cloud & Edge. Proc. VLDB Endow. 13, 12 (Aug. 2020), 3231–3244.
[12]
Atul Sandur, ChanHo Park, Stavros Volos, Gul Agha, and Myeongjae Jeon. 2022. Jarvis: Large-scale Server Monitoring with Adaptive Near-data Processing. In ICDE.
[13]
Weijia Song, Zhen Xiao, Qi Chen, and Haipeng Luo. 2014. Adaptive Resource Provisioning for the Cloud Using Online Bin Packing. IEEE Trans. Comput. 63, 11 (Nov. 2014), 2647–2660.
[14]
Xiang Sun, Nirwan Ansari, and Ruopeng Wang. 2016. Optimizing Resource Utilization of a Data Center. Commun. Surveys Tuts. 18, 4 (Oct. 2016), 2822–2846.
[15]
Uber. 2018. The Billion Data Point Challenge: Building a Query Engine for High Cardinality Time Series Data. https://eng.uber.com/billion-data-point-challenge/.
[16]
Ben Zhang, Xin Jin, Sylvia Ratnasamy, John Wawrzynek, and Edward A. Lee. 2018. AWStream: Adaptive Wide-Area Streaming Analytics. In SIGCOMM.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
WWW '22: Companion Proceedings of the Web Conference 2022
April 2022
1338 pages
ISBN:9781450391306
DOI:10.1145/3487553
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 16 August 2022

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Datacenter monitoring
  2. Edge computing
  3. Query partitioning
  4. Streaming analytics
  5. Wide area network

Qualifiers

  • Short-paper
  • Research
  • Refereed limited

Conference

WWW '22
Sponsor:
WWW '22: The ACM Web Conference 2022
April 25 - 29, 2022
Virtual Event, Lyon, France

Acceptance Rates

Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 116
    Total Downloads
  • Downloads (Last 12 months)20
  • Downloads (Last 6 weeks)2
Reflects downloads up to 25 Feb 2025

Other Metrics

Citations

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media