skip to main content
research-article

Towards Observability Data Management at Scale

Published: 10 March 2021 Publication History

Abstract

Observability has been gaining importance as a key capability in today's large-scale software systems and services. Motivated by current experience in industry exemplified by Slack and as a call to arms for database research, this paper outlines the challenges and opportunities involved in designing and building Observability Data Management Systems (ODMSs) to handle this emerging workload at scale.

References

[1]
R. H. Arpaci-Dusseau et al. 2018. Cloud-Native File Systems. In USENIX Conference on Hot Topics in Cloud Computing (HotCloud).
[2]
C. Chan et al. 2020. Debugging Incidents in Google's Distributed Systems. ACM Queue 18, 2 (2020).
[3]
J. Duggan et al. 2015. The BigDAWG Polystore System. ACM SIGMOD Record 44, 2 (2015), 11--16.
[4]
C. Gormley et al. 2015. Elasticsearch: The Definitive Guide. O'Reilly Media.
[5]
M. Hausenblas et al. 2017. Lambda Architecture. http://lambdaarchitecture. net.
[6]
N. Narkhede et al. 2017. Kafka: The Definitive Guide Real-Time Data and Stream Processing at Scale. O'Reilly Media.
[7]
J. Jeffrey et al. 1987. Monitoring Distributed Systems. ACM Transactions on Computer Systems (TOCS) 5, 2 (1987), 121--150.
[8]
J. Kaldor et al. 2017. Canopy: An End-to-End Performance Tracing And Analysis System. In SOSP. 34--50.
[9]
R. Katkov. 2020. All Hands on Deck. https://slack.engineering/allhands- on-deck-91d6986c3ee.
[10]
L. Lamport. 1976. The Ordering of Events in a Distributed System. Communications of the ACM 21, 7 (1976), 558.
[11]
J. Mace et al. 2015. Pivot Tracing: Dynamic Causal Monitoring for Distributed Systems. In SOSP. 378--393.
[12]
S. More. 2018. A Practical Observability Primer. mStakx.
[13]
S. Niedermaier et al. 2019. On Observability and Monitoring of Distributed Systems -- An Industry Interview Study. In ICSOC.
[14]
OpenTelemetry. 2019. The OpenTelemetry Open-Source Observability Framework. https://opentelemetry.io/.
[15]
J. O'Shea. 2020. Building Dashboards for Operational Visibility. https://aws.amazon.com/builders-library/building-dashboards-foroperational- visibility/.
[16]
F. Özcan et al. 2017. Hybrid Transactional/Analytical Processing: A Survey. In ACM SIGMOD Conference. 1771--1775.
[17]
T. Palpanas. 2015. Data Series Management: The Road to Big Sequence Analytics. ACM SIGMOD Record 44, 2 (2015), 47--52.
[18]
T. Palpanas et al. 2019. Report on the First and Second Interdisciplinary Time Series Analysis Workshops. ACM SIGMOD Record 48, 3 (2019), 36--40.
[19]
Pinterest. 2017. Pinterest Secor: A Service for Implementing Kafka Log Persistence. https://github.com/pinterest/secor.
[20]
Prometheus. 2012. Prometheus Documentation. https://prometheus.io/ docs/concepts/metric_types/.
[21]
J. Rodrigues et al. 2017. Sieve: Actionable Insights from Monitored Metrics in Distributed Systems. In ACM Middleware Conference. 14-- 27.
[22]
R. Sethi et al. 2019. Presto: SQL on Everything. In IEEE ICDE.
[23]
Y. Shkuro. 2019. Mastering Distributed Tracing: Analyzing Performance in Microservices and Complex Systems. Packt Publishing.
[24]
B. H. Sigelman et al. 2010. Dapper: A Large-Scale Distributed Systems Tracing Infrastructure. Technical Report. Google, Inc.
[25]
C. Sridharan. 2018. Distributed Systems Observability: A Guide to Building Robust Systems. O'Reilly Media.
[26]
D. Vohra. 2016. Apache Parquet. In Practical Hadoop Ecosystem: A Definitive Guide to Hadoop-Related Frameworks and Tools. 325--335.
[27]
A. Wiedemann et al. 2019. The DevOps Phenomenon. ACM Queue 17, 2 (2019).
[28]
M. Zaharia et al. 2010. Spark: Cluster Computing withWorking Sets. In USENIX Conference on Hot Topics in Cloud Computing (HotCloud).

Cited By

View all
  • (2024)Predicting Business Process Events in Presence of Anomalous IT EventsProceedings of the 7th Joint International Conference on Data Science & Management of Data (11th ACM IKDD CODS and 29th COMAD)10.1145/3632410.3632437(27-35)Online publication date: 4-Jan-2024
  • (2024)Towards Business Process ObservabilityProceedings of the 7th Joint International Conference on Data Science & Management of Data (11th ACM IKDD CODS and 29th COMAD)10.1145/3632410.3632435(257-265)Online publication date: 4-Jan-2024
  • (2024)ASDMG: business topic clustering-based architecture smell detection for microservice granularitySoftware Quality Journal10.1007/s11219-024-09681-532:3(1341-1374)Online publication date: 1-Sep-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM SIGMOD Record
ACM SIGMOD Record  Volume 49, Issue 4
December 2020
27 pages
ISSN:0163-5808
DOI:10.1145/3456859
Issue’s Table of Contents
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 10 March 2021
Published in SIGMOD Volume 49, Issue 4

Check for updates

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)131
  • Downloads (Last 6 weeks)14
Reflects downloads up to 17 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Predicting Business Process Events in Presence of Anomalous IT EventsProceedings of the 7th Joint International Conference on Data Science & Management of Data (11th ACM IKDD CODS and 29th COMAD)10.1145/3632410.3632437(27-35)Online publication date: 4-Jan-2024
  • (2024)Towards Business Process ObservabilityProceedings of the 7th Joint International Conference on Data Science & Management of Data (11th ACM IKDD CODS and 29th COMAD)10.1145/3632410.3632435(257-265)Online publication date: 4-Jan-2024
  • (2024)ASDMG: business topic clustering-based architecture smell detection for microservice granularitySoftware Quality Journal10.1007/s11219-024-09681-532:3(1341-1374)Online publication date: 1-Sep-2024
  • (2022)Towards Observability for Production Machine Learning PipelinesProceedings of the VLDB Endowment10.14778/3565838.356585315:13(4015-4022)Online publication date: 1-Sep-2022
  • (2022)Tastes Great! Less Filling! High Performance and Accurate Training Data Collection for Self-Driving Database Management SystemsProceedings of the 2022 International Conference on Management of Data10.1145/3514221.3517845(617-630)Online publication date: 10-Jun-2022

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media