skip to main content
research-article

Lockless multi-core high-throughput buffering scheme for kernel tracing

Published: 18 December 2012 Publication History

Abstract

Studying execution of concurrent real-time online systems, to identify far-reaching and hard to reproduce latency and performance problems, requires a mechanism able to cope with voluminous information extracted from execution traces. Furthermore, the workload must not be disturbed by tracing, thereby causing the problematic behavior to become unreproducible.
In order to satisfy this low-disturbance constraint, we created the LTTng kernel tracer. It is designed to enable safe and race-free attachment of probes virtually anywhere in the operating system, including sites executed in non-maskable interrupt context.
In addition to being reentrant with respect to all kernel execution contexts, LTTng offers good performance and scalability, mainly due to its use of per-CPU data structures, local atomic operations as main buffer synchronization primitive, and RCU (Read-Copy Update) mechanism to control tracing.
Given that kernel infrastructure used by the tracer could lead to infinite recursion if traced, and typically requires non-atomic synchronization, this paper proposes an asynchronous mechanism to inform the kernel that a buffer is ready to read. This ensures that tracing sites do not require any kernel primitive, and therefore protects from infinite recursion.
This paper presents the core of LTTng's buffering algorithms and measures its performance.

References

[1]
Bligh, M., Schultz, R., and Desnoyers, M. 2007. Linux kernel debugging on Google-sized clusters. In Proceedings of the Ottawa Linux Symposium.
[2]
Cantrill, B. M., Shapiro, M. W., and Leventhal, A. H. 2004. Dynamic instrumentation of production systems. In USENIX. {Online}. Available: http://www.sagecertification.org/events/usenix04/tech/general/full_papers/cantrill/cantrill_html/index.html. {Accessed: October 19, 2009}.
[3]
Corbet, J. 2007a. Kernel Markers. {Online}. Available: Linux Weekly News, http://lwn.net/Articles/245671/. {Accessed: October 19, 2009}.
[4]
Corbet, J. 2007b. On DTrace envy. {Online}. Available: Linux Weekly News, http://lwn.net/Articles/244536/. {Accessed: October 19, 2009}.
[5]
Corbet, J. 2008. Tracing: no shortage of options. {Online}. Available: Linux Weekly News, http://lwn.net/Articles/291091/. {Accessed: October 19, 2009}.
[6]
Desnoyers, M. 2009. Low-impact operating system tracing. Ph.D. thesis, École Polytechnique de Montréal. {Online}. Available: http://www.lttng.org/pub/thesis/desnoyers-dissertation-2009-12.pdf.
[7]
Desnoyers, M. and Dagenais, M. 2006. The LTTng tracer: A low impact performance and behavior monitor for GNU/Linux. In Proceedings of the Ottawa Linux Symposium.
[8]
Desnoyers, M. and Dagenais, M. R. 2010. Synchronization for fast and reentrant operating system kernel tracing. Software -- Practice and Experience 40, 12, 1053--1072.
[9]
Desnoyers, M., McKenney, P. E., Stern, A. S., Dagenais, M. R., and Walpole, J. 2012. User-level implementations of Read-Copy Update. IEEE Transactions on Parallel and Distributed Systems (TPDS) 23, 2 (feb.), 375--382.
[10]
Hillier, G. 2008. System and application analysis with LTTng. {Online}. Available: Siemens Linux Inside, http://www.hillier.de/linux/LTTng-examples.pdf. {Accessed: June 7, 2009}.
[11]
Krieger, O., Auslander, M., Rosenburg, B., Wisniewski, R. W., Xenidis, J., Da Silva, D., and al. 2006. K42: building a complete operating system. In EuroSys '06: Proceedings of the 2006 EuroSys conference. 133--145.
[12]
Mavinakayanahalli, A., Panchamukhi, P., Keniston, J., Keshavamurthy, A., and Hiramatsu, M. 2006. Probing the guts of kprobes. In Proceedings of the Ottawa Linux Symposium.
[13]
McKenney, P. E. 2004. Exploiting deferred destruction: An analysis of read-copy-update techniques in operating system kernels. Ph.D. thesis, OGI School of Science and Engineering at Oregon Health and Sciences University. {Online}. Available: http://www.rdrop.com/users/paulmck/RCU/ RCUdissertation.2004.07.14e1.pdf. {Accessed: October 19, 2009}.
[14]
Prasad, V., Cohen, W., Eigler, F. C., Hunt, M., Keniston, J., and Chen, B. 2005. Locating system problems using dynamic instrumentation. In Proceedings of the Ottawa Linux Symposium. {Online}. Available: http://sourceware.org/systemtap/systemtap-ols.pdf. {Accessed: October 19, 2009}.
[15]
Wisniewski, R. and Rosenburg, B. 2003. Efficient, unified, and scalable performance monitoring for multiprocessor operating systems. In Supercomputing, 2003 ACM/IEEE Conference. IEEE, 3--3.
[16]
Wisniewski, R. W., Azimi, R., Desnoyers, M., Michael, M. M., Moreira, J., Shiloach, D., and Soares, L. 2007. Experiences understanding performance in a commercial scale-out environment. In European Conference on Parallel Processing (Euro-Par).
[17]
Yaghmour, K. and Dagenais, M. R. 2000. The Linux Trace Toolkit. Linux Journal. {Online}. Available: http://www.linuxjournal.com/article/3829. {Accessed: October 19, 2009}.
[18]
Zanussi, T.,Wisniewski, K. Y. R., Moore, R., and Dagenais, M. 2003. RelayFS: An efficient unified approach for transmitting data from kernel to user space. In Proceedings of the Ottawa Linux Symposium. 519--531. {Online}. Available: http://www. research.ibm.com/people/b/bob/papers/ols03.pdf. {Accessed: October 19, 2009}.

Cited By

View all
  • (2024)Software patterns and data structures for the runtime coordination of robots, with a focus on real-time execution performanceFrontiers in Robotics and AI10.3389/frobt.2024.136304111Online publication date: 4-Sep-2024
  • (2022)Performance evaluation of complex multi-thread applications through execution path analysisPerformance Evaluation10.1016/j.peva.2022.102289155:COnline publication date: 1-Jun-2022
  • (2021)Combining Distributed and Kernel Tracing for Performance Analysis of Cloud ApplicationsElectronics10.3390/electronics1021261010:21(2610)Online publication date: 26-Oct-2021
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM SIGOPS Operating Systems Review
ACM SIGOPS Operating Systems Review  Volume 46, Issue 3
December 2012
81 pages
ISSN:0163-5980
DOI:10.1145/2421648
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 18 December 2012
Published in SIGOPS Volume 46, Issue 3

Check for updates

Author Tags

  1. LTTng
  2. Linux
  3. atomic
  4. kernel
  5. lockless
  6. modular arithmetic
  7. tracing

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)12
  • Downloads (Last 6 weeks)2
Reflects downloads up to 30 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Software patterns and data structures for the runtime coordination of robots, with a focus on real-time execution performanceFrontiers in Robotics and AI10.3389/frobt.2024.136304111Online publication date: 4-Sep-2024
  • (2022)Performance evaluation of complex multi-thread applications through execution path analysisPerformance Evaluation10.1016/j.peva.2022.102289155:COnline publication date: 1-Jun-2022
  • (2021)Combining Distributed and Kernel Tracing for Performance Analysis of Cloud ApplicationsElectronics10.3390/electronics1021261010:21(2610)Online publication date: 26-Oct-2021
  • (2020)Re-AnimatorProceedings of the 13th ACM International Systems and Storage Conference10.1145/3383669.3398276(61-74)Online publication date: 30-May-2020
  • (2019)Efficient large-scale heterogeneous debugging using dynamic tracingJournal of Systems Architecture: the EUROMICRO Journal10.1016/j.sysarc.2019.02.01698:C(346-360)Online publication date: 1-Sep-2019
  • (2019)Efficient Methods for Trace Analysis ParallelizationInternational Journal of Parallel Programming10.1007/s10766-019-00631-4Online publication date: 9-Feb-2019
  • (2019)LTTng‐HSA: Bringing LTTng tracing to HSA‐based GPU runtimesConcurrency and Computation: Practice and Experience10.1002/cpe.523131:17Online publication date: 3-Apr-2019
  • (2018)CrowdNaviProceedings of the ACM on Human-Computer Interaction10.1145/32744482:CSCW(1-23)Online publication date: 1-Nov-2018
  • (2018)Informating CrisisProceedings of the ACM on Human-Computer Interaction10.1145/32744312:CSCW(1-22)Online publication date: 1-Nov-2018
  • (2018)P2P-NETACM Transactions on Graphics10.1145/3197517.320128837:4(1-13)Online publication date: 30-Jul-2018
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media