skip to main content
10.1145/2043556.2043585acmconferencesArticle/Chapter ViewAbstractPublication PagessospConference Proceedingsconference-collections
research-article

Fay: extensible distributed tracing from kernels to clusters

Published: 23 October 2011 Publication History

Abstract

Fay is a flexible platform for the efficient collection, processing, and analysis of software execution traces. Fay provides dynamic tracing through use of runtime instrumentation and distributed aggregation within machines and across clusters. At the lowest level, Fay can be safely extended with new tracing primitives, including even untrusted, fully-optimized machine code, and Fay can be applied to running user-mode or kernel-mode software without compromising system stability. At the highest level, Fay provides a unified, declarative means of specifying what events to trace, as well as the aggregation, processing, and analysis of those events.
We have implemented the Fay tracing platform for Windows and integrated it with two powerful, expressive systems for distributed programming. Our implementation is easy to use, can be applied to unmodified production systems, and provides primitives that allow the overhead of tracing to be greatly reduced, compared to previous dynamic tracing platforms. To show the generality of Fay tracing, we reimplement, in experiments, a range of tracing strategies and several custom mechanisms from existing tracing frameworks.
Fay shows that modern techniques for high-level querying and data-parallel processing of disaggregated data streams are well suited to comprehensive monitoring of software execution in distributed systems. Revisiting a lesson from the late 1960's [15], Fay also demonstrates the efficiency and extensibility benefits of using safe, statically-verified machine code as the basis for low-level execution tracing. Finally, Fay establishes that, by automatically deriving optimized query plans and code for safe extensions, the expressiveness and performance of high-level tracing queries can equal or even surpass that of specialized monitoring tools.

References

[1]
J. Ansel, P. Marchenko, Ú. Erlingsson, E. Taylor, B. Chen, D. L. Schuff, D. Sehr, C. L. Biffle, and B. Yee. Language-independent sandboxing of just-in-time compilation and self-modifying code. In PLDI, 2011.
[2]
Apache. Hadoop project. http://hadoop.apache.org/.
[3]
P. Avgustinov, J. Tibble, E. Bodden, L. Hendren, O. Lhotak, O. de Moor, N. Ongkingco, and G. Sittampalam. Efficient trace monitoring. In OOPSLA, 2006.
[4]
M. Balazinska, H. Balakrishnan, S. Madden, and M. Stonebraker. Fault-tolerance in the Borealis distributed stream processing system. In SIGMOD, 2005.
[5]
P. Barham, A. Donnelly, R. Isaacs, and R. Mortier. Using Magpie for request extraction and workload modelling. In OSDI, 2004.
[6]
B. N. Bershad, S. Savage, P. Pardyak, D. Becker, M. Fiuczynski, and E. G. Sirer. Protection is a software issue. In HotOS, 1995.
[7]
S. Bhatia, A. Kumar, M. E. Fiuczynski, and L. Peterson. Lightweight, high-resolution monitoring for troubleshooting production systems. In OSDI, 2008.
[8]
P. P. Bungale and C.-K. Luk. PinOS: A programmable framework for whole-system dynamic instrumentation. In VEE, 2007.
[9]
M. Burrows, Ú. Erlingsson, S.-T. A. Leung, M. T. Vandevoorde, C. A. Waldspurger, K. Walker, and W. E. Weihl. Efficient and flexible value sampling. In ASPLOS, 2000.
[10]
B. Cantrill. Hidden in plain sight. ACM Queue, 4, 2006.
[11]
B. M. Cantrill, M. W. Shapiro, and A. H. Leventhal. Dynamic instrumentation of production systems. In USENIX Annual Technical Conf., 2004.
[12]
Q. Cao, T. Abdelzaher, J. Stankovic, K. Whitehouse, and L. Luo. Declarative tracepoints: A programmable and application independent debugging system for wireless sensor networks. In SenSys, 2008.
[13]
C. Chambers, A. Raniwala, F. Perry, S. Adams, R. R. Henry, R. Bradshaw, and N. Weizenbaum. FlumeJava: Easy, efficient data-parallel pipelines. In PLDI, 2010.
[14]
J. Dean and S. Ghemawat. MapReduce: A flexible data processing tool. Comm. ACM, 53(1), 2010.
[15]
P. Deutsch and C. A. Grant. A flexible measurement tool for software systems. In IFIP, 1971.
[16]
Eclipse. Callgraph plug-in. http://wiki.eclipse.org/Linux_Tools_Project/Callgraph/User_Guide.
[17]
F. C. Eigler. Systemtap tutorial, Dec. 2010. http://sourceware.org/systemtap/tutorial/.
[18]
Ú. Erlingsson, M. Abadi, M. Vrable, M. Budiu, and G. C. Necula. XFI: Software guards for system address spaces. In OSDI, 2006.
[19]
Ú, Erlingsson, M. Manasse, and F. McSherry. A cool and practical alternative to traditional hash tables. In Workshop on Distributed Data and Structures, 2006.
[20]
Y. Etsion, D. Tsafrir, S. Kirkpatrick, and D. G. Feitelson. Fine grained kernel logging with KLogger: Experience and insights. In EuroSys, 2007.
[21]
Flume: Open source log collection system. http://github.com/cloudera/flume.
[22]
K. Glerum, K. Kinshumann, S. Greenberg, G. Aul, V. Orgovan, G. Nichols, D. Grant, G. Loihle, and G. Hunt. Debugging in the (very) large: Ten years of implementation and experience. In SOSP, 2009.
[23]
S. F. Goldsmith, R. O'Callahan, and A. Aiken. Relational queries over program traces. In OOPSLA, 2005.
[24]
A. Gupta, I. S. Mumick, and V. S. Subrahmanian. Maintaining views incrementally. In ACM Intl. Conf. on Management of Data, 1993.
[25]
G. Hunt and D. Brubacher. Detours: Binary interception of Win32 functions. In USENIX Windows NT Symposium, 1998.
[26]
M. Isard, M. Budiu, Y. Yu, A. Birrell, and D. Fetterly. Dryad: Distributed data-parallel programs from sequential building blocks. In EuroSys, 2007.
[27]
G. L. Lee, M. Schulz, D. H. Ahn, A. Bernat, B. R. de Supinskil, S. Y. Ko, and B. Rountree. Dynamic binary instrumentation and data aggregation on large scale systems. Intl. Journal on Parallel Programming, 35(3), 2007.
[28]
B. Liblit, A. Aiken, A. X. Zheng, and M. I. Jordan. Bug isolation via remote program sampling. PLDI, 38(5), 2003.
[29]
F. Marguerie, S. Eichert, and J. Wooley. LINQ in action. Manning Publications Co., 2008.
[30]
T. Marian, A. Sagar, T. Chen, and H. Weatherspoon. Fmeter: Extracting Indexable Low-level System Signatures by Counting Kernel Function Calls. Technical Report http://hdl.handle.net/1813/23568, Cornell University, Computing and Information Science, 2011.
[31]
M. Martin, B. Livshits, and M. S. Lam. Finding application errors and security flaws using PQL: A program query language. In OOPSLA, 2005.
[32]
M. L. Massie, B. N. Chun, and D. E. Culler. The Ganglia distributed monitoring system: Design, implementation and experience. Intl. Journal on Parallel Computing, 30, 2003.
[33]
F. McSherry, Y. Yu, M. Budiu, M. Isard, and D. Fetterly. Scaling Up Machine Learning. Cambridge U. Press, 2011.
[34]
Microsoft Corp. Introduction to hotpatching. Microsoft TechNet, 2003.
[35]
Microsoft Corp. Kernel patch protection: Frequently asked questions. Windows Hardware Developer Central, 2006. http://www.microsoft.com/whdc/driver/kernel/64bitpatch_FAQ.mspx.
[36]
Microsoft Corp. WDK and developer tools. Windows Hardware Developer Central, 2010. http://www.microsoft.com/whdc/DevTools/default.mspx.
[37]
G. Morrisett, D. Walker, K. Crary, and N. Glew. From System F to typed assembly language. In POPL, 1998.
[38]
G. C. Necula. Proof-carrying code. In POPL, 1997.
[39]
N. Nethercote and J. Seward. Valgrind: A framework for heavyweight dynamic binary instrumentation. In PLDI, 2007.
[40]
W. Oney. Programming the Microsoft Windows Driver Model. Microsoft Press, 2002.
[41]
I. Park and R. Buch. Improve debugging and performance tuning with ETW. MSDN Magazine, April 2007.
[42]
J. Passing, A. Schmidt, M. von Lowis, and A. Polze. NTrace: Function boundary tracing for Windows on IA-32. In Working Conference on Reverse Engineering, 2009.
[43]
S. Peter, A. Baumann, T. Roscoe, P. Barham, and R. Isaacs. 30 seconds is not enough!: A study of operating system timer usage. In EuroSys, 2008.
[44]
M. Pietrek. A crash course on the depths of Win32 structured exception handling. Microsoft Systems Journal, 1997.
[45]
V. Prasad, W. Cohen, F. C. Eigler, M. Hunt, J. Keniston, and B. Chen. Locating system problems using dynamic instrumentation. In Ottawa Linux Symposium, 2005.
[46]
G. Ren, E. Tune, T. Moseley, Y. Shi, S. Rus, and R. Hundt. Google-wide profiling: A continuous profiling infrastructure for data centers. IEEE Micro, 30(4), 2010.
[47]
T. H. Romer, D. Lee, G. M. Voelker, A. Wolman, W. A. Wong. J.-L. Baer, B. N. Bershad, and H. M. Levy. The structure and performance of interpreters. In ASPLOS, 1996.
[48]
S. Rostedt. Debugging the kernel using Ftrace. lwn.net, 2009.
[49]
M. E. Russinovich, D. A. Solomon, and A. Ionescu. Microsoft Windows Internals. Microsoft Press, 2009.
[50]
B. H. Sigelman, L. A. Barroso, M. Burrows, P. Stephenson, M. Plakal, D. Beaver, S. Jaspan, and C. Shanbhag. Dapper, a large-scale distributed systems tracing infrastructure. Technical Report 2010-1, Google Inc., 2010.
[51]
K. Skadron, P. S. Ahuja, M. Martonosi, and D. W. Clark. Improving prediction for procedure returns with return-address-stack repair mechanisms. In MICRO, 1998.
[52]
C. Small and M. I. Seltzer. MiSFIT: Constructing safe extensible systems. IEEE Concurrency: Parallel, Distributed and Mobile Computing, 6(3), 1998.
[53]
T. Sookoor, T. Hnat, P. Hooimeijer, W. Weimer, and K. Whitehouse. Macrodebugging: Global views of distributed program execution. In SenSys, 2009.
[54]
A. Srivastava, A. Edwards, and H. Vo. Vulcan: Binary transformation in a distributed environment. Technical Report MSR-TR-2001-50, Microsoft Research, 2001.
[55]
W. Stanek. Windows PowerShell(TM) 2.0 Administrator's Pocket Consultant. Microsoft Press, 2009.
[56]
M. Strosaker. Sample real-world use of SystemTap. http://zombieprocess.wordpress.com/2008/01/03/sample-real-world-use-of-systemtap/.
[57]
SystemTap. Examples. http://sourceware.org/systemtap/examples/.
[58]
SystemTap. Bug 2725: function("*") probes sometimes crash & burn, June 2006. http://sources.redhat.com/bugzilla/show_bug.cgi?id=2725.
[59]
G. Varghese and A. Lauck. Hashed and hierarchical timing wheels. IEEE/ACM Transactions on Networking, 5(6), 1997.
[60]
C. Verbowski, E. Kiciman, A. Kumar, B. Daniels, S. Lu, J. Lee, Y.-M. Wang, and R. Roussev. Flight data recorder: Monitoring persistent-state interactions to improve systems management. In OSDI, 2006.
[61]
R. Wahbe, S. Lucco, T. E. Anderson, and S. L. Graham. Efficient software-based fault isolation. In SOSP, 1993.
[62]
R. W. Wisniewski and B. Rosenburg. Efficient, unified, and scalable performance monitoring for multiprocessor operating systems. In Supercomputing, 2003.
[63]
D. B. Woodard and M. Goldszmidt. Model-based clustering for online crisis identification in distributed computing. Technical Report TR-2009-131, MSR, 2009.
[64]
B. Yee, D. Sehr, G. Dardyk, J. B. Chen, R. Muth, T. Ormandy, S. Okasaka, N. Narula, and N. Fullagar. Native client: A sandbox for portable, untrusted x86 native code. Comm. ACM, 53(1):91--99, 2010.
[65]
Y. Yu, P. K. Gunda, and M. Isard. Distributed aggregation for data-parallel computing: Interfaces and implementations. In SOSP, 2009.
[66]
Y. Yu, M. Isard, D. Fetterly, M. Budiu, Ú. Erlingsson, P. G. Kumar, and J. Currey. DryadLINQ: A system for general-purpose distributed data-parallel computing using a high-level language. In OSDI, 2008.

Cited By

View all
  • (2022)Unstructured Log Analysis for System Anomaly Detection—A StudyAdvances in Data Science and Management10.1007/978-981-16-5685-9_48(497-509)Online publication date: 13-Feb-2022
  • (2021)Automating instrumentation choices for performance problems in distributed applications with VAIFProceedings of the ACM Symposium on Cloud Computing10.1145/3472883.3487000(61-75)Online publication date: 1-Nov-2021
  • (2021)Automated Analysis of Distributed Tracing: Challenges and Research DirectionsJournal of Grid Computing10.1007/s10723-021-09551-519:1Online publication date: 1-Mar-2021
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SOSP '11: Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles
October 2011
417 pages
ISBN:9781450309776
DOI:10.1145/2043556
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 23 October 2011

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article

Conference

SOSP '11
Sponsor:

Acceptance Rates

Overall Acceptance Rate 174 of 961 submissions, 18%

Upcoming Conference

SOSP '25
ACM SIGOPS 31st Symposium on Operating Systems Principles
October 13 - 16, 2025
Seoul , Republic of Korea

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)5
  • Downloads (Last 6 weeks)2
Reflects downloads up to 08 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2022)Unstructured Log Analysis for System Anomaly Detection—A StudyAdvances in Data Science and Management10.1007/978-981-16-5685-9_48(497-509)Online publication date: 13-Feb-2022
  • (2021)Automating instrumentation choices for performance problems in distributed applications with VAIFProceedings of the ACM Symposium on Cloud Computing10.1145/3472883.3487000(61-75)Online publication date: 1-Nov-2021
  • (2021)Automated Analysis of Distributed Tracing: Challenges and Research DirectionsJournal of Grid Computing10.1007/s10723-021-09551-519:1Online publication date: 1-Mar-2021
  • (2020)Logging Inter-Thread Data Dependencies in Linux KernelIEICE Transactions on Information and Systems10.1587/transinf.2019EDP7255E103.D:7(1633-1646)Online publication date: 1-Jul-2020
  • (2019)An automated, cross-layer instrumentation framework for diagnosing performance problems in distributed applicationsProceedings of the ACM Symposium on Cloud Computing10.1145/3357223.3362704(165-170)Online publication date: 20-Nov-2019
  • (2019)You can't debug what you can't seeProceedings of the Workshop on Hot Topics in Operating Systems10.1145/3317550.3321428(163-169)Online publication date: 13-May-2019
  • (2019)TEE-Perf: A Profiler for Trusted Execution Environments2019 49th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN)10.1109/DSN.2019.00050(414-421)Online publication date: Jun-2019
  • (2018)SledgehammerProceedings of the 13th USENIX conference on Operating Systems Design and Implementation10.5555/3291168.3291208(545-560)Online publication date: 8-Oct-2018
  • (2018)wPerfProceedings of the 13th USENIX conference on Operating Systems Design and Implementation10.5555/3291168.3291207(527-543)Online publication date: 8-Oct-2018
  • (2018)Pivot TracingACM Transactions on Computer Systems10.1145/320810435:4(1-28)Online publication date: 5-Dec-2018
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media