skip to main content
research-article

Fay: Extensible Distributed Tracing from Kernels to Clusters

Published: 01 November 2012 Publication History

Abstract

Fay is a flexible platform for the efficient collection, processing, and analysis of software execution traces. Fay provides dynamic tracing through use of runtime instrumentation and distributed aggregation within machines and across clusters. At the lowest level, Fay can be safely extended with new tracing primitives, including even untrusted, fully optimized machine code, and Fay can be applied to running user-mode or kernel-mode software without compromising system stability. At the highest level, Fay provides a unified, declarative means of specifying what events to trace, as well as the aggregation, processing, and analysis of those events.
We have implemented the Fay tracing platform for Windows and integrated it with two powerful, expressive systems for distributed programming. Our implementation is easy to use, can be applied to unmodified production systems, and provides primitives that allow the overhead of tracing to be greatly reduced, compared to previous dynamic tracing platforms. To show the generality of Fay tracing, we reimplement, in experiments, a range of tracing strategies and several custom mechanisms from existing tracing frameworks.
Fay shows that modern techniques for high-level querying and data-parallel processing of disagreggated data streams are well suited to comprehensive monitoring of software execution in distributed systems. Revisiting a lesson from the late 1960s [Deutsch and Grant 1971], Fay also demonstrates the efficiency and extensibility benefits of using safe, statically verified machine code as the basis for low-level execution tracing. Finally, Fay establishes that, by automatically deriving optimized query plans and code for safe extensions, the expressiveness and performance of high-level tracing queries can equal or even surpass that of specialized monitoring tools.

References

[1]
Ansel, J., Marchenko, P., Erlingsson, Ú., Taylor, E., Chen, B., Schuff, D. L., Sehr, D.,Biffle, C. L., and Yee, B. 2011. Language-independent sandboxing of just-in-time compilation and self-modifying code. In Proceedings of the Conference on Programming Language Design and Implementation (PLDI).
[2]
Apache. Hadoop project. http://hadoop.apache.org/.
[3]
Avgustinov, P., Tibble, J., Bodden, E., Hendren, L., Lhotak, O., de Moor, O., Ongkingco, N., and Sittampalam, G. 2006. Efficient trace monitoring. In Proceedings of the Conference on Object-Oriented Programming Systems, Languages, and Applications (OOPSLA).
[4]
Balazinska, M., Balakrishnan, H., Madden, S., and Stonebraker, M. 2005. Fault-tolerance in the Borealis distributed stream processing system. In Proceedings of the ACM SIGMOD International Conference Management of Data (SIGMOD).
[5]
Barham, P., Donnelly, A., Isaacs, R., and Mortier, R. 2004. Using Magpie for request extraction and workload modelling. In Proceedings of the Conference on Operating System Design and Implementation (OSDI).
[6]
Bershad, B. N., Savage, S., Pardyak, P., Becker, D., Fiuczynski, M., and Sirer, E. G. 1995. Protection is a software issue. In Proceedings of the 5th Workshop on Hot Topics in Operating Systems (HotOS-V).
[7]
Bhatia, S., Kumar, A., Fiuczynski, M. E., and Peterson, L. 2008. Lightweight, high-resolution monitoring for troubleshooting production systems. In Proceedings of the Conference on Operating System Design and Implementation (OSDI).
[8]
Bungale, P. P. and Luk, C.-K. 2007. PinOS: A programmable framework for whole-system dynamic instrumentation. In Proceedings of the 3rd International ACM SIGPLAN/SIGOPS Conference on Virtual Execution Environment (VEE).
[9]
Burrows, M., Erlingsson, Ú., Leung, S.-T. A., Vandevoorde, M. T., Waldspurger, C. A., Walker, K., and Weihl, W. E. 2000. Efficient and flexible value sampling. In Proceedings of the Internaational Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS).
[10]
Cantrill, B. 2006. Hidden in plain sight. ACM Queue 4.
[11]
Cantrill, B. M., Shapiro, M. W., and Leventhal, A. H. 2004. Dynamic instrumentation of production systems. In Proceedings of the USENIX Annual Technical Conference.
[12]
Cao, Q., Abdelzaher, T., Stankovic, J., Whitehouse, K., and Luo, L. 2008. Declarative tracepoints: A programmable and application independent debugging system for wireless sensor networks. In Proceedings of the International Conference on Embedded Networked Sensor Systems (SenSys).
[13]
Chambers, C., Raniwala, A., Perry, F., Adams, S., Henry, R. R., Bradshaw, R., and Weizenbaum, N. 2010. FlumeJava: Easy, efficient data-parallel pipelines. In Proceedings of the ACM SIGPLAN 2010 Conference on Programming Language Design and Implementation (PLDI).
[14]
Dean, J. and Ghemawat, S. 2010. MapReduce: A flexible data processing tool. Comm. ACM 53, 1.
[15]
Deutsch, P. and Grant, C. A. 1971. A flexible measurement tool for software systems. In Proceedings of the IFIP Congress 71.
[16]
Eclipse. Callgraph plug-in. http://wiki.eclipse.org/Linux_Tools_Project/Callgraph/User_Guide.
[17]
Eigler, F. C. 2010. Systemtap tutorial. http://sourceware.org/systemtap/tutorial/.
[18]
Erlingsson, Ú., Abadi, M., Vrable, M., Budiu, M., and Necula, G. C. 2006a. XFI: Software guards for system address spaces. In Proceedings of the Conference on Operating System Design and Implementation (OSDI).
[19]
Erlingsson, Ú., Manasse, M., and McSherry, F. 2006b. A cool and practical alternative to traditional hash tables. In Proceedings of the Workshop on Distributed Data and Structures.
[20]
Etsion, Y., Tsafrir, D., Kirkpatrick, S., and Feitelson, D. G. 2007. Fine grained kernel logging with KLogger: Experience and insights. In Proceedings of the 2007 EuroSys Conference.
[21]
flume. Flume: Open source log collection system. http://github.com/cloudera/flume.
[22]
Gao, D., Jensen, S., Snodgrass, R. T., and Soo, M. D. 2005. Join operations in temporal databases. Int. J. Very Large Datab. (VLDB Journal) 14, 2.
[23]
Glerum, K., Kinshumann, K., Greenberg, S., Aul, G., Orgovan, V., Nichols, G., Grant, D., Loihle, G., and Hunt, G. 2009. Debugging in the (very) large: Ten years of implementation and experience. In Proceedings of the 22nd ACM Symposium on Operating System Principles (SOSP’09).
[24]
Goldsmith, S. F., O’Callahan, R., and Aiken, A. 2005. Relational queries over program traces. In Proceedings of the 20th Annual ACM SIGPLAN Conference on Object-Oriented Programming, Systems, Languages and Applications (OOPSLA’05).
[25]
Gupta, A., Mumick, I. S., and Subrahmanian, V. S. 1993. Maintaining views incrementally. In Proceedings of the ACM International Conference on Management of Data.
[26]
Hunt, G. and Brubacher, D. 1998. Detours: Binary interception of Win32 functions. In Proceedings of the USENIX Windows NT Symposium.
[27]
Isard, M., Budiu, M., Yu, Y., Birrell, A., and Fetterly, D. 2007. Dryad: Distributed data-parallel programs from sequential building blocks. In EuroSys.
[28]
Lee, G. L., Schulz, M., Ahn, D. H., Bernat, A., de Supinskil, B. R., Ko, S. Y., and Rountree, B. 2007. Dynamic binary instrumentation and data aggregation on large scale systems. Int. J. Parall. Prog. 35, 3.
[29]
Liblit, B., Aiken, A., Zheng, A. X., and Jordan, M. I. 2003. Bug isolation via remote program sampling. In Proceedings of the ACM SIGPLAN 2003 Conference on Programming Language Design and Implementation (PLDI) 38, 5.
[30]
Marguerie, F., Eichert, S., and Wooley, J. 2008. LINQ in Action. Manning Publications Co.
[31]
Marian, T., Sagar, A., Chen, T., and Weatherspoon, H. 2011. Fmeter: Extracting indexable low-level system signatures by counting kernel function calls. Tech. rep., Cornell University, Computing and Information Science. http://hdl.handle.net/1813/23568.
[32]
Martin, M., Livshits, B., and Lam, M. S. 2005. Finding application errors and security flaws using PQL: A program query language. In Proceedings of the 20th Annual ACM SIGPLAN Conference on Object-Oriented Programming, Systems, Languages and Applications (OOPSLA’05).
[33]
Massie, M. L., Chun, B. N., and Culler, D. E. 2003. The Ganglia distributed monitoring system: Design, implementation and experience. Int. J. Parall. Comput. 30.
[34]
McSherry, F., Yu, Y., Budiu, M., Isard, M., and Fetterly, D. 2011. Scaling Up Machine Learning. Cambridge Univ. Press.
[35]
Microsoft Corp. Determine which queries are holding locks. MSDN. http://msdn.microsoft.com/en-us/library/bb677357.aspx.
[36]
Microsoft Corp. 2003. Introduction to hotpatching. Microsoft TechNet.
[37]
Microsoft Corp. 2006. Kernel patch protection: Frequently asked questions. Windows Hardware Developer Central. http://www.microsoft.com/whdc/driver/kernel/64bitpatch_FAQ.mspx.
[38]
Microsoft Corp. 2010. WDK and developer tools. Windows Hardware Developer Central. http://www.microsoft.com/whdc/DevTools/default.mspx.
[39]
Microsoft Corp. 2011a. Diagnosing and resolving latch contention on SQL Server. Microsoft Download Center. http://www.microsoft.com/en-us/download/details.aspx?displaylang=en&id=%%26665.
[40]
Microsoft Corp. 2011b. Introducing SQL Server extended events. MSDN. http://msdn.microsoft.com/en-us/library/bb630354.aspx.
[41]
Microsoft Corp. 2011c. Use the Microsoft Symbol Server to obtain debug symbol files. http://support.microsoft.com/kb/311503.
[42]
Microsoft Corp. 2012. Microsoft StreamInsight. MSDN. http://msdn.microsoft.com/en-us/library/ee362541.aspx.
[43]
Morrisett, G., Walker, D., Crary, K., and Glew, N. 1998. From System F to typed assembly language. In Proceedings of the Symposium on Principles of Programming Languages (POPL).
[44]
Necula, G. C. 1997. Proof-carrying code. In Proceedings of the Symposium on Principles of Programming Languages (POPL).
[45]
Nethercote, N. and Seward, J. 2007. Valgrind: A framework for heavyweight dynamic binary instrumentation. In Proceedings of the Conference on Programming Language Design and Implementation (PLDI).
[46]
Oney, W. 2002. Programming the Microsoft Windows Driver Model. Microsoft Press.
[47]
Park, I. and Buch, R. 2007. Improve debugging and performance tuning with ETW. MSDN Magazine.
[48]
Passing, J., Schmidt, A., von Lowis, M., and Polze, A. 2009. NTrace: Function boundary tracing for Windows on IA-32. In Proceedings of the Working Conference on Reverse Engineering.
[49]
Peter, S., Baumann, A., Roscoe, T., Barham, P., and Isaacs, R. 2008. 30 seconds is not enough!: A study of operating system timer usage. In Proceedings of the 2008 EuroSys Conference.
[50]
Pietrek, M. 1997. A crash course on the depths of Win32 structured exception handling. Microsoft Syst. J.
[51]
Prasad, V., Cohen, W., Eigler, F. C., Hunt, M., Keniston, J., and Chen, B. 2005. Locating system problems using dynamic instrumentation. In Proceedings of the Ottawa Linux Symposium.
[52]
Ren, G., Tune, E., Moseley, T., Shi, Y., Rus, S., and Hundt, R. 2010. Google-wide profiling: A continuous profiling infrastructure for data centers. IEEE Micro 30, 4.
[53]
Romer, T. H., Lee, D., Voelker, G. M., Wolman, A., Wong, W. A., Baer, J.-L., Bershad, B. N., and Levy, H. M. 1996. The structure and performance of interpreters. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS).
[54]
Rostedt, S. 2009. Debugging the kernel using Ftrace. lwn.net.
[55]
Russinovich, M. E., Solomon, D. A., and Ionescu, A. 2009. Microsoft Windows Internals. Microsoft Press.
[56]
Sigelman, B. H., Barroso, L. A., Burrows, M., Stephenson, P., Plakal, M., Beaver, D., Jaspan, S., and Shanbhag, C. 2010. Dapper, a large-scale distributed systems tracing infrastructure. Tech. rep. 2010-1, Google Inc.
[57]
Skadron, K., Ahuja, P. S., Martonosi, M., and Clark, D. W. 1998. Improving prediction for procedure returns with return-address-stack repair mechanisms. In Proceedings of the Annual ACM/IEEE International Symposium on Microarchitecture (MICRO).
[58]
Small, C. and Seltzer, M. I. 1998. MiSFIT: Constructing safe extensible systems. IEEE Concurr.: Parall. Distrib. Mobile Comput. 6, 3.
[59]
Sookoor, T., Hnat, T., Hooimeijer, P., Weimer, W., and Whitehouse, K. 2009. Macrodebugging: Global views of distributed program execution. In Proceedings of the International Conference on Embedded Networked Sensor Systems (SenSys).
[60]
Srivastava, A., Edwards, A., and Vo, H. 2001. Vulcan: Binary transformation in a distributed environment. Tech. rep. MSR-TR-2001-50, Microsoft Research.
[61]
Stanek, W. 2009. Windows PowerShell(TM) 2.0 Administrator’s Pocket Consultant. Microsoft Press.
[62]
Strosaker, M. Sample real-world use of systemtap. http://zombieprocess.wordpress.com/2008/01/03/sample-real-world-use-of-systemtap/.
[63]
SystemTap. Examples. http://sourceware.org/systemtap/examples/.
[64]
SystemTap. 2006. Bug 2725: function(“*”) probes sometimes crash & burn. http://sources.redhat.com/bugzilla/show_bug.cgi?id=2725.
[65]
Varghese, G. and Lauck, A. 1997. Hashed and hierarchical timing wheels. IEEE/ACM Trans. Netw. 5, 6.
[66]
Verbowski, C., Kiciman, E., Kumar, A., Daniels, B., Lu, S., Lee, J., Wang, Y.-M., and Roussev, R. 2006. Flight data recorder: Monitoring persistent-state interactions to improve systems management. In Proceedings of the Conference on Operating System Design and Implementation (OSDI).
[67]
Wahbe, R., Lucco, S., Anderson, T. E., and Graham, S. L. 1993. Efficient software-based fault isolation. In Proceedings of the 14th ACM Symposium on Operating System Principles (SOSP’93).
[68]
Wisniewski, R. W. and Rosenburg, B. 2003. Efficient, unified, and scalable performance monitoring for multiprocessor operating systems. In Supercomputing.
[69]
Woodard, D. B. and Goldszmidt, M. 2009. Model-based clustering for online crisis identification in distributed computing. Tech. rep. TR-2009-131, MSR.
[70]
Yee, B., Sehr, D., Dardyk, G., Chen, J. B., Muth, R., Ormandy, T., Okasaka, S., Narula, N., and Fullagar, N. 2010. Native client: A sandbox for portable, untrusted x86 native code. Comm. ACM 53, 1, 91--99.
[71]
Yu, Y., Isard, M., Fetterly, D., Budiu, M., Erlingsson, Ú., Kumar, P. G., and Currey, J. 2008. DryadLINQ: A system for general-purpose distributed data-parallel computing using a high-level language. In Proceedings of the Conference on Operating System Design and Implementation (OSDI).
[72]
Yu, Y., Gunda, P. K., and Isard, M. 2009. Distributed aggregation for data-parallel computing: Interfaces and implementations. In Proceedings of the 22nd ACM Symposium on Operating System Principles (SOSP’09).

Cited By

View all
  • (2025)PDCleaner: A multi-view collaborative data compression method for provenance graph-based APT detection systemsComputers & Security10.1016/j.cose.2025.104359152(104359)Online publication date: May-2025
  • (2024)Eliminating eBPF Tracing Overhead on Untraced ProcessesProceedings of the ACM SIGCOMM 2024 Workshop on eBPF and Kernel Extensions10.1145/3672197.3673431(16-22)Online publication date: 4-Aug-2024
  • (2023)LatenSeerProceedings of the 2023 ACM Symposium on Cloud Computing10.1145/3620678.3624787(502-519)Online publication date: 30-Oct-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Computer Systems
ACM Transactions on Computer Systems  Volume 30, Issue 4
November 2012
136 pages
ISSN:0734-2071
EISSN:1557-7333
DOI:10.1145/2382553
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 November 2012
Accepted: 01 August 2012
Received: 01 July 2012
Published in TOCS Volume 30, Issue 4

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article
  • Research
  • Refereed

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)51
  • Downloads (Last 6 weeks)9
Reflects downloads up to 08 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2025)PDCleaner: A multi-view collaborative data compression method for provenance graph-based APT detection systemsComputers & Security10.1016/j.cose.2025.104359152(104359)Online publication date: May-2025
  • (2024)Eliminating eBPF Tracing Overhead on Untraced ProcessesProceedings of the ACM SIGCOMM 2024 Workshop on eBPF and Kernel Extensions10.1145/3672197.3673431(16-22)Online publication date: 4-Aug-2024
  • (2023)LatenSeerProceedings of the 2023 ACM Symposium on Cloud Computing10.1145/3620678.3624787(502-519)Online publication date: 30-Oct-2023
  • (2023)Performal: Formal Verification of Latency Properties for Distributed SystemsProceedings of the ACM on Programming Languages10.1145/35912357:PLDI(368-393)Online publication date: 6-Jun-2023
  • (2023)Diagnosing Distributed Systems Through Log Data AnalysisCongress on Smart Computing Technologies10.1007/978-981-99-2468-4_38(493-507)Online publication date: 11-Jul-2023
  • (2022)Anomaly detection in microservice environments using distributed tracing data analysis and NLPJournal of Cloud Computing: Advances, Systems and Applications10.1186/s13677-022-00296-411:1Online publication date: 13-Aug-2022
  • (2021)Combining Distributed and Kernel Tracing for Performance Analysis of Cloud ApplicationsElectronics10.3390/electronics1021261010:21(2610)Online publication date: 26-Oct-2021
  • (2021)tprofProceedings of the ACM Symposium on Cloud Computing10.1145/3472883.3486994(76-91)Online publication date: 1-Nov-2021
  • (2021)Predicting Performance Anomalies in Software Systems at Run-timeACM Transactions on Software Engineering and Methodology10.1145/344075730:3(1-33)Online publication date: 23-Apr-2021
  • (2021)General, Efficient, and Real-Time Data Compaction Strategy for APT Forensic AnalysisIEEE Transactions on Information Forensics and Security10.1109/TIFS.2021.307628816(3312-3325)Online publication date: 2021
  • Show More Cited By

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media