skip to main content
10.1145/2043556.2043589acmconferencesArticle/Chapter ViewAbstractPublication PagessospConference Proceedingsconference-collections
research-article

Pervasive detection of process races in deployed systems

Published: 23 October 2011 Publication History

Abstract

Process races occur when multiple processes access shared operating system resources, such as files, without proper synchronization. We present the first study of real process races and the first system designed to detect them. Our study of hundreds of applications shows that process races are numerous, difficult to debug, and a real threat to reliability. To address this problem, we created RacePro, a system for automatically detecting these races. RacePro checks deployed systems in-vivo by recording live executions then deterministically replaying and checking them later. This approach increases checking coverage beyond the configurations or executions covered by software vendors or beta testing sites. RacePro records multiple processes, detects races in the recording among system calls that may concurrently access shared kernel objects, then tries different execution orderings of such system calls to determine which races are harmful and result in failures. To simplify race detection, RacePro models under-specified system calls based on load and store micro-operations. To reduce false positives and negatives, RacePro uses a replay and go-live mechanism to distill harmful races from benign ones. We have implemented RacePro in Linux, shown that it imposes only modest recording overhead, and used it to detect a number of previously unknown bugs in real applications caused by process races.

References

[1]
All resource races studied. http://rcs.cs.columbia.edu/projects/racepro/.
[2]
Launchpad Software Collaboration Platform. https://launchpad.net/.
[3]
The Debian Almquist Shell. http://gondor.apana.org.au/~herbert/dash/.
[4]
Upstart: an Event-Based Replacement for System V Init Scripts. http://upstart.ubuntu.com/.
[5]
A. Aviram, S.-C. Weng, S. Hu, and B. Ford. Efficient System-Enforced Deterministic Parallelism. In Proceedings of the 9th Symposium on Operating Systems Design and Implementation (OSDI '10), Oct. 2010.
[6]
T. Bergan, N. Hunt, L. Ceze, and S. D. Gribble. Deterministic Process Groups in dOS. In Proceedings of the 9th Symposium on Operating Systems Design and Implementation (OSDI '10), Oct. 2010.
[7]
J. Chow, T. Garfinkel, and P. M. Chen. Decoupling Dynamic Program Analysis from Execution in Virtual Environments. In Proceedings of the USENIX Annual Technical Conference (USENIX '08), June 2008.
[8]
M. Chu, C. Murphy, and G. Kaiser. Distributed In Vivo Testing of Software Applications. In Proceedings of the First IEEE International Conference on Software Testing, Verification, and Validation (ICST '08), Apr. 2008.
[9]
H. Cui, J. Wu, C.-C. Tsai, and J. Yang. Stable Deterministic Multithreading through Schedule Memoization. In Proceedings of the 9th Symposium on Operating Systems Design and Implementation (OSDI '10), Oct. 2010.
[10]
G. W. Dunlap, D. G. Lucchetti, M. A. Fetterman, and P. M. Chen. Execution Replay of Multiprocessor Virtual Machines. In Proceedings of the 4th International Conference on Virtual Execution Environments (VEE '08), Mar. 2008.
[11]
D. Engler and K. Ashcraft. RacerX: Effective, Static Detection of Race Conditions and Deadlocks. In Proceedings of the 19th ACM Symposium on Operating Systems Principles (SOSP '03), Oct. 2003.
[12]
C. Flanagan and P. Godefroid. Dynamic Partial-Order Reduction for Model Checking Software. In Proceedings of the 32nd Annual Symposium on Principles of Programming Languages (POPL '05), Jan. 2005.
[13]
P. Fonseca, C. Li, and R. Rodrigues. Finding Complex Concurrency Bugs in Large Multi-Threaded Applications. In Proceedings of the 6th ACM European Conference on Computer Systems (EUROSYS '11), Apr. 2011.
[14]
Q. Gao, W. Zhang, Z. Chen, M. Zheng, and F. Qin. 2ndStrike: Towards Manifesting Hidden Concurrency Typestate Bugs. In Proceedings of the 16th International Conference on Architecture Support for Programming Languages and Operating Systems (ASPLOS '11), Mar. 2011.
[15]
Z. Guo, X. Wang, J. Tang, X. Liu, Z. Xu, M. Wu, M. F. Kaashoek, and Z. Zhang. R2: An Application-Level Kernel for Record and Replay. In Proceedings of the 8th Symposium on Operating Systems Design and Implementation (OSDI '08), Dec. 2008.
[16]
O. Laadan, C.-C. Tsai, N. Viennot, C. Blinn, P. S. Du, J. Yang, and J. Nieh. Finding Concurrency Errors in Sequential Code---OS-level, In-vivo Model Checking of Process Races. In Proceedings of the 13th USENIX Workshop on Hot Topics in Operating Systems (HOTOS '11), May 2011.
[17]
O. Laadan, N. Viennot, and J. Nieh. Transparent, Lightweight Application Execution Replay on Commodity Multiprocessor Operating Systems. In Proceedings of the ACM SIGMETRICS Conference on Measurement and Modeling of Computer Systems (SIGMETRICS '10), June 2010.
[18]
L. Lamport. Time, Clocks, and the Ordering of Events in a Distributed System. Comm. ACM, 21(7):558--565, 1978.
[19]
T. J. LeBlanc and J. M. Mellor-Crummey. Debugging Parallel Programs with Instant Replay. IEEE Trans. Comput., 36(4):471--482, 1987.
[20]
S. Lu, S. Park, E. Seo, and Y. Zhou. Learning from Mistakes: a Comprehensive Study on Real World Concurrency Bug Characteristics. In Proceedings of the 13th International Conference on Architecture Support for Programming Languages and Operating Systems (ASPLOS '08), Mar. 2008.
[21]
S. Lu, J. Tucek, F. Qin, and Y. Zhou. AVIO: Detecting Atomicity Violations via Access Interleaving Invariants. In Proceedings of the 12th International Conference on Architecture Support for Programming Languages and Operating Systems (ASPLOS '06), Oct. 2006.
[22]
F. Mattern. Dynamic Partial-Order Reduction for Model Checking Software. In Proceedings of the 32nd Annual Symposium on Principles of Programming Languages (POPL '05), Oct. 1988.
[23]
M. Musuvathi, S. Qadeer, T. Ball, G. Basler, P. A. Nainar, and I. Neamtiu. Finding and Reproducing Heisenbugs in Concurrent Programs. In Proceedings of the 8th Symposium on Operating Systems Design and Implementation (OSDI '08), Dec. 2008.
[24]
M. Naik, A. Aiken, and J. Whaley. Effective Static Race Detection For Java. In Proceedings of the ACM SIGPLAN 2006 Conference on Programming Language Design and Implementation (PLDI '06), 2006.
[25]
S. Narayanasamy, Z. Wang, J. Tigani, A. Edwards, and B. Calder. Automatically Classifying Benign and Harmful Data Racesallusing Replay Analysis. In Proceedings of the ACM SIGPLAN 2007 Conference on Programming Language Design and Implementation (PLDI '07), June 2007.
[26]
E. B. Nightingale, D. Peek, P. M. Chen, and J. Flinn. Parallelizing Security Checks on Commodity Hardware. In Proceedings of the 13th International Conference on Architecture Support for Programming Languages and Operating Systems (ASPLOS '08), Mar. 2008.
[27]
S. Osman, D. Subhraveti, G. Su, and J. Nieh. The Design and Implementation of Zap: A System for Migrating Computing Environments. In Proceedings of the 5th Symposium on Operating Systems Design and Implementation (OSDI '02), Dec. 2002.
[28]
D. E. Porter, O. S. Hofmann, C. J. Rossbach, A. Benn, and E. Witchel. Operating System Transactions. In Proceedings of the 22nd ACM Symposium on Operating Systems Principles (SOSP '09), Oct. 2009.
[29]
D. P. Quigley, J. Sipek, C. P. Wright, and E. Zadok. UnionFS: User- and Community-oriented Development of a Unification Filesystem. In Proceedings of the 2006 Linux Symposium, July 2006.
[30]
K. Sen. Race Directed Random Testing of Concurrent Programs. In Proceedings of the ACM SIGPLAN 2008 Conference on Programming Language Design and Implementation (PLDI '08), June 2008.
[31]
S. M. Srinivasan, S. Kandula, C. R. Andrews, and Y. Zhou. Flashback: A Lightweight Extension for Rollback and Deterministic Replay for Software Debugging. In Proceedings of the USENIX Annual Technical Conference (USENIX '04), June 2004.
[32]
D. Tsafrir, T. Hertz, D. Wagner, and D. Da Silva. Portably Solving File TOCTTOU Races with Hardness Amplification. In Proceedings of the 6th USENIX Conference on File and Storage Technologies (FAST '08), Feb. 2008.
[33]
E. Tsyrklevich and B. Yee. Dynamic Detection and Prevention of Race Conditions in File Accesses. In Proceedings of the 12th Conference on USENIX Security Symposium, Aug. 2003.
[34]
University of California at Berkeley. Open-Source Software for Volunteer Computing and Grid Computing. http://boinc.berkeley.edu/.
[35]
J. Wei and C. Pu. TOCTTOU Vulnerabilities in UNIX-Style File Systems: an Anatomical Study. In Proceedings of the 4th USENIX Conference on File and Storage Technologies (FAST '05), Dec. 2005.
[36]
J. Wu, H. Cui, and J. Yang. Bypassing Races in Live Applications with Execution Filters. In Proceedings of the 9th Symposium on Operating Systems Design and Implementation (OSDI '10), Oct. 2010.
[37]
M. Yabandeh, N. Knezevic, D. Kostic, and V. Kuncak. CrystalBall: Predicting and Preventing Inconsistencies in Deployed Distributed Systems. In Proceedings of the 6th Symposium on Networked Systems Design and Implementation (NSDI '09), Apr. 2009.
[38]
Y. Yu, T. Rodeheffer, and W. Chen. RaceTrack: Efficient Detection of Data Race Conditions via Adaptive Tracking. In Proceedings of the 20th ACM Symposium on Operating Systems Principles (SOSP '05), Oct. 2005.
[39]
M. Zalewski. Delivering Signals for Fun and Profit. Bindview Corporation, 2001.
[40]
W. Zhang, J. Lim, R. Olichandran, J. Scherpelz, G. Jin, S. Lu, and T. Reps. ConSeq: Detecting Concurrency Bugs through Sequential Errors. In Proceedings of the 16th International Conference on Architecture Support for Programming Languages and Operating Systems (ASPLOS '11), Mar. 2011.

Cited By

View all
  • (2023)Diagnosing Kernel Concurrency Failures with AITIAProceedings of the Eighteenth European Conference on Computer Systems10.1145/3552326.3567486(94-110)Online publication date: 8-May-2023
  • (2022)A deep study of the effects and fixes of server-side request races in web applicationsProceedings of the 19th International Conference on Mining Software Repositories10.1145/3524842.3528463(744-756)Online publication date: 23-May-2022
  • (2022)ReDProProceedings of the 2022 ACM Southeast Conference10.1145/3476883.3520207(106-112)Online publication date: 18-Apr-2022
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SOSP '11: Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles
October 2011
417 pages
ISBN:9781450309776
DOI:10.1145/2043556
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 23 October 2011

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. debugging
  2. model checking
  3. race detection
  4. record-replay

Qualifiers

  • Research-article

Funding Sources

Conference

SOSP '11
Sponsor:

Acceptance Rates

Overall Acceptance Rate 174 of 961 submissions, 18%

Upcoming Conference

SOSP '25
ACM SIGOPS 31st Symposium on Operating Systems Principles
October 13 - 16, 2025
Seoul , Republic of Korea

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)12
  • Downloads (Last 6 weeks)0
Reflects downloads up to 01 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2023)Diagnosing Kernel Concurrency Failures with AITIAProceedings of the Eighteenth European Conference on Computer Systems10.1145/3552326.3567486(94-110)Online publication date: 8-May-2023
  • (2022)A deep study of the effects and fixes of server-side request races in web applicationsProceedings of the 19th International Conference on Mining Software Repositories10.1145/3524842.3528463(744-756)Online publication date: 23-May-2022
  • (2022)ReDProProceedings of the 2022 ACM Southeast Conference10.1145/3476883.3520207(106-112)Online publication date: 18-Apr-2022
  • (2022)Defense and Attack Techniques Against File-Based TOCTOU Vulnerabilities: A Systematic ReviewIEEE Access10.1109/ACCESS.2022.315306410(21742-21758)Online publication date: 2022
  • (2021)SnowboardProceedings of the ACM SIGOPS 28th Symposium on Operating Systems Principles10.1145/3477132.3483549(66-83)Online publication date: 26-Oct-2021
  • (2021)Understanding and detecting server-side request races in web applicationsProceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering10.1145/3468264.3468594(842-854)Online publication date: 20-Aug-2021
  • (2019)SCMinerProceedings of the 34th IEEE/ACM International Conference on Automated Software Engineering10.1109/ASE.2019.00055(515-526)Online publication date: 10-Nov-2019
  • (2018)Replay without recording of production bugs for service oriented applicationsProceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering10.1145/3238147.3238186(452-463)Online publication date: 3-Sep-2018
  • (2017)DESCRY: reproducing system-level concurrency failuresProceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering10.1145/3106237.3106266(694-704)Online publication date: 21-Aug-2017
  • (2017)SimEvo: Testing Evolving Multi-process Software Systems2017 IEEE International Conference on Software Maintenance and Evolution (ICSME)10.1109/ICSME.2017.29(204-215)Online publication date: Sep-2017
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media