skip to main content
10.1145/3600006.3613161acmconferencesArticle/Chapter ViewAbstractPublication PagessospConference Proceedingsconference-collections

Acto: Automatic End-to-End Testing for Operation Correctness of Cloud System Management

Authors Info & Claims
Published:23 October 2023Publication History

ABSTRACT

Cloud systems are increasingly being managed by operation programs termed operators, which automate tedious, human-based operations. Operators of modern management platforms like Kubernetes, Twine, and ECS implement declarative interfaces based on the state-reconciliation principle. An operation declares a desired system state and the operator automatically reconciles the system to that declared state.

Operator correctness is critical, given the impacts on system operations---bugs in operator code put systems in un-desired or error states, with severe consequences. However, validating operator correctness is challenging due to the enormous system-state space and complex operation interface. A correct operator must not only satisfy correctness properties of its own code, but it must also maintain managed systems in desired states. Unfortunately, end-to-end testing of operators significantly falls short.

We present Acto, the first automatic end-to-end testing technique for cloud system operators. Acto uses a state-centric approach to test an operator together with a managed system. Acto continuously instructs an operator to reconcile a system to different states and checks if the system successfully reaches those desired states. Acto models operations as state transitions and systematically realizes state-transition sequences to exercise supported operations in different scenarios. Acto's oracles automatically check whether a system's state is as desired. To date, Acto has helped find 56 serious new bugs (42 were confirmed and 30 have been fixed) in eleven Kubernetes operators with few false alarms.

References

  1. Assigning Pods to Nodes. https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node/.Google ScholarGoogle Scholar
  2. Cloud Native Computing Foundation Operator White Paper. https://www.cncf.io/wp-content/uploads/2021/07/CNCF_Operator_WhitePaper.pdf.Google ScholarGoogle Scholar
  3. Custom Resources. https://kubernetes.io/docs/concepts/extend-kubernetes/api-extension/custom-resources/.Google ScholarGoogle Scholar
  4. Debugging Go Code with GDB. https://go.dev/doc/gdb.Google ScholarGoogle Scholar
  5. Dynamic Admission Control. https://kubernetes.io/docs/reference/access-authn-authz/extensible-admission-controllers/.Google ScholarGoogle Scholar
  6. Ephemeral Containers. https://kubernetes.io/docs/concepts/workloads/pods/ephemeral-containers/.Google ScholarGoogle Scholar
  7. etcd. https://etcd.io/.Google ScholarGoogle Scholar
  8. K3d. https://github.com/k3d-io/k3d.Google ScholarGoogle Scholar
  9. Kind. https://kind.sigs.k8s.io/.Google ScholarGoogle Scholar
  10. Kubernetes End-to-end Testing for Everyone. https://kubernetes.io/blog/2019/03/22/kubernetes-end-to-end-testing-for-everyone/.Google ScholarGoogle Scholar
  11. Labels and Selectors. https://kubernetes.io/docs/concepts/overview/working-with-objects/labels/.Google ScholarGoogle Scholar
  12. Minikube. https://minikube.sigs.k8s.io/.Google ScholarGoogle Scholar
  13. OpenAPI Specification. https://swagger.io/specification/#schema-object.Google ScholarGoogle Scholar
  14. Package pointer. https://pkg.go.dev/golang.org/x/tools/go/pointer.Google ScholarGoogle Scholar
  15. Package ssa. https://pkg.go.dev/golang.org/x/tools/go/ssa.Google ScholarGoogle Scholar
  16. Specifying a Disruption Budget for your Application. https://kubernetes.io/docs/tasks/run-application/configure-pdb/.Google ScholarGoogle Scholar
  17. Understanding Kubernetes Objects. https://kubernetes.io/docs/concepts/overview/working-with-objects/kubernetes-objects/.Google ScholarGoogle Scholar
  18. Automatically generated regex validation for Quantity does not match the validation used by unmarshalerDecoder. https://github.com/kubernetes-sigs/controller-tools/issues/665, 2022.Google ScholarGoogle Scholar
  19. Cassandra operator becomes partially inoperable if replaceNodes has a wrong pod name (issue comment). https://github.com/k8ssandra/cass-operator/issues/315#issuecomment-1090149844, 2022.Google ScholarGoogle Scholar
  20. CLOUDP-116155 Initial bootup with arbiters. https://github.com/mongodb/mongodb-kubernetes-operator/pull/1024, 2022.Google ScholarGoogle Scholar
  21. cmd/cgo: allow cgo to pass strings or []bytes bigger than 1«30. https://go-review.googlesource.com/c/go/+/418557, 2022.Google ScholarGoogle Scholar
  22. Contour pod is not deleted when disabled by user. https://github.com/knative/operator/pull/1176, 2022.Google ScholarGoogle Scholar
  23. Mongodb system is down and unable to recover when the feature-CompatibilityVersion is not specified and changed to an invalid value. https://github.com/mongodb/mongodb-kubernetes-operator/pull/1118, 2022.Google ScholarGoogle Scholar
  24. Redis does not run with resource request/limit set by cr.spec.resources. https://github.com/OT-CONTAINER-KIT/redis-operator/issues/290, 2022.Google ScholarGoogle Scholar
  25. Specifying the field redisFollower.pdb does not have any effect. https://github.com/OT-CONTAINER-KIT/redis-operator/pull/301, 2022.Google ScholarGoogle Scholar
  26. The number conversion of Value() of type Quantity is incorrect. https://github.com/kubernetes/kubernetes/issues/110653, 2022.Google ScholarGoogle Scholar
  27. The operator crashes if the image name does not contain colon. https://github.com/cockroachdb/cockroach-operator/pull/922, 2022.Google ScholarGoogle Scholar
  28. Unable to remove the additional labels on the seed service through CR. https://github.com/k8ssandra/cass-operator/pull/344, 2022.Google ScholarGoogle Scholar
  29. Updating the field spec.ingress.sql.tls.secretName is not reflected in the sql ingress object. https://github.com/cockroachdb/cockroach-operator/issues/920, 2022.Google ScholarGoogle Scholar
  30. Zookeeper pod keeps crashing when scaling down and up. https://github.com/pravega/zookeeper-operator/pull/526, 2022.Google ScholarGoogle Scholar
  31. TiDB crash loop when enabling binlog. https://github.com/pingcap/tidb-operator/issues/4945, 2023.Google ScholarGoogle Scholar
  32. TiDB operator unable to recover an unhealthy cluster even with manual revert. https://github.com/pingcap/tidb-operator/issues/4946, 2023.Google ScholarGoogle Scholar
  33. Andersen, L. O. Program Analysis and Specialization for the C Programming Language. PhD thesis, DIKU, University of Copenhagen, May 1994.Google ScholarGoogle Scholar
  34. Arpaci-Dusseau, R. H., and Arpaci-Dusseau, A. C. Fail-Stutter Fault Tolerance. In Proceedings of the 8th Workshop on Hot Topics in Operating Systems (HotOS-VIII) (May 2001).Google ScholarGoogle ScholarCross RefCross Ref
  35. Barroso, L. A., Hölzle, U., and Ranganathan, P. The Datacenter as a Computer: Designing Warehouse-Scale Machines, 3 ed. Morgan and Claypool Publishers, 2018.Google ScholarGoogle Scholar
  36. Behrang, F., Cohen, M. B., and Orso, A. Users Beware: Preference Inconsistencies Ahead. In Proceedings of the 10th Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering (ESEC/FSE'15) (Aug. 2015).Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Bianchini, R., Martin, R. P., Nagaraja, K., Nguyen, T. D., and Oliveira, F. Human-Aware Computer System Design. In Proceedings of the 10th Workshop on Hot Topics in Operating Systems (HotOS-X) (June 2005).Google ScholarGoogle Scholar
  38. Brown, A. B., and Patterson, D. A. Undo for Operators: Building an Undoable E-mail Store. In Proceedings of the 2003 USENIX Annual Technical Conference (ATC'03) (June 2003).Google ScholarGoogle Scholar
  39. Burns, B., Grant, B., Oppenheimer, D., Brewer, E., and Wilkes, J. Borg, Omega, and Kubernetes. Communications of the ACM 59, 5 (May 2016), 50--57.Google ScholarGoogle Scholar
  40. Cadar, C., and Sen, K. Symbolic Execution For Software Testing: Three Decades Later. Communications of the ACM 56, 2 (Feb. 2013), 82--90.Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Cebula, M., and Sherrod, B. 10 Weird Ways to Blow Up Your Kubernetes. In KubeCon North America (Nov. 2019).Google ScholarGoogle Scholar
  42. Chekrygin, I. Keep the Space Shuttle Flying: Writing Robust Operators. In KubeCon Europe (May 2019).Google ScholarGoogle Scholar
  43. Chen, Q., Wang, T., Legunsen, O., Li, S., and Xu, T. Understanding and Discovering Software Configuration Dependencies in Cloud and Datacenter Systems. In Proceedings of the 2020 ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE'20) (Nov. 2020).Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Chen, Y., Sun, X., Nath, S., Yang, Z., and Xu, T. Push-Button Reliability Testing for Cloud-Backed Applications with Rainmaker. In Proceedings of the 20th USENIX Symposium on Networked Systems Design and Implementation (NSDI'23) (Apr. 2023).Google ScholarGoogle Scholar
  45. Crameri, O., Knežević, N., Kostić, D., Bianchini, R., and Zwaenepoel:, W. Staged Deployment in Mirage, an Integrated Software Upgrade Testing and Distribution System. In Proceedings of the 21st Symposium on Operating System Principles (SOSP'07) (Oct. 2007).Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. DeTreville, J. Making System Configuration More Declarative. In Proceedings of the 10th Workshop on Hot Topics in Operating Systems (HotOS-X) (June 2005).Google ScholarGoogle Scholar
  47. Dobies, J., and Wood, J. Kubernetes Operators: Automating the Container Orchestration Platform. O'Reilly Media, Inc., 2020.Google ScholarGoogle Scholar
  48. Duplyakin, D., Ricci, R., Maricq, A., Wong, G., Duerig, J., Eide, E., Stoller, L., Hibler, M., Johnson, D., Webb, K., Akella, A., Wang, K., Ricart, G., Landweber, L., Elliott, C., Zink, M., Cecchet, E., Kar, S., and Mishra, P. The Design and Operation of CloudLab. In Proceedings of the 2019 USENIX Annual Technical Conference (ATC'19) (July 2019).Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Flemström, D., and Buck, A. Fleet Management at Spotify (Part 2): The Path to Declarative Infrastructure. https://engineering.atspotify.com/2023/05/fleet-management-at-spotify-part-2-the-path-to-declarative-infrastructure/, May 2023. Shopify Engineering Blog.Google ScholarGoogle Scholar
  50. Gao, L., and Menon, R. Scaling Apache Spark on Kubernetes at Lyft. https://www.youtube.com/watch?v=PPtrY_XxYBE, Apr. 2019. Spark+AI Summit.Google ScholarGoogle Scholar
  51. Gray, J. Why Do Computers Stop and What Can Be Done About It? Tandem Technical Report 85.7 (June 1985).Google ScholarGoogle Scholar
  52. Guilloux, S. Writing a Kubernetes Operator: the Hard Parts. In KubeCon North America (Nov. 2019).Google ScholarGoogle Scholar
  53. Gunawi, H. S., Hao, M., Suminto, R. O., Laksono, A., Satria, A. D., Adityatama, J., and Eliazar, K. J. Why Does the Cloud Stop Computing? Lessons from Hundreds of Service Outages. In Proceedings of the 7th ACM Symposium on Cloud Computing (SOCC'16) (Oct. 2016).Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. Gunawi, H. S., Suminto, R. O., Sears, R., Golliher, C., Sundararaman, S., Lin, X., Emami, T., Sheng, W., Bidokhti, N., McCaffrey, C., Srinivasan, D., Panda, B., Baptist, A., Grider, G., Fields, P. M., Harms, K., Ross, R. B., Jacobson, A., Ricci, R., Webb, K., Alvaro, P., Runesha, H. B., Hao, M., and Li, H. Fail-Slow at Scale: Evidence of Hardware Performance Faults in Large Production Systems. In Proceedings of the 16th USENIX Conference on File and Storage Technologies (FAST'18) (Feb. 2018).Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. Haase, S. How an Operator Becomes the Hero of the Edge. In OperatorCon (May 2019).Google ScholarGoogle Scholar
  56. Hall, C. AWS, Google, Microsoft, Red Hat's New Registry to Act as Clearing House for Kubernetes Operators. https://www.datacenterknowledge.com/open-source/aws-google-microsoft-red-hats-new-registry-act-clearing-house-kubernetes-operators, Mar. 2019.Google ScholarGoogle Scholar
  57. Hockin, T. Kubernetes: Edge vs. Level Triggered Logic. https://speakerdeck.com/thockin/edge-vs-level-triggered-logic, June 2017.Google ScholarGoogle Scholar
  58. Huang, P., Guo, C., Zhou, L., Lorch, J. R., Dang, Y., Chintalapati, M., and Yao, R. Gray Failure: The Achilles' Heel of Cloud-Scale Systems. In Proceedings of the 16th Workshop on Hot Topics in Operating Systems (HotOS-XVI) (May 2017).Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. Kumar, H., and Šafránek, J. Storage on Kubernetes - Learning From Failures. In KubeCon North America (Nov. 2019).Google ScholarGoogle Scholar
  60. Lagresle, M. Moving to Kubernetes: the Bad and the Ugly. In ContainerDays (June 2019).Google ScholarGoogle Scholar
  61. Lander, R. Kubernetes Operators: Should You Use Them? https://tanzu.vmware.com/developer/blog/kubernetes-operators-should-you-use-them/, July 2021. VMware Blog.Google ScholarGoogle Scholar
  62. Li, Z., Cheng, Q., Hsieh, K., Dang, Y., Huang, P., Singh, P., Yang, X., Lin, Q., Wu, Y., Levy, S., and Chintalapati, M. Gandalf: An Intelligent, End-To-End Analytics Service for Safe Deployment in Large-Scale Cloud Infrastructure. In Proceedings of the 17th USENIX Symposium on Networked Systems Design and Implementation (NSDI'20) (Feb. 2020).Google ScholarGoogle Scholar
  63. Lou, C., Huang, P., and Smith, S. Understanding, Detecting and Localizing Partial Failures in Large System Software. In Proceedings of the 17th USENIX Symposium on Networked Systems Design and Implementation (NSDI'20) (Feb. 2020).Google ScholarGoogle Scholar
  64. Ma, S., Zhou, F., Bond, M. D., and Wang, Y. Finding Heterogeneous-Unsafe Configuration Parameters in Cloud Systems. In Proceedings of the 16th ACM European Conference on Computer Systems (EuroSys'21) (Apr. 2021).Google ScholarGoogle ScholarDigital LibraryDigital Library
  65. Madhu, C. Preventing Controller Sprawl From Taking Down Your Cluster. In KubeCon North America (Oct. 2022).Google ScholarGoogle Scholar
  66. Manes, V. J., Han, H., Han, C., Cha, S. K., Egele, M., Schwartz, E. J., and Woo, M. The Art, Science, and Engineering of Fuzzing: A Survey. IEEE Transactions on Software Engineering 47, 11 (Nov. 2021), 2312--2331.Google ScholarGoogle ScholarCross RefCross Ref
  67. Melissaris, T., Nabar, K., Radut, R., Rehmtulla, S., Shi, A., Chandrashekar, S., and Papapanagiotou, I. Elastic Cloud Services: Scaling Snowflake's Control Plane. In Proceedings of the 13th ACM Symposium on Cloud Computing (SOCC'22) (Nov. 2022).Google ScholarGoogle ScholarDigital LibraryDigital Library
  68. Nagaraja, K., Oliveira, F., Bianchini, R., Martin, R. P., and Nguyen, T. D. Understanding and Dealing with Operator Mistakes in Internet Services. In Proceedings of the 6th USENIX Conference on Operating Systems Design and Implementation (OSDI'04) (Dec. 2004).Google ScholarGoogle ScholarDigital LibraryDigital Library
  69. Oliveira, F., Tjang, A., Bianchini, R., Martin, R. P., and Nguyen, T. D. Barricade: Defending Systems Against Operator Mistakes. In Proceedings of the 5th European Conference on Computer Systems (EuroSys'10) (Apr. 2010).Google ScholarGoogle ScholarDigital LibraryDigital Library
  70. Oppenheimer, D., Ganapathi, A., and Patterson, D. A. Why Do Internet Services Fail, and What Can Be Done About It? In Proceedings of the 4th USENIX Symposium on Internet Technologies and Systems (USITS'03) (Mar. 2003).Google ScholarGoogle ScholarDigital LibraryDigital Library
  71. Patterson, D., Brown, A., Broadwell, P., Candea, G., Chen, M., Cutler, J., Enriqez, P., Fox, A., Kiciman, E., Merzbacher, M., Oppenheimer, D., Sastry, N., Tetzlaff, W., Traupman, J., and Treuhaft, N. Recovery-Oriented Computing (ROC): Motivation, Definition, Techniques, and Case Studies. Tech. Rep. UCB//CSD-02-1175, University of California Berkeley, Mar. 2002.Google ScholarGoogle Scholar
  72. Pham, V.-T., Khurana, S., Roy, S., and Roychoudhury, A. Bucketing Failing Tests via Symbolic Analysis. In Proceedings of the 20th International Conference on Fundamental Approaches to Software Engineering (FASE'17) (Apr. 2017).Google ScholarGoogle ScholarDigital LibraryDigital Library
  73. Pina, L., Andronidis, A., Hicks, M., and Cadar, C. MVEDSUA: Higher Availability Dynamic Software Updates via Multi-Version Execution. In Proceedings of the 24th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS'19) (Apr. 2019).Google ScholarGoogle ScholarDigital LibraryDigital Library
  74. Rajagopalan, S., Williams, D., Jamjoom, H., and Warfield, A. Escape Capsule: Explicit State is Robust and Scalable. In Proceedings of the 16th Workshop on Hot Topics in Operating Systems (HotOS-XIV) (May 2013).Google ScholarGoogle Scholar
  75. Ratis, P. Lessons Learned using the Operator Pattern to build a Kubernetes Platform. In USENIX SREcon (Oct. 2021).Google ScholarGoogle Scholar
  76. Schwarzkopf, M., Konwinski, A., Abd-El-Malek, M., and Wilkes, J. Omega: Flexible, Scalable Schedulers for Large Compute Clusters. In Proceedings of the 8th ACM European Conference on Computer Systems (EuroSys'13) (Apr. 2013).Google ScholarGoogle ScholarDigital LibraryDigital Library
  77. Shen, Z., Shang, R., and Bedi, R. How eBay Leverages Kubernetes, Helm Charts and Jenkins Pipelines to Deliver High-Quality Software. https://tech.ebayinc.com/engineering/how-ebay-leverages-kubernetes-helm-charts-and-jenkins-pipelines-to-deliver-high-quality-software/, 2021. eBay Tech Blog.Google ScholarGoogle Scholar
  78. Sosa, C., and Bhatia, P. Application management made easier with Kubernetes Operators on GCP Marketplace. https://cloud.google.com/blog/products/containers-kubernetes/application-management-made-easier-with-kubernete-operators-on-gcp-marketplace, May 2019. Google Cloud Blog.Google ScholarGoogle Scholar
  79. Sun, X., Cheng, R., Chen, J., Ang, E., Legunsen, O., and Xu, T. Testing Configuration Changes in Context to Prevent Production Failures. In Proceedings of the 14th USENIX Symposium on Operating Systems Design and Implementation (OSDI'20) (Nov. 2020).Google ScholarGoogle Scholar
  80. Sun, X., Luo, W., Gu, J. T., Ganesan, A., Alagappan, R., Gasch, M., Suresh, L., and Xu, T. Automatic Reliability Testing for Cluster Management Controllers. In Proceedings of the 16th USENIX Symposium on Operating Systems Design and Implementation (OSDI'22) (July 2022).Google ScholarGoogle Scholar
  81. Sun, X., Suresh, L., Ganesan, A., Alagappan, R., Gasch, M., Tang, L., and Xu, T. Reasoning about modern datacenter infrastructures using partial histories. In Proceedings of the 18th Workshop on Hot Topics in Operating Systems (HotOS-XVIII) (May 2021).Google ScholarGoogle ScholarDigital LibraryDigital Library
  82. Suresh, L., ao Loff, J., Kalim, F., Jyothi, S. A., Narodytska, N., Ryzhyk, L., Gamage, S., Oki, B., Jain, P., and Gasch, M. Building Scalable and Flexible Cluster Managers Using Declarative Programming. In Proceedings of the 14th USENIX Symposium on Operating Systems Design and Implementation (OSDI'20) (Nov. 2020).Google ScholarGoogle Scholar
  83. Tang, C., Yu, K., Veeraraghavan, K., Kaldor, J., Michelson, S., Kooburat, T., Anbudurai, A., Clark, M., Gogia, K., Cheng, L., Christensen, B., Gartrell, A., Khutornenko, M., Kulkarni, S., Pawlowski, M., Pelkonen, T., Rodrigues, A., Tibrewal, R., Venkatesan, V., and Zhang, P. Twine: A Unified Cluster Management System for Shared Infrastructure. In Proceedings of the 14th USENIX Conference on Operating Systems Design and Implementation (OSDI'20) (Nov. 2020).Google ScholarGoogle Scholar
  84. Tang, L., Bhandari, C., Zhang, Y., Karanika, A., Ji, S., Gupta, I., and Xu, T. Fail through the Cracks: Cross-System Interaction Failures in Modern Cloud Systems. In Proceedings of the 18th European Conference on Computer Systems (EuroSys'23) (May 2023).Google ScholarGoogle ScholarDigital LibraryDigital Library
  85. Tang, Z., Li, X., and Guo, F. Demystifying Kubernetes as a service - How Alibaba cloud manages 10,000s of Kubernetes clusters. https://www.cncf.io/blog/2019/12/12/demystifying-kubernetes-as-a-service-how-does-alibaba-cloud-manage-10000s-of-kubernetes-clusters/, Dec. 2019. Cloud Native Computing Foundation Blog.Google ScholarGoogle Scholar
  86. Templeton, G., and Davidson, S. How a Couple of Characters (and GitOps) Brought Down Our Site. In KubeCon Europe (May 2022).Google ScholarGoogle Scholar
  87. Tirmazi, M., Barker, A., Deng, N., Haqe, M. E., Qin, Z. G., Hand, S., Harchol-Balter, M., and Wilkes, J. Borg: The Next Generation. In Proceedings of the 15th ACM European Conference on Computer Systems (EuroSys'20) (Apr. 2020).Google ScholarGoogle ScholarDigital LibraryDigital Library
  88. van Tonder, R., Kotheimer, J., and Goues, C. L. Semantic Crash Bucketing. In Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering (ASE'18) (Sept. 2018).Google ScholarGoogle ScholarDigital LibraryDigital Library
  89. Vasudevan, J. Azure Service Operators - A Kubernetes native way of Deploying Azure Resources. https://devblogs.microsoft.com/cse/2021/11/11/azure-service-operators-a-kubernetes-native-way-of-deploying-azure-resources/, Nov. 2021. Microsoft Developer Blogs.Google ScholarGoogle Scholar
  90. Verma, A., Pedrosa, L., Korupolu, M., Oppenheimer, D., Tune, E., and Wilkes, J. Large-Scale Cluster Management at Google with Borg. In Proceedings of the 10th European Conference on Computer Systems (EuroSys'15) (Apr. 2015).Google ScholarGoogle ScholarDigital LibraryDigital Library
  91. Wang, S., Lian, X., Marinov, D., and Xu, T. Test Selection for Unified Regression Testing. In Proceedings of the 45th IEEE/ACM International Conference on Software Engineering (ICSE'23) (May 2023).Google ScholarGoogle ScholarDigital LibraryDigital Library
  92. Xu, T., Jin, X., Huang, P., Zhou, Y., Lu, S., Jin, L., and Pasupathy, S. Early Detection of Configuration Errors to Reduce Failure Damage. In Proceedings of the 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI'16) (Nov. 2016).Google ScholarGoogle ScholarDigital LibraryDigital Library
  93. Xu, T., Zhang, J., Huang, P., Zheng, J., Sheng, T., Yuan, D., Zhou, Y., and Pasupathy, S. Do Not Blame Users for Misconfigurations. In Proceedings of the 24th Symposium on Operating System Principles (SOSP'13) (Nov. 2013).Google ScholarGoogle ScholarDigital LibraryDigital Library
  94. Xu, T., and Zhou, Y. Systems Approaches to Tackling Configuration Errors: A Survey. ACM Computing Surveys (CSUR) 47, 4 (July 2015).Google ScholarGoogle ScholarDigital LibraryDigital Library
  95. Zhang, J., Renganarayana, L., Zhang, X., Ge, N., Bala, V., Xu, T., and Zhou, Y. EnCore: Exploiting System Environment and Correlation Information for Misconfiguration Detection. In Proceedings of the 19th International Conference on Architecture Support for Programming Languages and Operating Systems (ASPLOS'14) (Mar. 2014).Google ScholarGoogle ScholarDigital LibraryDigital Library
  96. Zhang, Y., Yang, J., Jin, Z., Sethi, U., Rodrigues, K., Lu, S., and Yuan, D. Understanding and Detecting Software Upgrade Failures in Distributed Systems. In Proceedings of the 28th ACM Symposium on Operating Systems Principles (SOSP'21) (Oct. 2021).Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Acto: Automatic End-to-End Testing for Operation Correctness of Cloud System Management
            Index terms have been assigned to the content through auto-classification.

            Recommendations

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in
            • Published in

              cover image ACM Conferences
              SOSP '23: Proceedings of the 29th Symposium on Operating Systems Principles
              October 2023
              802 pages
              ISBN:9798400702297
              DOI:10.1145/3600006

              Copyright © 2023 ACM

              Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

              Publisher

              Association for Computing Machinery

              New York, NY, United States

              Publication History

              • Published: 23 October 2023

              Permissions

              Request permissions about this article.

              Request Permissions

              Check for updates

              Qualifiers

              • research-article

              Acceptance Rates

              SOSP '23 Paper Acceptance Rate43of232submissions,19%Overall Acceptance Rate131of716submissions,18%

              Upcoming Conference

              SOSP '24

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader