article

Systems of systems and coordinated atomic actions

Author:

Robert SchaeferAuthors Info & Claims

ACM SIGSOFT Software Engineering Notes, Volume 30, Issue 1

Page 6

https://doi.org/10.1145/1039174.1039196

Published: 01 January 2005 Publication History

Get Access

Abstract

System of systems (SoS) is an emerging field in the design and development of complex systems that are built from large scale component systems. A SoS has the following attributes: operational and managerial independence of components, a geographic extent that limits control mechanisms to information exchange, an evolutionary nature, and emergent behavior. The subsystems that comprise the SoS often are built by different organizations with conflicting goals, designed under different assumptions and built to different quality standards. These factors impact fault detection, fault isolation, and fault tolerance and can result in systems that cannot easily be debugged, integrated, or maintained. When fault detection and fault tolerance are deficient, the system may behave in a fragile or brittle manner, randomly and repeatedly crashing. Crashes prevent automated diagnosis algorithms from being executed and can prevent manual root cause analysis by erasing system state. Fragility during system integration can prevent achieving schedule milestones and deadlines. Deficient fault detection and fault isolation also impacts end users and system maintainers. (Think <insert name of infamous project here>).From the system architect's point of view, designing a system that can detect all possible fault conditions across all components can be an extremely difficult, if not impossible challenge. Can any system be trusted to diagnose or repair itself when it has been corrupted by faults? How do you prevent local faults from growing into global failures? The end users may have unreasonable expectations about how the system should behave when components within the SoS behave abnormally or fail. They may expect better behavior than the typical PC. The system maintainers may expect a coherent systems view of failures to isolate faulted components and to provide an orderly and safe shutdown or recovery.(Think power grid blackouts, Telecomm failures, etc.)The most beneficial way to achieve fault tolerance is to design in fault detection and fault reporting such that defined boundaries such as subsystems serve as natural firewalls for fault containment. Although partitioning the system into subsystems for fault containment is well known and practiced, the end result as experienced at the time of system integration is rarely a success. COTS middleware, intended to aid distributed design often becomes in effect a step backwards by providing fertile ground for faults and failures that breach fault containment boundaries. (Think <insert name of OS or middleware vendor here>)What can be done to improve this situation? This paper addresses the system architectural partitioning concept of the Coordinated Atomic Actions (CAA). CAA promotes a different manner of organizing software architecture that improves fault containment across potentially faulty components. CAA was first invented by members of Brian Randell's research group at the University of Newcastle at Tyne in the mid 1990's. CAA promotes the concept of the "transaction" which has been traditionally identified with database applications. When you access your bank account via ATM, you are exercising database transactions within your bank's financial SoS. CAA applies transactions to cooperating concurrent distributed processes, which are the basis for most large complex computing systems.

References

[1]

Very little attention has been paid to CAA in the US, most likely due to the Not Invented Here (NIH) mind set. Many papers on CAA, dependability and reliability can be found on the IEEE website: http://ieeexplore.ieee.org/Xplore/DynWel.jsp

Google Scholar

[2]

"On Applying Coordinated Atomic Actions and Dependable Software Architectures for Developing Complex Systems" Beder, D. M., Randell, B., Romanovsky, A., Rubira, C.M.F., 4th IEEE International Symposium on Object-Oriented Real-Time Distributed Computing, Margeburg, Germany, May 2001, pp. 103--112

Digital Library

Google Scholar

[3]

"High-Availability Computer Systems", Jim Gray, Jim, and Siewioreck, Daniel P., IEEE Computer, September 1991, pp. 39--48

Digital Library

Google Scholar

[4]

"A Distributed Object-Oriented Framework for Dependable Multiparty Interactions", Zorzo, A. F. and Stroud, R. J. In Proceedings of the 1999 ACM SIGPLAN Conference on Object-Oriented Programming, (OOPSLA '99) Denver, Colorado, pp. 435--446

Digital Library

Google Scholar

[5]

http://www.cs.ncl.ac.uk/people/home.php?name=alexander.roma novsky

Google Scholar

Cited By

View all

Nakagawa EGonçalves MGuessi MOliveira LOquendo FOquendo F(2013)The state of the art and future perspectives in systems of systems software architecturesProceedings of the First International Workshop on Software Engineering for Systems-of-Systems10.1145/2489850.2489853(13-20)Online publication date: 2-Jul-2013
https://dl.acm.org/doi/10.1145/2489850.2489853
Schaefer R(2008)Debugging debugged, a metaphysical manifesto of systems integrationACM SIGSOFT Software Engineering Notes10.1145/1360602.136109533:3(1-20)Online publication date: 1-May-2008
https://dl.acm.org/doi/10.1145/1360602.1361095
Schaefer R(2006)A rational theory of system-making systemsACM SIGSOFT Software Engineering Notes10.1145/1118537.111854331:2(1-20)Online publication date: 1-Mar-2006
https://dl.acm.org/doi/10.1145/1118537.1118543
Show More Cited By

Index Terms

Systems of systems and coordinated atomic actions
1. Computing methodologies
  1. Modeling and simulation
    1. Simulation theory
      1. Systems theory
2. Software and its engineering
  1. Software creation and management
    1. Designing software
      1. Software implementation planning
        Software design techniques
    2. Software development process management
  2. Software organization and properties
    1. Software system structures
      1. Abstraction, modeling and modularity

Recommendations

Architectural reconfiguration using coordinated atomic actions
SEAMS '06: Proceedings of the 2006 international workshop on Self-adaptation and self-managing systems

The provision of services despite the presence of faults is known as fault tolerance. One of its associated activities is fault handling, which aims to prevent the reactivation of already located faults. System reconfiguration, one of the steps of fault ...
Frameworks for designing and implementing dependable systems using Coordinated Atomic Actions: A comparative study

This paper presents ways of implementing dependable distributed applications designed using the Coordinated Atomic Action (CAA) paradigm. CAAs provide a coherent set of concepts adapted to fault tolerant distributed system design that includes ...
Using dynamic atomic actions to build fault tolerant systems
EW 5: Proceedings of the 5th workshop on ACM SIGOPS European workshop: Models and paradigms for distributed systems structuring

The purpose of this note is to propose a model for building fault tolerant systems. We present an approach based on the object paradigm. To ensure system consistency in the event of failure we provide two basic mechamisms, the persistent state of an ...

Comments

Information & Contributors

Information

Published In

cover image ACM SIGSOFT Software Engineering Notes

ACM SIGSOFT Software Engineering Notes Volume 30, Issue 1

January 2005

131 pages

ISSN:0163-5948

DOI:10.1145/1039174

Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 January 2005

Published in SIGSOFT Volume 30, Issue 1

Check for updates

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

6
Total Citations
View Citations
510
Total Downloads

Downloads (Last 12 months)1
Downloads (Last 6 weeks)0

Reflects downloads up to 12 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

View all

Nakagawa EGonçalves MGuessi MOliveira LOquendo FOquendo F(2013)The state of the art and future perspectives in systems of systems software architecturesProceedings of the First International Workshop on Software Engineering for Systems-of-Systems10.1145/2489850.2489853(13-20)Online publication date: 2-Jul-2013
https://dl.acm.org/doi/10.1145/2489850.2489853
Schaefer R(2008)Debugging debugged, a metaphysical manifesto of systems integrationACM SIGSOFT Software Engineering Notes10.1145/1360602.136109533:3(1-20)Online publication date: 1-May-2008
https://dl.acm.org/doi/10.1145/1360602.1361095
Schaefer R(2006)A rational theory of system-making systemsACM SIGSOFT Software Engineering Notes10.1145/1118537.111854331:2(1-20)Online publication date: 1-Mar-2006
https://dl.acm.org/doi/10.1145/1118537.1118543
Schaefer R(2005)The risks of large organizations in developing complex systemsACM SIGSOFT Software Engineering Notes10.1145/1095430.109544430:5(1-3)Online publication date: 1-Sep-2005
https://dl.acm.org/doi/10.1145/1095430.1095444
Schaefer R(2005)Deeper questionsACM SIGSOFT Software Engineering Notes10.1145/1082983.108300830:4(1-6)Online publication date: 1-Jul-2005
https://dl.acm.org/doi/10.1145/1082983.1083008
Caffall DMichael J(2005)Architectural Framework for a System-of-Systems2005 IEEE International Conference on Systems, Man and Cybernetics10.1109/ICSMC.2005.1571420(1876-1881)Online publication date: 2005
https://doi.org/10.1109/ICSMC.2005.1571420

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Abstract

References

Cited By

Index Terms

Recommendations

Architectural reconfiguration using coordinated atomic actions

Frameworks for designing and implementing dependable systems using Coordinated Atomic Actions: A comparative study

Using dynamic atomic actions to build fault tolerant systems

Comments

Information

Published In

Publisher

Publication History

Check for updates

Qualifiers

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

Login options

Full Access

View options

PDF

eReader

Share

Share this Publication link

Share on social media

Affiliations