skip to main content
article

Systems of systems and coordinated atomic actions

Published: 01 January 2005 Publication History

Abstract

System of systems (SoS) is an emerging field in the design and development of complex systems that are built from large scale component systems. A SoS has the following attributes: operational and managerial independence of components, a geographic extent that limits control mechanisms to information exchange, an evolutionary nature, and emergent behavior. The subsystems that comprise the SoS often are built by different organizations with conflicting goals, designed under different assumptions and built to different quality standards. These factors impact fault detection, fault isolation, and fault tolerance and can result in systems that cannot easily be debugged, integrated, or maintained. When fault detection and fault tolerance are deficient, the system may behave in a fragile or brittle manner, randomly and repeatedly crashing. Crashes prevent automated diagnosis algorithms from being executed and can prevent manual root cause analysis by erasing system state. Fragility during system integration can prevent achieving schedule milestones and deadlines. Deficient fault detection and fault isolation also impacts end users and system maintainers. (Think <insert name of infamous project here>).From the system architect's point of view, designing a system that can detect all possible fault conditions across all components can be an extremely difficult, if not impossible challenge. Can any system be trusted to diagnose or repair itself when it has been corrupted by faults? How do you prevent local faults from growing into global failures? The end users may have unreasonable expectations about how the system should behave when components within the SoS behave abnormally or fail. They may expect better behavior than the typical PC. The system maintainers may expect a coherent systems view of failures to isolate faulted components and to provide an orderly and safe shutdown or recovery.(Think power grid blackouts, Telecomm failures, etc.)The most beneficial way to achieve fault tolerance is to design in fault detection and fault reporting such that defined boundaries such as subsystems serve as natural firewalls for fault containment. Although partitioning the system into subsystems for fault containment is well known and practiced, the end result as experienced at the time of system integration is rarely a success. COTS middleware, intended to aid distributed design often becomes in effect a step backwards by providing fertile ground for faults and failures that breach fault containment boundaries. (Think <insert name of OS or middleware vendor here>)What can be done to improve this situation? This paper addresses the system architectural partitioning concept of the Coordinated Atomic Actions (CAA). CAA promotes a different manner of organizing software architecture that improves fault containment across potentially faulty components. CAA was first invented by members of Brian Randell's research group at the University of Newcastle at Tyne in the mid 1990's. CAA promotes the concept of the "transaction" which has been traditionally identified with database applications. When you access your bank account via ATM, you are exercising database transactions within your bank's financial SoS. CAA applies transactions to cooperating concurrent distributed processes, which are the basis for most large complex computing systems.

References

[1]
Very little attention has been paid to CAA in the US, most likely due to the Not Invented Here (NIH) mind set. Many papers on CAA, dependability and reliability can be found on the IEEE website: http://ieeexplore.ieee.org/Xplore/DynWel.jsp
[2]
"On Applying Coordinated Atomic Actions and Dependable Software Architectures for Developing Complex Systems" Beder, D. M., Randell, B., Romanovsky, A., Rubira, C.M.F., 4th IEEE International Symposium on Object-Oriented Real-Time Distributed Computing, Margeburg, Germany, May 2001, pp. 103--112
[3]
"High-Availability Computer Systems", Jim Gray, Jim, and Siewioreck, Daniel P., IEEE Computer, September 1991, pp. 39--48
[4]
"A Distributed Object-Oriented Framework for Dependable Multiparty Interactions", Zorzo, A. F. and Stroud, R. J. In Proceedings of the 1999 ACM SIGPLAN Conference on Object-Oriented Programming, (OOPSLA '99) Denver, Colorado, pp. 435--446
[5]
http://www.cs.ncl.ac.uk/people/home.php?name=alexander.roma novsky

Cited By

View all
  • (2013)The state of the art and future perspectives in systems of systems software architecturesProceedings of the First International Workshop on Software Engineering for Systems-of-Systems10.1145/2489850.2489853(13-20)Online publication date: 2-Jul-2013
  • (2008)Debugging debugged, a metaphysical manifesto of systems integrationACM SIGSOFT Software Engineering Notes10.1145/1360602.136109533:3(1-20)Online publication date: 1-May-2008
  • (2006)A rational theory of system-making systemsACM SIGSOFT Software Engineering Notes10.1145/1118537.111854331:2(1-20)Online publication date: 1-Mar-2006
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM SIGSOFT Software Engineering Notes
ACM SIGSOFT Software Engineering Notes  Volume 30, Issue 1
January 2005
131 pages
ISSN:0163-5948
DOI:10.1145/1039174
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 January 2005
Published in SIGSOFT Volume 30, Issue 1

Check for updates

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)1
  • Downloads (Last 6 weeks)0
Reflects downloads up to 12 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2013)The state of the art and future perspectives in systems of systems software architecturesProceedings of the First International Workshop on Software Engineering for Systems-of-Systems10.1145/2489850.2489853(13-20)Online publication date: 2-Jul-2013
  • (2008)Debugging debugged, a metaphysical manifesto of systems integrationACM SIGSOFT Software Engineering Notes10.1145/1360602.136109533:3(1-20)Online publication date: 1-May-2008
  • (2006)A rational theory of system-making systemsACM SIGSOFT Software Engineering Notes10.1145/1118537.111854331:2(1-20)Online publication date: 1-Mar-2006
  • (2005)The risks of large organizations in developing complex systemsACM SIGSOFT Software Engineering Notes10.1145/1095430.109544430:5(1-3)Online publication date: 1-Sep-2005
  • (2005)Deeper questionsACM SIGSOFT Software Engineering Notes10.1145/1082983.108300830:4(1-6)Online publication date: 1-Jul-2005
  • (2005)Architectural Framework for a System-of-Systems2005 IEEE International Conference on Systems, Man and Cybernetics10.1109/ICSMC.2005.1571420(1876-1881)Online publication date: 2005

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media