skip to main content
research-article

Recovering from distributable thread failures in distributed real-time Java

Published: 27 August 2010 Publication History

Abstract

We consider the problem of recovering from the failures of distributable threads (“threads”) in distributed real-time systems that operate under runtime uncertainties including those on thread execution times, thread arrivals, and node failure occurrences. When a thread experiences a node failure, the result is a broken thread having an orphan. Under a termination model, the orphans must be detected and aborted, and exceptions must be delivered to the farthest, contiguous surviving thread segment for resuming thread execution. Our application/scheduling model includes the proposed distributable thread programming model for the emerging Distributed Real-Time Specification for Java (DRTSJ), together with an exception-handler model. Threads are subject to time/utility function (TUF) time constraints and an utility accrual (UA) optimality criterion. A key underpinning of the TUF/UA scheduling paradigm is the notion of “best-effort” where higher importance threads are always favored over lower importance ones, irrespective of thread urgency as specified by their time constraints. We present a thread scheduling algorithm called HUA and a thread integrity protocol called TPR. We show that HUA and TPR bound the orphan cleanup and recovery time with bounded loss of the best-effort property. Our implementation experience for HUA/TPR in the Reference Implementation of the proposed programming model for the DRTSJ demonstrates the algorithm/protocol's effectiveness.

References

[1]
Aguilera, M. K., Lann, G. L., and Toueg, S. 2002. On the impact of fast failure detectors on real-time fault-tolerant systems. In Proceedings of the 16th International Conference on Distributed Computing (DISC'02), Springer, Berlin, 354--370.
[2]
Anderson, J. S. and Jensen, E. D. 2006. Distributed real-time specification for Java: A status report (digest). In Proceedings of the 4th International Workshop on Java Technologies for Real-Time and Embedded Systems (JTRES'06), ACM, New York, 3--9.
[3]
Cares, J. R. 2006. Distributed Networked Operations: The Foundations of Network Centric Warfare. iUniverse, Inc., Lincoln, NE.
[4]
Clark, R., Jensen, E. D., Kanevsky, A., Maurer, J. A., Wallace, P., Wheeler, T., Zhang, Y., Wells, D., Lawrence, T., and Hurley, P. 1999. An adaptive, distributed airborne tracking system (“process the right tracks at the right time”). In Proceedings of the 11IPPS/SPDP'99 Workshops. In conjunction with the 13th International Parallel Processing Symposium and 10th Symposium on Parallel and Distributed Processing. Springer, Berlin, 353--362.
[5]
Clark, R. K. 1990. Scheduling dependent real-time activities. Ph.D dissertation. CMU-CS-90-155, Carnegie Mellon University.
[6]
Curley, E., Anderson, J., Ravindran, B., and Jensen, E. D. 2006. Recovering from distributable thread failures with assured timeliness in real-time distributed systems. In Proceedings of the 25th IEEE Symposium on Reliable Distributed Systems (SRDS'06). IEEE, Los Alamitos, CA, 267--276.
[7]
de Oliveira, R. S. and da Silva Fraga, J. 2000. Fixed priority scheduling of tasks with arbitrary precedence constraints in distributed hard real-time systems. J. Syst. Architecture 49, 11, 991--1004.
[8]
Ford, B. and Lepreau, J. 1994. Evolving Mach 3.0 to a migrating thread model. In Proceedings of the USENIX Winter Technical Conference (WTEC'94). USENIX Association, Berkeley, CA, 9.
[9]
Goldberg, J., Greenberg, I., et al. 1995. Adaptive fault-resistant systems (ch. 5. Adpative distributed thread integrity). Tech. rep. csl-95-02, SRI International. http://www.csl.sri.com/papers/sri-csl-95-02/.
[10]
Harbour, M. G. and Palencia, J. C. 2003. Response time analysis for tasks scheduled under EDF within fixed priorities. In Proceedings of the 24th IEEE International Real-Time Systems Symposium (RTSS'03). IEEE, Los Alamitos, CA, 200.
[11]
Hermant, J.-F. and Le Lann, G. 2002. Fast asynchronous uniform consensus in real-time distributed systems. IEEE Trans. Comput. 51, 8, 931--944.
[12]
Hermant, J.-F. and Widder, J. 2005. Implementing reliable distributed real-time systems with the Theta-model. In Proceedings of the 9th International Conference on Principles of Distributed Systems (OPODIS'05). Lecture Notes in Computer Science, vol. 3974, Springer, Berlin, 334--350.
[13]
Horn, W. 1974. Some simple scheduling algorithms. Naval Res. Logistics Q. 21, 177--185.
[14]
Jensen, E. D., Locke, C. D., and Tokuda, H. 1985. A time-driven scheduling model for realtime systems. In Proceedings of the IEEE Real-Time Systems Symposium (RTSS'85). IEEE, Los Alamitos, CA, 112--122.
[15]
Kao, B. and Garcia-Molina, H. 1997. Deadline assignment in a distributed soft real-time system. IEEE Trans. Paral. Distrib. Syst. 8, 12, 1268--1274.
[16]
Li, P. 2004. Utility accrual real-time scheduling: Models and algorithms. Ph.D. dissertation, Virginia Tech., Blacksburg, VA.
[17]
Li, P., Ravindran, B., et al. 2004. A formally verified application-level framework for real-time scheduling on POSIX real-time operating systems. IEEE Trans. Softw. Engin. 30, 9, 613--629.
[18]
Locke, C. D. 1986. Best-effort decision making for real-time scheduling. Ph.D. dissertation. CMU-CS-86-134, Carnegie Mellon University.
[19]
Maynard, D. P., Shipman, S. E., et al. 1988. An example real-time command, control, and battle management application for alpha. Tech. rep., Archons Project Tech. rep. 88121, Computer Science Dept., Carnegie Mellon University.
[20]
Mills, D. L. 1995. Improved algorithms for synchronizing computer network clocks. IEEE/ACM Trans. Netw. 3, 245--254.
[21]
Nagy, S. and Bestavros, A. 1997. Admission control for soft-transactions in ACCORD. In Proceedings of the 3rd IEEE Real-Time Technology and Applications Symposium (RTAS'97). IEEE, Los Alamitos, CA, 160.
[22]
Northcutt, J. D. 1987. Mechanisms for Reliable Distributed Real-Time Operating Systems: The Alpha Kernel. Academic Press, San Diego, CA.
[23]
OMG. 2001. Real-Time CORBA 2.0: Dynamic scheduling specification. Tech. rep., Object Management Group.
[24]
Palencia, J. C. and Harbour, M. G. 1998. Schedulability analysis for tasks with static and dynamic offsets. In Proceedings of the IEEE Real-Time Systems Symposium (RTSS'98). IEEE, Los Alamitos, CA, 26.
[25]
Palencia, J. C. and Harbour, M. G. 2003. Offset-based response time analysis of distributed systems scheduled under EDF. In Proceedings of the15th IEEE Euromicro Conference on Real-Time Systems (ECRTS'03). IEEE, Los Alamitos, CA, 3--12.
[26]
Pellizzoni, R. and Lipari, G. 2005. Improved schedulability analysis of real-time transactions with earliest deadline scheduling. In Proceedings of the 11th IEEE Real Time on Embedded Technology and Applications Symposium (RTAS'05). IEEE, Los Alamitos, CA, 66--75.
[27]
Ravindran, B., Anderson, J. S., and Jensen, E. D. 2007. On distributed real-time scheduling in networked embedded systems in the presence of crash failures. In Proceedings of the 5th IFIP WG 10.2 International Workshop on Software Technologies for Embedded and Ubiquitous Systems (SEUS'07). Lecture Notes in Computer Science, vol. 4761, Springer, Berlin, 67--81.
[28]
Ravindran, B., Jensen, E. D., and Li, P. 2005. On recent advances in time/utility function real-time scheduling and resource management. In Proceedings of the 8th IEEE International Symposium on Object-Oriented Real-Time Distributed Computing (ISORC'05). IEEE, Los Alamitos, CA, 55--60.
[29]
Sha, L., Rajkumar, R., and Lehoczky, J. P. 1990. Priority inheritance protocols: An approach to real-time synchronization. IEEE Trans. Comput. 39, 9, 1175--1185.
[30]
Spuri, M. 1996. Holistic analysis of deadline scheduled real-time distributed systems. Tech. rep. RR-2873, INRIA.
[31]
Streich, H. 1995. Taskpair-scheduling: An approach for dynamic real-time systems. Mini and Microcomput. 17, 2, 77--83.
[32]
Sun, J. 1997. Fixed priority scheduling of end-to-end periodic tasks. Ph.D. dissertation, Computer Science Department, University of Illinois, Urbana-Champaign.
[33]
The Open Group. 1998. MK7.3a release notes. The Open Group Research Institute, Cambridge, MA.
[34]
Tindell, K. and Clark, J. 1994. Holistic schedulability analysis for distributed hard real-time systems. Microprocess. Microprogram. 40, 2-3, 117--134.

Cited By

View all
  • (2012)About 15 years of real-time JavaProceedings of the 10th International Workshop on Java Technologies for Real-time and Embedded Systems10.1145/2388936.2388943(34-43)Online publication date: 24-Oct-2012
  • (2011)Management of Orphaned-Nodes in Wireless Sensor Networks for Smart Irrigation SystemsIEEE Transactions on Signal Processing10.1109/TSP.2011.216025859:10(4909-4922)Online publication date: 1-Oct-2011

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Embedded Computing Systems
ACM Transactions on Embedded Computing Systems  Volume 10, Issue 1
August 2010
369 pages
ISSN:1539-9087
EISSN:1558-3465
DOI:10.1145/1814539
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Journal Family

Publication History

Published: 27 August 2010
Accepted: 01 July 2009
Revised: 01 April 2009
Received: 01 November 2008
Published in TECS Volume 10, Issue 1

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Distributed
  2. Java
  3. distributable thread
  4. distributed scheduling
  5. real-time
  6. thread integrity

Qualifiers

  • Research-article
  • Research
  • Refereed

Funding Sources

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)2
  • Downloads (Last 6 weeks)0
Reflects downloads up to 20 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2012)About 15 years of real-time JavaProceedings of the 10th International Workshop on Java Technologies for Real-time and Embedded Systems10.1145/2388936.2388943(34-43)Online publication date: 24-Oct-2012
  • (2011)Management of Orphaned-Nodes in Wireless Sensor Networks for Smart Irrigation SystemsIEEE Transactions on Signal Processing10.1109/TSP.2011.216025859:10(4909-4922)Online publication date: 1-Oct-2011

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media