Skip to main content

REXEC: A Decentralized, Secure Remote Execution Environment for Clusters

  • Conference paper
Network-Based Parallel Computing. Communication, Architecture, and Applications (CANPC 2000)

Abstract

Bringing clusters of computers into the mainstream as general-purpose computing systems requires that better facilities for transparent remote execution of parallel and sequential applications be developed. While much research has been done in this area, most of this work remains inaccessible for clusters built using contemporary hardware and operating systems. Implementations are either too old and/or not publicly available, require use of operating systems which are not supported by modern hardware, or simply do not meet the functional requirements demanded by practical use in real world settings. To address these issues, we designed REXEC, a decentralized, secure remote execution facility. It provides high availability, scalability, transparent remote execution, dynamic cluster configuration, decoupled node discovery and selection, a well-defined failure and cleanup model, parallel and distributed program support, and strong authentication and encryption. The system is implemented and is currently installed and in use on a 32-node cluster of 2-way SMPs running the Linux 2.2.5 operating system.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Agrawal, R., Ezzat, A.K.: Location independent remote execution in nest. IEEE Transactions on Software Engineering 13(8), 905–912 (1987)

    Article  Google Scholar 

  2. Barak, A., La’Adan, O., Smith, A.: Scalable cluster computing with mosix for linux. In: Proceedings of Linux Expo 1999, pp. 95–100 (May 1999)

    Google Scholar 

  3. Barcellos, A.M.P., Schramm, J.F.L., Filho, V.R.B., Geyer, C.F.R.: The hetnos network operating system: a tool for writing distributed applications. Operating Systems Review (October 1994)

    Google Scholar 

  4. Chun, B.N., Culler, D.E.: Market-based proportional resource sharing for clusters (September 1999) (submitted for publication)

    Google Scholar 

  5. Douglis, F., Ousterhout, J.: Transparent process migration: Design alternatives and the sprite implementation. Software—Practice and Experience 21(8) (August 1991)

    Google Scholar 

  6. Freier, A.O., Karlton, P., Kocher, P.C.: The ssl protocol version 3.0, internetdraft (1996)

    Google Scholar 

  7. Ghormley, D.P., Petrou, D., Rodrigues, S.H., Vahdat, A.M., Anderson, T.E.: Glunix: a global layer unix for a network of workstations. Software—Practice and Experience (April 1998)

    Google Scholar 

  8. Hori, A., Tezuka, H., Ishikawa, Y.: An implementation of parallel operating system for clustered commodity computers. In: Proceedings of Cluster Computing Conference (March 1997)

    Google Scholar 

  9. Ju, J., Xu, G., Tao, J.: Parallel computing using idle workstations. Operating Systems Review (July 1993)

    Google Scholar 

  10. Khalidi, Y.A., Bernabeu, J.M., Matena, V., Shirriff, K., Thadani, M.: Solaris mc: A multi computer os. In: Proceedings of the 1996 USENIX Conference (1996)

    Google Scholar 

  11. Litzkow, M., Tannenbaum, T., Basney, J., Livny, M.: Checkpoint andmigration of unix processes in the condor distributed processing system. Tech. Rep. 1346, University of Wisconsin-Madison (April 1997)

    Google Scholar 

  12. Myricom. The gm api (1999)

    Google Scholar 

  13. Nichols, D.A.: Using idle workstations in a shared computing environment. In: Proceedings of the 11th ACM Symposium on Operating Systems Principles (1987)

    Google Scholar 

  14. Ousterhout, J.K., Cherenson, A.R., Douglis, F., Nelson, M.N., Welch, B.B.: The sprite network operating system. IEEE Computer 21(2) (February 1988)

    Google Scholar 

  15. Plank, J.S., Beck, M., Kingsley, G., Li, K.: Libckpt: Transparent checkpointing under unix. In: Proceedings of the 1995 USENIX Winter Conference (1995)

    Google Scholar 

  16. Rowe, L.A., Birman, K.P.: A local network based on the unix operating system. IEEE Transactions on Software Engineering 8(2) (March 1982)

    Google Scholar 

  17. Shirriff, K.: Building distributed process management on an object-oriented framework. In: Proceedings of the 1997 USENIX Conference (1997)

    Google Scholar 

  18. Stumm, M.: The design and implementation of a decentralized scheduling facility for a workstation cluster. In: Proceedings of the 2nd IEEE Conference on Computer Workstations, pp. 12–22 (March 1988)

    Google Scholar 

  19. Theimer, M.M., Lantz, K.A., Cheriton, D.R.: Preemptable remote execution facilities for the v-system. In: Proceedings of the 10th ACM Symposium on Operating Systems Principles (1985)

    Google Scholar 

  20. Waldspurger, C.A., Weihl, W.E.: Stride scheduling: Deterministic proportionalshare resource management. Tech. Rep. MIT/LCS/TM-528, Massachusetts Institute of Technology (1995)

    Google Scholar 

  21. Walker, B., Popek, G., English, R., Kline, C., Thiel, G.: The locus distributed operating system. In: Proceedings of the 9th ACM Symposium on Operating Systems Principles, pp. 49–70 (1983)

    Google Scholar 

  22. Zhou, S., Wang, J., Zheng, X., Delisle, P.: Utopia: A load sharing facility for large, heterogenous distributed computer systems. Software—Practice and Experience (1992)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2000 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Chun, B.N., Culler, D.E. (2000). REXEC: A Decentralized, Secure Remote Execution Environment for Clusters. In: Falsafi, B., Lauria, M. (eds) Network-Based Parallel Computing. Communication, Architecture, and Applications. CANPC 2000. Lecture Notes in Computer Science, vol 1797. Springer, Berlin, Heidelberg. https://doi.org/10.1007/10720115_1

Download citation

  • DOI: https://doi.org/10.1007/10720115_1

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-67879-3

  • Online ISBN: 978-3-540-44655-2

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics