Abstract
The growing maturity of hardware and software components has tempted researchers to build very large SCI clusters with several hundred processors that are operated as high-performance compute servers in multi-user mode.
In this chapter, we present a resource management software for the user access and system administration of high-performance compute clusters named Computing Center Software (CCS). It is in day-to-day use since 1992 on various parallel systems and has recently been adapted to the management of SCI clusters. CCS provides pluggable schedulers, optimal space partitioning for multiple users, reliable user access, and powerful tools for specifying resources and services by means of a specification language and a graphical user interface.
After a brief introduction in the remainder of this section, we describe the CCS system architecture and the characteristics of its resource description facilities.
The work presented in this chapter was done while all three authors were at Paderborn Center for Parallel Computing, http://www.upb.de/pc2
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Abramson, D., Sosic, R., Giddy, J., Hall, B.: Nimrod: A Tool for Performing Parameterized Simulations using Distributed Workstations. In: 4th IEEE Symp. High Performance and Distributed Computing (August 1995)
Baker, M., Fox, G., Yau, H.: Cluster Computing Review. Northeast Parallel Architectures Center, Syracuse University, New York (1995), http://www.npar.syr.edu/techreports/index.html
Bauer, B., Ramme, F.: A General Purpose Resource Description Language. In: Grebe, B. (ed.) Parallele Datenverarbeitung mit dem Transputer, pp. 68–75. Springer, Berlin (1991)
Bayucan, A., Henderson, R., Proett, T., Tweten, D., Kelly, B.: Portable Batch System: External Reference Specification. Release 1.1.7, NASA Ames Research Center (June 1996)
Berman, F., Wolski, R., Figueira, S., Schopf, J., Shao, G.: Application-Level Scheduling on Distributed Heterogeneous Networks. Supercomputing (November 1996)
Boden, N., Cohen, D., Felderman, R.E., Kulawik, A.E., Seitz, C.L., Seizovic, J.N., Su, W.K.: Myrinet: A Gigabit-per-Second Local Area Network. IEEE Micro 15(1), 29–36 (1995)
Brune, M., Gehring, J., Keller, A., Reinefeld, A.: RSD – Resource and Service Description. In: Intl. Symp. on High Performance Computing Systems and Applications HPCS 1998, Edmonton Canada, Kluwer Academic Press, Dordrecht (1998)
Epema, D., Livny, M., van Dantzig, R., Evers, X., Pruyne, J.: A Worldwide Flock of Condors: Load Sharing among Workstation Clusters. In: FGCS, vol. 12, pp. 53–66 (1996)
Gehring, J., Ramme, F.: Architecture-Independent Request-Scheduling with Tight Waiting-Time Estimations. In: Feitelson, D.G., Rudolph, L. (eds.) IPPS-WS 1996 and JSSPP 1996. LNCS, vol. 1162, pp. 41–54. Springer, Heidelberg (1996)
GENIAS Software GmbH: Codine: Computing in Distributed Networked Environments (January 1999), http://www.genias.de/products/codine
Grimshaw, A., Weissman, J., West, E., Loyot, E.: Metasystems: An Approach Combining Parallel Processing and Heterogeneous Distributed Computing Systems. J. Parallel Distributed Computing 21, 257–270 (1994)
Jones, J., Brickell, C.: Second Evaluation of Job Queueing/Scheduling Software: Phase 1 Report. Nasa Ames Research Center, NAS Tech. Rep. NAS-97-013 (June 1997)
Keller, A., Reinefeld, A.: CCS Resource Management in Networked HPC Systems. In: 7th Heterogeneous Computing Workshop HCW 1998 at IPPS, Orlando Florida, pp. 44–56. IEEE Comp. Society Press, Los Alamitos (1998)
Kinsbury, B.A.: The Network Queuing System. Cosmic Software, NASA Ames Research Center (1986)
Litzkow, M.J., Livny, M.: Condor – A Hunter of Idle Workstations. In: Procs. 8th IEEE Int. Conference on Distributed Computing Systems, June 1988, pp. 104–111 (1988)
LSF: Product Overview (January 1999), http://www.platform.com/content/products/
NQE-Administration. Cray-Soft USA SG-2150 2.0 (May 1995)
Ramme, F., Römke, T., Kremer, K.: A Distributed Computing Center Software for the Efficient Use of Parallel Computer Systems. In: Gentzsch, W., Harms, U. (eds.) HPCN-Europe 1994. LNCS, vol. 797, pp. 129–136. Springer, Heidelberg (1994)
Tandiary, F., Kothari, S.C., Dixit, A., Anderson, E.W.: Batrun: Utilizing Idle Workstations for Large-Scale Computing. IEEE Parallel and Distributed Techn., 41–48 (1996)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 1999 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Brune, M., Keller, A., Reinefeld, A. (1999). Multi-User System Management on SCI Clusters. In: Hellwagner, H., Reinefeld, A. (eds) SCI: Scalable Coherent Interface. Lecture Notes in Computer Science, vol 1734. Springer, Berlin, Heidelberg. https://doi.org/10.1007/10704208_34
Download citation
DOI: https://doi.org/10.1007/10704208_34
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-66696-7
Online ISBN: 978-3-540-47048-9
eBook Packages: Springer Book Archive