Abstract
We introduce the concept ofhierarchical clustering as a way to structure shared-memory multiprocessor operating systems for scalability. The concept is based on clustering and hierarchical system design. Hierarchical clustering leads to a modular system, composed of easy-to-design and efficient building blocks. The resulting structure is scalable because it 1) maximizes locality, which is key to good performance in NUMA (non-uniform memory access) systems and 2) provides for concurrency that increases linearly with the number of processors. At the same time, there is tight coupling within a cluster, so the system performs well for local interactions that are expected to constitute the common case. A clustered system can easily be adapted to different hardware configurations and architectures by changing the size of the clusters. We show how this structuring technique is applied to the design of a microkernel-based operating system calledHurricane. This prototype system is the first complete and running implementation of its kind and demonstrates the feasibility of a hierarchically clustered system. We present performance results based on the prototype, demonstrating the characteristics and behavior of a clustered system. In particular, we show how clustering trades off the efficiencies of tight coupling for the advantages of replication, increased locality, and decreased lock contention.
Similar content being viewed by others
References
Ahmad, I., and Ghafoor, A. 1991. Semi-distributed load balancing for massively parallel multicomputer systems.IEEE Transactions on Software Engineering, 17, 10 (Oct.): 987–1004.
Anderson, T.E. 1990. The performance of spin lock alternatives for shared-memory multiprocessors.IEEE Transactions on Parallel and Distributed Systems, 1, 1 (Jan.): 6–16.
Balan, R., and Gollhardt, K. 1992. A scalable implementation of virtual memory HAT layer for shared memory multiprocessor machines. InProc., USEN1X Summer '92 Conference (San Antonio, Tex., June), pp. 107–115.
Barach, D., Wells, R., and Uban, T. 1990. Design of parallel virtual memory management on the TC2000. Technical Report 7296, BBN Advanced Computers Inc., Cambridge, Mass.
Barak, A., and Kornatzky, Y. 1987. Design principles of operating systems for large scale multicomputers. Technical Report RC 13220 (#59114), IBM T.J. Watson Research Center.
BBN. 1988.Overview of the Butterfly GP1000. BBN Advanced Computers, Inc.
BBN. 1989.TC2000 Technical Product Summary. BBN Advanced Computers, Inc.
Bolosky, W.J., Fitzgerald, R.P, and Scott, M.L. 1989. Simple but effective techniques for NUMA memory management. InProc., 12th ACM Symposium on Operating System Principles, pp. 19–31.
Brecht, T.B. 1993. On the importance of parallel application placement in NUMA multiprocessors. InProc., SEDMS IV, Symposium on Experiences with Distributed and Multiprocessor Systems, USENIX Association, pp. 1–18.
Campbell, M., Holt, R., and Slice, J. 1991. Lock granularity tuning mechanisms in SVR4/MP. InProc., SEDMS II, Symposium on Experiences with Distributed and Multiprocessor Systems, USENIX Association, pp. 221–228.
Chaiken, D., Kubiatowicz, J., and Agarwal, A. 1991. LimitLESS directories: A scalable cache coherence scheme. InProc., 4th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS) (Santa Clara), ACM Press, pp. 224–234.
Chang, H.H.Y., and Rosenburg, B. 1992. Experience porting Mach to the RP3 large-scale shared-memory multiprocessor.Future Generation Computer Systems, 7, 2/3 (Apr.): 259–267.
Chaves, E., Das, P.C., LeBlanc, T.J., Marsh, B.D., and Scott, M.L. 1993. Kernel-kernel communication in a shared-memory multiprocessor.Concurrency: Practice and Experience, 5, 3, (May): pp. 171–191.
Chen, J.B., and Bershad, B.N. 1993. The impact of operating system structure on memory system performance. InProc., 14th ACM Symposium on Operating Systems Principles, pp. 120–133.
Cheriton, D.R. 1988. The V distributed system.Communications of the ACM, 31, 3 (Mar.): 314–333.
Cheriton, D.R., Goosen, H., and Boyle, P. 1991. ParaDiGM: A highly scalable shared-memory multi-computer architecture.IEEE Computer, 24, 2 (Feb.): 33–46.
Cox, A.L., and Fowler, R.J. 1989. The implementation of a coherent memory abstraction on a NUMA multiprocessor: Experiences with PLATINUM. InProc., 12th ACM Symposium on Operating System Principles, pp. 32–44.
Feitelson, D.G., and Rudolph, L. 1990. Distributed hierarchical control for parallel processing.IEEE Computer, 23, 5 (May): 65–81.
Frank, S., Rothnie, J., and Burkhardt, H. 1993. The KSR1: Bridging the gap between shared memory and MPPs. InIEEE Compcon 1993 Digest of Papers, pp. 285–294.
Gamsa, B. 1992. Region-oriented main memory management in shared-memory NUMA multiprocessors. Master's thesis, Department of Computer Science, University of Toronto, Toronto, Canada.
Gamsa, B., Krieger, O., and Stumm, M. 1993. Optimizing IPC performance for shared-memory multiprocessors. Technical Report 294, CSRI, University of Toronto, Toronto, Canada.
Hagersten, E., Landin, A., and Haridi, S. 1992. DDM — A cache-only memory architecture.IEEE Computer, 25, 9 (Sept.): 44–54.
Krieger, O. 1994. HFS: A flexible file system for shared memory multiprocessors. Ph.D. thesis, Department of Electrical and Computer Engineering, University of Toronto, Toronto, Canada.
Krieger, O., and Stumm, M. 1993. HFS: A flexible file system for large-scale multiprocessors. InProc., 1993 DAGS/PC Symposium (Hanover, N.H., June), Dartmouth Institute for Advanced Graduate Studies, pp. 6–14.
Krieger, O., Stumm, M., and Unrau, R. 1994. The Alloc Stream Facility: A redesign of application-level stream I/O.IEEE Computer, 27, 3 (Mar.): 75–83.
Kuck, D.J., Davidson, E.S., Lawrie, D.H., and Sameh, A.H. 1986. Parallel supercomputing today and the Cedar approach.Science, 231 (Feb.): 967–974.
LaRowe Jr., R.P., Ellis, C.S., and Kaplan, L.S. 1991. Tuning NUMA memory management for applications and architectures. InProc., 13th ACM Symposium on Operating System Principles (Asilomar, Pacific Grove, Calif.), Association for Computing Machinery SIGOPS, pp. 137–151.
Lenoski, D., Laudon, J., Gharachorloo, K., Weber, W., Gupta, A., Hennessy, J., Horowitz, M., and Lam, M.S. 1992. The Stanford DASH Multiprocessor.Computer, 25, 3 (Mar.): 63–79.
Mellor-Crummey, J.M., and Scott, M.L. 1991. Algorithms for scalable synchronization on shared-memory multiprocessors.ACM Transactions on Computer Systems, 9, 1 (Feb.): 21–65.
Oed, W. 1993. The Cray Research massively parallel processor system CRAY T3D. Technical report, Cray Research GmbH, München, Germany.
Peacock, J.K., Saxena, S., Thomas, T., Yang, F., and Yu, W. 1992. Experiences from multithreading system V release 4. InProc., SEDMS III, Symposium on Experiences with Distributed and Multiprocessor Systems, USENIX Association, pp. 77–91.
Pfister, G.F., Brantley, W.C., George, D.A., Harvey, S.L., Kleinfelder, W.J., McAuliffe, K.P., Melton, E.A., Norton, V.A., and Weiss, J. 1985. The IBM Research Parallel Processor Prototype. InProc., 1985 International Conference on Parallel Processing, pp. 764–771.
Scott, M.L., LeBlanc, T.J., Marsh, B.D., Becker, T.G., Dubnicki, C., Markatos, E.P., and Smithline, N.G. 1990. Implementation issues for the Psyche multiprocessor operating system.Computing Systems, 3, 1 (Jan.): 101–137.
Simon, H.A. 1985.The Sciences of the Artificial, 2nd ed. MIT Press, Cambridge, Mass.
Stumm, M., Unrau, R., and Krieger, O. 1992. Designing a scalable operating system for shared memory multiprocessors. InProc., USENIX Workshop on Microkernels and Other Kernel Architectures, pp. 285–303.
Unrau, R. 1993. Scalable memory management through hierarchical symmetric multiprocessing. Ph.D. thesis, Department of Electrical and Computer Engineering, University of Toronto, Toronto, Canada.
Unrau, R., Krieger, O., Gamsa, B., and Stumm, M. 1994. Experiences with locking in a NUMA multiprocessor operating system kernel. InProc., USENIX OSDI Symposium (Nov.), pp. 139–152.
Vranesic, Z.G., Stumm, M., Lewis, D., and White, R. 1991. Hector: A hierarchically structured shared-memory multiprocessor.IEEE Computer, 24, 1 (Jan.): 72–80.
Zajcew, R., Roy, P., Black, D., Peak, C., Guedes, P., Kemp, B., LoVerso, J., Leibensperger, M., Barnett, M., Rabii, F., and Netterwala, D. 1993. An OSF/1 UNIX for massively parallel multicomputers. InProc., USENIX Winter Conference, USENIX Association, pp. 449–468.
Zhou, Z., and Brecht, T. 1991. Processor pool-based scheduling for large-scale NUMA multiprocessors. InProc., ACM Sigmetrics Conference on Measurement and Modeling of Computer Systems (San Diego), ACM Press, pp. 133–142.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Unrau, R.C., Krieger, O., Gamsa, B. et al. Hierarchical clustering: A structure for scalable multiprocessor operating system design. J Supercomput 9, 105–134 (1995). https://doi.org/10.1007/BF01245400
Received:
Accepted:
Issue Date:
DOI: https://doi.org/10.1007/BF01245400