Skip to main content

Hierarchical clustering: A structure for scalable multiprocessor operating system design

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

We introduce the concept ofhierarchical clustering as a way to structure shared-memory multiprocessor operating systems for scalability. The concept is based on clustering and hierarchical system design. Hierarchical clustering leads to a modular system, composed of easy-to-design and efficient building blocks. The resulting structure is scalable because it 1) maximizes locality, which is key to good performance in NUMA (non-uniform memory access) systems and 2) provides for concurrency that increases linearly with the number of processors. At the same time, there is tight coupling within a cluster, so the system performs well for local interactions that are expected to constitute the common case. A clustered system can easily be adapted to different hardware configurations and architectures by changing the size of the clusters. We show how this structuring technique is applied to the design of a microkernel-based operating system calledHurricane. This prototype system is the first complete and running implementation of its kind and demonstrates the feasibility of a hierarchically clustered system. We present performance results based on the prototype, demonstrating the characteristics and behavior of a clustered system. In particular, we show how clustering trades off the efficiencies of tight coupling for the advantages of replication, increased locality, and decreased lock contention.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Ahmad, I., and Ghafoor, A. 1991. Semi-distributed load balancing for massively parallel multicomputer systems.IEEE Transactions on Software Engineering, 17, 10 (Oct.): 987–1004.

    Google Scholar 

  • Anderson, T.E. 1990. The performance of spin lock alternatives for shared-memory multiprocessors.IEEE Transactions on Parallel and Distributed Systems, 1, 1 (Jan.): 6–16.

    Google Scholar 

  • Balan, R., and Gollhardt, K. 1992. A scalable implementation of virtual memory HAT layer for shared memory multiprocessor machines. InProc., USEN1X Summer '92 Conference (San Antonio, Tex., June), pp. 107–115.

  • Barach, D., Wells, R., and Uban, T. 1990. Design of parallel virtual memory management on the TC2000. Technical Report 7296, BBN Advanced Computers Inc., Cambridge, Mass.

    Google Scholar 

  • Barak, A., and Kornatzky, Y. 1987. Design principles of operating systems for large scale multicomputers. Technical Report RC 13220 (#59114), IBM T.J. Watson Research Center.

  • BBN. 1988.Overview of the Butterfly GP1000. BBN Advanced Computers, Inc.

  • BBN. 1989.TC2000 Technical Product Summary. BBN Advanced Computers, Inc.

  • Bolosky, W.J., Fitzgerald, R.P, and Scott, M.L. 1989. Simple but effective techniques for NUMA memory management. InProc., 12th ACM Symposium on Operating System Principles, pp. 19–31.

  • Brecht, T.B. 1993. On the importance of parallel application placement in NUMA multiprocessors. InProc., SEDMS IV, Symposium on Experiences with Distributed and Multiprocessor Systems, USENIX Association, pp. 1–18.

  • Campbell, M., Holt, R., and Slice, J. 1991. Lock granularity tuning mechanisms in SVR4/MP. InProc., SEDMS II, Symposium on Experiences with Distributed and Multiprocessor Systems, USENIX Association, pp. 221–228.

  • Chaiken, D., Kubiatowicz, J., and Agarwal, A. 1991. LimitLESS directories: A scalable cache coherence scheme. InProc., 4th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS) (Santa Clara), ACM Press, pp. 224–234.

    Google Scholar 

  • Chang, H.H.Y., and Rosenburg, B. 1992. Experience porting Mach to the RP3 large-scale shared-memory multiprocessor.Future Generation Computer Systems, 7, 2/3 (Apr.): 259–267.

    Google Scholar 

  • Chaves, E., Das, P.C., LeBlanc, T.J., Marsh, B.D., and Scott, M.L. 1993. Kernel-kernel communication in a shared-memory multiprocessor.Concurrency: Practice and Experience, 5, 3, (May): pp. 171–191.

    Google Scholar 

  • Chen, J.B., and Bershad, B.N. 1993. The impact of operating system structure on memory system performance. InProc., 14th ACM Symposium on Operating Systems Principles, pp. 120–133.

  • Cheriton, D.R. 1988. The V distributed system.Communications of the ACM, 31, 3 (Mar.): 314–333.

    Google Scholar 

  • Cheriton, D.R., Goosen, H., and Boyle, P. 1991. ParaDiGM: A highly scalable shared-memory multi-computer architecture.IEEE Computer, 24, 2 (Feb.): 33–46.

    Google Scholar 

  • Cox, A.L., and Fowler, R.J. 1989. The implementation of a coherent memory abstraction on a NUMA multiprocessor: Experiences with PLATINUM. InProc., 12th ACM Symposium on Operating System Principles, pp. 32–44.

  • Feitelson, D.G., and Rudolph, L. 1990. Distributed hierarchical control for parallel processing.IEEE Computer, 23, 5 (May): 65–81.

    Google Scholar 

  • Frank, S., Rothnie, J., and Burkhardt, H. 1993. The KSR1: Bridging the gap between shared memory and MPPs. InIEEE Compcon 1993 Digest of Papers, pp. 285–294.

  • Gamsa, B. 1992. Region-oriented main memory management in shared-memory NUMA multiprocessors. Master's thesis, Department of Computer Science, University of Toronto, Toronto, Canada.

    Google Scholar 

  • Gamsa, B., Krieger, O., and Stumm, M. 1993. Optimizing IPC performance for shared-memory multiprocessors. Technical Report 294, CSRI, University of Toronto, Toronto, Canada.

    Google Scholar 

  • Hagersten, E., Landin, A., and Haridi, S. 1992. DDM — A cache-only memory architecture.IEEE Computer, 25, 9 (Sept.): 44–54.

    Google Scholar 

  • Krieger, O. 1994. HFS: A flexible file system for shared memory multiprocessors. Ph.D. thesis, Department of Electrical and Computer Engineering, University of Toronto, Toronto, Canada.

    Google Scholar 

  • Krieger, O., and Stumm, M. 1993. HFS: A flexible file system for large-scale multiprocessors. InProc., 1993 DAGS/PC Symposium (Hanover, N.H., June), Dartmouth Institute for Advanced Graduate Studies, pp. 6–14.

  • Krieger, O., Stumm, M., and Unrau, R. 1994. The Alloc Stream Facility: A redesign of application-level stream I/O.IEEE Computer, 27, 3 (Mar.): 75–83.

    Google Scholar 

  • Kuck, D.J., Davidson, E.S., Lawrie, D.H., and Sameh, A.H. 1986. Parallel supercomputing today and the Cedar approach.Science, 231 (Feb.): 967–974.

    Google Scholar 

  • LaRowe Jr., R.P., Ellis, C.S., and Kaplan, L.S. 1991. Tuning NUMA memory management for applications and architectures. InProc., 13th ACM Symposium on Operating System Principles (Asilomar, Pacific Grove, Calif.), Association for Computing Machinery SIGOPS, pp. 137–151.

    Google Scholar 

  • Lenoski, D., Laudon, J., Gharachorloo, K., Weber, W., Gupta, A., Hennessy, J., Horowitz, M., and Lam, M.S. 1992. The Stanford DASH Multiprocessor.Computer, 25, 3 (Mar.): 63–79.

    Google Scholar 

  • Mellor-Crummey, J.M., and Scott, M.L. 1991. Algorithms for scalable synchronization on shared-memory multiprocessors.ACM Transactions on Computer Systems, 9, 1 (Feb.): 21–65.

    Google Scholar 

  • Oed, W. 1993. The Cray Research massively parallel processor system CRAY T3D. Technical report, Cray Research GmbH, München, Germany.

    Google Scholar 

  • Peacock, J.K., Saxena, S., Thomas, T., Yang, F., and Yu, W. 1992. Experiences from multithreading system V release 4. InProc., SEDMS III, Symposium on Experiences with Distributed and Multiprocessor Systems, USENIX Association, pp. 77–91.

  • Pfister, G.F., Brantley, W.C., George, D.A., Harvey, S.L., Kleinfelder, W.J., McAuliffe, K.P., Melton, E.A., Norton, V.A., and Weiss, J. 1985. The IBM Research Parallel Processor Prototype. InProc., 1985 International Conference on Parallel Processing, pp. 764–771.

  • Scott, M.L., LeBlanc, T.J., Marsh, B.D., Becker, T.G., Dubnicki, C., Markatos, E.P., and Smithline, N.G. 1990. Implementation issues for the Psyche multiprocessor operating system.Computing Systems, 3, 1 (Jan.): 101–137.

    Google Scholar 

  • Simon, H.A. 1985.The Sciences of the Artificial, 2nd ed. MIT Press, Cambridge, Mass.

    Google Scholar 

  • Stumm, M., Unrau, R., and Krieger, O. 1992. Designing a scalable operating system for shared memory multiprocessors. InProc., USENIX Workshop on Microkernels and Other Kernel Architectures, pp. 285–303.

  • Unrau, R. 1993. Scalable memory management through hierarchical symmetric multiprocessing. Ph.D. thesis, Department of Electrical and Computer Engineering, University of Toronto, Toronto, Canada.

    Google Scholar 

  • Unrau, R., Krieger, O., Gamsa, B., and Stumm, M. 1994. Experiences with locking in a NUMA multiprocessor operating system kernel. InProc., USENIX OSDI Symposium (Nov.), pp. 139–152.

  • Vranesic, Z.G., Stumm, M., Lewis, D., and White, R. 1991. Hector: A hierarchically structured shared-memory multiprocessor.IEEE Computer, 24, 1 (Jan.): 72–80.

    Google Scholar 

  • Zajcew, R., Roy, P., Black, D., Peak, C., Guedes, P., Kemp, B., LoVerso, J., Leibensperger, M., Barnett, M., Rabii, F., and Netterwala, D. 1993. An OSF/1 UNIX for massively parallel multicomputers. InProc., USENIX Winter Conference, USENIX Association, pp. 449–468.

  • Zhou, Z., and Brecht, T. 1991. Processor pool-based scheduling for large-scale NUMA multiprocessors. InProc., ACM Sigmetrics Conference on Measurement and Modeling of Computer Systems (San Diego), ACM Press, pp. 133–142.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Unrau, R.C., Krieger, O., Gamsa, B. et al. Hierarchical clustering: A structure for scalable multiprocessor operating system design. J Supercomput 9, 105–134 (1995). https://doi.org/10.1007/BF01245400

Download citation

  • Received:

  • Accepted:

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF01245400

Keywords