Skip to main content

Part of the book series: High-Performance Computing Series ((HPC,volume 1))

Abstract

The goal of the ZeptoOS project was to explore fundamental limits and advanced designs required for petascale operating system suites, focusing on ultrascale and collective OS behavior. Within the project, the Linux kernel was ported to the IBM Blue Gene’s compute nodes. Major research activities included work on HPC-specific memory management (called Big Memory) and on extensible I/O forwarding infrastructure (called ZOID). The project demonstrated excellent performance and scalability of the Linux kernel, comparable to the IBM lightweight kernel, at the same time attracting novel use cases.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 109.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 149.00
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 139.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  • Arcangeli, A. (2010). Transparent hugepage support. KVM Forum. https://www.linux-kvm.org/images/9/9e/2010-forum-thp.pdf.

  • Beckman, P., Iskra, K., Yoshii, K., & Coghlan, S. (2006a). The influence of operating systems on the performance of collective operations at extreme scale. IEEE International Conference on Cluster Computing, Cluster.

    Google Scholar 

  • Beckman, P., Iskra, K., Yoshii, K., & Coghlan, S. (2006b). Operating system issues for petascale systems. ACM SIGOPS Operating Systems Review, 40(2), 29–33.

    Article  Google Scholar 

  • Beckman, P., Iskra, K., Yoshii, K., Coghlan, S., & Nataraj, A. (2008). Benchmarking the effects of operating system interference on extreme-scale parallel machines. Cluster Computing, 11(1), 3–16.

    Article  Google Scholar 

  • Brightwell, R., Riesen, R., Underwood, K., Bridges, P. G., Maccabe, A. B., & Hudson, T. (2003). A performance comparison of Linux and a lightweight kernel. IEEE International Conference on Cluster Computing, Cluster (p. 251–258).

    Google Scholar 

  • Butcher, H. R. (2004). LOFAR: First of a new generation of radio telescopes. Proceedings SPIE, 5489, 537–544.

    Article  Google Scholar 

  • Carns, P. H., Ligon III, W. B., Ross, R. B., & Thakur, R. (2000). PVFS: A parallel file system for Linux clusters. 4th Annual Linux Showcase and Conference (pp. 317–327). GA: Atlanta.

    Google Scholar 

  • Gara, A., et al. (2005). Overview of the Blue Gene/L system architecture. IBM Journal of Research and Development, 49(2/3), 189–500.

    Google Scholar 

  • Giampapa, M., Gooding, T., Inglett, T., & Wisniewski, R. (2010). Experiences with a lightweight supercomputer kernel: Lessons learned from Blue Gene’s CNK. International Conference for High Performance Computing, Networking, Storage and Analysis, SC.

    Google Scholar 

  • IBM Blue Gene team. (2008). Overview of the IBM Blue Gene/P project. IBM Journal of Research and Development, 52(1/2), 199–220.

    Google Scholar 

  • Iskra, K., Romein, J. W., Yoshii, K., & Beckman, P. (2008). ZOID: I/O-forwarding infrastructure for petascale architectures. 13th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP (pp. 153–162). UT: Salt Lake City.

    Google Scholar 

  • Jones, T., Dawson, S., Neely, R., Tuel, W., Brenner, L., Fier, J., et al. (2003). Improving the scalability of parallel jobs by adding parallel awareness to the operating system. ACM/IEEE Conference on Supercomputing, SC. Phoenix: AZ.

    Google Scholar 

  • Jones, P. W., Worley, P. H., Yoshida, Y., White III, J. B., & Levesque, J. (2005). Practical performance portability in the parallel ocean program (POP). Concurrency and Computation: Practice and Experience, 17(10), 1317–1327.

    Google Scholar 

  • Kelly, S. M. & Brightwell, R. (2005). Software architecture of the light weight kernel, Catamount. 47th Cray User Group Conference, CUG. NM.

    Google Scholar 

  • Kerbyson, D. J., & Jones, P. W. (2005). A performance model of the parallel ocean program. International Journal of High Performance Computing Applications, 19(3), 261–276.

    Article  Google Scholar 

  • Kumar, S., Dozsa, G., Almasi, G., Heidelberger, P., Chen, D., Giampapa, M. E., et al. (2008). The Deep Computing Messaging Framework: Generalized scalable message passing on the Blue Gene/P supercomputer. 22nd Annual International Conference on Supercomputing, ICS (pp. 94–103).

    Google Scholar 

  • Moreira, J. E. et al. (2006). Designing a highly-scalable operating system: The Blue Gene/L story. ACM/IEEE Conference on Supercomputing, SC. FL.

    Google Scholar 

  • Moreira, J. E., et al. (2005). Blue Gene/L programming and operating environment. IBM Journal of Research and Development, 49(2/3), 367–376.

    Article  Google Scholar 

  • Nataraj, A., Morris, A., Malony, A., Sottile, M., & Beckman, P. (2007). The ghost in the machine: Observing the effects of kernel operation on parallel application performance. ACM/IEEE Conference on Supercomputing, SC.

    Google Scholar 

  • Nek5000 (2008). NEK5000: A fast and scalable high-order solver for computational fluid dynamics. https://nek5000.mcs.anl.gov/.

  • Peters, A., King, A., Budnik, T., McCarthy, P., Michaud, P., Mundy, M., et al. (2008). Asynchronous task dispatch for high throughput computing for the eServer IBM Blue Gene® supercomputer. IEEE International Symposium on Parallel and Distributed Processing, IPDPS.

    Google Scholar 

  • Petrini, F., Kerbyson, D. J., & Pakin, S. (2003). The case of the missing supercomputer performance: Achieving optimal performance on the 8,192 processors of ASCI Q. ACM/IEEE Conference on Supercomputing, SC.

    Google Scholar 

  • Raicu, I., Foster, I. T., & Zhao, Y. (2008a). Many-task computing for grids and supercomputers. Workshop on Many-Task Computing on Grids and Supercomputers, MTAGS.

    Google Scholar 

  • Raicu, I., Zhang, Z., Wilde, M., Foster, I., Beckman, P., Iskra, K., & Clifford, B. (2008b). Toward loosely coupled programming on petascale systems. ACM/IEEE Conference on Supercomputing, SC.

    Google Scholar 

  • Raicu, I., Zhao, Y., Dumitrescu, C., Foster, I., & Wilde, M. (2007). Falkon: A fast and light-weight task execution framework. ACM/IEEE Conference on Supercomputing, SC.

    Google Scholar 

  • Ritsko, J. J., Ames, I., Raider, S. I., & Robinson, J. H. (Eds.). (2005). IBM Journal of Research and Development. IBM Corporation. Blue Gene (Vol. 49).

    Google Scholar 

  • Romein, J. W., Broekema, P. C., Mol, J. D., & van Nieuwpoort, R. V. (2010). The LOFAR correlator: Implementation and performance analysis. 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP (pp. 169–178).

    Google Scholar 

  • Romein, J. W., Broekema, P. C., van Meijeren, E., van der Schaaf, K., & Zwart, W. H. (2006). Astronomical real-time streaming signal processing on a Blue Gene/L supercomputer. ACM Symposium on Parallel Algorithms and Architectures, SPAA (pp. 59–66). Cambridge.

    Google Scholar 

  • Shmueli, E., Almási, G., Brunheroto, J., Castaños, J., Dózsa, G., Kumar, S., et al. (2008). Evaluating the effect of replacing CNK with Linux on the compute-nodes of Blue Gene/L. 22nd ACM International Conference on Supercomputing, ICS (pp. 165–174). Greece: Kos.

    Google Scholar 

  • Tang, W., Lan, Z., Desai, N., and Buettner, D. (2009). Fault-aware, utility-based job scheduling on Blue Gene/P systems. In IEEE International Conference on Cluster Computing and Workshops, Cluster.

    Google Scholar 

  • Wallace, D. (2007). Compute Node Linux: Overview, progress to date and roadmap. Cray User Group Conference, CUG.

    Google Scholar 

  • Yoshii, K., Iskra, K., Naik, H., Beckman, P., & Broekema, P. (2009). Characterizing the performance of “Big Memory” on Blue Gene Linux. 2nd International Workshop on Parallel Programming Models and Systems Software for High-End Computing, P2S2 (pp. 65–72).

    Google Scholar 

  • Yoshii, K., Naik, H., Yu, C., & Beckman, P. (2011b). Extending and benchmarking the “Big Memory” implementation on Blue Gene/P Linux. 1st International Workshop on Runtime and Operating Systems for Supercomputers, ROSS (pp. 65–72).

    Google Scholar 

  • Yoshii, K., Iskra, K., Naik, H., Beckman, P., & Broekema, P. C. (2011a). Performance and scalability evaluation of “Big Memory” on Blue Gene Linux. International Journal of High Performance Computing Applications, 25(2), 148–160.

    Article  Google Scholar 

  • ZeptoOS (2005). ZeptoOS: Small Linux for big computers. http://www.mcs.anl.gov/research/projects/zeptoos/.

  • Zhao, Y., Hategan, M., Clifford, B., Foster, I., von Laszewski, G., Nefedova, V., et al. (2007). Swift: Fast, reliable, loosely coupled parallel computation. IEEE Congress on Services (pp. 199–206).

    Google Scholar 

Download references

Acknowledgements

We thank the rest of the ZeptoOS core team: Harish Naik and Chenjie Yu at Argonne National Laboratory and the University of Oregon’s Allen D. Malony, Sameer Shende, and Aroon Nataraj. We thank our colleagues at Argonne who offered their expertise and assistance in many areas, especially Susan Coghlan and other members of the Leadership Computing Facility. We also thank all our summer interns, in particular Balazs Gerofi, Kazunori Yamamoto, Peter Boonstoppel, Hajime Fujita, Satya Popuri, and Taku Shimosawa, who contributed to the ZeptoOS project. Additionally, we thank ASTRON’s John W. Romein and P. Chris Broekema and the University of Chicago’s Ioan Raicu, Zhao Zhang, Mike Wilde, and Ian Foster. In addition, we thank IBM’s Todd Inglett, Thomas Musta, Thomas Gooding, George Almási, Sameer Kumar, Michael Blocksome, Blake Fitch, Chris Ward, and Robert Wisniewski for their advice on programming the Blue Gene hardware.

This work was supported by the Office of Advanced Scientific Computer Research, Office of Science, U.S. Department of Energy, under Contract DE-AC02-06CH11357. This research used resources of the Argonne Leadership Computing Facility, which is a DOE Office of Science User Facility. 

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kamil Iskra .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Singapore Pte Ltd.

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Iskra, K., Yoshii, K., Beckman, P. (2019). ZeptoOS. In: Gerofi, B., Ishikawa, Y., Riesen, R., Wisniewski, R.W. (eds) Operating Systems for Supercomputers and High Performance Computing. High-Performance Computing Series, vol 1. Springer, Singapore. https://doi.org/10.1007/978-981-13-6624-6_10

Download citation

  • DOI: https://doi.org/10.1007/978-981-13-6624-6_10

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-13-6623-9

  • Online ISBN: 978-981-13-6624-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics