Skip to main content
Log in

The Experience in Designing and Evaluating the High Performance Cluster Netuno

  • Published:
International Journal of Parallel Programming Aims and scope Submit manuscript

Abstract

This paper presents the Netuno supercomputer, a large-scale cluster installed at Federal University of Rio de Janeiro in Brazil. A detailed performance evaluation of Netuno is presented, depicting its computational and I/O performance, as well as the results for two real-world applications. Since building a high- performance cluster for running a wide range of applications is a non-trivial task, some lessons learned from assembling and operating this cluster, such as the excellent performance of the OpenMPI library, and the relevance of employing an efficient parallel file system over the traditional NFS system, can be useful knowledge to support the design of new systems. Currently, Netuno is being heavily used to run large scale simulations in the areas of ocean modeling, meteorology, engineering, physics, and geophysics.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Adachi, T., Shida, N., Miura, K., Sumimoto, S., Uno, A., Kurokawa, M., Shoji, F., Yokokawa, M.: The design of ultra scalable mpi collective communication on the k computer. Comput. Sci. Res. Dev. (2012). doi:10.1007/s00450-012-0211-7

  2. Alam, S., Barrett, R., Bast, M., Fahey, M.R., Kuehn, J., McCurdy, C., Rogers, J., Roth, P., Sankaran, R., Vetter, J.S., Worley, P., Yu, W.: Early evaluation of ibm bluegene/p. In SC ’08: Proceedings of the 2008 ACM/IEEE Conference on Supercomputing, pp. 1–12. IEEE Press, Piscataway, NJ, USA, (2008)

  3. Alam, S.R., Kuehn, J.A., Barrett, R.F., Larkin, J.M., Fahey, M.R., Sankaran, R., Worley, P.H.: Cray xt4: an early evaluation for petascale scientific simulation. In SC ’07: Proceedings of the 2007 ACM/IEEE Conference on Supercomputing, pp. 1–12. ACM, New York, NY, USA (2007)

  4. Barker, K.J., Davis, K., Hoisie, A., Kerbyson, D.J., Lang, M., Pakin, S., Sancho, J.C.: Entering the petaflop era: the architecture and performance of roadrunner. In SC ’08: Proceedings of the 2008 ACM/IEEE Conference on Supercomputing, pp. 1–11. IEEE Press, Piscataway, NJ, USA (2008)

  5. Barker, K.J., Kerbyson, D.J.: A performance model and scalability analysis of the hycom ocean simulation application. In: Parallel and Distributed Computing and Systems Proceedings of the 17th IASTED International Conference (2005)

  6. Bleck R.: An oceanic general circulation model framed in hybrid isopycnic-cartesian. Ocean Model. 37, 55–88 (2002)

    Article  Google Scholar 

  7. Bleck R., Benjamin S.: Regional weather prediction with a model combining terrain-following and isentropic coordinates. part I: model description. Mon. Weather Rev. 121(6), 1770–1785 (1993)

    Article  Google Scholar 

  8. Bleck R., Boudra D.: Initial testing of a numerical ocean circulation model using a hybrid (quasi-isopycnic) vertical coordinate. J. Phys. Oceanogr. 11(6), 755–769 (1981)

    Article  Google Scholar 

  9. Chai, L., Gao, Q., Panda, D.K.: Understanding the impact of multi-core architecture in cluster computing: a case study with intel dual-core system. In: Seventh IEEE International Symposium on Cluster Computing and the Grid—CCGRID 2007, pp. 471–478 (2007)

  10. Chris D.: Building and managing production bioclusters. BIOSILICO 2(5), 208–213 (2004)

    Google Scholar 

  11. Darling, A.E., Carey, L., Feng, W.C.: The design, implementation, and evaluation of mpiblast. In: Proceedings of ClusterWorld 2003 (2003)

  12. Davis, B., Auyeung, M., Clark, M., Lee, C.,Thomas, M., Palko, J., Varney, R.: Lessons learned building a general purpose cluster for space mission applications. In: IEEE International Conference on Space Mission Challenges for Information Technology (SMC-IT 2006) (2006)

  13. Davis, B., Auyeung, M., Green, G., Lee, C.: Building a high-performance computing cluster using FreeBSD. In: Proceedings of BSDCon 2003, pp. 35–46 (2003)

  14. Dongarra, J.: Frequently asked questions on the linpack benchmark and top500. http://www.netlib.org/utk/people/JackDongarra/faq-linpack.html (2007)

  15. Dunigan, T.H.Jr., Vetter, J.S., Worley, P.H.: Performance evaluation of the sgi altix 3700. In: International Conference on Parallel Processing (ICPP), pp. 231–240 (2005)

  16. Fatoohi, R.: Performance evaluation of the dual-core based sgi altix 4700. In: International Symposium on Computer Architecture and High Performance Computing, pp. 97–104 (2007)

  17. Ginty, K., Tindle, J., Tindle, S.J.: Cluster systems—an open-access design solution. In: ICSE2009 Conference (2009)

  18. Hasegawa, Y., Iwata, J.-I., Tsuji, M., Takahashi, D., Oshiyama, A., Minami, K., Boku, T., Shoji, F., Uno, A., Kurokawa, M., Inoue, H., Miyoshi, I., Yokokawa, M.: First-principles calculations of electron states of a silicon nanowire with 100,000 atoms on the k computer. In: Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis, SC ’11, pp. 1–11. ACM, New York, NY, USA (2011)

  19. Kistler M., Gunnels J., Brokenshire D., Benton B.: Programming the linpack benchmark for roadrunner. IBM J. Res. Dev. 53(5), 736–746 (2009)

    Article  Google Scholar 

  20. Koop, M.J., Huang, W., Vishnu, A., Panda, D.K.: Memory scalability evaluation of the next-generation intel bensley platform with infiniband. In: Proceedings of the 14th IEEE Symposium on High-Performance Interconnects, pp. 52–60. IEEE Computer Society (2006)

  21. Krioukov, A., Mohan, P., Alspaugh, S., Keys, L., Culler, D., Napsac, R.K.: The design and implementation of a power proportional web cluster. In: Proceedings of the 1st ACM SIGCOMM Workshop on Green Networking. ACM, New York, NY, USA (2010)

  22. Los Alamos National Labs. Mpi-io test. http://public.lanl.gov/jnunez/benchmarks/mpiiotest.htm (2011)

  23. Leach, C.L., Oppe, T.C., Ward, W.A.Jr., Campbell, R.L.Jr.: Cwo-based hpcmp systems assessment using hycom and wrf. In: Proceedings of the 2005 Users Group Conference, pp. 356–359 (2005)

  24. Norcott, W.D.: Iozone filesystem benchmark. http://www.iozone.org (2011)

  25. Oliker, L., Canning, A., Carter, J., Iancu, C., Lijewski, M., Kamil, S., Shalf, J., Shan, J., Strohmaier, E., Ethier, S., Goodale, T.: Scientific application performance on candidate petascale platforms. In: Proceedings of the International Parallel and Distributed Processing Symposium (IPDPS), pp. 1–12 (2007)

  26. Panetta J., Filho P.R.P.S., Filho C.A.C., da Motta F.M.R., Pinheiro S.S., Junior I.P., Rosa A.L.R., Monnerat L.R., Carneiro L.T., Albrecht C.H.B.: Computational characteristics of production seismic migration and its performance on novel processor architectures. Int. Symp. Comput. Archit. High Perform. Comput. 0, 11–18 (2007)

    Google Scholar 

  27. Petitet, A., Whaley, R.C., Dongarra, J., Cleary, A.: Hpl—a portable implementation of the high-performance linpack benchmark for distributed-memory computers. http://www.netlib.org/benchmark/hpl (2008)

  28. Shan, H., Shalf, J.: Using ior to analyze the i/o performance for hpc platforms. In: Cray User Gruop Conference, Seattle, WA, USA (2007)

  29. Shipman, G.M., Woodall, T.S., Graham, R.L., Maccabe, A.B., Bridges, P.G. Infiniband Scalability in OpenMPI. In: Proceedings of IEEE Parallel and Distributed Processing Symposium (IPDPS) (2006)

  30. Silva, G.P., Bentes, C., Silva, V., Guedes, S.: Arquitetura e avaliacao do cluster de alto desempenho netuno. In: X Simposio em Sistemas Computacionais—WSCAD-SSC 2009, pp. 1–10. Sao Paulo, Brazil (2009)

  31. Sterling T., Becker D.J., Salmon J., Savarese D.F.: How to Build a Beowulf: A Guide to the Implementation and Application of PC Clusters. MIT Press, Cambridge, MA (1999)

    Google Scholar 

  32. Szalay, A.S, Bell, G., Vandenberg, J., Wonders, A., Burns, R., Fay, D., Heasley, J., Hey, T., Nieto-SantiSteban, M., Thakar, A. (2009) In: van Ingen, C., Wilton, R. (eds.) Graywulf: Scalable clustered architecture for data intensive computing. Proceedings of the 42nd Hawaii International Conference on System Sciences, HICSS ’09, pp. 1–10. IEEE Computer Society, Washington, DC, USA (2009)

  33. Tamaoki, J.N., Bonatti, J.P., Panetta, J., Tomita, S.: Parallelizing cptecs general circulation model. In: Proceedings of the 11th Symposium on Computer Architecture and High Performance Computing SBAC-PAD, pp. 93–100 (1999)

  34. Vetter J.S., Alam S.R., Dunigan T.H., Fahey M.R., Roth P.C., Worley P.H.: Early evaluation of the cray xt3. Parallel Distrib. Process. Symp. Int. 0, 43 (2006)

    Google Scholar 

  35. Vinter, B.: Design and implementation of a 512 cpu cluster for general purpose supercomputing. In: Peters, F.J., Joubert, G.R., Nagel, W.E., Walter, W.V. (eds.) Parallel Computing—Software Technology, Algorithms, Architectures and Applications, vol. 13 of Advances in Parallel Computing, pp. 871–877. North-Holland (2004)

  36. Worley, P., Barrett, R., Kuehn, J.: Early evaluation of the cray xt5. In: Proceedings of the 51st Cray User Group Conference, pp. 1–12 (2009)

  37. Yu, H., Sahoo, R.K., Howson, C., Almsi, G., Castanos, J.G., Gupta, M., Moreira, J.E., Parker, J.J.: High performance file i/o for the blue gene/l supercomputer. In: The Twelfth International Symposium on High-Performance Computer Architecture, pp. 187–196 (2006)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Cristiana Bentes.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Silva, G.P., Correa, J., Bentes, C. et al. The Experience in Designing and Evaluating the High Performance Cluster Netuno. Int J Parallel Prog 42, 265–286 (2014). https://doi.org/10.1007/s10766-012-0224-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10766-012-0224-7

Keywords

Navigation