Skip to main content
Log in

Janus: a framework to boost HPC applications in the cloud based on SDN path provisioning

  • Published:
Cluster Computing Aims and scope Submit manuscript

Abstract

Data centers, clusters, and grids have historically supported High-Performance Computing (HPC) applications. Due to the high capital and operational expenditures associated with such infrastructures, we have witnessed consistent efforts to run HPC applications in the cloud in the recent past. The potential advantages of this shift include higher scalability and lower costs. If, on the one hand, app instantiation—through customized Virtual Machines (VMs)—is a well-solved issue, on the other, the network still represents a significant bottleneck. When switching HPC applications to be executed on the cloud, we lose control of where VMs will be positioned and of the paths that will be traversed for processes to communicate with one another. To bridge this gap, we present Janus, a framework for dynamic, just-in-time path provisioning in cloud infrastructures. By leveraging emerging software-defined networking principles, the framework allows for an HPC application, once deployed, to have interprocess communication paths configured upon usage based on least-used network links (instead of resorting to shortest, pre-computed paths). Janus is fully configurable to cope with different operating parameters and communication strategies, providing a rich ecosystem for application execution speed up. Through an extensive experimental evaluation, we provide evidence that the proposed framework can lead to significant gains regarding runtime. Moreover, we show what one can expect in terms of system overheads, providing essential insights on how better benefiting from Janus.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Data availability

The datasets generated during and/or analysed during the current study are available from the first author on reasonable request.

References

  1. Alsmadi, I., Khamaiseh, S., Xu, D.: Network Parallelization in HPC Clusters. In: International Conference on Computational Science and Computational Intelligence (CSCI’2016), pp. 584–589 (2016). https://doi.org/10.1109/CSCI.2016.0116

  2. Bailey, D.H.: NAS parallel benchmarks. In: Encyclopedia of Parallel Computing, pp. 1254–1259 (2011)

  3. Bera, S., Misra, S., Obaidat, M.S.: Mobi-flow: mobility-aware adaptive flow-rule placement in software-defined access network. IEEE Trans. Mob. Comput. 18(8), 1831–1842 (2019)

    Article  Google Scholar 

  4. Dagum, L., Menon, R.: Openmp: an industry-standard api for shared-memory programming. IEEE Comput. Sci. Eng. 5(1), 46–55 (1998). https://doi.org/10.1109/99.660313

    Article  Google Scholar 

  5. Evangelinos, C., Hill, C.N.: Cloud computing for parallel scientific HPC applications: feasibility of running coupled atmosphere-ocean climate models on Amazon’s EC2. In: The 1st Workshop on Cloud Computing and its Applications (CCA), pp. 2–34 (2008)

  6. Faizian, P., Mollah, M.A., Tong, Z., Yuan, X., Lang, M.: A comparative study of SDN and adaptive routing on dragonfly networks. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, p. 51. ACM (2017)

  7. Gropp, W., Lusk, E., Skjellum, A.: Using MPI: Portable Parallel Programming with the Message-Passing Interface. The MIT Press, New York (2014)

    MATH  Google Scholar 

  8. Guan, Y., Lei, W., Zhang, W., Liu, S., Li, H.: Scalable orchestration of software defined service overlay network for multipath transmission. Comput. Netw. 137, 132–146 (2018). https://doi.org/10.1016/j.comnet.2018.03.005

    Article  Google Scholar 

  9. Guillen, L., Izumi, S., Abe, T., Suganuma, T., Muraoka, H.: Sdn-based hybrid server and link load balancing in multipath distributed storage systems. In: NOMS 2018—2018 IEEE/IFIP Network Operations and Management Symposium, pp. 1–6 (2018)

  10. Guo, Z., Liu, R., Xu, Y., Gushchin, A., Walid, A., Chao, H.J.: STAR: preventing flow-table overflow in software-defined networks. Comput. Netw. 125, 15–25 (2017)

    Article  Google Scholar 

  11. Gupta, A., Faraboschi, P., Gioachin, F., Kale, L.V., Kaufmann, R., Lee, B., March, V., Milojicic, D., Suen, C.H.: Evaluating and improving the performance and scheduling of HPC applications in cloud. IEEE Trans. Cloud Comput. 4(3), 307–321 (2016). https://doi.org/10.1109/TCC.2014.2339858

    Article  Google Scholar 

  12. Hauser, C.B., Palanivel, S.R.: Dynamic network scheduler for cloud data centres with SDN. In: Proceedings of The10th International Conference on Utility and Cloud Computing, UCC ’17, p. 29-38. Association for Computing Machinery, New York, NY, USA (2017). https://doi.org/10.1145/3147213.3147217

  13. Joy, S., Nayak, A.: Improving flow completion time for short flows in datacenter networks. In: IFIP/IEEE International Symposium on Integrated Network Management (IM’2015), pp. 700–705 (2015). https://doi.org/10.1109/INM.2015.7140358

  14. Kotas, C., Naughton, T., Imam, N.: A comparison of amazon web services and microsoft azure cloud platforms for high performance computing. In: 2018 IEEE International Conference on Consumer Electronics (ICCE), pp. 1–4 (2018)

  15. Lantz, B., Heller, B., McKeown, N.: A network in a laptop: rapid prototyping for software-defined networks. In: Proceedings of the 9th ACM SIGCOMM Workshop on Hot Topics in Networks, p. 19. ACM (2010)

  16. Lee, J., Tong, Z., Achalkar, K., Yuan, X., Lang, M.: Enhancing infiniband with Openflow-style SDN capability. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC’16, pp. 36:1–36:12. IEEE Press, Piscataway, NJ, USA (2016). http://dl.acm.org/citation.cfm?id=3014904.3014953

  17. Li, C., Zhang, J., Luo, Y.: Real-time scheduling based on optimized topology and communication traffic in distributed real-time computation platform of storm. J. Netw. Comput. Appl. 87, 100–115 (2017)

    Article  Google Scholar 

  18. Mauch, V., Kunze, M., Hillenbrand, M.: High performance cloud computing. Future Gener. Comput. Syst. 29(6), 1408–1416 (2013)

    Article  Google Scholar 

  19. Milojičić, D., Llorente, I.M., Montero, R.S.: Opennebula: a cloud management tool. IEEE Internet Comput. 15(2), 11–14 (2011)

    Article  Google Scholar 

  20. Netto, M.A.S., Calheiros, R.N., Rodrigues, E.R., Cunha, R.L.F., Buyya, R.: HPC cloud for scientific and business applications: taxonomy, vision, and research challenges. ACM Comput. Surv. 51(1), 29 (2018)

    Google Scholar 

  21. Pretto, G.R., Dalmazo, B.L., Marques, J.A., Wu, Z., Wang, X., Korkhov, V., Navaux, P.O.A., Gaspary, L.P.: Boosting hpc applications in the cloud through JIT traffic-aware path provisioning. In: Misra, S., Gervasi, O., Murgante, B., Stankova, E., Korkhov, V., Torre, C., Rocha, A.M.A., Taniar, D., Apduhan, B.O., Tarantino, E. (eds.) Computational Science and Its Applications—ICCSA 2019, pp. 702–716. Springer, Cham (2019)

  22. Ramakrishnan, L., Canon, R.S., Muriki, K., Sakrejda, I., Wright, N.J.: Evaluating interconnect and virtualization performance for high performance computing. ACM SIGMETRICS Perform. Eval. Rev. 40(2), 55–60 (2012)

    Article  Google Scholar 

  23. Roloff, E., Diener, M., Diaz Carreño, E., Gaspary, L.P., Navaux, P.O.A.: Leveraging cloud heterogeneity for cost-efficient execution of parallel applications. In: Rivera, F.F., Pena, T.F., Cabaleiro, J.C. (eds.) Euro-Par 2017: Parallel Processing, pp. 399–411. Springer, Cham (2017)

  24. RYU: Ryu, a component-based software defined networking framework. (2019). Accessed 26 Jan 2019. http://osrg.github.io/ryu/

  25. Sefraoui, O., Aissaoui, M., Eleuldj, M.: Openstack: toward an open-source solution for cloud computing. Int. J. Comput. Appl. 55(3), 38–42 (2012)

    Google Scholar 

  26. Tokmakov, K., Sarker, M., Domaschka, J., Wesner, S.: A case for data centre traffic management on software programmable ethernet switches. In: 2019 IEEE 8th International Conference on Cloud Networking (CloudNet), pp. 1–6 (2019)

  27. Walker, E.: Benchmarking Amazon EC2 for Hig-Performance Scientific Computing. ;login:: The magazine of USENIX & SAGE 33(5), 18–23 (2008)

  28. Witte, P.A., Louboutin, M., Modzelewski, H., Jones, C., Selvage, J., Herrmann, F.J.: An event-driven approach to serverless seismic imaging in the cloud. IEEE Trans. Parallel Distrib. Syst. 31(9), 2032–2049 (2020)

    Article  Google Scholar 

  29. Zahid, F., Taherkordi, A., Gran, E.G., Skeie, T., Johnsen, B.D.: A self-adaptive network for hpc clouds: architecture, framework, and implementation. IEEE Trans. Parallel Distrib. Syst. 29(12), 2658–2671 (2018)

    Article  Google Scholar 

Download references

Acknowledgements

This work was carried out as part of the project CloudHPC—Harnessing Cloud Computing to Power Up HPC Applications, BRICS Pilot Call 2016. It was partially supported by the Brazilian National Council for Scientific and Technological Development (CNPq), Project Numbers 441892/2016-7, Call CNPq/MCTIC/BRICS-STI No 18/2016, the Coordination for the Improvement of Higher Education Personnel (CAPES), as well as the National Key Cooperation between the BRICS Program of China (No. 2017YE0100500) and the Beijing Natural Science Foundation of China (No. 4172033).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jonatas A. Marques.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Pretto, G.R., Dalmazo, B.L., Marques, J.A. et al. Janus: a framework to boost HPC applications in the cloud based on SDN path provisioning. Cluster Comput 25, 947–964 (2022). https://doi.org/10.1007/s10586-021-03470-6

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10586-021-03470-6

Keywords

Navigation