Abstract
Supporting transfers of science big data over Wide Area Networks (WANs) with Data Transfer Nodes (DTNs) requires optimizing multiple parameters within the underlying infrastructure. New solutions for such data movement require new paradigms and technologies, such as NVMe over Fabrics, which provides high-performance data movement with direct remote NVMe device access over traditional fabrics. However, recent NVMe over Fabrics studies have been limited to local storage fabrics. To support increasing demands for the large volume of science data movement during Supercomputing (SC) conferences, we proposed a SCinet DTN-as-a-Service framework orchestrating the desired optimization to meet users, applications, and providers’ requirements. Furthermore, we extend the SCinet DTN-as-a-Service framework to incorporate new techniques, solve optimization issues in data-intensive science and evaluate NVMe over Fabrics with multiple WAN testbeds to examine its performance and discover new opportunities for optimization.
Similar content being viewed by others
Data availability
SCinet DTN-as-a-Service implementation is available as a container image at https://tinyurl.com/tcu2u8v5 and the data used in this manuscript is available at https://tinyurl.com/6d8fmtbk.
References
Minturn, D.: NVM Express Over Fabrics. In: 11th Annual OpenFabrics International OFS Developers’ Workshop, Monterey, CA, USA (2015)
Yu, S., Chen, J., Mambretti, J., Yeh, F.: Analysis of CPU pinning and storage configuration in 100 Gbps network data transfer. IEEE/ACM Innovating the Network for Data-Intensive Science (INDIS), November 2018, pp. 64–74 (2018)
Yu, S., Chen, J., Yeh, F., Mambretti, J., Wang, X., Giannakou, A., Pouyoul, E., Lyonnais, M.: SCinet DTN-as-a-Service framework. In: IEEE/ACM Innovating the Network for Data-Intensive Science (INDIS) 2019, pp. 1–8 (2019)
Guz, Z., Li, H. H., Shayesteh, A., Balakrishnan, V.: VMe-over-fabrics Performance Characterization and the Path to Low-overhead Flash Disaggregation. In: Proceedings of the 10th ACM International Systems and Storage Conference, ser. SYSTOR ’17, pp. 16:1–16:9. ACM, New York (2017)
Guz, Z., Li, H.H., Shayesteh, A., Balakrishnan, V.: Performance characterization of NVMe-over-Fabrics storage disaggregation. ACM Trans. Storage 14(4), 31:1-31:18 (2018)
Lu, Q., Zhang, L., Sasidharan, S., Wu, W., DeMar, P., Guok, C., Macauley, J., Monga, I., Yu, S., Chen, J.H., Mambretti, J., Kim, J., Noh, S., Yang, X., Lehman, T., Liu, G.: BigData express: toward schedulable, predictable, and high-performance data transfer. In: IEEE/ACM Innovating the Network for Data-Intensive Science (INDIS), November 2018, pp. 75–84 (2018)
Foster, I.: Globus online: accelerating and democratizing science through cloud-based services. IEEE Internet Comput. 15(3), 70–73 (2011)
Dorigo, A., Elmer, P., Furano, F., Hanushevsky, A.: XROOTD/TXNetFile: a highly scalable architecture for data access in the ROOT environment. In: Proceedings of the 4th WSEAS International Conference on Telecommunications and Informatics, ser. TELE-INFO’05, pp. 46:1–46:6. World Scientific and Engineering Academy and Society (WSEAS), Stevens Point (2005)
Ragan-Kelley, M., Perez, F., Granger, B., Kluyver, T., Ivanov, P., Frederic, J., Bussonnier, M.: The Jupyter/IPython architecture: a unified view of computational research, from interactive exploration to communication and publication. In: AGU Fall Meeting Abstracts (2014)
Peinl, R., Holzschuher, F., Pfitzer, F.: Docker cluster management for the cloud—survey results and own solution. J. Grid Comput. 14(2), 265–282 (2016)
Allcock, W., Bresnahan, J., Kettimuthu, R., Link, M., Dumitrescu, C., Raicu, I., Foster, I.: The Globus striped GridFTP framework and server. In: Proceedings of the 2005 ACM/IEEE Conference on Supercomputing, p. 54. IEEE Computer Society, Shanghai (2005)
Dart, E., Rotman, L., Tierney, B., Hester, M., Zurawski, J.: The Science DMZ: a network design pattern for data-intensive science. In: SC - International Conference for High Performance Computing, Networking. Storage and Analysis (SC), November 2013, 1–10 (2013)
Bernstein, D.: Containers and cloud: from LXC to docker to Kubernetes. IEEE Cloud Comput. 1(3), 81–84 (2014)
Kurtzer, G.M., Sochat, V., Bauer, M.W.: Singularity: scientific containers for mobility of compute. PLoS ONE 12(5), 1–20 (2017)
Manchanda, N., Anand, K.: Non-uniform Memory Access (NUMA), vol. 4. New York University (2010)
Postel, J.: Transmission Control Protocol. RFC 793, Internet Engineering Task Force (IETF) (1981)
Cardwell, N., Cheng, Y., Gunn, C.S., Yeganeh, S.H., Jacobson, V.: BBR: congestion-based congestion control. ACM Queue 14, 20–53 (2016)
Ha, S., Rhee, I., Xu, L.: CUBIC: a new TCP-friendly high-speed TCP variant. SIGOPS Oper. Syst. Rev. 42(5), 64–74 (2008)
Kettimuthu, R., Link, S., Bresnahan, J., Link, M., Foster, I.: Globus xio pipe open driver: enabling gridftp to leverage standard unix tools. In: Proceedings of the 2011 TeraGrid Conference: Extreme Digital Discovery, p. 20. ACM, New York (2011)
Gu, Y., Grossman, R.L.: UDT: UDP-based data transfer for high-speed wide area networks. Comput. Netw. 51(7), 1777–1799 (2007)
Hightower, K., Burns, B., Beda, J.: Kubernetes: Up and Running Dive into the Future of Infrastructure, 1st edn. O’Reilly Media, Inc., Sebastopol (2017)
Tierney, B., Kissel, E., Swany, M., Pouyoul, E.: Efficient data transfer protocols for big data. In: 2012 IEEE 8th International Conference on E-Science (e-Science), pp. 1–9 (2012)
Wang, Y., Liu, K., Tian, C., Bai, B., Zhang, G.: Error recovery of RDMA packets in data center networks. In: 2019 28th International Conference on Computer Communication and Networks (ICCCN), pp. 1–8 (2019)
Shpiner, A., Zahavi, E., Dahley, O., Barnea, A., Damsker, R., Yekelis, G., Zus, M., Kuta, E., Baram, D.: RoCE Rocks without PFC: detailed evaluation. In: Proceedings of the Workshop on Kernel-Bypass Networks, ser. KBNets ’17, pp. 25–30. Association for Computing Machinery, New York (2017)
Zhu, Y., Ghobadi, M., Misra, V., Padhye, J.: ECN or delay: Lessons learnt from analysis of DCQCN and TIMEL. In: Proceedings of the 12th International on Conference on Emerging Networking EXperiments and Technologies, ser. CoNEXT ’16, pp. 313–327. Association for Computing Machinery, New York (2016)
Acknowledgements
We would like to thank our sponsors including the StarLight International/National Communications Exchange Facility, Metropolitan Research and Education Network (MREN), National Science Foundation (NSF), Northwestern University, Dell Technologies Inc., CANARIE, Ciena Inc. and more specifically Rodney Wilson who has been very valuable supporter and contributor.
Funding
This work was supported by the National Science Foundation (NSF) under Grant No. 1450871.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
Authors do not have actual or potential conflict of interest in relation to this article.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Funding Agency: NSF IRNC Grant Award 1450871.
Rights and permissions
About this article
Cite this article
Yu, Sy., Chen, J., Yeh, F. et al. Analysis of NVMe over fabrics with SCinet DTN-as-a-Service. Cluster Comput 25, 2991–3003 (2022). https://doi.org/10.1007/s10586-021-03433-x
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10586-021-03433-x