ABSTRACT
RDMA communication in virtual private cloud (VPC) networks is still a challenging job due to the difficulty in fulfilling all virtualization requirements without sacrificing RDMA communication performance. To address this problem, this paper proposes a software-defined solution, namely, MasQ, which is short for "queue masquerade". The core insight of MasQ is that all RDMA communications should associate with at least one queue pair (QP). Thus, the requirements of virtualization, such as network isolation and the application of security rules, can be easily fulfilled if QP's behavior is properly defined. In particular, MasQ exploits the virtio-based paravirtualization technique to realize the control path. Moreover, to avoid performance overhead, MasQ leaves all data path operations, such as sending and receiving, to the hardware. We have implemented MasQ in the OpenFabrics Enterprise Distribution (OFED) framework and proved its scalability and performance efficiency by evaluating it against typical applications. The results demonstrate that MasQ achieves almost the same performance as bare-metal RDMA for data communication.
Supplemental Material
- 2019. Docker. https://www.docker.com/. (2019).Google Scholar
- 2019. Ftrace. https://www.kernel.org/doc/Documentation/trace/ftrace.txt. (2019).Google Scholar
- 2019. Graph 500. https://graph500.org/. (2019).Google Scholar
- 2019. High-Performance Big Data. http://hibd.cse.ohio-state.edu/#spark. (2019).Google Scholar
- 2019. HowTo Configure QoS over SR-IOV. https://community.mellanox.com/s/article/howto-configure-qos-over-sr-iov. (2019).Google Scholar
- 2019. Mellanox ConnectX-6 Dx. https://www.mellanox.com/products/ethernet-adapter-ic/connectx-6-dx-ic. (2019).Google Scholar
- 2019. Mellanox VMA. https://github.com/Mellanox/libvma. (2019).Google Scholar
- 2019. Monitoring Spark applications. https://spark.apache.org/docs/latest/monito-ring.html. (2019).Google Scholar
- 2019. MVAPICH: MPI over InfiniBand, Omni-Path, Ethernet/iWARP, and RoCE. http://mvapich.cse.ohio-state.edu/. (2019).Google Scholar
- 2019. Open vSwitch. https://www.openvswitch.org/. (2019).Google Scholar
- 2019. OSU HiBD Benchmarks. http://hibd.cse.ohio-state.edu/#microbenchmarks. (2019).Google Scholar
- 2019. Perftest package. https://community.mellanox.com/docs/DOC-2802. (2019).Google Scholar
- 2019. RDMA-bench. https://github.com/efficient/rdma_bench. (2019).Google Scholar
- 2019. Runtime options with Memory, CPUs, and GPUs. https://docs.docker.com/config/containers/resource_constraints/. (2019).Google Scholar
- 2019. Weave Net. https://www.weave.works/. (2019).Google Scholar
- Fabrice Bellard. 2005. QEMU, a Fast and Portable Dynamic Translator. In Proceedings of the Annual Conference on USENIX Annual Technical Conference (ATEC '05). USENIX Association, Berkeley, CA, USA, 41--41.Google ScholarDigital Library
- Youmin Chen, Youyou Lu, and Jiwu Shu. 2019. Scalable RDMA RPC on Reliable Connection with Efficient Resource Sharing. In Proceedings of the Fourteenth EuroSys Conference 2019 (EuroSys '19). Association for Computing Machinery, New York, NY, USA, Article 19, 14pages. https://doi.org/10.1145/3302424.3303968Google ScholarDigital Library
- Inho Cho, Keon Jang, and Dongsu Han. 2017. Credit-Scheduled Delay-Bounded Congestion Control for Datacenters. In Proceedings of the Conference of the ACM Special Interest Group on Data Communication (SIGCOMM '17). ACM, New York, NY, USA, 239--252. https://doi.org/10.1145/3098822.3098840Google ScholarDigital Library
- Andrew R Curtis, Jeffrey C Mogul, Jean Tourrilhes, Praveen Yalagandula, Puneet Sharma, and Sujata Banerjee. 2011. DevoFlow: Scaling flow management for high-performance networks. In Proceedings of the ACM SIGCOMM 2011 conference. 254--265.Google ScholarDigital Library
- Aleksandar Dragojević, Dushyanth Narayanan, Miguel Castro, and Orion Hodson. 2014. FaRM: Fast Remote Memory. In 11th USENIX Symposium on Networked Systems Design and Implementation (NSDI 14). USENIX Association, Seattle, WA, 401--414.Google ScholarDigital Library
- Shiqing Fan, Fang Chen, Holm Rauchfuss, Nadav Har'El, Uwe Schilling, and Nico Struckmann. 2017. Towards a Lightweight RDMA Para-Virtualization for HPC. In Proceedings of the Joined Workshops COSH 2017 and VisorHPC 2017.Google Scholar
- Daniel Firestone, Andrew Putnam, Sambhrama Mundkur, Derek Chiou, Alireza Dabagh, Mike Andrewartha, Hari Angepat, Vivek Bhanu, Adrian Caulfield, Eric Chung, et al. 2018. Azure Accelerated Networking: SmartNICs in the Public Cloud. In 15th USENIX Symposium on Networked Systems Design and Implementation (NSDI 18), Renton, WA.Google ScholarDigital Library
- Mark Handley, Costin Raiciu, Alexandru Agache, Andrei Voinescu, Andrew W. Moore, Gianni Antichi, and Marcin Wójcik. 2017. Re-architecting Datacenter Networks and Stacks for Low Latency and High Performance. In Proceedings of the Conference of the ACM Special Interest Group on Data Communication (SIGCOMM '17). ACM, New York, NY, USA, 29--42. https://doi.org/10.1145/3098822.3098825Google ScholarDigital Library
- InfiniBand Trade Association 2010. InfiniBand Architecture Specification Release 1.2.1 Annex A16: RoCE. InfiniBand Trade Association. Rev.1.2.1.Google Scholar
- InfiniBand Trade Association 2014. InfiniBand Architecture Specification Release 1.2.1. InfiniBand Trade Association. Rev.1.2.1.Google Scholar
- InfiniBand Trade Association 2014. InfiniBand Architecture Specification Release 1.2.1 Annex A17: RoCEv2. InfiniBand Trade Association. Rev.1.2.1.Google Scholar
- Internet Engineering Task Force 2007. A Remote Direct Memory Access Protocol Specification. Internet Engineering Task Force. RFC5040.Google Scholar
- N. S. Islam, M. W. Rahman, J. Jose, R. Rajachandrasekar, H. Wang, H. Subramoni, C. Murthy, and D. K. Panda. 2012. High Performance RDMA-based Design of HDFS over InfiniBand. In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis (SC '12). IEEE Computer Society Press, Los Alamitos, CA, USA, Article 35, 35 pages.Google Scholar
- Cheng Jin, Abhinav Srivastava, and Zhi-Li Zhang. 2016. Understanding security group usage in a public IaaS cloud. In IEEE INFOCOM 2016 - The 35th Annual IEEE International Conference on Computer Communications. 1--9.Google ScholarDigital Library
- Anuj Kalia, Michael Kaminsky, and David G. Andersen. 2014. Using RDMA Efficiently for Key-value Services. In Proceedings of the 2014 ACM Conference on SIGCOMM (SIGCOMM '14). ACM, New York, NY, USA, 295--306. https://doi.org/10.1145/2619239.2626299Google Scholar
- Anuj Kalia, Michael Kaminsky, and David G Andersen. 2016. Design Guidelines for High Performance RDMA Systems. In 2016 USENIX Annual Technical Conference (USENIX ATC 16). 437--450.Google ScholarDigital Library
- Anuj Kalia, Michael Kaminsky, and David G. Andersen. 2016. FaSST: Fast, Scalable and Simple Distributed Transactions with Two-Sided (RDMA) Datagram RPCs. In 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16). USENIX Association, Savannah, GA, 185--201.Google ScholarDigital Library
- Daehyeok Kim, Tianlong Yu, Hongqiang Harry Liu, Yibo Zhu, Jitu Padhye, Shachar Raindel, Chuanxiong Guo, Vyas Sekar, and Srinivasan Seshan. 2019. FreeFlow: Software-based Virtual RDMA Networking for Containerized Clouds. In NSDI. 113--126.Google Scholar
- Teemu Koponen, Martin Casado, Natasha Gude, Jeremy Stribling, Leon Poutievski, Min Zhu, Rajiv Ramanathan, Yuichiro Iwata, Hiroaki Inoue, Takayuki Hama, et al. 2010. Onix: A distributed control platform for large-scale production networks.Google Scholar
- Yuliang Li, Rui Miao, Hongqiang Harry Liu, Yan Zhuang, Fei Feng, Lingbo Tang, Zheng Cao, Ming Zhang, Frank Kelly, Mohammad Alizadeh, et al. 2019. HPCC: high precision congestion control. In Proceedings of the ACM Special Interest Group on Data Communication. 44--58.Google Scholar
- Fangfei Liu and Ruby B. Lee. 2014. Random Fill Cache Architecture. In Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-47). IEEE Computer Society, USA, 203--215. https://doi.org/10.1109/MICRO.2014.28Google Scholar
- M. Mahalingam, K. Duda, P. Agarwal, L. Kreeger, T. Sridhar, M. Bursell, and C. Wright. 2014. Virtual eXtensible Local Area Network (VXLAN): A Framework for Overlaying Virtualized Layer 2 Networks over Layer 3 Networks. https://www.rfc-editor.org/info/rfc7348. (August 2014).Google Scholar
- Jonas Pfefferle, Patrick Stuedi, Animesh Trivedi, Bernard Metzler, Ionnis Koltsidas, and Thomas R Gross. 2015. A hybrid I/O virtualization framework for RDMA-capable network interfaces. ACM SIGPLAN Notices 50, 7 (2015), 17--30.Google ScholarDigital Library
- Adit Ranadive and Bhavesh Davda. 2012. Toward a paravirtual vRDMA device for VMware ESXi guests. VMware Technical Journal, Winter 2012 1, 2 (2012).Google Scholar
- Rusty Russell. 2008. virtio: towards a de-facto standard for virtual I/O devices. ACM SIGOPS Operating Syst. Review (OSR (2008), 103.Google ScholarDigital Library
- Shin-Yeh Tsai, Mathias Payer, and Yiying Zhang. 2019. Pythia: Remote Oracles for the Masses. In Proceedings of the 28th USENIX Conference on Security Symposium (SEC '19). USENIX Association, USA, 693--710.Google Scholar
- Shin-Yeh Tsai and Yiying Zhang. 2017. Lite kernel rdma support for datacenter applications. In Proceedings of the 26th Symposium on Operating Systems Principles. ACM, 306--324.Google ScholarDigital Library
- Dongyang Wang, Binzhang Fu, Gang Lu, Kun Tan, and Bei Hua. 2019. VSocket: Virtual Socket Interface for RDMA in Public Clouds. In Proceedings of the 15th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments (VEE 2019). Association for Computing Machinery, New York, NY, USA, 179 C192. https://doi.org/10.1145/3313808.3313813Google ScholarDigital Library
- Zhenghong Wang and Ruby B. Lee. 2007. New Cache Designs for Thwarting Software Cache-Based Side Channel Attacks. SIGARCH Comput. Archit. News 35, 2 (June 2007), 494--505. https://doi.org/10.1145/1273440.1250723Google ScholarDigital Library
- Yibo Zhu, Haggai Eran, Daniel Firestone, Chuanxiong Guo, Marina Lipshteyn, Yehonatan Liron, Jitendra Padhye, Shachar Raindel, Mohamad Haj Yahia, and Ming Zhang. 2015. Congestion Control for Large-Scale RDMA Deployments. SIGCOMM Comput. Commun. Rev. 45, 4 (Aug. 2015), 523--536. https://doi.org/10.1145/2829988.2787484Google ScholarDigital Library
- Danyang Zhuo, Kaiyuan Zhang, Yibo Zhu, Hongqiang Harry Liu, Matthew Rockett, Arvind Krishnamurthy, and Thomas Anderson. 2019. Slim:OS kernel support for a low-overhead container overlay network. In 16th USENIX Symposium on Networked Systems Design and Implementation NSDI 19. 331--344.Google Scholar
Index Terms
- MasQ: RDMA for Virtual Private Cloud
Recommendations
NetLord: a scalable multi-tenant network architecture for virtualized datacenters
SIGCOMM '11Providers of "Infrastructure-as-a-Service" need datacenter networks that support multi-tenancy, scale, and ease of operation, at low cost. Most existing network architectures cannot meet all of these needs simultaneously.
In this paper we present ...
NetLord: a scalable multi-tenant network architecture for virtualized datacenters
SIGCOMM '11: Proceedings of the ACM SIGCOMM 2011 conferenceProviders of "Infrastructure-as-a-Service" need datacenter networks that support multi-tenancy, scale, and ease of operation, at low cost. Most existing network architectures cannot meet all of these needs simultaneously.
In this paper we present ...
Pre-Copy and post-copy VM live migration for memory intensive applications
Euro-Par'12: Proceedings of the 18th international conference on Parallel processing workshopsVirtualization technology provides a means for server consolidation, reducing the number of physical servers required for running a given workload. Virtual Machine (VM) live migration facilitates the transfer of a running (VM) between physical hosts ...
Comments