Abstract
As data processing evolves towards large scale, distributed platforms, the network will necessarily play a substantial role in achieving efficiency and performance. Modern high-speed networks such as InfiniBand, RoCE, or Omni-Path provide advanced features such as Remote-Direct-Memory-Access (RDMA) that have shown to improve the performance and scalability of distributed data processing systems. Furthermore, switches and network cards are becoming more flexible while programmability at all levels (aka, software-defined networks) opens up many possibilities to tailor the network to data processing applications and to push processing down to the network elements. In this paper, we discuss opportunities and present our recent research results to redesign scalable data management systems for the capabilities of modern networks.
Similar content being viewed by others
References
Babu S et al (2013) Massively parallel databases and mapreduce systems. Found Trend Datab 5(1):1–104
Barthels C et al (2015) Rack-scale in-memory join processing using RDMA. In: ACM SIGMOD, pp 1463–1475
Barthels C et al (2017) Distributed join algorithms on thousands of cores. Proceedings VLDB Endowment 10(5):517–528
Binnig C et al (2014) Distributed snapshot isolation: global transactions pay globally, local transactions pay locally. VLDB J 23(6):987–1011
Binnig C et al (2016) The end of slow networks: It’s time for a redesign. Proceedings VLDB Endowment 9(7):528–539
Blöcher M et al (2018) Boosting scalable data analytics with modern programmable networks. In: ACM DaMoN@SIGMOD ACM, pp 1:1–1:3
Brantner M et al (2008) Building a database on S3. In: Proc. of ACM SIGMOD, pp 251–264
Chen H et al (2017) Fast in-memory transaction processing using RDMA and HTM. ACM Trans Comput Syst 35(1):3:1–3:37
Curino C et al (2010) Schism: a workload-driven approach to database replication and partitioning. Proceedings VLDB Endowment 3(1-2):48–57
DDR3 SDRAM Standard. www.jedec.org/standards-documents/docs/jesd-79-3d. Accessed 19 Oct 2016
Devulapalli A et al (2005) Distributed queue-based locking using advanced network features. In: ICPP, pp 408–415
Dragojević A et al (2014) FaRM: Fast remote memory. In: Proc. of NSDI, pp 401–414
Dragojević A et al (2015) No compromises: distributed transactions with consistency, availability, and performance. In: Proc. of OSDI, pp 54–70
Dragojevic A et al (2017) RDMA reads: to use or not to use? IEEE Data Eng Bull 40(1):3–14
Elnikety S et al (2005) Database replication using generalized snapshot isolation. In: IEEE SRDS 2005, pp 73–84
Feldman M (2010) RoCE: an Ethernet-Infiniband love story. HPC Wire. https://www.hpcwire.com/2010/04/22/roce_an_ethernet-infiniband_love_story/
Firestone D et al (2018) Azure accelerated networking: SmartNICs in the public cloud. NSDI, 51–66. https://www.usenix.org/conference/nsdi18/presentation/firestone. Accessed: 13 Apr 2018
Gropp W et al (2014) Using advanced MPI: modern features of the message-passing interface. MIT Press, Cambridge
Infiband Trade Association (2010) Infiniband architecture specification release 1.2.1. http://www.infinibandta.org/. Accessed 19 Oct 2016
Infiband Trade Association (2013) InfiniBand Roadmap. http://www.infinibandta.org/. Accessed 19 Oct 2016
Kalia A et al (2014) Using rdma efficiently for key-value services. In: Proc. of ACM SIGCOMM, pp 295–306
Kalia A et al (2016) FaSST: fast, scalable and simple distributed transactions with two-sided (RDMA) datagram RPCs. In: Proc. of OSDI, pp 185–201
Kraska T et al (2009) Consistency rationing in the cloud: pay only when it matters. Proceedings VLDB Endowment 2(1):253–264
Krishnaswamy V et al (1997) Relative serializability: an approach for relaxing the atomicity of transactions. J Comput Syst Sci 55(2):344–354
Levandoski JJ et al (2015) High performance transactions in deuteronomy. In: CIDR 2015, Online Proceedings
Loesing S et al (2015) On the design and scalability of distributed shared-data databases. In: ACM SIGMOD, pp 663–676
Mitchell C et al (2013) Using one-sided RDMA reads to build a fast, CPU-efficient key-value store. In: Proc. of USENIX ATC, pp 103–114
Narravula S et al (2007) High performance distributed lock management services using network-based remote atomic operations. In: IEEE CCGrid, pp 583–590
Ousterhout J et al (2011) The case for RAMCloud. Commun ACM 54(7):121–130
Pavlo A et al (2011) On predictive modeling for optimizing transaction execution in parallel oltp systems. Proceedings VLDB Endowment 5(2):85–96
Pavlo A et al (2012) Skew-aware automatic database partitioning in shared-nothing, parallel oltp systems. In: Proc. of ACM SIGMOD, pp 61–72
Polychroniou O et al (2014) Track join: distributed joins with minimal network traffic. In: ACM SIGMOD
Quamar A et al (2013) SWORD: scalable workload-aware data placement for transactional workloads. In: Proc. of EBDT, pp 430–441
Ramesh S et al (2008) Optimizing distributed joins with bloom filters. In: ICDCIT
Rödiger W et al (2014) Locality-sensitive operators for parallel main-memory database clusters. In: ICDE
Salama A et al (2017) Rethinking distributed query execution on high-speed networks. IEEE Data Eng Bull 40(1):27–37
Sapio A et al (2017) DAIET: a system for data aggregation inside the network. In: SoCC ACM, p 626
Subramoni H et al (2009) RDMA over ethernet – a preliminary study. In: CLUSTER
Yoon DY et al (2018) Distributed lock management with RDMA: decentralization without starvation. In: ACM SIGMOD, pp 1571–1586
Zamanian E et al (2017) The end of a myth: distributed transaction can scale. Proceedings VLDB Endowment 10(6):685–696
Acknowledgements
This work was funded by the German Research Foundation through a grant of the DFG Priority Program 2037 “Scalable Data Management for Future Hardware” and the Collaborative Research Center bin “Multi-Mechanisms Adaptation for the Future Internet”.
The work presented in this paper was mainly done by my fantastic PhD students and they truly deserve all the credit. Most notably, I would like to thank Andrew Crotty, Alex Galakatos, Abdallah Salama, Erfan Zamanian, and Tobias Ziegler. Furthermore, I was very lucky to have many fantastic collaborators, most notably Tim Kraska but also Ugur Cetintemel, Patrick Eugster, Rodrigo Fonseca, and Stan Zdonik.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Binnig, C. Scalable Data Management on Modern Networks. Datenbank Spektrum 18, 203–209 (2018). https://doi.org/10.1007/s13222-018-0297-6
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13222-018-0297-6