Skip to main content
Log in

RecFlow: SDN-based receiver-driven flow scheduling in datacenters

  • Published:
Cluster Computing Aims and scope Submit manuscript

Abstract

Datacenter applications (e.g., web search, recommendation systems, and social networking) are designed to have a high fanout for the purpose of achieving scalable performance. Frequent fabric congestion (e.g., due to incast, imperfect hashing) is a corollary of such a design. This is true even when the network utilization is low. Such fabric congestion exhibits both temporal as well as spatial (intra-rack and inter-rack) variations. There exist two basic design paradigms which are used to address this issue. Current solutions lie somewhere between the two. On one hand we have arbiter based approaches where senders poll a centralized arbiter and collectively obey global scheduling decisions. On the other end of the spectrum, we have self adjusting end point based approaches where senders independently adjust transmission rate based on network congestion. The former incurs greater overhead, compared to the latter which trades off complexity for sub-optimality. Our work seeks a middle ground - optimality of arbiter based approaches with the simplicity of self adjusting end point based approaches. Our key design principle is that since the receiver has complete information regarding the flows destined for it, rather than having a centralized arbiter schedule flows or the senders making independent scheduling decisions, the receiver can orchestrate the various flows destined for it. Since multiple receivers may be using a bottleneck link, datapath visibility should be used to ensure fair sharing of the bottleneck capacity between receivers with minimum overhead. We propose RecFlow, which is a receiver-based proactive congestion control scheme. RecFlow employs OpenFlow provided path visibility to track changing bottlenecks on the fly. It spaces TCP acknowledgements to prevent traffic bursts and ensure that no receiver exceeds its fair share of the bottleneck capacity. The goal is to reduce buffer overflows while maintaining fairness among flows and high link utilization. Using extensive simulation results and real testbed evaluation, we show that compared to the state-of-the-art, RecFlow achieves up to 6× improvement in the inter-rack scenario and 1.5× in the intra-rack scenario while sharing the link capacity fairly between all flows.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16

Similar content being viewed by others

Notes

  1. These controllers run in slave mode to ensure that they are read-only.

  2. Note that 10 Gbps and 40 Gbps links are common in DCs [2, 5].

References

  1. Abdelmoniem, A.M., Bensaou, B., Abu, A.J.: Sicc: Sdn-based incast congestion control for data centers. In: 2017 IEEE International Conference on Communications (ICC), pp. 1–6 (2017). https://doi.org/10.1109/ICC.2017.7996826

  2. Abts, D., Felderman, B.: A guided tour of data-center networking. Commun. ACM 55(6), 44–51 (2012). https://doi.org/10.1145/2184319.2184335

    Article  Google Scholar 

  3. Akidau, T., Balikov, A., Bekiroğlu, K., Chernyak, S., Haberman, J., Lax, R., McVeety, S., Mills, D., Nordstrom, P., Whittle, S.: Millwheel: fault-tolerant stream processing at internet scale. Proc. VLDB Endow. 6(11), 1033–1044 (2013). https://doi.org/10.14778/2536222.2536229

    Article  Google Scholar 

  4. Alizadeh, M., Edsall, T., Dharmapurikar, S., Vaidyanathan, R., Chu, K., Fingerhut, A., Lam, V.T., Matus, F., Pan, R., Yadav, N., Varghese, G.: Conga: distributed congestion-aware load balancing for datacenters, pp. 503–514. ACM, New York, NY, USA (2014). https://doi.org/10.1145/2740070.2626316

    Book  Google Scholar 

  5. Alizadeh, M., Greenberg, A., Maltz, D.A., Padhye, J., Patel, P., Prabhakar, B., Sengupta, S., Sridharan, M.: Data center tcp (dctcp). In: Proceedings of the ACM SIGCOMM 2010 Conference, SIGCOMM ’10, pp. 63–74. ACM, New York, NY, USA (2010). https://doi.org/10.1145/1851182.1851192

  6. Ananthanarayanan, G., Kandula, S., Greenberg, A., Stoica, I., Lu, Y., Saha, B., Harris, E.: Reining in the outliers in map-reduce clusters using mantri. In: Proceedings of the 9th USENIX Conference on Operating Systems Design and Implementation, OSDI’10, pp. 265–278. USENIX Association, Berkeley, CA, USA (2010). http://dl.acm.org/citation.cfm?id=1924943.1924962

  7. Bai, W., Chen, K., Wu, H., Lan, W., Zhao, Y.: PAC: Taming TCP incast congestion using proactive ACK control. In: Proceedings of the 2014 IEEE 22nd International Conference on Network Protocols, ICNP ’14, pp. 385–396. IEEE Computer Society, Washington, DC, USA (2014). https://doi.org/10.1109/ICNP.2014.62

  8. Bai, W., Chen, L., Chen, K., Wu, H.: Enabling ECN in multi-service multi-queue data centers. In: 13th USENIX Symposium on Networked Systems Design and Implementation (NSDI 16), pp. 537–549. USENIX Association, Santa Clara, CA (2016). https://www.usenix.org/conference/nsdi16/technical-sessions/presentation/bai

  9. Cheng, P., Ren, F., Shu, R., Lin, C.: Catch the whole lot in an action: rapid precise packet loss notification in data centers. In: Proceedings of the 11th USENIX Conference on Networked Systems Design and Implementation, NSDI’14, pp. 17–28. USENIX Association, Berkeley, CA, USA (2014). http://dl.acm.org/citation.cfm?id=2616448.2616451

  10. Dalton, M., Schultz, D., Adriaens, J., Arefin, A., Gupta, A., Fahs, B., Rubinstein, D., Zermeno, E.C., Rubow, E., Docauer, J.A., Alpert, J., Ai, J., Olson, J., DeCabooter, K., de Kruijf, M., Hua, N., Lewis, N., Kasinadhuni, N., Crepaldi, R., Krishnan, S., Venkata, S., Richter, Y., Naik, U., Vahdat, A.: Andromeda: performance, isolation, and velocity at scale in cloud network virtualization. In: 15th USENIX Symposium on Networked Systems Design and Implementation (NSDI 18), pp. 373–387. USENIX Association, Renton, WA (2018). https://www.usenix.org/conference/nsdi18/presentation/dalton

  11. Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)

    Article  Google Scholar 

  12. Emmerich, P., Raumer, D., Wohlfart, F., Carle, G.: Performance characteristics of virtual switching. In: 2014 IEEE 3rd International Conference on Cloud Networking (CloudNet), pp. 120–125 (2014). https://doi.org/10.1109/CloudNet.2014.6968979

  13. Facebook: Newsroom (2017). http://newsroom.fb.com/company-info

  14. Ghobadi, M., Yeganeh, S.H., Ganjali, Y.: Rethinking end-to-end congestion control in software-defined networks. In: Proceedings of the 11th ACM Workshop on Hot Topics in Networks, HotNets-XI, pp. 61–66. ACM, New York, NY, USA (2012). https://doi.org/10.1145/2390231.2390242

  15. Hafeez, U.U., Kashaf, A., u. a. Bajwa, Q., Mushtaq, A., Zaidi, H., Qazi, I.A., Uzmi, Z.A.: Mitigating datacenter incast congestion using rto randomization. In: 2015 IEEE Global Communications Conference (GLOBECOM), pp. 1–6 (2015). https://doi.org/10.1109/GLOCOM.2015.7417797

  16. He, K., Rozner, E., Agarwal, K., Felter, W., Carter, J., Akella, A.: Presto: Edge-based load balancing for fast datacenter networks. In: Proceedings of the 2015 ACM Conference on Special Interest Group on Data Communication, SIGCOMM ’15, pp. 465–478. ACM, New York, NY, USA (2015). https://doi.org/10.1145/2785956.2787507

  17. Hoff, T.: Latency is everywhere and it costs you sales how to crush it (2009). https://www.highscalability.com/blog/2009/7/25/latency-is-everywhere-and-it-costs-you-sales-how-to-crush-it.html

  18. Hwang, J., Yoo, J., Choi, N.: Deadline and incast aware TCP for cloud data center networks. Comput. Netw. 68(Supplement C), 20–34 (2014). https://doi.org/10.1016/j.comnet.2013.12.002

    Article  Google Scholar 

  19. Jouet, S., Pezaros, D.P.: Measurement-based tcp parameter tuning in cloud data centers. In: 2013 21st IEEE International Conference on Network Protocols (ICNP), pp. 1–3 (2013). https://doi.org/10.1109/ICNP.2013.6733644

  20. Khan, A.Z., Qazi, I.A.: Receiver-driven flow scheduling for commodity datacenters. In: 2017 IEEE International Conference on Communications (ICC), pp. 1–6 (2017). https://doi.org/10.1109/ICC.2017.7996676

  21. Knowledge, D.C.: The facebook data center faq (2010). http://www.datacenterknowledge.com/the-facebook-data-center-faq-page-2/

  22. Krevat, E., Vasudevan, V., Phanishayee, A., Andersen, D.G., Ganger, G.R., Gibson, G.A., Seshan, S.: On application-level approaches to avoiding tcp throughput collapse in cluster-based storage systems. In: Proceedings of the 2nd International Workshop on Petascale Data Storage: Held in Conjunction with Supercomputing ’07, PDSW ’07, pp. 1–4. ACM, New York, NY, USA (2007). https://doi.org/10.1145/1374596.1374598

  23. Kulkarni, S., Agrawal, P.: A probabilistic approach to address tcp incast in data center networks. In: 2011 31st International Conference on Distributed Computing Systems Workshops, pp. 26–33 (2011). https://doi.org/10.1109/ICDCSW.2011.41

  24. Lu, Y., Zhu, S.: SDN-based TCP congestion control in data center networks. In: Proceedings of the 2015 IEEE 34th International Performance Computing and Communications Conference (IPCCC), IPCCC ’15, pp. 1–7. IEEE Computer Society, Washington, DC, USA (2015). https://doi.org/10.1109/PCCC.2015.7410275

  25. Luo, T., Tan, H.P., Quan, P.C., Law, Y.W., Jin, J.: Enhancing responsiveness and scalability for openflow networks via control-message quenching. In: 2012 International Conference on ICT Convergence (ICTC), pp. 348–353 (2012). https://doi.org/10.1109/ICTC.2012.6386857

  26. Miller, R.: Google uses about 900,000 servers (2011). www.datacenterknowledge.com/archives/2011/08/01/report-google-uses-about-900000-servers

  27. Nishtala, R., Fugal, H., Grimm, S., Kwiatkowski, M., Lee, H., Li, H.C., McElroy, R., Paleczny, M., Peek, D., Saab, P., Stafford, D., Tung, T., Venkataramani, V.: Scaling memcache at facebook. In: Proceedings of the 10th USENIX Conference on Networked Systems Design and Implementation, nsdi’13, pp. 385–398. USENIX Association, Berkeley, CA, USA (2013). http://dl.acm.org/citation.cfm?id=2482626.2482663

  28. ns-2 Network Simulator. https://www.isi.edu/nsnam/ns/

  29. OpenFlow: https://www.opennetworking.org/sdn-resources/openflow

  30. Peng, D., Dabek, F.: Large-scale incremental processing using distributed transactions and notifications. In: Proceedings of the 9th USENIX Conference on Operating Systems Design and Implementation, OSDI’10, pp. 251–264. USENIX Association, Berkeley, CA, USA (2010). http://dl.acm.org/citation.cfm?id=1924943.1924961

  31. Perlin, M.: Downtime, outages and failures—understanding their true costs (2012). https://www.evolven.com/blog/downtime-outages-and-failures-understanding-their-true-costs.html

  32. Perry, J., Ousterhout, A., Balakrishnan, H., Shah, D., Fugal, H.: Fastpass: A centralized “zero-queue” datacenter network. In: Proceedings of the 2014 ACM Conference on SIGCOMM, SIGCOMM ’14, pp. 307–318. ACM, New York, NY, USA (2014). https://doi.org/10.1145/2619239.2626309

  33. Pfaff, B., Pettit, J., Koponen, T., Jackson, E.J., Zhou, A., Rajahalme, J., Gross, J., Wang, A., Stringer, J., Shelar, P., Amidon, K., Casado, M.: The design and implementation of open vswitch. In: Proceedings of the 12th USENIX Conference on Networked Systems Design and Implementation, NSDI’15, pp. 117–130. USENIX Association, Berkeley, CA, USA (2015). http://dl.acm.org/citation.cfm?id=2789770.2789779

  34. Phanishayee, A., Krevat, E., Vasudevan, V., Andersen, D.G., Ganger, G.R., Gibson, G.A., Seshan, S.: Measurement and analysis of tcp throughput collapse in cluster-based storage systems. In: Proceedings of the 6th USENIX Conference on File and Storage Technologies, FAST’08, pp. 12:1–12:14. USENIX Association, Berkeley, CA, USA (2008). http://dl.acm.org/citation.cfm?id=1364813.1364825

  35. Pirzada, H.A., Mahboob, M.R., Qazi, I.A.: esdn: Rethinking datacenter transports using end-host sdn controllers. In: Proceedings of the 2015 ACM Conference on Special Interest Group on Data Communication, SIGCOMM ’15, pp. 605–606. ACM, New York, NY, USA (2015). https://doi.org/10.1145/2785956.2790022

  36. Ryu: (2017). http://osrg.github.io/ryu/

  37. Rotsos, C., Sarrar, N., Uhlig, S., Sherwood, R., Moore, A.W.: Oflops: An open framework for openflow switch evaluation. In: Proceedings of the 13th International Conference on Passive and Active Measurement, PAM’12, pp. 85–95. Springer, Berlin (2012)

    Chapter  Google Scholar 

  38. Roy, A., Zeng, H., Bagga, J., Porter, G., Snoeren, A.C.: Inside the social network’s (datacenter) network, pp. 123–137. ACM, New York (2015). https://doi.org/10.1145/2829988.2787472

    Book  Google Scholar 

  39. Singh, A., Ong, J., Agarwal, A., Anderson, G., Armistead, A., Bannon, R., Boving, S., Desai, G., Felderman, B., Germano, P., Kanagala, A., Provost, J., Simmons, J., Tanda, E., Wanderer, J., Hölzle, U., Stuart, S., Vahdat, A.: Jupiter rising: a decade of clos topologies and centralized control in google’s datacenter network, pp. 183–197. ACM, New York, NY, USA (2015). https://doi.org/10.1145/2829988.2787508

    Book  Google Scholar 

  40. Sreekumari, P., Jung, Ji, Lee, M.: A simple and efficient approach for reducing tcp timeouts due to lack of duplicate acknowledgments in data center networks. Clust. Comput. 19(2), 633–645 (2016). https://doi.org/10.1007/s10586-016-0555-z

    Article  Google Scholar 

  41. Stats, I.L.: Google search statistics (2017). http://www.internetlivestats.com/google-search-statistics

  42. The Open Networking Foundation: OpenFlow Switch Specification (2012)

  43. The Open Networking Foundation: OpenFlow and SDN State of the Union (2016)

  44. Tootoonchian, A., Gorbunov, S., Ganjali, Y., Casado, M., Sherwood, R.: On controller performance in software-defined networks. In: Proceedings of the 2Nd USENIX Conference on Hot Topics in Management of Internet, Cloud, and Enterprise Networks and Services, Hot-ICE’12, pp. 10–10. USENIX Association, Berkeley, CA, USA (2012). http://dl.acm.org/citation.cfm?id=2228283.2228297

  45. Vasudevan, V., Phanishayee, A., Shah, H., Krevat, E., Andersen, D.G., Ganger, G.R., Gibson, G.A., Mueller, B.: Safe and effective fine-grained tcp retransmissions for datacenter communication. In: Proceedings of the ACM SIGCOMM 2009 Conference on Data Communication, SIGCOMM ’09, pp. 303–314. ACM, New York, NY, USA (2009). https://doi.org/10.1145/1592568.1592604

  46. Wu, H., Feng, Z., Guo, C., Zhang, Y.: Incast congestion control for TCP in data-center networks. IEEE/ACM Trans. Netw. 21, 345–358 (2013). https://doi.org/10.1109/TNET.2012.2197411

    Article  Google Scholar 

  47. Zhang, J., Ren, F., Lin, C.: Modeling and understanding TCP incast in data center networks. In: 2011 Proceedings IEEE INFOCOM, pp. 1377–1385 (2011). https://doi.org/10.1109/INFCOM.2011.5934923

  48. Zhang, J., Ren, F., Lin, C.: Survey on transport control in data center networks. IEEE Netw. 27(4), 22–26 (2013). https://doi.org/10.1109/MNET.2013.6574661

    Article  Google Scholar 

  49. Zhang, J., Ren, F., Tang, L., Lin, C.: Taming tcp incast throughput collapse in data center networks. In: 2013 21st IEEE International Conference on Network Protocols (ICNP), pp. 1–10 (2013). https://doi.org/10.1109/ICNP.2013.6733609

  50. Zheng, H., Chen, C., Qiao, C.: Understanding the impact of removing tcp binary exponential backoff in data centers. In: 2011 Third International Conference on Communications and Mobile Computing, pp. 174–177 (2011). https://doi.org/10.1109/CMC.2011.85

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Aadil Zia Khan.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Khan, A.Z., Qazi, I.A. RecFlow: SDN-based receiver-driven flow scheduling in datacenters. Cluster Comput 23, 289–306 (2020). https://doi.org/10.1007/s10586-019-02922-4

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10586-019-02922-4

Keywords

Navigation