Skip to main content
Log in

Labeled Network Stack: A High-Concurrency and Low-Tail Latency Cloud Server Framework for Massive IoT Devices

  • Regular Paper
  • Published:
Journal of Computer Science and Technology Aims and scope Submit manuscript

Abstract

Internet of Things (IoT) applications have massive client connections to cloud servers, and the number of networked IoT devices is remarkably increasing. IoT services require both low-tail latency and high concurrency in datacenters. This study aims to determine whether an order of magnitude improvement is possible in tail latency and concurrency in mainstream systems by proposing a hardware–software codesigned labeled network stack (LNS) for future datacenters. The key innovation is a cross-layered payload labeling mechanism that distinguishes different requests by payload across the full network stack, including application, TCP/IP, and Ethernet layers. This type of design enables prioritized data packet processing and forwarding along the full datapath, such that latency-insensitive requests cannot significantly interfere with high-priority requests. We build a prototype datacenter server to evaluate the LNS design against a commercial X86 server and the mTCP research, using a cloud-supported IoT application scenario. Experimental results show that the LNS design can provide an order of magnitude improvement in tail latency and concurrency. A single datacenter server node can support over 2 million concurrent long-living connections for IoT devices as a 99-percentile tail latency of 50 ms is maintained. In addition, the hardware–software codesign approach remarkably reduces the labeling and prioritization overhead and constrains the interference of high-priority requests to low-priority requests.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Gubbi J, Buyya R, Marusic S et al. Internet of Things (IoT): A vision, architectural elements, and future directions. Future Generation Computer Systems, 2013, 29(7): 1645-1660.

    Article  Google Scholar 

  2. Botta A, De Donato W, Persico V et al. Integration of cloud computing and Internet of Things: A survey. Future Generation Computer Systems, 2016, 56: 684-700.

    Article  Google Scholar 

  3. Mohammadi M, Al-Fuqaha A, Sorour S et al. Deep learning for IoT big data and streaming analytics: A survey. IEEE Communications Surveys & Tutorials, 2018, 20(4): 2923-2960.

    Article  Google Scholar 

  4. Dean J, Barroso L A. The tail at scale. Communications of the ACM, 2013, 56(2): 74-80.

    Article  Google Scholar 

  5. Zats D, Das T, Mohan P, Borthakur D, Katz R. DeTail: Reducing the flow completion time tail in datacenter networks. ACM SIGCOMM Comput. Commun. Rev., 2012, 42: 139-150.

    Article  Google Scholar 

  6. Li J, Sharma N K, Ports D R et al. Tales of the tail: Hardware, OS, and application-level sources of tail latency. In Proc. the ACM Symposium on Cloud Computing, November 2014, Article No. 9.

  7. Liu H. A measurement study of server utilization in public clouds. In Proc. the 9th IEEE International Conference on Dependable, Autonomic and Secure Computing, December 2011, pp.435-442.

  8. Thekkath C A, Nguyen T D, Moy E et al. Implementing network protocols at user level. IEEE/ACM Transactions on Networking, 1993, 1(5): 554-565.

    Article  Google Scholar 

  9. ZhangW, Liu K, Song H et al. Labeled network stack: A co-designed stack for low tail-latency and high concurrency in datacenter services. In Proc. the 15th IFIP WG 10.3 International Conference on Network and Parallel Computing, November 2018, pp.132-136.

    Google Scholar 

  10. Wu W, Feng X, Zhang W, Chen M. MCC: A predictable and scalable massive client load generator. In Proc. the 2019 BenchCouncil International Symposium on Benchmarking, Measuring and Optimizing, Nov. 2019.

  11. Song H, Zhang W, Liu K et al. HCMonitor: An accurate measurement system for high concurrent network services. In Proc. the 2019 IEEE International Conference on Networking, Architecture and Storage, August 2019, Article No. 2.

  12. Xu Z W, Li C D. Low-entropy cloud computing systems. SCIENTIA SINICA Informationis, 2017, 47(9): 1149-1163.

    Article  MathSciNet  Google Scholar 

  13. Nowlan M F, Tiwari N, Iyengar J et al. Fitting square pegs through round pipes: Unordered delivery wire-compatible with TCP and TLS. In Proc. the 9th USENIX Symposium on Networked Systems Design and Implementation, April 2012, pp.383-398.

  14. Moritz P, Nishihara R, Wang S et al. Ray: A distributed framework for emerging AI applications. In Proc. the 13th USENIX Symposium on Operating Systems Design and Implementation, October 2018, pp.561-577.

  15. Nguyen M, Li Z, Duan F et al. The tail at scale: How to predict it? In Proc. the 8th USENIX Workshop on Hot Topics in Cloud Computing, June 2016, Article No. 17.

  16. Delimitrou C, Kozyrakis C. Amdahl’s law for tail latency. Communications of the ACM, 2018, 61(8): 65-72.

    Article  Google Scholar 

  17. Xu Y, Musgrave Z, Noble B et al. Bobtail: Avoiding long tails in the cloud. In Proc. the 10th USENIX Symposium on Networked Systems Design & Implementation, April 2013, pp.329-342.

  18. Lai Z, Cui Y, Li M et al. TailCutter: Wisely cutting tail latency in cloud CDN under cost constraints. In Proc. the 35th Annual IEEE International Conference on Computer Communications, April 2016.

  19. Suresh L, Canini M, Schmid S et al. C3: Cutting tail latency in cloud data stores via adaptive replica selection. In Proc. the 12th USENIX Conference on Networked Systems Design & Implementation, May 2015, pp.513-527.

  20. Kasture H, Sanchez D. Tailbench: A benchmark suite and evaluation methodology for latency-critical applications. In Proc. the 2016 IEEE International Symposium on Workload Characterization, September 2016, pp.3-12.

  21. Cerrato I, Annarumma M, Risso F. Supporting fine-grained network functions through Intel DPDK. In Proc. the 3rd European Workshop on Software Defined Networks, September 2014, pp.1-6.

  22. Shanmugalingam S, Ksentini A, Bertin P. DPDK Open vSwitch performance validation with mirroring feature. In Proc. the 23rd International Conference on Telecommunications, May 2016, Article No. 45.

  23. Marinos I, Watson R N M, Handley M. Network stack specialization for performance. ACM SIGCOMM Computer Communication Review, 2014, 44(4): 175-186.

    Article  Google Scholar 

  24. Ousterhout A, Fried J, Behrens J et al. Shenango: Achieving high CPU efficiency for latency-sensitive datacenter workloads. In Proc. the 16th USENIX Symposium on Networked Systems Design and Implementation, February 2019, pp.361-378.

  25. Kaffes K, Chong T, Humphries J T et al. Shinjuku: Preemptive scheduling for μ second-scale tail latency. In Proc. the 16th USENIX Symposium on Networked Systems Design and Implementation, February 2019, pp.345-360.

  26. Jeong E, Woo S, Jamshed M, Jeong H, Ihm S, Han D, Park K. mTCP: A highly scalable user-level TCP stack for multicore systems. In Proc. the 11th USENIX Symposium on Networked Systems Design and Implementation, April 2014, pp.489-502.

  27. Belay A, Prekas G, Klimovic A et al. IX: A protected data plane operating system for high throughput and low latency. In Proc. the 11th USENIX Symposium on Operating Systems Design and Implementation, Oct. 2014, pp.49-65.

  28. Dragojevic A, Narayanan D, Hodson O, Castro M. FaRM: Fast remote memory. In Proc. the 11th Symposium on Networked Systems Design and Implementation, April 2014, pp.401-414.

  29. Jose J, Subramoni H, Luo M et al. Memcached design on high performance RDMA capable interconnects. In Proc. the 2011 International Conference on Parallel Processing, September 2011, pp.743-752.

  30. Mitchell C, Geng Y, Li J. Using one-sided RDMA reads to build a fast, CPU-efficient key value store. In Proc. the 2013 USENIX Annual Technical Conference, June 2013, pp.103-114.

  31. Ongaro D, Rumble S M, Stutsman R, Ousterhout J K, Rosenblum M. Fast crash recovery in RAMCloud. In Proc. the 23rd ACM Symposium on Operating Systems Principles, October 2011, pp.29-41.

  32. Nishtala R, Fugal H, Grimm S et al. Scaling Memcache at Facebook. In Proc. the 10th Symposium on Networked Systems Design and Implementation, April 2013, pp.385-398.

  33. Han S, Marshall S, Chun B G, Ratnasamy S. MegaPipe: A new programming interface for scalable network I/O. In Proc. the 10th USENIX Symposium on Operating System Design and Implementation, October 2012, pp.135-148.

  34. Bao Y G, Wang S. Labeled von Neumann architecture for software-defined cloud. J. Comput. Sci. Technol., 2017, 32(2): 219-223.

    Article  MathSciNet  Google Scholar 

  35. Ma J, Sui X, Sun N H et al. Supporting differentiated services in computers via programmable architecture for resourcing-on-demand (PARD). In Proc. the 20th International Conference on Architectural Support for Programming Languages and Operating Systems, March 2015, pp.131-143.

  36. Marian T, Lee K S, Weatherspoon H. NetSlices: Scalable multi-core packet processing in user-space. In Proc. the 8th ACM/IEEE Symposium on Architectures for Networking and Communication Systems, October 2012, pp.27-38.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Wen-Li Zhang.

Electronic supplementary material

ESM 1

(PDF 761 kb)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, WL., Liu, K., Shen, YF. et al. Labeled Network Stack: A High-Concurrency and Low-Tail Latency Cloud Server Framework for Massive IoT Devices. J. Comput. Sci. Technol. 35, 179–193 (2020). https://doi.org/10.1007/s11390-020-9651-x

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11390-020-9651-x

Keywords

Navigation