research-article

Distributed runtime load-balancing for software routers on homogeneous many-core processors

Authors:

Dilip Joy Mampilly,

Tilman WolfAuthors Info & Claims

PRESTO '10: Proceedings of the Workshop on Programmable Routers for Extensible Services of Tomorrow

Article No.: 1, Pages 1 - 6

https://doi.org/10.1145/1921151.1921153

Published: 30 November 2010 Publication History

Abstract

With the advent of diversifie network services and programmability deployed in the network infrastructure, the functionality of the data path in network systems has moved from "store-and-forward" toward "store-process-forward." However, the processing performance of many contemporary software routers does not scale with the increasing number of processor cores that are integrated on a chip due to software bottlenecks. To tackle one aspect of this problem, we propose a distributed algorithm that can load-balance packet processing workloads on a modern many-core architecture. The algorithm exploits parallelism and achieves load balancing by distributing processing task across different local regions of the chi. Workload distribution at chip level can be achieved with an O(n log n) time complexity and thus can scale to large configurations

References

[1]

M. B. Taylor, J. Kim, J. Miller, D. Wentzlaff, F. Ghodrat, B. Greenwald, H. Hoffman, P. Johnson, J.-W. Lee, W. Lee, A. Ma, A. Saraf, M. Seneski, N. Shnidman, V. Strumpen, M. Frank, S. Amarasinghe, and A. Agarwal, "The raw microprocessor: A computational fabric for software circuits and general-purpose programs," IEEE Micro, vol. 22, no. 2, pp. 25--35, 2002.

Digital Library

[2]

S. Bell, B. Edwards, J. Amann, R. Conlin, K. Joyce, V. Leung, J. MacKay, M. Reif, L. Bao, J. Brown, M. Mattina, C.-C. Miao, C. Ramey, D. Wentzlaff, W. Anderson, E. Berger, N. Fairbanks, D. Khan, F. Montenegro, J. Stickney, and J. Zook, "Tile64 - processor: A 64-core soc with mesh interconnect," feb. 2008. pp. 88--598.

[3]

T. Mattson, R. Van der Wijingaart, M. Riepen, T. Lehnig, P. Brett, W. Hass, P. Kennedy, J. Howard, S. Vangal, N. Borkar, G. Ruhl, and S. Dighe, "The 48-core scc processor: The programmers view," in International Conference for High Performance Computing, Networking, Storage and Analysis, New Orleans, LA, USA, Nov. 2010.

Digital Library

[4]

The Cisco Quantum Flow Processor: Cisco's Next Generation Network Processor, Cisco Systems, Inc., San Jose, CA, Feb. 2008.

[5]

Q. Wu and T. Wolf, "Design of a network service processing platform for data path customization," in Proc. of The Second ACM SIGCOMM Workshop on Programmable Routers for Extensible Service of TOmorrow (PRESTO), Barcelona, Spain, Aug. 2009.

Digital Library

[6]

K. Argyraki, S. Baset, B.-G. Chun, K. Fall, G. Iannaccone, A. Knies, E. Kohler, M. Manesh, S. Nedevschi, and S. Ratnasamy, "Can software routers scale?" in PRESTO '08: Proceedings of the ACM workshop on Programmable routers for extensible services of tomorrow. New York, NY, USA: ACM, 2008, pp. 21--26.

Digital Library

[7]

M. Dobrescu, N. Egi, K. Argyraki, B.-G. Chun, K. Fall, G. Iannaccone, A. Knies, M. Manesh, and S. Ratnasamy, "Routebricks: exploiting parallelism to scale software routers," in SOSP '09: Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles. New York, NY, USA: ACM, 2009, pp. 15--28.

Digital Library

[8]

T. Spalink, S. Karlin, L. Peterson, and Y. Gottlieb, "Building a robust software-based router using network processors," in SOSP '01: Proceedings of the eighteenth ACM symposium on Operating systems principles. New York, NY, USA: ACM, 2001, pp. 216--229.

Digital Library

[9]

S. Han, K. Jang, K. Park, and S. Moon, "Packetshader: a gpu-accelerated software router," in SIGCOMM '10: Proceedings of the ACM SIGCOMM 2010 conference on Data communication, Newdelhi, India, Sep. 2010.

Digital Library

[10]

T. Herbert, "Receive packet steering: A software solution to scaling the network receive path," in Linux Plumbers Conference, Portland, OR, USA, Sep. 2009.

[11]

E. Kohler, R. Morris, B. Chen, J. Jannotti, and M. F. Kaashoek, "The click modular router," ACM Trans. Comput. Syst., vol. 18, no. 3, pp. 263--297, 2000.

Digital Library

[12]

B. Chen and R. Morris, "Flexible control of parallelism in a multiprocessor pc router," in Proceedings of the General Track: 2002 USENIX Annual Technical Conference. Berkeley, CA, USA: USENIX Association, 2001, pp. 333--346.

Digital Library

[13]

Q. Wu and T. Wolf, "On runtime management in multi-core packet processing systems," in Proc. of ACM/IEEE Symposium on Architectures for Networking and Communication Systems (ANCS), San Jose, CA, Nov. 2008.

Digital Library

[14]

N. Shah, W. Plishker, K. Ravindran, and K. Keutzer, "Np-click: A productive software development approach for network processors," IEEE Micro, vol. 24, no. 5, pp. 45--54, 2004.

Digital Library

[15]

D. S. Milojičić, F. Douglis, Y. Paindaveine, R. Wheeler, and S. Zhou, "Process migration," ACM Comput. Surv., vol. 32, no. 3, pp. 241--299, 2000.

Digital Library

[16]

D. Wentzlaff and A. Agarwal, "Factored operating systems (fos): the case for a scalable operating system for multicores," SIGOPS Oper. Syst. Rev., vol. 43, no. 2, pp. 76--85, 2009.

Digital Library

[17]

S. Gochman, A. Mendelson, A. Naveh, and E. Rotem, "Introduction to intel core duo processor architecture," Intel Technology Journal, vol. 10, no. 2, 2006.

[18]

N. Egi, M. Dobrescu, J. Du, K. Argyraki, B.-G. C. Chun, K. Fall, G. Iannaccone, A. Knies, M. Manesh, L. Mathy, and S. Ratnasamy, "Understanding the packet processing capabilities of multi-core servers," 2009. {Online}, Available: http://infoscience.epfl.ch/record/134539

[19]

Q. Wu and T. Wolf, "Dynamic workload profiling and task allocation in packet processing systems," in Proc. of IEEE Workshop on High Performance Switching and Routing (HPSR), Shanghai, China, May 2008.

[20]

Q. Wu, S. Shanbhag, and T. Wolf, "Fair multithreading on packet processors for scalable network virtualization," in Proc. of ACM/IEEE Symposium on Architectures for Networking and Communication Systems (ANCS), San Diego, CA, Oct. 2010.

Digital Library

[21]

N. Santoro, Design and Analysis of Distributed Algorithms (Wiley Series on Parallel and Distributed Computing). Wiley-Interscience, 2006.

Digital Library

Cited By

Bolla RBruschi RLombardo CPodda F(2014)OpenFlow in the Small: A Flexible and Efficient Network Acceleration Framework for Multi-Core SystemsIEEE Transactions on Network and Service Management10.1109/TNSM.2014.234607811:3(390-404)Online publication date: Sep-2014
https://doi.org/10.1109/TNSM.2014.2346078
Bolla RLombardo CBruschi RPodda F(2013)OpenFlow in the small2013 IEEE International Conference on Communications (ICC)10.1109/ICC.2013.6655094(3509-3513)Online publication date: Jun-2013
https://doi.org/10.1109/ICC.2013.6655094
Al-Fares MKapoor RPorter GDas SWeatherspoon HPrabhakar BVahdat AWolf TMoore APrasanna V(2012)NetBumpProceedings of the eighth ACM/IEEE symposium on Architectures for networking and communications systems10.1145/2396556.2396567(61-72)Online publication date: 29-Oct-2012
https://dl.acm.org/doi/10.1145/2396556.2396567

Index Terms

Distributed runtime load-balancing for software routers on homogeneous many-core processors
1. Computer systems organization
  1. Architectures
    1. Parallel architectures
2. Networks
  1. Network components
    1. Intermediate nodes
      1. Routers

Recommendations

Designing and dynamically load balancing hybrid LU for multi/many-core

Designing high-performance LU factorization for modern hybrid multi/many-core systems requires highly-tuned BLAS subroutines, hiding communication latency and balancing the load across devices of variable processing capabilities. In this paper we show ...
All-pairs computations on many-core graphics processors

Developing high-performance applications on emerging multi- and many-core architectures requires efficient mapping techniques and architecture-specific tuning methodologies to realize performance closer to their peak compute capability and memory ...
Vectorizing Unstructured Mesh Computations for Many-core Architectures
PMAM'14: Proceedings of Programming Models and Applications on Multicores and Manycores

Achieving optimal performance on the latest multi-core and many-core architectures depends more and more on making efficient use of the hardware's vector processing capabilities. While auto-vectorizing compilers do not require the use of vector ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

PRESTO '10: Proceedings of the Workshop on Programmable Routers for Extensible Services of Tomorrow

November 2010

67 pages

ISBN:9781450304672

DOI:10.1145/1921151

Program Chairs:
T. S. Eugene Ng
Rice University
,
Sylvia Ratnasamy
Intel Research
,
Jonathan M. Smith
University of Pennsylvania

Copyright © 2010 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGCOMM: ACM Special Interest Group on Data Communication

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 30 November 2010

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Research-article

Funding Sources

Division of Computer and Network Systems

Conference

Co-NEXT '10

Sponsor:

SIGCOMM

Co-NEXT '10: Conference on emerging Networking EXperiments and Technologies

November 30, 2010

Pennsylvania, Philadelphia

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

3
Total Citations
View Citations
225
Total Downloads

Downloads (Last 12 months)1
Downloads (Last 6 weeks)0

Reflects downloads up to 27 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Bolla RBruschi RLombardo CPodda F(2014)OpenFlow in the Small: A Flexible and Efficient Network Acceleration Framework for Multi-Core SystemsIEEE Transactions on Network and Service Management10.1109/TNSM.2014.234607811:3(390-404)Online publication date: Sep-2014
https://doi.org/10.1109/TNSM.2014.2346078
Bolla RLombardo CBruschi RPodda F(2013)OpenFlow in the small2013 IEEE International Conference on Communications (ICC)10.1109/ICC.2013.6655094(3509-3513)Online publication date: Jun-2013
https://doi.org/10.1109/ICC.2013.6655094
Al-Fares MKapoor RPorter GDas SWeatherspoon HPrabhakar BVahdat AWolf TMoore APrasanna V(2012)NetBumpProceedings of the eighth ACM/IEEE symposium on Architectures for networking and communications systems10.1145/2396556.2396567(61-72)Online publication date: 29-Oct-2012
https://dl.acm.org/doi/10.1145/2396556.2396567

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten