skip to main content
10.1145/1323548.1323563acmconferencesArticle/Chapter ViewAbstractPublication PagesancsConference Proceedingsconference-collections
research-article

Automated task distribution in multicore network processors using statistical analysis

Published: 03 December 2007 Publication History

Abstract

Chip multiprocessor designs are the most common types of architectures seen in Network Processors. As the Network Processors are used to implement increasingly complicated applications, task distribution among the cores is becoming an important problem. In this paper, we propose a new task allocation scheme for such architectures. This scheme relies on the inherent modular nature of the networking applications and intelligently distributes modules among different execution cores. Additionally, we selectively replicate modules to parallelize execution of tasks having longer processing time. We have developed a technique that uses the probability distribution of the execution times of different modules in the networking applications. The proposed schemes result in resource utilization of up to 95%, 89%, and 84% on average for the processors with 2, 4, and 8 cores, respectively. The schemes are highly scalable and can improve the throughput by 6.72 times for 8 core processors, aggregated over four representative applications. The combination of selective replication of modules and variation-aware task allocation result in up to 12.5% (9.9% on average) performance improvement as compared to a scheme based on just mean processing time.

References

[1]
Baker, F., Requirements for IP version 4 routers. RFC 1812, June 1995.
[2]
Burger, D. and T. Austin, The SimpleScalar Tool Set, Version 2.0. 1997, Univ. of Wisconsin-Madison, Comp. Sci. Dept.
[3]
Chekuri, C., Approximation Algorithms for Scheduling Problems,Technical Report CS-TR-98-1611, Computer Science Department, Stanford University. August 1998.
[4]
Chen, M.K., et al., Shangri-La: achieving high performance from compiled network applications while enabling ease of programming. ACM SIGPLAN Notices, 2005. 40(6): p. 224--236.
[5]
Datar, S. and M. A. Franklin, Task Scheduling of Processor Pipelines with Application to Network Processors, Department of Computer Science and Engineering, Washington University in St. Louis.
[6]
Devadas, S. and A. R. Newton., Algorithms for Hardware Allocation in Datapath Synthesis. IEEE Trans. On CAD, July 1989. 8, No. 7, pp. 768--781,(7).
[7]
Gordon, M. I., W. Thies, and S. Amarasinghe, Exploiting coarse-grained task, data, and pipeline parallelism in stream programs, in International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS). 2006. p. 151--162.
[8]
Intel. The Intel® Pentium® 4 processor - Product Briefs, {http://www.intel.com/design/Pentium4/prodbref/index.htm}.
[9]
Intel, Intel® IXP2400 Network Processor Thermal and Mechanical Design Guideline. March 2003.
[10]
Intel, C., Intel® IXP2800 Network Processor Product Brief. 2002: Santa Clara/CA.
[11]
Kohler, E. The Click Modular Router Project. in http://pdos.csail.mit.edu/click.
[12]
Kohler, E., et al., The Click modular router. ACM Transactions on Computer Systems, 2000. 18(3): p. 263--97.
[13]
McMahan, S., et al. A 600 MHz NT3 network processor. in The Digest of Technical Papers for IEEE International Solid-State Circuits Conference (ISSCC). 2003.
[14]
Memik, G. and W. H. Mangione-Smith. NEPAL: A Framework for Efficiently Structuring Applications for Network Processors. in Workshop on Network Processors -- NP2 (held in conjunction with HPCA). Feb. 2003. Anaheim, CA.
[15]
Motorola, C-5 Network Processor Fact Sheet. Oct. 2001.
[16]
Plishker, W., et al. Automated Task Allocation for Network Processors. in Network System Design Conference Proceedings. October, 2004.
[17]
Postel, J., Internet Control Message Protocol. RFC 792 (Sept.), Internet Engineering Task Force. ftp://ftp.ietf.org/rfc/rfc0792.txt. 1981.
[18]
Postel, J., Internet Protocol. RFC 791 (Sept.), Internet Engineering Task Force. ftp://ftp.ietf.org/rfc/rfc0791.txt, 1981.
[19]
Schreedhar, M. and G. Varghese. Efficient Fair Queueing using Deficit Round Robin. in SIGCOMM'95. Aug/Sep 1995. Cambridge, MA.
[20]
Shachnai, H. and T. Tamir. Polynomial time approximation schemes for class-constrained packing problems. in Proceedings of Workshop on Approximation Algorithms. 2000.
[21]
Shah, N., W. Plishker, and K. Keutzer. NP-Click: A Programming Model for the Intel IXP1200. in 2nd Workshop on Network Processors (NP-2) at the 9th International Symposium on High Performance Computer Architecture (HPCA-9). February, 2003. Anaheim, CA.
[22]
Srinivasan, A., Multiprocessor Scheduling in Processor-based Router Platforms: Issues and Ideas. Network Processor Design:Issues and Practices, November 2003.
[23]
Tsai, M., et al. A Benchmarking Methodology for Network Processors. in 1st Network Processor Workshop, 8th Int. Symposium on High Performance Architectures. 2002.
[24]
Vin, H.M., et al. A Programming Environment for Packet-processing Systems: Design Considerations. in The Workshop on Network Processors & Applications - NP3. Held in conjunction with The 10th International Symposium on High-Performance Computer Architecture 2004.
[25]
Wheeler, B. and L. Gwennap, A Guide to Metro Network Processors. 8 ed. December, 2006: The Linley Group.

Cited By

View all
  • (2014)Enabling Network Security in HPC Systems Using Heterogeneous CMPsHigh-Performance Computing on Complex Environments10.1002/9781118711897.ch20(383-399)Online publication date: 18-Apr-2014
  • (2013)External monitoring of highly parallel network processors2013 IEEE 14th International Conference on High Performance Switching and Routing (HPSR)10.1109/HPSR.2013.6602312(197-204)Online publication date: Jul-2013
  • (2012)Runtime Task Allocation in Multicore Packet Processing SystemsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2012.5623:10(1934-1943)Online publication date: 1-Oct-2012
  • Show More Cited By

Index Terms

  1. Automated task distribution in multicore network processors using statistical analysis

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    ANCS '07: Proceedings of the 3rd ACM/IEEE Symposium on Architecture for networking and communications systems
    December 2007
    212 pages
    ISBN:9781595939456
    DOI:10.1145/1323548
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 03 December 2007

    Permissions

    Request permissions for this article.

    Check for updates

    Qualifiers

    • Research-article

    Conference

    ANCS07

    Acceptance Rates

    ANCS '07 Paper Acceptance Rate 20 of 70 submissions, 29%;
    Overall Acceptance Rate 88 of 314 submissions, 28%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)3
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 12 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2014)Enabling Network Security in HPC Systems Using Heterogeneous CMPsHigh-Performance Computing on Complex Environments10.1002/9781118711897.ch20(383-399)Online publication date: 18-Apr-2014
    • (2013)External monitoring of highly parallel network processors2013 IEEE 14th International Conference on High Performance Switching and Routing (HPSR)10.1109/HPSR.2013.6602312(197-204)Online publication date: Jul-2013
    • (2012)Runtime Task Allocation in Multicore Packet Processing SystemsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2012.5623:10(1934-1943)Online publication date: 1-Oct-2012
    • (2012)Optimizing energy-efficiency for program partitioning and mapping onto multi-core packet processing systemsThe Journal of China Universities of Posts and Telecommunications10.1016/S1005-8885(11)60464-019(79-86)Online publication date: Jun-2012
    • (2010)LATAProceedings of the 47th Design Automation Conference10.1145/1837274.1837286(36-41)Online publication date: 13-Jun-2010
    • (2009)Performance analysis of high rate packet capture on multiprocessor platformFrontiers of Electrical and Electronic Engineering in China10.1007/s11460-009-0070-65:1(36-42)Online publication date: 5-Nov-2009
    • (2009)Concurrent workload mapping for multicore security systemsConcurrency and Computation: Practice and Experience10.1002/cpe.142321:10(1281-1306)Online publication date: 16-Apr-2009
    • (2008)MultiLayer processing - an execution model for parallel stateful packet processingProceedings of the 4th ACM/IEEE Symposium on Architectures for Networking and Communications Systems10.1145/1477942.1477954(79-88)Online publication date: 6-Nov-2008
    • (2008)On runtime management in multi-core packet processing systemsProceedings of the 4th ACM/IEEE Symposium on Architectures for Networking and Communications Systems10.1145/1477942.1477953(69-78)Online publication date: 6-Nov-2008
    • (2008)Dynamic workload profiling and task allocation in packet processing systems2008 International Conference on High Performance Switching and Routing10.1109/HSPR.2008.4734432(123-130)Online publication date: May-2008

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media