skip to main content
research-article

Analytic modeling of network processors for parallel workload mapping

Published: 22 April 2009 Publication History

Abstract

Network processors are heterogeneous system-on-chip multiprocessors that are optimized to perform packet forwarding and processing tasks at Gigabit data rates. To meet the performance demands of increasing link speeds and complex network applications, network processors are implemented with several dozen embedded processor cores and hardware accelerators that run multiple packet processing applications in parallel. The parallel nature of the processing system makes it increasingly difficult for application developers to understand and manage resources and map processing tasks to the hardware. To address this problem, we present a methodology for profiling and analyzing network processor applications, mapping processing tasks to a generalized network processor architecture, and analytically determining the expected throughput performance. The key novelty of this work is not only the adaptation of application analysis and mapping algorithms to heterogeneous network processors, but also that the entire process can be automated and hidden from the application developer. Starting with the analysis of a uniprocessor implementation of the application, the process yields a mapping of the partitioned application that shows best performance for a given network processor system. The simplicity of the proposed randomized mapping algorithm allows the use of this methodology in network processor runtime systems where dynamic reallocation of tasks is necessary but processing power is limited. We present results that show the effectiveness of the analysis and mapping methodology as well as its application to design space exploration.

References

[1]
Agarwal, A. 1992. Performance tradeoffs in multithreaded processors. IEEE Trans. Parall. Distrib. Syst. 3, 5, 525--539.
[2]
Austin, T. M. and Sohi, G. S. 1993. Tetra: evaluation of serial program performance on fine-grain parallel processors. Tech. rep. 1163, Computer Science Department, University of Wisconsin, Madison.
[3]
Baker, F. 1995. Requirements for IP version 4 routers. RFC 1812, Network Working Group.
[4]
Bhandarkar, D. P. 1975. Analysis of memory interference in multiprocessors. IEEE Trans. Comput. C- 24, 9, 897--908.
[5]
Daemen, J. and Rijmen, V. 2000. The block cipher Rijndael. Lecture Notes in Computer Science. Vol. 1820. Springer-Verlag, Berlin, Germany, 288--296.
[6]
Dowdy, L. W., Rosti, E., Serazzi, G., and Smirni, E. 1999. Scheduling issues in high-performance computing. SIGMETRICS Perform. Eval. Rev. 26, 4, 60--69.
[7]
Foster, I. and Kesselman, C., Eds. 2004. The Grid -- Blueprint for a New Computing Infrastructure, 2nd Ed. Morgan Kaufmann, San Francisco, CA.
[8]
Franklin, M. A. and Wolf, T. 2002. A network processor performance and design model with benchmark parameterization. In Proceedings of the 1st Network Processor Workshop (NP-1) in Conjunction with the 8th International Symposium on High-Performance Computer Architecture (HPCA-8). ACM, New York, 63--74.
[9]
Franklin, M. A. and Wolf, T. 2003. Power considerations in network processor design. In Proceedings of the 2nd Network Processor Workshop (NP-2) in Conjunction with 9th International Symposium on High-Performance Computer Architecture (HPCA-9). ACM, New York, 10--22.
[10]
Goglin, S. D., Hooper, D., Kumar, A., and Yavatkar, R. 2003. Advanced software framework, tools, and languages for the IXP family. Intel Tech. J. 7, 4, 64--76.
[11]
Grasso et al., P. A. 1984. Memory interference in multimicroprocessor systems with a time-shared bus. Proc. IEEE 131, 10.
[12]
Gries, M., Kulkarni, C., Sauer, C., and Keutzer, K. 2003. Exploring trade-offs in performance and programmability of processing element topologies for network processors. In Proceedings of the 2nd Network Processor Workshop (NP-2) in Conjunction with 9th International Symposium on High-Performance Computer Architecture (HPCA-9). ACM, New York, 75--87.
[13]
Hoogendoorn, C. H. 1977. A general model for memory interference in multiprocessors. IEEE Trans. Comput. c-26, 10, 998--1005.
[14]
Intel Corporation 2003. Intel IXA software developers Kit 2.01.
[15]
Kapasi, U. J., Rixner, S., Dally, W. J., Khailany, B., Ahn, J. H., Mattson, P., and Owens, J. D. 2003. Progammable stream processors. IEEE Comput. 36, 8, 54--62.
[16]
Karp, R. M. 1991. An introduction to randomized algorithms. Discrete Appl. Math. 34, 1-3, 165--201.
[17]
Kohler, E., Morris, R., Chen, B., Jannotti, J., and Kaashoek, M. F. 2000. The Click modular router. ACM Trans. Comput. Syst. 18, 3, 263--297.
[18]
Kokku, R., Riché, T., Kunze, A., Mudigonda, J., Jason, J., and Vin, H. 2003. A case for run-time adaptation in packet processing systems. In Proceedings of the 2nd Workshop on Hot Topics in Networks (HOTNETSII). Cambridge, MA.
[19]
Kwok, Y.-K. and Ahmad, I. 1999. Static scheduling algorithms for allocating directed task graphs to multiprocessors. ACM Comput. Surv. 31, 4, 406--471.
[20]
Lakamraju, V., Koren, I., and Krishna, C. M. 2002. Filtering random networks to synthesize interconnection networks with multiple objectives. IEEE Trans. Parall. Distrib. Syst.13, 11, 1139--1149.
[21]
Malloy, B. A., Lloyd, E. L., and Souffa, M. L. 1994. Scheduling DAG's for asynchronous multiprocessor execution. IEEE Trans. Parall. Distrib. Syst. 5, 5, 498--508.
[22]
Motwani, R. and Raghavan, P. 1995. Randomized Algorithms. Cambridge University Press, Cambridge, UK.
[23]
Nilsson, S. and Karlsson, G. 1999. IP-address lookup using LC-tries. IEEE J. Sel. Areas Comm. 17, 6, 1083--1092.
[24]
Ramaswamy, R., Weng, N., and Wolf, T. 2004. Application analysis and resource mapping for heterogeneous network processor architectures. In Proceedings of the 3rd Workshop on Network Processors and Applications (NP-3) in Conjunction with the 10th International Symposium on High Performance Computer Architecture (HPCA-10). ACM, New York, 103--119.
[25]
Ramaswamy, R., Weng, N., and Wolf, T. 2005. Analysis of network processing workloads. In Proceedings of the of IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS). IEEE, Los Alamitos, CA, 226--235.
[26]
Ramaswamy, R. and Wolf, T. 2003. PacketBench: A tool for workload characterization of network processing. In Proceedings of the IEEE 6th Annual Workshop on Workload Characterization (WWC-6). IEEE, Los Alamitos, CA, 42--50.
[27]
Reijns, G. L. and van Gemund, A. J. C. 1999. Analysis of a shared-memory multiprocessor via a novel queuing model. J. Syst. Architect. 45, 14, 1189--1193.
[28]
Shah, N., Plishker, W., and Keutzer, K. 2003. NP-Click: A programming model for the intel IXP1200. In Proceedings of the 2nd Network Processor Workshop (NP-2) in Conjunction with 9th International Symposium on High-Performance Computer Architecture (HPCA-9). ACM, New York, 100--111.
[29]
Taylor, M. B., Kim, J., Miller, J., Wentzlaff, D., Ghodrat, F., Greenwald, B., Hoffman, H., Lee, J.-W., Johnson, P., et al. 2002. The Raw microprocessor: A computational fabric for software circuits and general purpose programs. IEEE Micro 22, 2, 25--35.
[30]
Teja Technologies. 2003. TejaNP datasheet. Teja Technologies. http://www.teja.com.
[31]
Thiele, L., Chakraborty, S., Gries, M., and Künzli, S. 2002. Design space exploration of network processor architectures. In Proceedings of the 1st Network Processor Workshop (NP-1) in Conjunction with the 8th International Symposium on High-Performance Computer Architecture (HPCA-8). ACM, New York, 30--41.
[32]
van Gemund, A. J. C. 1993. Performances prediction of parallel processing systems: The Pamela methodology. In Proceedings of the 7th ACM International Conference on Supercomputing. ACM, New York, 318--327.
[33]
Wei, Y.-C. and Cheng, C.-K. 1991. Ratio cut partitioning for hierarchical designs. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 10, 7, 911--921.
[34]
Wolf, T. and Franklin, M. A. 2000. CommBench -- a telecommunications benchmark for network processors. In Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS). IEEE, Los Alamitos, CA, 154--162.
[35]
Wolf, T., Weng, N., and Tai, C.-H. 2005. Design considerations for network processor operating systems. In Proceedings of the ACM/IEEE Symposium on Architectures for Networking and Communication Systems (ANCS). ACM, New York, 71--80.

Cited By

View all
  • (2014)Research on Packet-Processing Architecture Based on Multi-core ProcessorProceedings of the 2014 Sixth International Conference on Measuring Technology and Mechatronics Automation10.1109/ICMTMA.2014.126(520-523)Online publication date: 10-Jan-2014
  • (2013)MAPS: Mapping Concurrent Dataflow Applications to Heterogeneous MPSoCsIEEE Transactions on Industrial Informatics10.1109/TII.2011.21739419:1(527-545)Online publication date: Feb-2013
  • (2013)External monitoring of highly parallel network processors2013 IEEE 14th International Conference on High Performance Switching and Routing (HPSR)10.1109/HPSR.2013.6602312(197-204)Online publication date: Jul-2013
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Embedded Computing Systems
ACM Transactions on Embedded Computing Systems  Volume 8, Issue 3
April 2009
239 pages
ISSN:1539-9087
EISSN:1558-3465
DOI:10.1145/1509288
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Journal Family

Publication History

Published: 22 April 2009
Accepted: 01 July 2006
Revised: 01 May 2006
Received: 01 August 2005
Published in TECS Volume 8, Issue 3

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Application profiling
  2. embedded systems
  3. multiprocessor scheduling
  4. network processors

Qualifiers

  • Research-article
  • Research
  • Refereed

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)2
  • Downloads (Last 6 weeks)0
Reflects downloads up to 27 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2014)Research on Packet-Processing Architecture Based on Multi-core ProcessorProceedings of the 2014 Sixth International Conference on Measuring Technology and Mechatronics Automation10.1109/ICMTMA.2014.126(520-523)Online publication date: 10-Jan-2014
  • (2013)MAPS: Mapping Concurrent Dataflow Applications to Heterogeneous MPSoCsIEEE Transactions on Industrial Informatics10.1109/TII.2011.21739419:1(527-545)Online publication date: Feb-2013
  • (2013)External monitoring of highly parallel network processors2013 IEEE 14th International Conference on High Performance Switching and Routing (HPSR)10.1109/HPSR.2013.6602312(197-204)Online publication date: Jul-2013
  • (2012)Performance model for mapping processing tasks to OpenFlow switch resources2012 IEEE International Conference on Communications (ICC)10.1109/ICC.2012.6363651(1476-1481)Online publication date: Jun-2012
  • (2012)Analytical Performance Models for MapReduce WorkloadsInternational Journal of Parallel Programming10.1007/s10766-012-0227-441:4(495-525)Online publication date: 27-Nov-2012
  • (2011)Trends in embedded software synthesis2011 International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation10.1109/SAMOS.2011.6045483(347-354)Online publication date: Jul-2011
  • (2011)Detection and Mitigation of High-Rate Flooding AttacksAn Investigation into the Detection and Mitigation of Denial of Service (DoS) Attacks10.1007/978-81-322-0277-6_5(131-181)Online publication date: 6-Sep-2011
  • (2010)Implementation of a simplified network processor2010 International Conference on High Performance Switching and Routing10.1109/HPSR.2010.5580273(7-13)Online publication date: Jun-2010
  • (2009)Runtime resource allocation in multi-core packet processing systemsProceedings of the 15th international conference on High Performance Switching and Routing10.5555/1715730.1715740(62-69)Online publication date: 22-Jun-2009
  • (2009)Runtime resource allocation in multi-core packet processing systems2009 International Conference on High Performance Switching and Routing10.1109/HPSR.2009.5307422(1-8)Online publication date: Jun-2009
  • Show More Cited By

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media