research-article

Analytic modeling of network processors for parallel workload mapping

Authors:

Tilman WolfAuthors Info & Claims

ACM Transactions on Embedded Computing Systems (TECS), Volume 8, Issue 3

Article No.: 18, Pages 1 - 29

https://doi.org/10.1145/1509288.1509290

Published: 22 April 2009 Publication History

Abstract

Network processors are heterogeneous system-on-chip multiprocessors that are optimized to perform packet forwarding and processing tasks at Gigabit data rates. To meet the performance demands of increasing link speeds and complex network applications, network processors are implemented with several dozen embedded processor cores and hardware accelerators that run multiple packet processing applications in parallel. The parallel nature of the processing system makes it increasingly difficult for application developers to understand and manage resources and map processing tasks to the hardware. To address this problem, we present a methodology for profiling and analyzing network processor applications, mapping processing tasks to a generalized network processor architecture, and analytically determining the expected throughput performance. The key novelty of this work is not only the adaptation of application analysis and mapping algorithms to heterogeneous network processors, but also that the entire process can be automated and hidden from the application developer. Starting with the analysis of a uniprocessor implementation of the application, the process yields a mapping of the partitioned application that shows best performance for a given network processor system. The simplicity of the proposed randomized mapping algorithm allows the use of this methodology in network processor runtime systems where dynamic reallocation of tasks is necessary but processing power is limited. We present results that show the effectiveness of the analysis and mapping methodology as well as its application to design space exploration.

References

[1]

Agarwal, A. 1992. Performance tradeoffs in multithreaded processors. IEEE Trans. Parall. Distrib. Syst. 3, 5, 525--539.

Digital Library

[2]

Austin, T. M. and Sohi, G. S. 1993. Tetra: evaluation of serial program performance on fine-grain parallel processors. Tech. rep. 1163, Computer Science Department, University of Wisconsin, Madison.

[3]

Baker, F. 1995. Requirements for IP version 4 routers. RFC 1812, Network Working Group.

Digital Library

[4]

Bhandarkar, D. P. 1975. Analysis of memory interference in multiprocessors. IEEE Trans. Comput. C- 24, 9, 897--908.

Digital Library

[5]

Daemen, J. and Rijmen, V. 2000. The block cipher Rijndael. Lecture Notes in Computer Science. Vol. 1820. Springer-Verlag, Berlin, Germany, 288--296.

[6]

Dowdy, L. W., Rosti, E., Serazzi, G., and Smirni, E. 1999. Scheduling issues in high-performance computing. SIGMETRICS Perform. Eval. Rev. 26, 4, 60--69.

Digital Library

[7]

Foster, I. and Kesselman, C., Eds. 2004. The Grid -- Blueprint for a New Computing Infrastructure, 2nd Ed. Morgan Kaufmann, San Francisco, CA.

Digital Library

[8]

Franklin, M. A. and Wolf, T. 2002. A network processor performance and design model with benchmark parameterization. In Proceedings of the 1st Network Processor Workshop (NP-1) in Conjunction with the 8th International Symposium on High-Performance Computer Architecture (HPCA-8). ACM, New York, 63--74.

[9]

Franklin, M. A. and Wolf, T. 2003. Power considerations in network processor design. In Proceedings of the 2nd Network Processor Workshop (NP-2) in Conjunction with 9th International Symposium on High-Performance Computer Architecture (HPCA-9). ACM, New York, 10--22.

[10]

Goglin, S. D., Hooper, D., Kumar, A., and Yavatkar, R. 2003. Advanced software framework, tools, and languages for the IXP family. Intel Tech. J. 7, 4, 64--76.

[11]

Grasso et al., P. A. 1984. Memory interference in multimicroprocessor systems with a time-shared bus. Proc. IEEE 131, 10.

[12]

Gries, M., Kulkarni, C., Sauer, C., and Keutzer, K. 2003. Exploring trade-offs in performance and programmability of processing element topologies for network processors. In Proceedings of the 2nd Network Processor Workshop (NP-2) in Conjunction with 9th International Symposium on High-Performance Computer Architecture (HPCA-9). ACM, New York, 75--87.

[13]

Hoogendoorn, C. H. 1977. A general model for memory interference in multiprocessors. IEEE Trans. Comput. c-26, 10, 998--1005.

Digital Library

[14]

Intel Corporation 2003. Intel IXA software developers Kit 2.01.

[15]

Kapasi, U. J., Rixner, S., Dally, W. J., Khailany, B., Ahn, J. H., Mattson, P., and Owens, J. D. 2003. Progammable stream processors. IEEE Comput. 36, 8, 54--62.

Digital Library

[16]

Karp, R. M. 1991. An introduction to randomized algorithms. Discrete Appl. Math. 34, 1-3, 165--201.

Digital Library

[17]

Kohler, E., Morris, R., Chen, B., Jannotti, J., and Kaashoek, M. F. 2000. The Click modular router. ACM Trans. Comput. Syst. 18, 3, 263--297.

Digital Library

[18]

Kokku, R., Riché, T., Kunze, A., Mudigonda, J., Jason, J., and Vin, H. 2003. A case for run-time adaptation in packet processing systems. In Proceedings of the 2nd Workshop on Hot Topics in Networks (HOTNETSII). Cambridge, MA.

[19]

Kwok, Y.-K. and Ahmad, I. 1999. Static scheduling algorithms for allocating directed task graphs to multiprocessors. ACM Comput. Surv. 31, 4, 406--471.

Digital Library

[20]

Lakamraju, V., Koren, I., and Krishna, C. M. 2002. Filtering random networks to synthesize interconnection networks with multiple objectives. IEEE Trans. Parall. Distrib. Syst.13, 11, 1139--1149.

Digital Library

[21]

Malloy, B. A., Lloyd, E. L., and Souffa, M. L. 1994. Scheduling DAG's for asynchronous multiprocessor execution. IEEE Trans. Parall. Distrib. Syst. 5, 5, 498--508.

Digital Library

[22]

Motwani, R. and Raghavan, P. 1995. Randomized Algorithms. Cambridge University Press, Cambridge, UK.

Digital Library

[23]

Nilsson, S. and Karlsson, G. 1999. IP-address lookup using LC-tries. IEEE J. Sel. Areas Comm. 17, 6, 1083--1092.

Digital Library

[24]

Ramaswamy, R., Weng, N., and Wolf, T. 2004. Application analysis and resource mapping for heterogeneous network processor architectures. In Proceedings of the 3rd Workshop on Network Processors and Applications (NP-3) in Conjunction with the 10th International Symposium on High Performance Computer Architecture (HPCA-10). ACM, New York, 103--119.

[25]

Ramaswamy, R., Weng, N., and Wolf, T. 2005. Analysis of network processing workloads. In Proceedings of the of IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS). IEEE, Los Alamitos, CA, 226--235.

Digital Library

[26]

Ramaswamy, R. and Wolf, T. 2003. PacketBench: A tool for workload characterization of network processing. In Proceedings of the IEEE 6th Annual Workshop on Workload Characterization (WWC-6). IEEE, Los Alamitos, CA, 42--50.

[27]

Reijns, G. L. and van Gemund, A. J. C. 1999. Analysis of a shared-memory multiprocessor via a novel queuing model. J. Syst. Architect. 45, 14, 1189--1193.

Digital Library

[28]

Shah, N., Plishker, W., and Keutzer, K. 2003. NP-Click: A programming model for the intel IXP1200. In Proceedings of the 2nd Network Processor Workshop (NP-2) in Conjunction with 9th International Symposium on High-Performance Computer Architecture (HPCA-9). ACM, New York, 100--111.

[29]

Taylor, M. B., Kim, J., Miller, J., Wentzlaff, D., Ghodrat, F., Greenwald, B., Hoffman, H., Lee, J.-W., Johnson, P., et al. 2002. The Raw microprocessor: A computational fabric for software circuits and general purpose programs. IEEE Micro 22, 2, 25--35.

Digital Library

[30]

Teja Technologies. 2003. TejaNP datasheet. Teja Technologies. http://www.teja.com.

[31]

Thiele, L., Chakraborty, S., Gries, M., and Künzli, S. 2002. Design space exploration of network processor architectures. In Proceedings of the 1st Network Processor Workshop (NP-1) in Conjunction with the 8th International Symposium on High-Performance Computer Architecture (HPCA-8). ACM, New York, 30--41.

[32]

van Gemund, A. J. C. 1993. Performances prediction of parallel processing systems: The Pamela methodology. In Proceedings of the 7th ACM International Conference on Supercomputing. ACM, New York, 318--327.

Digital Library

[33]

Wei, Y.-C. and Cheng, C.-K. 1991. Ratio cut partitioning for hierarchical designs. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 10, 7, 911--921.

Digital Library

[34]

Wolf, T. and Franklin, M. A. 2000. CommBench -- a telecommunications benchmark for network processors. In Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS). IEEE, Los Alamitos, CA, 154--162.

Digital Library

[35]

Wolf, T., Weng, N., and Tai, C.-H. 2005. Design considerations for network processor operating systems. In Proceedings of the ACM/IEEE Symposium on Architectures for Networking and Communication Systems (ANCS). ACM, New York, 71--80.

Digital Library

Cited By

Ying W(2014)Research on Packet-Processing Architecture Based on Multi-core ProcessorProceedings of the 2014 Sixth International Conference on Measuring Technology and Mechatronics Automation10.1109/ICMTMA.2014.126(520-523)Online publication date: 10-Jan-2014
https://dl.acm.org/doi/10.1109/ICMTMA.2014.126
Castrillon JLeupers RAscheid G(2013)MAPS: Mapping Concurrent Dataflow Applications to Heterogeneous MPSoCsIEEE Transactions on Industrial Informatics10.1109/TII.2011.21739419:1(527-545)Online publication date: Feb-2013
https://doi.org/10.1109/TII.2011.2173941
Chen XChasaki DWolf T(2013)External monitoring of highly parallel network processors2013 IEEE 14th International Conference on High Performance Switching and Routing (HPSR)10.1109/HPSR.2013.6602312(197-204)Online publication date: Jul-2013
https://doi.org/10.1109/HPSR.2013.6602312
Show More Cited By

Index Terms

Analytic modeling of network processors for parallel workload mapping

Recommendations

Profiling and mapping of parallel workloads on network processors
SAC '05: Proceedings of the 2005 ACM symposium on Applied computing

Network processors are embedded system-on-a-chip multiprocessors that are optimized to perform simple packet processing tasks at data rates of several Gigabits per second. To meet the performance demands of increasing link speeds and more complex ...
Evaluating Network Processors using NetBench

The Network Processor market is one of the fastest growing segments of the microprocessor industry today. In spite of this increasing market importance, there does not exist a common framework to compare the performance of different Network Processor ...
Program mapping onto network processors by recursive bipartitioning and refining
DAC '07: Proceedings of the 44th annual Design Automation Conference

Mapping packet processing applications onto embedded network processors (NP) is a challenging task due to the unique constraints of NP systems and the characteristics of network application domains. A remarkable difference with general multiprocessor ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Embedded Computing Systems

ACM Transactions on Embedded Computing Systems Volume 8, Issue 3

April 2009

239 pages

ISSN:1539-9087

EISSN:1558-3465

DOI:10.1145/1509288

Issue’s Table of Contents

Copyright © 2009 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Journal Family

ACM Journals for the Design of Smart and Connected Systems

Publication History

Published: 22 April 2009

Accepted: 01 July 2006

Revised: 01 May 2006

Received: 01 August 2005

Published in TECS Volume 8, Issue 3

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

13
Total Citations
View Citations
537
Total Downloads

Downloads (Last 12 months)2
Downloads (Last 6 weeks)0

Reflects downloads up to 27 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Ying W(2014)Research on Packet-Processing Architecture Based on Multi-core ProcessorProceedings of the 2014 Sixth International Conference on Measuring Technology and Mechatronics Automation10.1109/ICMTMA.2014.126(520-523)Online publication date: 10-Jan-2014
https://dl.acm.org/doi/10.1109/ICMTMA.2014.126
Castrillon JLeupers RAscheid G(2013)MAPS: Mapping Concurrent Dataflow Applications to Heterogeneous MPSoCsIEEE Transactions on Industrial Informatics10.1109/TII.2011.21739419:1(527-545)Online publication date: Feb-2013
https://doi.org/10.1109/TII.2011.2173941
Chen XChasaki DWolf T(2013)External monitoring of highly parallel network processors2013 IEEE 14th International Conference on High Performance Switching and Routing (HPSR)10.1109/HPSR.2013.6602312(197-204)Online publication date: Jul-2013
https://doi.org/10.1109/HPSR.2013.6602312
Ferkouss OBen Ali RLemieux YOmar C(2012)Performance model for mapping processing tasks to OpenFlow switch resources2012 IEEE International Conference on Communications (ICC)10.1109/ICC.2012.6363651(1476-1481)Online publication date: Jun-2012
https://doi.org/10.1109/ICC.2012.6363651
Vianna EComarela GPontes TAlmeida JAlmeida VWilkinson KKuno HDayal U(2012)Analytical Performance Models for MapReduce WorkloadsInternational Journal of Parallel Programming10.1007/s10766-012-0227-441:4(495-525)Online publication date: 27-Nov-2012
https://doi.org/10.1007/s10766-012-0227-4
Castrillon JSheng WLeupers R(2011)Trends in embedded software synthesis2011 International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation10.1109/SAMOS.2011.6045483(347-354)Online publication date: Jul-2011
https://doi.org/10.1109/SAMOS.2011.6045483
Mohay GAhmed EBhatia SNadarajan ARavindran BTickle AVijayasarathy R(2011)Detection and Mitigation of High-Rate Flooding AttacksAn Investigation into the Detection and Mitigation of Denial of Service (DoS) Attacks10.1007/978-81-322-0277-6_5(131-181)Online publication date: 6-Sep-2011
https://doi.org/10.1007/978-81-322-0277-6_5
Wu QChasaki DWolf T(2010)Implementation of a simplified network processor2010 International Conference on High Performance Switching and Routing10.1109/HPSR.2010.5580273(7-13)Online publication date: Jun-2010
https://doi.org/10.1109/HPSR.2010.5580273
Wu QWolf T(2009)Runtime resource allocation in multi-core packet processing systemsProceedings of the 15th international conference on High Performance Switching and Routing10.5555/1715730.1715740(62-69)Online publication date: 22-Jun-2009
https://dl.acm.org/doi/10.5555/1715730.1715740
Wu QWolf T(2009)Runtime resource allocation in multi-core packet processing systems2009 International Conference on High Performance Switching and Routing10.1109/HPSR.2009.5307422(1-8)Online publication date: Jun-2009
https://doi.org/10.1109/HPSR.2009.5307422
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Issue’s Table of Contents