Improving Latency Tolerance of Network Processors Through Simultaneous Multithreading

Liang, Bo; An, Hong; Lu, Fang; Guo, Rui

doi:10.1007/11573937_9

Bo Liang¹⁹,
Hong An^19,20,
Fang Lu¹⁹ &
…
Rui Guo¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 3756))

Included in the following conference series:

International Workshop on Advanced Parallel Processing Technologies

660 Accesses

Abstract

Existing multithreaded network processors architecture with multiple processing engines (PEs), aims at taking advantage of blocked multithreading technique which executes instructions of different user-defined threads in the same PE pipeline, in explicit and interleave way. Multiple PEs, each of which is a multithreaded processor core, process several packets in parallel to hide long memory access latency. Most of them are optimized for throughputs mostly in data-plane. In future network workloads, the boundaries between data-plane and control-plane become blurred, so that PEs are demanded not only wire speed packet forwarding on data-plane, but also highly intelligent and increased complex packet processing function on control-plane. In this paper, we analyze SMT’s short latency tolerance potential when used in out-of-order and dynamic scheduling PE cores. We show in this paper that 2~4 issue SMT provides an excellent short memory and branch latency tolerance, which gain higher instructions throughout as well as much simpler structures.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Intel Corporation: Intel IXP2400 Network Processor Family Hardware Reference Manual (June 2001)
Google Scholar
IBM Corporation: The Network Processor: Enabling Technology for High-Performance Networking. IBM Microelectronics (1999)
Google Scholar
C-Port Corporation: C-5 Digital Communications Processor (1999), http://www.cportcorp.com/solutions/docs/c5brief.pdf
Wolf, T., Franklin, M.: CommBench - A Telecommunications Benchmark for Network Processors. In: International Symposium on Performance Analysis of Systems and Software (April 2000)
Google Scholar
Memik, G., Mangione-smith, W., Hu, W.: NetBench: A Benchmarking Suite for Network Processors. In: 2001 IEEE/ACM International Conference on Computer-Aided Design (2001)
Google Scholar
Lee, B.K.: NpBench: A Benchmark Suite for Control plane and Data plane Applications for Network Processors. In: IEEE International Conference on Computer Design (October 2003)
Google Scholar
Gonçalves, R., Ayguadé, E., Valero, M., Navaux, P.: A Simulator for SMT Architectures: Evaluating Instruction Cache Topologies. In: SBAC-PAD, Brazil, pp. 2169–2161 (2000)
Google Scholar
Simplescalar Simulator, from, http://www.simplerscalar.com
Chiueh, T.-C., Pradhan, P.: Cache Memory Design for Network Processors. In: Proceeding of the 6th International Symposium. on High Performance Computer Architecture, Tolouse, France (January 2000)
Google Scholar
Sherwood, T., Varghese, G., Calder, B.: A Pipelined Memory Architecture for High Throughput Network Processors. In: Proceedings of the 30th Annual, ISCA 2003 (2003)
Google Scholar
Hasan, J., Chandra1, S., Vijaykumar, T.N.: Efficient Use of Memory Bandwidth to Improve Network Processor Throughput. In: Proceedings of the 30th Annual, ISCA 2003 (2003)
Google Scholar
Parcerisa, J.-M., Gonzalez, A.: Improving Latency Tolerance of Multithreading through Decoupling. IEEE Transactions on Computers 50(10) (October 2001)
Google Scholar
Hily, S., Seznec, A.: Branch Prediction and Simultaneous Multithreading. In: proceeding of International Conference on Parallel Architecture and Compilation Techniques (1996)
Google Scholar
Ramsay, M., Feucht, C., Lipasti, M.H.: Exploring Efficient SMT Branch Predictor Design. In: Workshop on Complexity-Effective Design, in conjunction with ISCA (June 2003)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Technology, University of Science and Technology of China, Hefei, 230026, China
Bo Liang, Hong An, Fang Lu & Rui Guo
Computer Architecture Laboratory, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, 100086, China
Hong An

Authors

Bo Liang
View author publications
You can also search for this author in PubMed Google Scholar
Hong An
View author publications
You can also search for this author in PubMed Google Scholar
Fang Lu
View author publications
You can also search for this author in PubMed Google Scholar
Rui Guo
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computing, Hong Kong Polytechnic University, Kowloon, Hong Kong, China
Jiannong Cao
L3S Research Center, Leibniz Universität Hannover, Appelstrasse 9a, 30167, Hannover, Germany
Wolfgang Nejdl
Department of Network Engineering, School of Computer Science, National University of Defense Technology, 410073, Changsha, China
Ming Xu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Liang, B., An, H., Lu, F., Guo, R. (2005). Improving Latency Tolerance of Network Processors Through Simultaneous Multithreading. In: Cao, J., Nejdl, W., Xu, M. (eds) Advanced Parallel Processing Technologies. APPT 2005. Lecture Notes in Computer Science, vol 3756. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11573937_9

Download citation

DOI: https://doi.org/10.1007/11573937_9
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-29639-3
Online ISBN: 978-3-540-32107-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics