Abstract
Instruction Reuse (IR) is a microarchitectural technique that improves the execution time of a program by removing redundant computations at run-time. Although this is the job of an optimizing compiler, they do not succeed many a time due to limited knowledge of run-time data. In this article we concentrate on integer ALU and load instructions in packet processing applications and see how IR can be used to obtain better performance. In addition, we attempt to answer the following questions in the article - (1) Can IR be improved by reducing interference in the reuse buffer?, (2) What characteristics of network applications can be exploited to improve IR?, and (3) What is the effect of IR on resource contention and memory accesses? We propose an aggregation scheme that combines the high-level concept of network traffic i.e. “flows” with the low level microarchitectural feature of programs i.e. repetition of instructions and data and propose an architecture that exploits temporal locality in incoming packet data to improve IR by reducing interference in the RB. We find that the benefits that can be achieved by exploiting IR varies widely depending on the nature of the application and input data. For the benchmarks considered, we find that IR varies between 1% and 50% while the speedup achieved varies between 1% and 24%.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
A. Sodani and G. Sohi. “Dynamic Instruction Reuse.” 24th Annual International Symposium on Computer Architecture, July 1997, pp. 194–205.
A. Sodani and G. Sohi. “Understanding the Differences between Value Prediction and Instruction Reuse.” 32nd Annual International Symposium on Microarchiteclure, December 1998, pp. 205–215.
A. Sodani and G. Sohi. “An Empirical Analysis of Instruction Repetition.” In Proceedings of ASPLOS-8, 1998.
J. Yang and R. Gupta, “Load Redundancy Removal through Instruction Reuse.” In Proceedings of International Conference on Parallel Processing, August 2000, pp. 61–68.
C. Molina, A. Gonzalez and J. Tubella. “Dynamic Removal of Redundant Computations.” In Proceedings of International Conference on Supercomputing, June 1999.
F. Baker. “Requirements for IP Version 4 Routers,” RFC-1812, Network Working Group, June 1995.
D. Burger, T. M. Austin, and S. Bennett. “Evaluating Future Microprocessors: The SimpleScalar Tool Set.” Technical Report CS-TR-96-1308, University of Wisconsin-Madison, July 1996.
Tilman Wolf and, Mark Franklin. “CommBench-A Telecommunications Benchmark for Network Processors.” IEEE Symposium on Performance Analysis of Systems and Software, Apr 2000, pp. 154–162.
Gokhan Memik, B. Mangione-Smith, and W. Hu, “NetBench: A Benchmarking Suite for Network Processors.” In Proceedings of ICCAD, November 2001.
S. Bradner and J. McQuaid. “A Benchmarking Methodology for Network Interconnect Devices.” RFC-2544.
Stephen Melvin and Yale Patt. “Handling of Packet Dependencies: A Critical Issue for Highly Parallel Network Processors.” In Proceedings of CASES, 2002.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Kluwer Academic Publishers
About this chapter
Cite this chapter
Surendra, G., Banerjee, S., Nandy, S.K. (2003). Enhancing Speedup in Network Processing Applications by Exploiting Instruction Reuse with Flow Aggregation. In: Jerraya, A.A., Yoo, S., Verkest, D., Wehn, N. (eds) Embedded Software for SoC. Springer, Boston, MA. https://doi.org/10.1007/0-306-48709-8_27
Download citation
DOI: https://doi.org/10.1007/0-306-48709-8_27
Publisher Name: Springer, Boston, MA
Print ISBN: 978-1-4020-7528-5
Online ISBN: 978-0-306-48709-5
eBook Packages: Springer Book Archive