Abstract
On chip caches in modern processors account for a sizable fraction of the dynamic and leakage power. Much of this power is wasted, required only because the memory cells farthest from the sense amplifiers in the cache must discharge a large capacitance on the bitlines. We reduce this capacitance by segmenting the memory cells along the bitlines, and turning off the segmenters to reduce the overall bitline capacitance.
The success of this cache relies on accessing segments near the sense-amps much more often than remote segments. We show that the access pattern to the first level data and instruction cache is extremely skewed. Only a small set of cache lines are accessed frequently. We exploit this non-uniform cache access pattern by mapping the frequently accessed cache lines closer to the sense amp. These lines are isolated by segmenting circuits on the bitlines and hence dissipate lesser power when accessed.
Modifications to the address decoder enable a dynamic re-mapping of cache lines to segments. In this paper, we explore the design-space of segmenting the level one data and instruction caches. Instruction and data caches show potential power savings of 10% and 6% respectively on the subset of benchmarks simulated.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Amrutur, B.S., Horowitz, M.A.: Speed and power scaling of srams. IEEE Journal of Solid-State Circuits 35(2), 175–185 (2000)
Bradley, D., Mahoney, P., Stackhouse, B.: The 16kb single-cycle read acess cache on a next-generation 64b itanium microprocessor. In: International Solid State Cirtuits Conference (2002)
Burger, D.C., Austin, T.M.: The simplescalar tool set, version 2.0. Technical Report CS-TR-1997-1342, University of Wisconsin, Madison (June 1997)
Ghose, K., Kamble, M.B.: Reducing power in superscalar processor caches using subbanking, multiple line buffers and bit-line segmentation. In: International Symposium on Low Power Electronics and Design, pp. 70–75 (1999)
Lau, J., Schoenmackers, S., Calder, B.: Transition phase classification and prediction. In: 11th International Symposium on High Performance Computer Architecture (February 2005)
Rabaey, J.M.: Digital integrated circuits: A design perspective (1996)
Sherwood, T., Perelman, E., Hamerly, G., Calder, B.: Automatically characterizing large scale program behavior. In: 10th International Conference on Architectural Support for Programming Languages and Operating Systems (October 2002)
Wilton, S.J., Jouppi, N.P.: Cacti: An enhanced cache access and cycle time model. IEEE Journal of Solid-State Circuits (May 1996)
Yang, B.-D., Kim, L.-S.: A low-power sram using hierarchical bit line and local sense amplifiers. IEEE Journal of Solid-State Circuits (June 2005)
Yang, S.-H., Falsafi, B.: Near-optimal precharging in high-performance nanoscale cmos caches. In: 36th International Symposium on Microarchitecture (2003)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Rao, R., Wenck, J., Franklin, D., Amirtharajah, R., Akella, V. (2006). Segmented Bitline Cache: Exploiting Non-uniform Memory Access Patterns. In: Robert, Y., Parashar, M., Badrinath, R., Prasanna, V.K. (eds) High Performance Computing - HiPC 2006. HiPC 2006. Lecture Notes in Computer Science, vol 4297. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11945918_17
Download citation
DOI: https://doi.org/10.1007/11945918_17
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-68039-0
Online ISBN: 978-3-540-68040-6
eBook Packages: Computer ScienceComputer Science (R0)