Abstract
Memory bandwidth becomes more and more important in the forthcoming 10 billion transistors chip times. This paper discusses and implements a memory bandwidth effective cache store miss policy. Although the write-allocate policy is adopted, we find it is possible not to load the full cache block from lower memory hierarchy when cache store miss occurs, if the cache block is fully modified before any load instruction accesses the un-modified data of the same cache block. This cache store miss policy will partly reduce the pressure on memory bandwidth, and improve the cache hit rate. We provides a hardware mechanism, Store Merge Buffer, to implement the policy in Goodson-2 processor. Our experiments demonstrate the encouraging results: Memory bandwidth improved by almost 50% (tested by stream benchmark), and IPC on SPEC CPU2K improved by 9.4% on average.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Tullsen, D.M., Eggers, S.J., Levy, H.M., et al.: Simultaneous Multithreading: Maximizing On-Chip Parallelism. In: 22nd Annual International Symposium on Computer Architecture (1995)
Hu, W., Tang, Z.: Microarchitecture design of the Godson-1 processor. Chinese Journal of Computers, 385–396 (April 2003) (in Chinese)
Hu, W.-W., Zhang, F.-X., Li, Z.-S.: Microarchitecture of the Godson-2 Processor. Journal of Computer Science and Technology 20(2) (March 2005)
Patterson, D., Hennessy, J.: Computer Architecture: A Quantitative Approach. Morgan Kaufmann Publishers, Inc., San Francisco (1996)
McCalpin, J.D.: STREAM: Sustainable Memory Bandwidth in High Performance Computers, http://www.cs.virginia.edu/stream/
Yeager, K.: The MIPS R10000 superscalar microprocessor. IEEE Micro 16, 28–41 (1996)
Kessler, R.: The Alpha 21264 microprocessor. IEEE Micro 19, 24–36 (1999)
Burger, D., Goodman, J.R., Kagi, A.: Memory Bandwidth Limitations of Future Microprocessors. ISCA, 78–89 (1996)
Chen, T.-F., Baer, J.-L.: A performance study of software and hardware data prefetching schemes. In: The 21st Annual International Symposium on Computer Architecture, pp. 223–232 (1994)
Wulf, W., McKee, S.: Hitting the Memory Wall: Implications of the Obvious. ACM Computer Architecture News 23(1), 20–24 (1995)
IBM Microelectronics and Motorola Corporation, PowerPC Microprocessor Family: The Programming Environments, Motorola Inc., (1994)
Jouppi, N.: Cache Write Policies and Performance. ACM SIGARCH Computer Architecture News 21(2), 191–201 (1993)
Henning, J.L.: SPEC CPU 2000: Measuring CPU Performance in the new millennium. IEEE Computer (July 2000)
Hu, S., John, L.: Avoiding Store Misses to Fully Modified Cache Blocks. Submitted to EURO-PAR (October 2005)
Huh, J., Burger, D., Keckler, S.: Exploring the design space of future CMPs. In: The 10th International Conference on Parallel Architectures and Compilation Techniques, September 2001, pp. 199–210 (2001)
Burger, D., Goodman, J.R.: Billion-transistor architectures: there and back again. Computer 37, 22–28 (2004)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Rui, H., Zhang, F., Hu, W. (2005). A Memory Bandwidth Effective Cache Store Miss Policy. In: Srikanthan, T., Xue, J., Chang, CH. (eds) Advances in Computer Systems Architecture. ACSAC 2005. Lecture Notes in Computer Science, vol 3740. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11572961_61
Download citation
DOI: https://doi.org/10.1007/11572961_61
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-29643-0
Online ISBN: 978-3-540-32108-8
eBook Packages: Computer ScienceComputer Science (R0)