PARBLO: Page-Allocation-Based DRAM Row Buffer Locality Optimization

Mi, Wei; Feng, Xiao-Bing; Jia, Yao-Cang; Chen, Li; Xue, Jing-Ling

doi:10.1007/s11390-009-9297-1

PARBLO: Page-Allocation-Based DRAM Row Buffer Locality Optimization

Regular Paper
Published: 06 November 2009

Volume 24, pages 1086–1097, (2009)
Cite this article

Journal of Computer Science and Technology Aims and scope Submit manuscript

Wei Mi^1,2,
Xiao-Bing Feng¹,
Yao-Cang Jia¹,
Li Chen¹ &
…
Jing-Ling Xue³

121 Accesses
1 Citation
Explore all metrics

Abstract

DRAM row buffer conflicts can increase memory access latency significantly. This paper presents a new page-allocation-based optimization that works seamlessly together with some existing hardware and software optimizations to eliminate significantly more row buffer conflicts. Validation in simulation using a set of selected scientific and engineering benchmarks against a few representative memory controller optimizations shows that our method can reduce row buffer miss rates by up to 76% (with an average of 37.4%). This reduction in row buffer miss rates will be translated into performance speedups by up to 15% (with an average of 5%).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

HSCS: a hybrid shared cache scheduling scheme for multiprogrammed workloads

Article 04 June 2018

A Multi-core Memory Organization for 3-D DRAM as Main Memory

Main memory controller with multiple media technologies for big data workloads

Article Open access 22 May 2023

References

McKee S A, Wulf W A, Aylor J H et al. Dynamic access ordering for streamed computations. IEEE Trans. Computers, 2000, 49(11): 1255–1271.
Article Google Scholar
Rixner S, Dally W J, Kapasi U J, Mattson P R, Owens J D. Memory access scheduling. In Proc. ISCA 2000, Vancouver, Canada, June 10–14, pp.128–138.
Scott Rixner. Memory controller optimizations for Web servers. In Proc. MICRO 2004, Portland, USA, Dec. 4–8, pp.355–366.
Shao J, Davis B T. A burst scheduling access reordering mechanism. In Proc. HPCA 2007, Phoenix, USA, Feb. 10–14, 2007, pp.285–294.
Zhang Z, Zhu Z, Zhang X. A permutation-based page interleaving scheme to reduce row-buffer conflicts and exploit data locality. In Proc. MICRO 2000, Montery, USA, Dec. 10–13, 2000, pp.32–41.
Lin W F, Reinhardt S K, Burger D. Reducing DRAM latencies with an integrated memory hierarchy design. In Proc. HPCA 2001, Nuevo Leone, Mexico, Jan. 20–24, pp.301–312.
Shin J, Chame J, Hall M W. A compiler algorithm for exploiting page-mode memory access in embedded-DRAM devices. In Proc. the 4th Workshop on Media and Streaming Processors, Istanbul, Turkey, Nov. 18–19, November 2002.
Ding C, Kennedy K. Improving effective bandwidth through compiler enhancement of global cache reuse. In Proc. IPDPS 2001, San Francisco, USA, April 23–27, 2001, p.38.
Jacob B, Ng S W, Wang D T. With Contributions by Samuel Rodriguez, Memory Systems: Cache, DRAM, Disk. ISBN 978-0-12-379751-3, Morgan Kaufmann Publishers, September 2007.
Mutlu O, Moscibroda T. Parallelism-aware batch scheduling: Enhancing both performance and fairness of shared DRAM systems. In Proc. ISCA 2008, Beijing, China, June 21–25, 2008, pp.63–74.
Kessler R E, Hill M D. Page placement algorithms for large real-indexed caches. ACM Trans. Comput. Syst., 1992, 10(4): 338–359.
Article Google Scholar
McDougall R, Mauro J. Solaris Internals: Solaris 10 and OpenSolaris Kernel Architecture. Sun Microsystems Press, Prentice Hall, 2006.
Ishizaka K, Obata M, Kasahara H. Cache optimization for coarse grain task parallel processing using inter-array padding. In Proc. LCPC 2003, College Station, USA, Oct. 2–4, 2003, pp.64–76.
Bahadur S, Kalyanakrishnan V, Westall J. An empirical study of the effects of careful page placement in Linux. In Proc. ACM Southeast Regional Conference, Marietta, USA, April 1–3, 1998, pp.241–250.
Ding C, Zhong Y. Predicting whole-program locality through reuse distance analysis. In Proc. PLDI 2003, San Diego, USA, June 9–11, 2003, pp.245–257.
Mowry T C, Lam M S. Anoop gupta: Design and evaluation of a compiler algorithm for prefetching. In Proc. ASPLOS 1992, Boston, USA, Oct. 12–15, 1992, pp.62–73.
Horwitz S, Reps T W, Binkley D. Interprocedural slicing using dependence graphs. In Proc. PLDI 1988, Atlanta, USA, June 22–24, 1988, pp.35–46.
Zhang Z, Zhu Z, Zhang X. Breaking address mapping symmetry at multi-levels of memory heirarchy to reduce DRAM row-buffer conflicts. Journal of Instruction-Level Parallelism, 2001, 3.
Naveen Neelakantam, Colin Blundell, Joe Devietti, Milo M K Martin, Craig Zilles. FeS2: A full-system execution-driven Simulator for x86. Poster session of ASPLOS 2008, Seattle, USA, March 1–5, 2008.
Wang D, Ganesh B, Tuaycharoen N, Baynes K, Jaleel A, Jacob B. DRAMsim: A memory-system simulator. SIGARCH Computer Architecture News, September 2005, 33(4): 100–107.
Article Google Scholar
Micron. DDR2 SDRAM Datasheet.
Hur I, Lin C. Adaptive history-based memory schedulers. In Proc. MICRO 2004, Portland, USA, Dec. 4–8, 2004, pp.343–354.
Grun P, Dutt N D, Nicolau A. Memory aware compilation through accurate timing extraction. In Proc. DAC 2000, Los Angeles, USA, June 5–9, 2000, pp.316–321.
Kandemir M T, Yemliha T, Son S W, Ozturk O. Memory bank aware dynamic loop scheduling. In Proc. DATE 2007, Nice, France, April 16–20, 2007, pp.1671–1676.
Chen G, Kandemir M T, Saputra H, Irwin M J. Exploiting bank locality in multi-bank memories. In Proc CASES 2003, San Jose, USA, Oct. 30–Nov. 1, 2003, pp.287–297.
Zheng H, Lin J, Zhang Z, Zhu Z. Memory access scheduling schemes for systems with multi-core processors. In Proc. ICPP 2008, Portland, USA, Sept. 8–12, 2008, pp.406–413.
Nesbit K J, Aggarwal N, Laudon J, Smith J E. Fair queuing memory systems. In Proc. MICRO 2006, Orlando, USA, Dec. 9–13, 2006, pp.208–222.
Rafique N, Lim W T, Thottethodi M. Effective management of DRAM bandwidth in multicore processors. In Proc. PACT 2007, Brasov, Romania, Sept. 15–19, 2007, pp.245–258.
Mutlu O, Moscibroda T. Stall-time fair memory access scheduling for chip multiprocessors. In Proc. MICRO 2007, Chicago, USA, Dec. 1–5, 2007, pp.146–160.
Lee C J, Mutlu O, Narasiman V, Patt Y N. Prefetch-aware DRAM controllers. In Proc. MICRO 2008, Lake Como, Italy, Nov. 8–12, 2008, pp.200–209.
Bugnion E, Anderson J-A M, Mowry T C, Rosenblum M, Lam M S. Compiler-directed page coloring for multiprocessors. In Proc. ASPLOS 1996, Cambridge, USA, Oct. 1–5, 1996, pp.244–255.
Lin J, Lu Q, Ding X, Zhang Z, Zhang X, Sadayappan P. Gaining insights into multicore cache partitioning: Bridging the gap between simulation and real systems. In Proc. HPCA 2008, Salt Lake City, USA, Feb. 16–20, 2008, pp.367–378.

Download references

Author information

Authors and Affiliations

Key Laboratory of Computer System and Architecture, Institution of Computing Technology, Chinese Academy of Sciences, Beijing, 100190, China
Wei Mi, Xiao-Bing Feng (Member, CCF, ACM), Yao-Cang Jia & Li Chen (Member, CCF, ACM)
Graduate University of Chinese Academy of Sciences, Beijing, 100039, China
Wei Mi
Programming Languages and Compilers Group, School of Computer Science and Engineering, University of New South Wales, Sydney, NSW, 2052, Australia
Jing-Ling Xue (Senior Member, IEEE)

Authors

Wei Mi
View author publications
You can also search for this author in PubMed Google Scholar
Xiao-Bing Feng
View author publications
You can also search for this author in PubMed Google Scholar
Yao-Cang Jia
View author publications
You can also search for this author in PubMed Google Scholar
Li Chen
View author publications
You can also search for this author in PubMed Google Scholar
Jing-Ling Xue
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Wei Mi.

Additional information

Supported by the National Basic Research 973 Program of China under Grant No. 2005CB321602, and the National Natural Science Foundation of China under Grant No. 60736012.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Mi, W., Feng, XB., Jia, YC. et al. PARBLO: Page-Allocation-Based DRAM Row Buffer Locality Optimization. J. Comput. Sci. Technol. 24, 1086–1097 (2009). https://doi.org/10.1007/s11390-009-9297-1

Download citation

Received: 12 June 2009
Revised: 29 September 2009
Published: 06 November 2009
Issue Date: November 2009
DOI: https://doi.org/10.1007/s11390-009-9297-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

PARBLO: Page-Allocation-Based DRAM Row Buffer Locality Optimization

Abstract

Access this article

Similar content being viewed by others

HSCS: a hybrid shared cache scheduling scheme for multiprogrammed workloads

A Multi-core Memory Organization for 3-D DRAM as Main Memory

Main memory controller with multiple media technologies for big data workloads

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

PARBLO: Page-Allocation-Based DRAM Row Buffer Locality Optimization

Abstract

Access this article

Similar content being viewed by others

HSCS: a hybrid shared cache scheduling scheme for multiprogrammed workloads

A Multi-core Memory Organization for 3-D DRAM as Main Memory

Main memory controller with multiple media technologies for big data workloads

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation