Skip to main content
Log in

A shared libraries aware and bank partitioning-based mechanism for multicore architecture

  • Optimization
  • Published:
Soft Computing Aims and scope Submit manuscript

Abstract

Dynamic random-access memory (DRAM) consists of several banks, which are shared resources among cores. Memory interference is caused by sharing with banks among cores, which results in overall system performance reduction. This will exacerbate the problem because shared libraries are commonly used in modern operating systems. The physical memories used by shared libraries are often distributed throughout all banks in DRAM, and shared library codes are regularly run. This will result in a large number of row-buffer conflicts and a decrease in system performance. This paper proposes a new shared library awareness and bank partitioning-based mechanism (SBM) that takes into account inter-thread interference caused by shared libraries and assigns allocated DRAM banks to specific cores rather than processes, thus taking advantage of bank-level parallelism (BLP) and improving system performance isolation. We conducted several experiments to assess the degree of performance isolation achieved by SBM. The findings indicate that SBM significantly enhanced performance isolation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Data availability statement

The datasets generated and analyzed during the current study are available from the corresponding author upon reasonable request.

Notes

  1. The Linux kernel function allocate_pages is responsible for handling the request for a group of contiguous page frames.

  2. The Linux kernel function __rmqueue is used to allocate the requested page frames from the buddy system.

  3. The size of the array is 512MB.

  4. The size of the row buffer is 8KB on our experimental platform.

References

  • AMD (2013) Bios and kernel developer’s guide (bkdg) for amd family 10h processors. https://www.amd.com/system/files/TechDocs/31116.pdf

  • Bao Y, Chen M, Ruan Y, et al (2008) HMTT: a platform independent full-system memory trace monitoring system. In: Proceedings of the 2008 ACM SIGMETRICS international conference on measurement and modeling of computer systems, SIGMETRICS 2008, Annapolis, MD, USA, June 2–6, 2008. ACM, pp 229–240

  • Beazley DM, Ward BD, Cooke IR (2001) The inside story on shared libraries and dynamic loading. Comput Sci Eng 3(5):90–97

    Article  Google Scholar 

  • Chandru V, Mueller F (2016) Reducing noc and memory contention for manycores. In: Architecture of Computing Systems - ARCS 2016 - 29th international conference, Nuremberg, Germany, April 4–7, 2016, Proceedings, lecture notes in computer science, vol 9637. Springer, pp 293–305

  • de Cock Buning M, de Bruin R (2017) Autonomous intelligent cars: proof that the EPSRC principles are future-proof. Connect Sci 29(3):189–199

    Article  Google Scholar 

  • Do H, Hayot-Sasson V, da Silva RF, et al (2021) Modeling the linux page cache for accurate simulation of data-intensive applications. In: IEEE International conference on cluster computing, CLUSTER 2021, Portland, OR, USA, September 7-10, 2021. IEEE, pp 398–408

  • Fang J, Lu J, Cai M (2015) Bank partitioning based adaptive page policy in multi-core memory systems. In: International symposium on distributed computing and applications for business engineering and science (DCABES). IEEE, pp 240–243

  • Fang J, Wang M, Wei Z (2020) A memory scheduling strategy for eliminating memory access interference in heterogeneous system. J Supercomput 76(4):3129–3154

    Article  Google Scholar 

  • Farshchi F, Valsan PK, Mancuso R, et al (2018) Deterministic memory abstraction and supporting multicore system architecture. In: 30th Euromicro conference on real-time systems, ECRTS 2018, July 3–6, 2018, Barcelona, Spain, LIPIcs, vol 106. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, pp 1:1–1:25

  • Goonasekera N, Caelli WJ, Fidge CJ (2015) Libvm: an architecture for shared library sandboxing. Softw Pract Exp 45(12):1597–1617

    Article  Google Scholar 

  • Hassan M, Kaushik AM, Patel HD (2015) Reverse-engineering embedded memory controllers through latency-based analysis. In: 21st IEEE real-time and embedded technology and applications symposium, Seattle, WA, USA, April 13–16, 2015. IEEE Computer Society, pp 297–306

  • Helm C, Akiyama S, Taura K (2020) Reliable reverse engineering of intel DRAM addressing using performance counters. In: 28th international symposium on modeling, analysis, and simulation of computer and telecommunication systems, MASCOTS 2020, Nice, France, November 17-19, 2020. IEEE, pp 1–8

  • Ikeda T, Kise K (2013) Application aware DRAM bank partitioning in CMP. In: 19th IEEE international conference on parallel and distributed systems, ICPADS 2013, Seoul, Korea, December 15–18, 2013. IEEE Computer Society, pp 349–356

  • Jeong MK, Yoon DH, Sunwoo D, et al (2012) Balancing DRAM locality and parallelism in shared memory CMP systems. In: 18th IEEE international symposium on high performance computer architecture, HPCA 2012, New Orleans, LA, USA, 25–29 February, 2012. IEEE Computer Society, pp 53–64

  • Jia G, Li X, Yuan Y, et al (2014) Pseudonuma for reducing memory interference in multi-core systems. In: 2014 Spring simulation multiconference, SpringSim ’14, Tampa, FL, USA, April 13-16, 2014, Proceedings of the high performance computing symposium. ACM, p 6

  • Jiao J, Wang L, Li Y et al (2021) CASH: correlation-aware scheduling to mitigate soft error impact on heterogeneous multicores. Connect Sci 33(2):113–135

    Article  Google Scholar 

  • Jonggyu P, Oh K, Eom YI (2020) Towards application-level i/o proportionality with a weight-aware page cache management. In: 36th international conference on massive storage systems and technology, MSST 2020, October 29–30, 2020, pp 1–11

  • Jung D, Li S, Ahn JH (2016) Large pages on steroids: small ideas to accelerate big memory applications. IEEE Comput Archit Lett 15(2):101–104

    Article  Google Scholar 

  • Kim H, Kandhalu A, Rajkumar R (2013) A coordinated approach for practical os-level cache management in multi-core real-time systems. In: 25th Euromicro conference on real-time systems, ECRTS 2013, Paris, France, July 9–12, 2013. IEEE Computer Society, pp 80–89

  • Kim H, de Niz D, Andersson B, et al (2014) Bounding memory interference delay in cots-based multi-core systems. In: 20th IEEE real-time and embedded technology and applications symposium, RTAS 2014, Berlin, Germany, April 15-17, 2014. IEEE Computer Society, pp 145–154

  • Kim N, Chisholm M, Otterness N, et al (2017) Allowing shared libraries while supporting hardware isolation in multicore real-time systems. In: 2017 IEEE real-time and embedded technology and applications symposium, RTAS 2017, Pittsburg, PA, USA, April 18-21, 2017. IEEE Computer Society, pp 223–234

  • Liu L, Cui Z, Xing M, et al (2012) A software memory partition approach for eliminating bank-level interference in multicore systems. In: International conference on parallel architectures and compilation techniques, PACT ’12, Minneapolis, MN, USA - September 19 - 23, 2012. ACM, pp 367–376

  • Liu W, Zhou K, Huang P et al (2021) Rbc: a memory architecture for improved performance and energy efficiency. Tsinghua Sci Technol 26(3):347–360

    Article  Google Scholar 

  • Liu Y, Lu J, Tong D, et al (2017) Locality-aware bank partitioning for shared DRAM mpsocs. In: 22nd Asia and south pacific design automation conference, ASP-DAC 2017, Chiba, Japan, January 16–19, 2017. IEEE, pp 770–775

  • Margaritov A, Ustiugov D, Bugnion E, et al (2019) Prefetched address translation. In: Proceedings of the 52nd annual IEEE/ACM international symposium on microarchitecture, MICRO 2019, Columbus, OH, USA, October 12–16, 2019. ACM, pp 1023–1036

  • Margaritov A, Ustiugov D, Shahab A, et al (2021) Ptemagnet: fine-grained physical memory reservation for faster page walks in public clouds. In: Sherwood T, Berger ED, Kozyrakis C (eds) ASPLOS ’21: 26th ACM international conference on architectural support for programming languages and operating systems, Virtual Event, USA, April 19–23, 2021. ACM, pp 211–223

  • Mi W, Feng X, Xue J, et al (2010) Software-hardware cooperative DRAM bank partitioning for chip multiprocessors. In: Network and parallel computing, IFIP international conference, NPC 2010, Zhengzhou, China, September 13-15, 2010. Proceedings, Lecture Notes in Computer Science, vol 6289. Springer, pp 329–343

  • Pan X, Mueller F (2018) Controller-aware memory coloring for multicore real-time systems. In: Proceedings of the 33rd annual ACM symposium on applied computing, SAC 2018, Pau, France, April 09-13, 2018. ACM, pp 584–592

  • Pan X, Gownivaripalli YJ, Mueller F (2016) Tintmalloc: reducing memory access divergence via controller-aware coloring. In: 2016 IEEE international parallel and distributed processing symposium, IPDPS 2016, Chicago, IL, USA, May 23–27, 2016. IEEE Computer Society, pp 363–372

  • Panwar A, Gopinath K (2015) Towards practical page placement for a green memory manager. In: 22nd IEEE international conference on high performance computing, HiPC 2015, Bengaluru, India, December 16-19, 2015. IEEE Computer Society, pp 155–164

  • Park H, Baek S, Choi J, et al (2013) Regularities considered harmful: forcing randomness to memory accesses to reduce row buffer conflicts for multi-core, multi-bank systems. In: Architectural support for programming languages and operating systems, ASPLOS ’13, Houston, TX, USA - March 16 - 20, 2013. ACM, pp 181–192

  • Pessl P, Gruss D, Maurice C, et al (2016) DRAMA: exploiting DRAM addressing for cross-cpu attacks. In: 25th USENIX security symposium, USENIX Security 16, Austin, TX, USA, August 10-12, 2016. USENIX Association, pp 565–581

  • Qiang W, Cao Y, Dai W, et al (2017) Libsec: A hardware virtualization-based isolation for shared library. In: 19th IEEE international conference on high performance computing and communications; 15th IEEE international conference on smart city; 3rd IEEE International Conference on Data Science and Systems, HPCC/SmartCity/DSS 2017, Bangkok, Thailand, December 18-20, 2017. IEEE Computer Society, pp 34–41

  • SPEC (2018) Spec cpu 2006. https://www.spec.org/cpu2006/

  • Suzuki N, Kim H, de Niz D, et al (2013) Coordinated bank and cache coloring for temporal protection of memory accesses. In: 16th IEEE international conference on computational science and engineering, CSE 2013, December 3–5, 2013, Sydney, Australia. IEEE Computer Society, pp 685–692

  • Suzuki N, Kim H, de Niz D, et al (2013) Coordinated bank and cache coloring for temporal protection of memory accesses. In: 16th IEEE international conference on computational science and engineering, CSE 2013, December 3-5, 2013, Sydney, Australia. IEEE Computer Society, pp 685–692

  • Wang X, Wang X, Zhu F, et al (2016) Mei: a light weight memory error injection tool for validating online memory testers. In: 2016 International symposium on system and software reliability (ISSSR) pp 129–136

  • Wu Y, Sathyanarayan S, Yap RHC, et al (2012) Codejail: Application-transparent isolation of libraries with tight program interactions. In: Computer security - ESORICS 2012 - 17th European symposium on research in computer security, Pisa, Italy, September 10–12, 2012. Proceedings, Lecture Notes in Computer Science, vol 7459. Springer, pp 859–876

  • Xiao Y, Zhang X, Zhang Y, et al (2016) One bit flips, one cloud flops: cross-vm row hammer attacks and privilege escalation. In: 25th USENIX security symposium, USENIX security 16, Austin, TX, USA, August 10-12, 2016. USENIX Association, pp 19–35

  • Xie M, Tong D, Feng Y, et al (2013) Page policy control with memory partitioning for DRAM performance and power efficiency. In: International symposium on low power electronics and design (ISLPED), Beijing, China, September 4-6, 2013. IEEE, pp 298–303

  • Xie M, Tong D, Huang K, et al (2014) Improving system throughput and fairness simultaneously in shared memory CMP systems via dynamic bank partitioning. In: 20th IEEE international symposium on high performance computer architecture, HPCA 2014, Orlando, FL, USA, February 15-19, 2014. IEEE Computer Society, pp 344–355

  • Xu L, Yu R, Wang L et al (2019) Memway: in-memorywaylaying acceleration for practical rowhammer attacks against binaries. Tsinghua Sci Technol 24(5):535–545

    Article  Google Scholar 

  • Yun H, Mancuso R, Wu ZP, et al (2014) PALLOC: DRAM bank-aware memory allocator for performance isolation on multicore platforms. In: 20th IEEE real-time and embedded technology and applications symposium, RTAS 2014, Berlin, Germany, April 15–17, 2014. IEEE Computer Society, pp 155–166

  • Zhao H, Yao L, Zeng Z et al (2021) An edge streaming data processing framework for autonomous driving. Connect Sci 33(2):173–200

    Article  Google Scholar 

  • Zheng H, Lin J, Zhang Z et al (2008) Mini-rank: adaptive DRAM architecture for improving memory power efficiency. In: 41st annual IEEE/ACM international symposium on microarchitecture (MICRO-41 2008), November 8–12, 2008. Italy. IEEE Computer Society, Lake Como, pp 210–221

Download references

Acknowledgements

The authors are grateful to the editor and anonymous referees for their valuable comments and helpful suggestions which helped to improve the presentation of this paper.

Funding

This work was partially supported by National Key R &D Program of China under Grant No. 2020YFC0832500, National Natural Science Foundation of China under Grant No. 61402210, the Fundamental Research Funds for the Central Universities under Grant No. lzujbky-2021-sp57 and lzujbky-2021-sp43, and the Gansu Provincial Science and Technology Major Special Innovation Consortium Project under Grant No. 21ZD3GA002.

Author information

Authors and Affiliations

Authors

Contributions

All authors contributed to the study’s conception and design. Material preparation, data collection, and analysis were performed by HY, SX, YC, GL, and KCL. Funding acquisition, resources, supervision were performed by RZ and QZ. The first draft of the manuscript was written by HY, and all authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.

Corresponding authors

Correspondence to Rui Zhou or Qingguo Zhou.

Ethics declarations

Conflict of interest

The authors have no competing interests to declare relevant to this article’s content.

Ethical approval

This article does not contain any studies with human participants or animals performed by any authors.

Informed consent

Not applicable.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yang, H., Xu, S., Chen, Y. et al. A shared libraries aware and bank partitioning-based mechanism for multicore architecture. Soft Comput 27, 8775–8787 (2023). https://doi.org/10.1007/s00500-023-08020-3

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00500-023-08020-3

Keywords

Navigation