Skip to main content
Log in

Bounding and reducing memory interference in COTS-based multi-core systems

  • Published:
Real-Time Systems Aims and scope Submit manuscript

Abstract

In multi-core systems, main memory is a major shared resource among processor cores. A task running on one core can be delayed by other tasks running simultaneously on other cores due to interference in the shared main memory system. Such memory interference delay can be large and highly variable, thereby posing a significant challenge for the design of predictable real-time systems. In this paper, we present techniques to reduce this interference and provide an upper bound on the worst-case interference on a multi-core platform that uses a commercial-off-the-shelf (COTS) DRAM system. We explicitly model the major resources in the DRAM system, including banks, buses, and the memory controller. By considering their timing characteristics, we analyze the worst-case memory interference delay imposed on a task by other tasks running in parallel. We find that memory interference can be significantly reduced by (i) partitioning DRAM banks, and (ii) co-locating memory-intensive tasks on the same processing core. Based on these observations, we develop a memory interference-aware task allocation algorithm for reducing memory interference. We evaluate our approach on a COTS-based multi-core platform running Linux/RK. Experimental results show that the predictions made by our approach are close to the measured worst-case interference under workloads with both high and low memory contention. In addition, our memory interference-aware task allocation algorithm provides a significant improvement in task schedulability over previous work, with as much as 96 % more tasksets being schedulable.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15

Similar content being viewed by others

Notes

  1. JEDEC. DDR3 SDRAM Standard. http://www.jedec.org.

  2. The physical structure of priority queues, bank schedulers, and the channel scheduler depends on the implementation. They can be implemented as a single hardware structure (Nesbit et al. 2006) or as multiple decoupled structures (Mutlu and Moscibroda 2007, Mutlu and Moscibroda 2008; Ausavarungnirun et al. 2012).

  3. The DRAM mapping of Fig. 1c is for the single-channel configuration in this system. Section 6 gives more details on this system.

  4. The effect of REF (\(E_{R}\)) in memory interference delay can be roughly estimated as \(E_{R}^{k+1}=\lceil \text {\{(total delay from analysis)}+E_{R}^k\}/t_{REFI}\rceil \cdot t_{RFC}\), where \(E_R^0=0\). For the DDR3-1333 with 2 Gb density below 85\(^{\circ }\), \(t_{RFC}/t_{REFI}\) is \(160\text {ns}/7.8\mu \text {s}=0.02\), so the effect of REF results in only about 2 % increase in the total memory interference delay. A more detailed analysis on REF can be found in Bhat and Mueller (2010).

  5. Micron 2Gb DDR3 Component: MT41J256M8-15E. http://download.micron.com/pdf/datasheets/dram/ddr3/2Gb_DDR3_SDRAM.pdf.

  6. OSEK/VDX OS. http://portal.osek-vdx.org/files/pdf/specs/os223.pdf.

  7. Windriver VxWorks. http://www.windriver.com.

  8. An arbitrary tie-breaking rule can be used to assign a unique priority to each task.

  9. These assumptions will be relaxed in future work.

  10. This assumption is required to bound the re-ordering effect of the memory controller, which will be described in Sect. 4.1.

  11. Note that the write-buffer draining does not completely block read requests until all the write requests are serviced. In a memory controller with write batching, read requests are always exposed to the memory controller, but write requests are exposed to and scheduled by the memory controller only when the write buffer is close to full (Lee et al. 2010). Hence, even when the write buffer is being drained, a read request can be scheduled if its commands are ready with respect to DRAM timing constraints (e.g., read and write requests to different banks).

  12. This is why the DRAM address mapping in Fig. 1c does not have a bit for channel selection.

  13. Linux/RK is available at https://rtml.ece.cmu.edu/redmine/projects/rk.

  14. McCalpin JD. STREAM: Sustainable memory bandwidth in high performance computers. http://www.cs.virginia.edu/stream.

  15. Software cache partitioning simultaneously partitions the entire physical memory space into the number of cache partitions. Therefore the spatial memory requirement of a task determines the minimum number of cache partitions for that task (Kim et al. 2013).

References

  • Akesson B, Goossens K, Ringhofer M (2007) Predator: a predictable SDRAM memory controller. In: IEEE/ACM international conference on hardware/software codesign and system synthesis (CODES+ISSS), 2007

  • Altmeyer S, Davis R, Maiza C (2011) Cache related pre-emption delay aware response time analysis for fixed priority pre-emptive systems. In: IEEE real-time systems symposium (RTSS), 2011

  • Andersson B, Easwaran A, Lee J (2010) Finding an upper bound on the increase in execution time due to contention on the memory bus in COTS-based multicore systems. SIGBED Rev 7(1):4

    Article  Google Scholar 

  • Ausavarungnirun R, Chang KK-W, Subramanian L, Loh GH, Mutlu O (2012) Staged memory scheduling: achieving high performance and scalability in heterogeneous systems. In: International symposium on computer architecture (ISCA), 2012

  • Bhat B, Mueller F (2010) Making DRAM refresh predictable. In: Euromicro conference on real-time systems (ECRTS), 2010

  • Bienia C, Kumar S, Singh JP, Li K (2008) The PARSEC benchmark suite: Characterization and architectural implications. In: International conference on parallel architectures and compilation techniques (PACT), 2008

  • Dasari D, Andersson B, Nelis V, Petters SM, Easwaran A, Lee J (2011) Response time analysis of COTS-based multicores considering the contention on the shared memory bus. In: IEEE international conference on trust, security and privacy in computing and communications, 2011

  • de Niz D, Rajkumar R (2006) Partitioning bin-packing algorithms for distributed real-time systems. Int J Embed Syst 2(3):196–208

    Article  Google Scholar 

  • Ebrahimi E, Lee CJ, Mutlu O, Patt YN (2010) Fairness via source throttling: a configurable and high-performance fairness substrate for multi-core memory systems. In: International conference on architectural support for programming languages and operating systems (ASPLOS), 2010

  • Eswaran A, Rajkumar R (2005) Energy-aware memory firewalling for QoS-sensitive applications. In: Euromicro conference on real-time systems (ECRTS), 2005

  • Jeong MK, Yoon DH, Sunwoo D, Sullivan M, Lee I, Erez M (2012) Balancing DRAM locality and parallelism in shared memory CMP systems. In: IEEE international symposium on high-performance computer architecture (HPCA), 2012

  • Johnson DS, Demers A, Ullman JD, Garey MR, Graham RL (1974) Worst-case performance bounds for simple one-dimensional packing algorithms. SIAM J Comput 3(4):299–325

    Article  MathSciNet  MATH  Google Scholar 

  • Joseph M, Pandya PK (1986) Finding response times in a real-time system. Comput J 29(5):390–395

    Article  MathSciNet  Google Scholar 

  • Kim H, de Niz D, Andersson B, Klein M, Mutlu O, Rajkumar RR (2014) Bounding memory interference delay in COTS-based multi-core systems. In: IEEE real-time technology and applications symposium (RTAS)

  • Kim Y, Han D, Mutlu O, Harchol-Balter M (2010) ATLAS: a scalable and high-performance scheduling algorithm for multiple memory controllers. In: IEEE international symposium on high-performance computer architecture (HPCA), 2010

  • Kim H, Kandhalu A, Rajkumar R (2013) A coordinated approach for practical OS-level cache management in multi-core real-time systems. In: Euromicro conference on real-time systems (ECRTS), 2013

  • Kim H, Kim J, Rajkumar RR. A profiling framework in Linux/RK and its application. In: Open demo session of IEEE real-time systems symposium (RTSS@Work), 2012

  • Kim Y, Papamichael M, Mutlu O, Harchol-Balter M (2010) Thread cluster memory scheduling: exploiting differences in memory access behavior. In: IEEE/ACM international symposium on microarchitecture (MICRO), 2010

  • Kim H, Rajkumar R. Shared-page management for improving the temporal isolation of memory reservations in resource kernels. In: IEEE conference on embedded and real-time computing systems and applications (RTCSA), 2012

  • Krishnapillai Y, Wu ZP, Pellizzoni R (2014) A rank-switching, open-row DRAM controller for mixed-criticality systems. In: Euromicro conference on real-time systems (ECRTS), 2014

  • Lakshmanan K, de Niz D, Rajkumar R, Moreno G (2010) Resource allocation in distributed mixed-criticality cyber-physical systems. In: IEEE international conference on distributed computing systems (ICDCS), 2010

  • Lakshmanan K, Rajkumar R, Lehoczky JP (2009) Partitioned fixed-priority preemptive scheduling for multi-core processors. In: Euromicro conference on real-time systems (ECRTS), 2009

  • Lee CJ, Narasiman V, Ebrahimi E, Mutlu O, Patt YN (2010) DRAM-aware last-level cache writeback: Reducing write-caused interference in memory systems. Technical Report TR-HPS-2010-002, UT Austin, 2010

  • Li Y, Akesson B, Goossens K (2014) Dynamic command scheduling for real-time memory controllers. In: Euromicro conference on real-time systems (ECRTS), 2014

  • Liu L, Cui Z, Xing M, Bao Y, Chen M, Wu C (2012) A software memory partition approach for eliminating bank-level interference in multicore systems. In: International conference on parallel architectures and compilation techniques (PACT), 2012

  • Liu CL, Layland JW (1973) Scheduling algorithms for multiprogramming in a hard-real-time environment. J ACM 20(1):46–61

    Article  MathSciNet  MATH  Google Scholar 

  • Lv M, Nan G, Yi W, Yu G (2010) Combining abstract interpretation with model checking for timing analysis of multicore software. In: IEEE real-time systems symposium (RTSS), 2010

  • Moscibroda T, Mutlu O (2007) Memory performance attacks: denial of memory service in multi-core systems. In: USENIX security symposium, 2007

  • Muralidhara SP, Subramanian L, Mutlu O, Kandemir M, Moscibroda T (2011) Reducing memory interference in multicore systems via application-aware memory channel partitioning. In: IEEE/ACM international symposium on microarchitecture (MICRO), 2011

  • Mutlu O, Moscibroda T (2007) Stall-time fair memory access scheduling for chip multiprocessors. In: IEEE/ACM International symposium on microarchitecture (MICRO), 2007

  • Mutlu O, Moscibroda T (2008) Parallelism-aware batch scheduling: Enhancing both performance and fairness of shared DRAM systems. In: International symposium on computer architecture (ISCA), 2008

  • Nesbit KJ, Aggarwal N, Laudon J, Smith JE (2006) Fair queuing memory systems. In: IEEE/ACM international symposium on microarchitecture (MICRO), 2006

  • Oikawa S, Rajkumar R (1998) Linux/RK: a portable resource kernel in Linux. In: IEEE real-time systems symposium (RTSS) Work-In-Progress, 1998

  • Paolieri M, Quiñones E, Cazorla F, Valero M (2010) An analyzable memory controller for hard read-time CMPs. IEEE Embed Syst Lett 1(4):86–90

    Article  Google Scholar 

  • Paolieri M, Quiñones E, Cazorla F, Davis R, Valero M (2011) IA\(^{3}\): an interference aware allocation algorithm for multicore hard real-time systems. In: IEEE real-time technology and applications symposium (RTAS), 2011

  • Pellizzoni R, Schranzhofer A, Chen J, Caccamo M, Thiele L (2010) Worst case delay analysis for memory interference in multicore systems. In: Design, automation test in europe conference exhibition (DATE), 2010

  • Rajkumar R, Juvva K, Molano A, Oikawa S (1998) Resource kernels: A resource-centric approach to real-time and multimedia systems. In: SPIE/ACM conference on multimedia computing and networking, 1998

  • Reineke J, Liu I, Patel HD, Kim S, Lee EA (2011) PRET DRAM controller: Bank privatization for predictability and temporal isolation. In: IEEE/ACM international conference on hardware/software codesign and system synthesis (CODES+ISSS), 2011

  • Rixner S, Dally WJ, Kapasi UJ, Mattson P, Owens JD (200) Memory access scheduling. In: International symposium on computer architecture (ISCA), 2000

  • Rosén J, Andrei A, Eles P, Peng Z (2007) Bus access optimization for predictable implementation of real-time applications on multiprocessor systems-on-chip. In: IEEE real-time systems symposium (RTSS), 2007

  • Schliecker S, Negrean M, Ernst R (2010) Bounding the shared resource load for the performance analysis of multiprocessor systems. In: Design, automation test in europe conference exhibition (DATE), 2010

  • Seshadri V, Bhowmick A, Mutlu O, Gibbons PB, Kozuch M, Mowry TC, et al. (2014) The dirty-block index. In: International symposium on computer architecture (ISCA), 2014

  • Subramanian L, Lee D, Seshadri V, Rastogi H, Mutlu O (2014) The blacklisting memory scheduler: achieving high performance and fairness at low cost. In: IEEE international conference on computer design (ICCD), 2014

  • Subramanian L, Seshadri V, Ghosh A, Khan S, Mutlu O (2015) The application slowdown model: quantifying and controlling the impact of inter-application interference at shared caches and main memory. In: IEEE/ACM international symposium on microarchitecture (MICRO), 2015

  • Subramanian L, Seshadri V, Kim Y, Jaiyen B, Mutlu O (2013) MISE: providing performance predictability and improving fairness in shared main memory systems. In: IEEE international symposium on high-performance computer architecture (HPCA), 2013

  • Suzuki N, Kim H, de Niz D, Andersson B, Wrage L, Klein M, Rajkumar RR (2103) Coordinated bank and cache coloring for temporal protection of memory accesses. In: IEEE International conference on embedded software and systems (ICESS), 2013

  • Wilhelm R, Grund D, Reineke J, Schlickling M, Pister M, Ferdinand C (2009) Memory hierarchies, pipelines, and buses for future architectures in time-critical embedded systems. IEEE Trans Comput Aided Des Integr Circuits Syst 28(7):966–978

    Article  Google Scholar 

  • Wu ZP, Krish Y, Pellizzoni R (2013) Worst case analysis of DRAM latency in multi-requestor systems. In: IEEE real-time systems symposium (RTSS), 2013

  • Xie M, Tong D, Huang K, Cheng X (2014) Improving system throughput and fairness simultaneously in CMP systems via dynamic bank partitioning. In: IEEE international symposium on high-performance computer architecture (HPCA), 2014

  • Yun H, Mancuso R, Wu Z-P, Pellizzoni R (2014) PALLOC: DRAM bank-aware memory allocator for performance isolation on multicore platforms. In: IEEE real-time technology and applications symposium (RTAS), 2014

  • Yun H, Yao G, Pellizzoni R, Caccamo M, Sha L (2012) Memory access control in multiprocessor for real-time systems with mixed criticality. In: Euromicro conference on real-time systems (ECRTS), 2012

  • Zhang X, Dwarkadas S, Shen K (2009) Hardware execution throttling for multi-core resource management. In: USENIX annual technical conference (USENIX ATC), 2009

  • Zuravleff W, Robinson T (1997) Controller for a synchronous DRAM that maximizes throughput by allowing memory requests and commands to be issued out of order. US Patent Number 5,630,096, 1997

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hyoseung Kim.

Additional information

This material is based upon work funded and supported by the Department of Defense under Contract No. FA8721-05-C-0003 with Carnegie Mellon University for the operation of the Software Engineering Institute, a federally funded research and development center. This material has been approved for public release and unlimited distribution. Carnegie Mellon® is registered in the U.S. Patent and Trademark Office by Carnegie Mellon University. DM-0001596.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kim, H., de Niz, D., Andersson, B. et al. Bounding and reducing memory interference in COTS-based multi-core systems. Real-Time Syst 52, 356–395 (2016). https://doi.org/10.1007/s11241-016-9248-1

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11241-016-9248-1

Keywords

Navigation