No abstract available.
Proceeding Downloads
PFetch: software prefetching exploiting temporal predictability of memory access streams
CPU speeds have increased faster than the rate of improvement in memory access latencies in the recent past. As a result, with programs that suffer excessive cache misses, the CPU will increasingly be stalled waiting for the memory system to provide the ...
Modeling of cache access behavior based on Zipf's law
Recently, chip multiprocessors (CMPs) that can simultaneously execute multiple workloads using multiple cores have become a key to achieve high-performance processing. To improve CMP performance, various shared resource management mechanisms have been ...
Zero loads: canceling load requests by tracking zero values
The considerable gap between processor and DRAM speed and the power losses in the cache hierarchy calls for more efficient approaches. Broadly speaking, cache-hierarchy efficiency can be increased either by improving cache management or by reducing the ...
A shared cache for a chip multi vector processor
This paper discusses the design of a chip multi vector processor (CMVP), especially examining the effects of an on-chip cache when the off-chip memory bandwidth is limited. As chip multiprocessors (CMPs) have become the mainstream in commodity scalar ...
A leakage-aware cache sharing technique for low-power chip multi-processors (CMPs) with private L2 caches
Power dissipation becomes an important issue in modern microprocessors such as chip multiprocessors (CMPs). Especially as the process technology advances below 90nm, the leakage power consumption becomes dominant in the total power dissipation, thus ...
Predictable dynamic instruction scratchpad for simultaneous multithreaded processors
For precise timing analysis of hard-real applications a predictable memory system is of particular importance. Caches have a great impact on performance, but at the cost of reduced timing predictability. Conventional scratchpads, i.e. statically managed ...
Exploiting multithreaded architectures to improve the hash join operation
As database management systems gain importance in our everyday life, it is essential to have efficient implementations of important database operations such as the hash join. Improvements in processor architectures including simultaneous multithreaded ...
Accurate system-level performance modeling and workload characterization for mobile internet devices
As mobile applications and devices become ubiquitous, consumer demands for performance, power efficiency, and connectivity are increasing. The software framework existing on mobile internet devices is a complex interaction of real-time tasks, non-real-...
WormBench: a configurable workload for evaluating transactional memory systems
Transactional Memory (TM) is a promising new technology that makes it possible to ease writing multi-threaded applications. Many different TM implementations exist, unfortunately most of those TM systems are currently evaluated by using workloads that ...
Version management alternatives for hardware transactional memory
Transactional Memory is a promising parallel programming model that addresses the programmability issues of lock-based applications using mechanisms that are transparent to developers. Hardware Transactional Memory (HTM) implements these mechanisms in ...
Evaluation of memory performance on the cell BE with the SARC programming model
With the advent of multicore architectures, especially with the heterogeneous ones, both computational and memory top performance are difficult to obtain using traditional programming models. Usually, programmers have to fully reorganize the code and ...
Recommendations
Acceptance Rates
Year | Submitted | Accepted | Rate |
---|---|---|---|
MEDEA '06 | 9 | 6 | 67% |
Overall | 9 | 6 | 67% |