Elsevier

Journal of Systems Architecture

Volume 54, Issues 1–2, January–February 2008, Pages 161-176
Journal of Systems Architecture

A small data cache for multimedia-oriented embedded systems

https://doi.org/10.1016/j.sysarc.2007.04.006Get rights and content

Abstract

This paper proposes a data cache with small space for low power, but high performance on multimedia applications. The basic architecture is a split-cache consisting of a direct-mapped cache with small block size (DMC) and a fully-associative buffer with large block size (FAB). To overcome the disadvantage caused by small cache areas, two hardware mechanisms are enhanced considering the operational behaviors of multimedia applications: an adaptive multi-block prefetching to initiate various fetch sizes for FAB and an efficient block filtering to remove the data likely to be rarely reused for DMC. The simulations on MediaBench show that the proposed 5kB cache can achieve up to 57% and 50% of power saving while providing almost equal and better performance compared with the 16kB 4-way set associative cache and 17kB stream caches, respectively.

Introduction

Recently, with the dominant popularity of multimedia applications, the multimedia service delivery platforms are moving rapidly from desktop PCs to mobile platforms such as PDAs and smart-phones. As a result, multimedia devices have become one of the most growing areas in the embedded market. On the whole, multimedia applications incorporate a number of algorithms requiring the high computational complexity with heavy memory access. Therefore, the need for high performance embedded processors is increasing rapidly, simultaneously with widely spreading mobile devices. However, a unilateral approach of simply aiming at high performance could make it difficult to achieve the goal in portable multimedia devices, because they inherently have several limitations that must be considered at the design stage: low computational power, low memory area, short battery life, miniaturization requirements, real-time processing, and so on.

Multimedia embedded systems for data intensive applications require large memory bus bandwidth. However, since the memory resources and the bus bandwidth are limited, cache memories can still play an important role to bridge the performance gap between a high-speed microprocessor and low-speed main memory. Especially, considering the circumstance that datasets of multimedia applications increase in size and complexity, designing an effective and application specific memory system is becoming an important issue to efficiently move data within the memory hierarchy to reduce the overall memory access latency [1], [2]. This can be done in the optimum way by exploiting the predictable memory access patterns inherent in multimedia applications.

On conventional applications, the performance of a cache could be improved by the increase of the cache size, which can result in a high cache hit ratio. Unfortunately, in embedded systems, increasing cache capacity is not an adequate approach for high performance due to area concerns, which may increase power consumed as well as cost. The gravity of this issue can be recognized easily from the example of StrongARM 110 [3] which dissipates 42% of its total power in caches. Consequently, a small cache size is recommended for designing power efficient embedded systems. However, it may cause the issue of low cache hit rates which can decrease performance and increase power dissipation. In order to compensate the above drawback, various prefetching schemes have been proposed [4], [5], [6]. Within a limited cache space, an aggressive prefeching policy could increase cache pollution which may degrade the overall system performance. Especially in the multimedia applications, because unit-stride memory accesses may occur frequently, a policy for continual data prefetching with specialized stride intervals could be encouraged to increase overall system performance. However, it may have the possibility to introduce contiguous cache pollutions caused by data prefecthing with mismatched stride intervals.

In order to overcome these correlative memory performance issues found in data caches for multimedia embedded systems, this paper proposes a novel data cache architecture with small area combining a conventional split-cache with two hardware enhancements for prefetching and filtering. The fundamental issue is balancing all the components under a limited area to produce an energy efficient and high performance cache suitable for multimedia-oriented embedded systems. The solution is provided by exploiting the operational behaviors stemming from the algorithmic characteristics of multimedia applications. More importantly, they do not slow down the cache access, nor require excessive amount of extra hardware. The performance is evaluated with three metrics: cache size, access latencies, and energy consumption.

The remainder of this paper is organized as follows. Related work is provided in Section 2. Section 3 describes the architectural and operational characteristics of the proposed cache. Section 4 presents our simulation results on the performance and the energy consumption. Finally, conclusions are given in Section 5.

Section snippets

Related work

Contrary to the popularity of multimedia applications, there have been limited evaluations on the cache performance related with them. Furthermore, some studies [7], [8], [9] express that data caches are not useful for them, because of the streaming nature of the data with less temporal locality, non-sequential locality of access, and the frequent access to large data sets that cannot be stored in the first-level cache. On the other hand, the recent studies suggest that multimedia applications

Proposed data cache systems

In this section, the architectural characteristics of the proposed data cache are presented along with its design motivation and operational model. Also, the details on how the proposed data cache is optimized for multimedia applications are introduced.

Performance evaluations

The details of the simulation environment, performance metrics, and the aspect of area and power are presented in this section. We used SimpleScalar/ARM processor simulator [17] to collect runtime information on three benchmarks: MediaBench [18], MiBench [19], and SPEC2000 for representing embedded multimedia and communications applications, general embedded applications, and conventional applications, respectively. Only data references are collected and utilized for simulations. We modified

Conclusion

This paper proposes a data cache for low power and high performance multimedia-oriented embedded processors. The basic strategy for achieving the objective is to design a small size of cache employing two modules: a direct-mapped cache with small block size for temporal locality and a fully-associative buffer with large block size for spatial locality. In addition to that, two hardware enhancements are designed on the base of the behavioral characteristics of multimedia applications: an

Acknowledgement

This work has been supported by the BK21 Research Center for Intelligent Mobile Software at Yonsei University in Korea.

References (21)

  • P.R. Panda, N.D. Dutt, and A. Nicolau, Architectural exploration and optimization of local memory in embedded systems,...
  • W. Shiue et al.

    Data memory design and exploration for low-power embedded systems

    ACM Trans. Des. Automat. Electro. Syst.

    (2001)
  • S. Santhanam, StrongARM SA110 - A 160MHz 32b 0.5W CMOS ARM Processor, Hot Chips 8: A Symposium on High-Performance...
  • P. Struik, P. van der Wolf, A.D. Pimentel. A combined hardware/software solution for stream prefetching in multimedia...
  • D.F. Zucker et al.

    Hardware and software cache prefetching techniques for MPEG benchmarks

    IEEE Trans. Circ. Syst Video Technol.

    (2000)
  • R. Cucchiara et al.

    Neighbor cache prefetching for multimedia image and video processing

    IEEE Trans. Multimedia

    (2004)
  • I. Kuroda, T. Nishitani, Multimedia processors, in: Proceedings of the IEEE 86 (6), June 1998, pp....
  • S. Rixner, W.J. Dally, U.J. Kapasi, P. Mattson, and J.D. Owens, Memory access scheduling, in: Proceedings of the 27th...
  • K. Diefendorff et al.

    How multimedia workloads will change processor design

    Computer

    (1997)
  • N.T. Slingerland, A.J. Smith, Cache performance for multimedia application, in: Proceedings of the 15th International...
There are more references available in the full text version of this article.

Cited by (2)

1

Current address: Department of Computer Science, Namseoul University, CheonAn-Si, ChoongNam 330-707, Republic of Korea.

View full text