A small data cache for multimedia-oriented embedded systems
Introduction
Recently, with the dominant popularity of multimedia applications, the multimedia service delivery platforms are moving rapidly from desktop PCs to mobile platforms such as PDAs and smart-phones. As a result, multimedia devices have become one of the most growing areas in the embedded market. On the whole, multimedia applications incorporate a number of algorithms requiring the high computational complexity with heavy memory access. Therefore, the need for high performance embedded processors is increasing rapidly, simultaneously with widely spreading mobile devices. However, a unilateral approach of simply aiming at high performance could make it difficult to achieve the goal in portable multimedia devices, because they inherently have several limitations that must be considered at the design stage: low computational power, low memory area, short battery life, miniaturization requirements, real-time processing, and so on.
Multimedia embedded systems for data intensive applications require large memory bus bandwidth. However, since the memory resources and the bus bandwidth are limited, cache memories can still play an important role to bridge the performance gap between a high-speed microprocessor and low-speed main memory. Especially, considering the circumstance that datasets of multimedia applications increase in size and complexity, designing an effective and application specific memory system is becoming an important issue to efficiently move data within the memory hierarchy to reduce the overall memory access latency [1], [2]. This can be done in the optimum way by exploiting the predictable memory access patterns inherent in multimedia applications.
On conventional applications, the performance of a cache could be improved by the increase of the cache size, which can result in a high cache hit ratio. Unfortunately, in embedded systems, increasing cache capacity is not an adequate approach for high performance due to area concerns, which may increase power consumed as well as cost. The gravity of this issue can be recognized easily from the example of StrongARM 110 [3] which dissipates 42% of its total power in caches. Consequently, a small cache size is recommended for designing power efficient embedded systems. However, it may cause the issue of low cache hit rates which can decrease performance and increase power dissipation. In order to compensate the above drawback, various prefetching schemes have been proposed [4], [5], [6]. Within a limited cache space, an aggressive prefeching policy could increase cache pollution which may degrade the overall system performance. Especially in the multimedia applications, because unit-stride memory accesses may occur frequently, a policy for continual data prefetching with specialized stride intervals could be encouraged to increase overall system performance. However, it may have the possibility to introduce contiguous cache pollutions caused by data prefecthing with mismatched stride intervals.
In order to overcome these correlative memory performance issues found in data caches for multimedia embedded systems, this paper proposes a novel data cache architecture with small area combining a conventional split-cache with two hardware enhancements for prefetching and filtering. The fundamental issue is balancing all the components under a limited area to produce an energy efficient and high performance cache suitable for multimedia-oriented embedded systems. The solution is provided by exploiting the operational behaviors stemming from the algorithmic characteristics of multimedia applications. More importantly, they do not slow down the cache access, nor require excessive amount of extra hardware. The performance is evaluated with three metrics: cache size, access latencies, and energy consumption.
The remainder of this paper is organized as follows. Related work is provided in Section 2. Section 3 describes the architectural and operational characteristics of the proposed cache. Section 4 presents our simulation results on the performance and the energy consumption. Finally, conclusions are given in Section 5.
Section snippets
Related work
Contrary to the popularity of multimedia applications, there have been limited evaluations on the cache performance related with them. Furthermore, some studies [7], [8], [9] express that data caches are not useful for them, because of the streaming nature of the data with less temporal locality, non-sequential locality of access, and the frequent access to large data sets that cannot be stored in the first-level cache. On the other hand, the recent studies suggest that multimedia applications
Proposed data cache systems
In this section, the architectural characteristics of the proposed data cache are presented along with its design motivation and operational model. Also, the details on how the proposed data cache is optimized for multimedia applications are introduced.
Performance evaluations
The details of the simulation environment, performance metrics, and the aspect of area and power are presented in this section. We used SimpleScalar/ARM processor simulator [17] to collect runtime information on three benchmarks: MediaBench [18], MiBench [19], and SPEC2000 for representing embedded multimedia and communications applications, general embedded applications, and conventional applications, respectively. Only data references are collected and utilized for simulations. We modified
Conclusion
This paper proposes a data cache for low power and high performance multimedia-oriented embedded processors. The basic strategy for achieving the objective is to design a small size of cache employing two modules: a direct-mapped cache with small block size for temporal locality and a fully-associative buffer with large block size for spatial locality. In addition to that, two hardware enhancements are designed on the base of the behavioral characteristics of multimedia applications: an
Acknowledgement
This work has been supported by the BK21 Research Center for Intelligent Mobile Software at Yonsei University in Korea.
References (21)
- P.R. Panda, N.D. Dutt, and A. Nicolau, Architectural exploration and optimization of local memory in embedded systems,...
- et al.
Data memory design and exploration for low-power embedded systems
ACM Trans. Des. Automat. Electro. Syst.
(2001) - S. Santhanam, StrongARM SA110 - A 160MHz 32b 0.5W CMOS ARM Processor, Hot Chips 8: A Symposium on High-Performance...
- P. Struik, P. van der Wolf, A.D. Pimentel. A combined hardware/software solution for stream prefetching in multimedia...
- et al.
Hardware and software cache prefetching techniques for MPEG benchmarks
IEEE Trans. Circ. Syst Video Technol.
(2000) - et al.
Neighbor cache prefetching for multimedia image and video processing
IEEE Trans. Multimedia
(2004) - I. Kuroda, T. Nishitani, Multimedia processors, in: Proceedings of the IEEE 86 (6), June 1998, pp....
- S. Rixner, W.J. Dally, U.J. Kapasi, P. Mattson, and J.D. Owens, Memory access scheduling, in: Proceedings of the 27th...
- et al.
How multimedia workloads will change processor design
Computer
(1997) - N.T. Slingerland, A.J. Smith, Cache performance for multimedia application, in: Proceedings of the 15th International...
Cited by (2)
An enhanced energy efficient instruction cache based on record buffer
2010, Jiangsu Daxue Xuebao (Ziran Kexue Ban) / Journal of Jiangsu University (Natural Science Edition)A high efficient flash storage system for two-way cable modem
2008, Proceedings - 8th IEEE International Conference on Computer and Information Technology Workshops, CIT Workshops 2008
- 1
Current address: Department of Computer Science, Namseoul University, CheonAn-Si, ChoongNam 330-707, Republic of Korea.