Abstract
Future dynamic applications will require new mapping strategies to deliver power-efficient performance. Fully static design-time mappings will not be able to optimally address the unpredictably varying application characteristics and system resource requirements. Instead, the platforms will not only need to be programmable in terms of instruction set processors, but also at least partial reconfigurability will be required, while the applications themselves will need to exploit this increased freedom at runtime to adapt to the dynamism. In this context, it is important for applications to optimally exploit the memory hierarchy under varying memory availability. This article presents an analysis of spatial locality trade-offs in wavelet-based applications, to be used in dynamic execution environments: Depending on the encountered runtime conditions, the execution switches to different memory optimized instantiations or localizations, optimally exploiting temporal and spatial locality under these conditions. This is enabled by systematic mapping guidelines, indicating how the miss-rate behavior of a localization is influenced by a specific execution condition, under which conditions a certain localization is optimal and which miss-rate gains may be obtained by switching to that localization.
- Amrutur, B. and Horowitz, M. 2000. Speed and power scaling of SRAM's. IEEE J. Solid-State Circuits 35.Google Scholar
- Anderson, J., Amarasinghe, S., and Lam, M. 1995. Data and computation transformations for multiprocessors. In Proceedings of the Symposium on Principles and Practice of Parallel Programming (PPoPP'95). ACM, New York, 166--178. Google ScholarDigital Library
- Banerjee, U. K. 1993. Loop Transformations for Restructuring Compilers: The Foundations. Kluwer Academic Publishers, Norwell, MA. Google ScholarDigital Library
- Bernabe, G., Garcia, J., and Gonzalez, J. 2005. Reducing 3D fast WT execution time using blocking and the streaming SIMD extensions. J. VLSI Sig. Proc. 41, 2, 209--223. Google ScholarDigital Library
- Beyls, K. and D'Hollander, E. 2002. Reuse distance-based cache hint selection. In Proceedings of the 8th International Euro-Par Conference. Springer, Berlin, 265--274. Google ScholarDigital Library
- Catthoor, F., Wuytack, S., De Greef, E., Balasha, F., Nachtergaele, L., and Vandecappelle, A. 1998. Custom Memory Management Methodology. Kluwer Academic Publishers, The Netherlands. Google ScholarDigital Library
- Chatterjee, S. and Brooks, C. 2002. Cache-efficient wavelet lifting in JPEG 2000. In Proceedings of the 2002 IEEE International Conference on Multimedia and Expo. IEEE, Los Alamitos, CA, 797--800.Google Scholar
- Chrysafis, C. and Ortega, A. 2000. Line-based, reduced memory, wavelet image compression. IEEE Trans. Image Process. 378--389. Google ScholarDigital Library
- Cierniak, M. and Li, W. 1995. Unifying data and control transformations for distributed shared- memory machines. In Proceedings of the Programming Language Design and Implementation (PLDI'95). ACM, New York, 205--217. Google ScholarDigital Library
- Ferentinos, V., Geelen, B., Catthoor, F., Lafruit, G., Stouraitis, T., Lauwereins, R., and Verkest, D. 2007. Adaptive mapping to resource availability for dynamic wavelet-based applications. In Proceedings of 5th IEEE Workshop on Embedded Systems for Real-Time Multimedia (ESTIMedia'07). IEEE, Los Alamitos, CA.Google Scholar
- Geelen, B., Ferentinos, V., Catthoor, F., Lafruit, G., Stouraitis, T., Lauwereins, R., and Verkest, D. 2007. Exploiting varying resource requirements in wavelet-based applications in dynamic execution environments. J. VLSI Sig. Proc.Google Scholar
- Geelen, B., Ferentinos, V., Catthoor, F., Lafruit, G., Verkest, D., Lauwereins, R., and Stouraitis, T. 2008. Application-level impact of new wavelet transform data layout choices. ACM Trans. Des. Autom. Electron. Syst.Google Scholar
- Geelen, B., Ferentinos, V., Catthoor, F., Vandecappelle, A., Lafruit, G., Stouraitis, T., Lauwereins, R., and Verkest, D. 2006. Software-controlled scratchpad mapping strategies for wavelet-based applications. In Proceedings of the IEEE Workshop on Signal Processing Systems. IEEE, Los Alamitos, CA.Google Scholar
- Hill, M. S. 1998. Dinero IV, release 7, trace-driven uniprocessor cache simulator. http://www.cs.wisc.edu/_markhill/DineroIV.Google Scholar
- Huang, C.-T., Tseng, P.-C., and Chen, L.-G. 2005. Generic ram-based architectures for 2D discrete wavelet transform with line-based method. IEEE Trans. Circuits Syst. Video Tech. 15, 7, 910--920. Google ScholarDigital Library
- Kandemir, M. 2001. A compiler technique for improving whole-program locality. In Proceedings of the ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL'01). ACM, New York, 179--192. Google ScholarDigital Library
- Kandemir, M., Kadayif, I., Choudhary, A., Ramanujam, J., and Ibrahim, K. 2004. Compiler-directed scratchpad mem. opt. for embedded multiproc's. IEEE Trans. VLSI Syst. 3, 281--287. Google ScholarDigital Library
- Kulkarni, C., Ghez, C., Miranda, M., Catthoor, F., and De Man, H. 2001. Cache conscious data layout organization for emb. multimedia applications. In Proceedings of the Design, Automation and Test in EuropeConference (DATE'01). IEEE, Los Alamitos, CA, 686--691. Google ScholarDigital Library
- Lafruit, G., Nachtergaele, L., and Bormans, J. 1999. Opt. mem. org. for scalable texture codecs in MPEG-4. IEEE Trans. Circuit Syst. Video Tech. 2, 218--243. Google ScholarDigital Library
- Leung, S. and Zahorjan, J. 1995. Optimizing data locality by array restructuring. Tech. rep. TR-95-09-01. Department of Computer Science and Engineering, University of Washington.Google Scholar
- Mallat, S. G. 1989. A theory for multires. signal decomposition: The wavelet representation. IEEE Trans. Pattern Anal. Mach. Intell. 11, 7, 674--693. Google ScholarDigital Library
- Masselos, K., Catthoor, F., Kakarudas, A., Goutis, C., and De Man, H. 2001. Memory hierarchy layer assignment for data reuse exploitation in multimedia algorithms realized on predefined processor architectures. In Proceedings of the IEEE International Conference on Electronics, Circuits and Systems. IEEE, Los Alamitos, CA, 281--287.Google Scholar
- McKinley, K. S., Carr, S., and Tseng, C.-W. 1996. Improving data locality with loop transformations. ACM Trans. Program. Lang. Syst. 18, 4, 424--453. Google ScholarDigital Library
- Meerwald, P., Norcen, R., and Uhl, A. 2002. Cache issues with JPEG2000 wavelet lifting. In Proceedings of Visual Communications and Image Processing. SPIE, vol. 4671, Bellingham, WA, 626--634.Google Scholar
- O'Boyle, M. and Knijnenburg, P. 1997. Nonsingular data transformations: Definition, validity and applications. In Proceedings of the International Conference on Supercomputing (ICS'97). ACM, New York, 309--316. Google ScholarDigital Library
- Palkovic, M., Corporaal, H., and Catthoor, F. 2005. Global memory optimization for embedded systems allowed by code duplication. In Proceedings of 9th International Workshop on Software and Compilers for Embedded Systems (SCOPES'05). ACM, New York, 72--79. Google ScholarDigital Library
- Panda, P., Nakamura, H., Dutt, N., and Nicolau, A. 1999. Augmenting loop tiling with data alignment for improved cache performance. IEEE Trans. Comput. 48, 2, 142--149. Google ScholarDigital Library
- Patterson, D. and Henessy, J. 1996. Computer Architecture: A Quantitative Approach. Morgan Kaufmann, San Francisco, CA. Google ScholarDigital Library
- Schwarz, H., Marpe, D., and Wiegand, T. 2007. Overview of the scalable video coding extension of the H.264/AVC standard. IEEE Trans. Circuits Syst. Video Tech. 17, 9, 1103--1120. Google ScholarDigital Library
- Steinke, S., Wehmeyer, L., Lee, B.-S., and Marwedel, P. 2002. Assigning program and data objects to scratchpad for energy reduction. In Proceedings of the 5th ACM/IEEE Design, Automation and Test in EuropeConference. ACM, New York, 409--415. Google ScholarDigital Library
- Sweldens, W. 1995. The lifting scheme: A new philosophy in biorthogonal wavelet constructions. In Wavelet Applications in Signal and Image Processing III. Proceedings of SPIE, 2569. SPIE, Bellingham, WA, 68--79.Google Scholar
- Taubman, D. and Marcellin, M. 2002. JPEG2000: Image Compression Fundamentals, Standards and Practice. Kluwer Academic Publishers, The Netherlands. Google ScholarDigital Library
- Van Achteren, T., Lauwereins, R., and Catthoor, F. 2002. Data reuse exploration techniques for loop-dominated applications. In Proceedings of the 5th ACM/IEEE Design, Automation and Test in EuropeConference. ACM, New York, 428--435. Google ScholarDigital Library
- Vetterli, M. and Kovacic, J. 1995. Wavelets and Subband Cod. Prentice Hall, Upper Saddle River, NJ. Google ScholarDigital Library
- Wu, B.-F. and Lin, C.-F. 2005. A high-performance and memory-efficient pipeline architecture for the discrete wavelet transform of jpeg2000 codec. IEEE Trans. Circuits Syst. Video Tech. 15, 12, 1615--1628. Google ScholarDigital Library
Index Terms
- Modeling and exploiting spatial locality trade-offs in wavelet-based applications under varying resource requirements
Recommendations
Exploiting spatial locality in data caches using spatial footprints
Special Issue: Proceedings of the 25th annual international symposium on Computer architecture (ISCA '98)Modern cache designs exploit spatial locality by fetching large blocks of data called cache lines on a cache miss. Subsequent references to words within the same cache line result in cache hits. Although this approach benefits from spatial locality, ...
Exploiting spatial locality in data caches using spatial footprints
ISCA '98: Proceedings of the 25th annual international symposium on Computer architectureModern cache designs exploit spatial locality by fetching large blocks of data called cache lines on a cache miss. Subsequent references to words within the same cache line result in cache hits. Although this approach benefits from spatial locality, ...
Spatial locality exploitation for runtime reordering of JPEG2000 wavelet data layouts
Exploitation of spatial locality is essential for memories to increase the access bandwidth and to reduce the access-related latency and energy per word. Spatial locality exploitation of a kernel can be improved by modifying placement of data in memory, ...
Comments