Abstract
Code compression coupled with dynamic decompression is an important technique for both embedded and general-purpose microprocessors. Postfetch decompression, in which decompression is performed after the compressed instructions have been fetched, allows the instruction cache to store compressed code but requires a highly efficient decompression implementation. We propose implementing postfetch decompression using a new hardware facility called dynamic instruction stream editing (DISE). DISE provides a programmable decoder---similar in structure to those in many IA-32 processors---that is used to add functionality to an application by injecting custom code snippets into its fetched instruction stream. We present a DISE-based implementation of postfetch decompression and show that it naturally supports customized program-specific decompression dictionaries, enables parameterized decompression allowing similar-but-not-identical instruction sequences to share dictionary entries, and uses no decompression-specific hardware. We present extensive experimental results showing the virtue of this approach and evaluating the factors that impact its efficacy. We also present implementation-neutral results that give insight into the characteristics of any postfetch decompression technique. Our experiments not only demonstrate significant reduction in code size (up to 35%) but also significant improvements in performance (up to 20%) and energy (up to 10%).
- Advanced RISC Machines Ltd. 1995. An Introduction to Thumb. Advanced RISC Machines Ltd, Austin, TX.Google Scholar
- Albonesi, D. 1999. Selective cache ways: On demand cache resource allocation. In Proceedings of the 32nd International Symposium on Microarchitecture. 248--259. Google Scholar
- Araujo, G., Centoducatte, P., and Cortes, M. 1998. Code compression based on operand factorization. In Proceedings of the 31st International Symposium on Microarchitecture. 194--201. Google Scholar
- Brooks, D., Tiwari, V., and Martonosi, M. 2000. Wattch: A framework for architectural-level power analysis and optimizations. In Proceedings of the 27th International Symposium on Computer Architecture. 83--94. Google Scholar
- Burger, D. and Austin, T. M. 1997. The SimpleScalar Tool Set, Version 2.0. Tech. Rep. 1342, University of Wisconsin--Madison Computer Sciences Department.Google Scholar
- Cooper, K. and McIntosh, N. 1999. Enhanced code compression for embedded RISC processors. In Proceedings of the ACM SIGPLAN '99 Conference on Programming Language Design and Implementation. 139--149. Google Scholar
- Corliss, M. L., Lewis, E. C., and Roth, A. 2002. DISE: Dynamic Instruction Stream Editing. Tech. Rep. MS-CIS-02-24, University of Pennsylvania. July.Google Scholar
- Corliss, M. L., Lewis, E. C., and Roth, A. 2003a. DISE: A programmable macro engine for customizing applications. In Proceedings of the 30th International Symposium on Computer Architecture. 362--373. Google Scholar
- Corliss, M. L., Lewis, E. C., and Roth, A. 2003b. A DISE implementation of dynamic code decompression. In Proceedings of the Conference on Languages, Compilers, and Tools for Embedded Systems. 232--243. Google Scholar
- Cormie, D. 2002. The ARM11 microarchitecture. ARM Ltd. White Paper.Google Scholar
- Debray, S. and Evans, W. 2002. Profile-guided code compression. In Proceedings of the 2002 ACM SIGPLAN Conference on Programming Languages Design and Implementation. 95--105. Google Scholar
- Debray, S. K., Evans, W., Muth, R., and B. De Sutter. 2000. Compiler techniques for code compression. ACM Trans. Program. Lang. Operating Syst. 22, 2 (Mar.), 378--415. Google Scholar
- Diefendorf, K. 1998. K7 challenges Intel. Microprocess. Rep. 12, 14 (Nov.).Google Scholar
- Ernst, J., Evans, W., Fraser, C., Lucco, S., and Proebsting, T. 1997. Code compression. In Proceedings of the ACM SIGPLAN '97 Conference on Programming Language Design and Implementation. 358--365. Google Scholar
- Glaskowsky, P. 2000. Pentium 4 (partially) previewed. Microprocess. Rep. 14, 8 (Aug.).Google Scholar
- Gwenapp, L. 1997. P6 microcode can be patched. Microprocess. Rep. 11, 12 (Sep.).Google Scholar
- Kemp, T. M., Montoye, R. K., Auerback, D. J., Harper, J. D., and Palmer, J. D. 1998. A decompression core for PowerPC. IBM Syst. J. 42, 6 (November), 807--812. Google Scholar
- Kirovski, D., Kin, J., and Mangione-Smith, W. 1997. Procedure based program compression. In Proceedings of the 30th International Symposium on Microarchitecture. 204--213. Google Scholar
- Kissell, K. 1997. MIPS16: High-Density MIPS for the Embedded Market. Silicon Graphics MIPS Group, Mt. View, CA.Google Scholar
- Lee, C., Potkonjak, M., and Mangione-Smith, W. 1997. Mediabench: A tool for evaluating and synthesizing multimedia and communications systems. In Proceedings 30th International Symposium on Microarchitecture. 330--335. Google Scholar
- Lefurgy, C., Bird, P., Cheng, I.-C., and Mudge, T. 1997. Improving code density using compression techniques. In Proceedings of the 30th International Symposium on Microarchitecture. 194--203. Google Scholar
- Lefurgy, C., Piccininni, E., and Mudge, T. 2000. Reducing code size with run-time decompression. In Proceedings of the 6th International Symposium on High-Performance Computer Architecture. 218--227.Google Scholar
- Lekatsas, H., Henkel, J., and Wolf, W. 2000. Code compression for low power embedded system design. In Proceedings 36th Design Automation Conference. 294--299. Google Scholar
- Liao, S., Devadas, S., and Keutzer, K. 1999. A text-compression-based method for code size minimization in embedded systems. ACM Trans. Design Autom. Electr. Syst. 4, 1 (Jan.), 12--38. Google Scholar
- Nam, S.-J., Park, I.-C., and Kyung, C.-M. 1999. Improving dictionary-based code compression in VLIW architectures. IEICE Trans. Fundam. E82-A, 11 (Nov.), 2318--2324.Google Scholar
- Phelan, R. 2003. Improving ARM Code Density and Performance. Tech. Rep., Advanced RISC Machines Ltd, Austin, TX.Google Scholar
- Szymanski, T. 1978. Assembling code for machines with span dependent instructions. Commun. ACM 21, 4 (Apr.), 300--308. Google Scholar
- Wilton, S. and Jouppi, N. 1994. An Enhanced Access and Cycle Time Model for On-Chip Caches. Tech. Rep., DEC Western Research Laboratory, Palo Alto, CA.Google Scholar
- Wolfe, A. and Chanin, A. 1992. Executing compressed programs on an embedded RISC architecture. In Proceedings of the 25th International Symposium on Microarchitecture. 81--91. Google Scholar
- Yang, S.-H., Powell, M., Falsafi, B., and Vijaykumar, T. 2002. Exploiting choice in resizable cache design to optimize deep-submicron processor energy-delay. In Proceedings 8th International Symposium on High Performance Computer Architecture. Google Scholar
Index Terms
- The implementation and evaluation of dynamic code decompression using DISE
Recommendations
A DISE implementation of dynamic code decompression
LCTES '03: Proceedings of the 2003 ACM SIGPLAN conference on Language, compiler, and tool for embedded systemsCode compression coupled with dynamic decompression is an important technique for both embedded and general-purpose microprocessors. Post-fetch decompression, in which decompression is performed after the compressed instructions have been fetched, ...
A DISE implementation of dynamic code decompression
Special Issue: Proceedings of the 2003 ACM SIGPLAN conference on Language, compiler, and tool support for embedded systems (San Diego, CA).Code compression coupled with dynamic decompression is an important technique for both embedded and general-purpose microprocessors. Post-fetch decompression, in which decompression is performed after the compressed instructions have been fetched, ...
Code compression for performance enhancement of variable-length embedded processors
Most of the work done in the field of code compression pertains to processors with fixed-length instruction encoding. The design of a code-compression scheme for variable-length instruction encodings poses newer design challenges. In this work, we first ...
Comments