skip to main content
article

The implementation and evaluation of dynamic code decompression using DISE

Published:01 February 2005Publication History
Skip Abstract Section

Abstract

Code compression coupled with dynamic decompression is an important technique for both embedded and general-purpose microprocessors. Postfetch decompression, in which decompression is performed after the compressed instructions have been fetched, allows the instruction cache to store compressed code but requires a highly efficient decompression implementation. We propose implementing postfetch decompression using a new hardware facility called dynamic instruction stream editing (DISE). DISE provides a programmable decoder---similar in structure to those in many IA-32 processors---that is used to add functionality to an application by injecting custom code snippets into its fetched instruction stream. We present a DISE-based implementation of postfetch decompression and show that it naturally supports customized program-specific decompression dictionaries, enables parameterized decompression allowing similar-but-not-identical instruction sequences to share dictionary entries, and uses no decompression-specific hardware. We present extensive experimental results showing the virtue of this approach and evaluating the factors that impact its efficacy. We also present implementation-neutral results that give insight into the characteristics of any postfetch decompression technique. Our experiments not only demonstrate significant reduction in code size (up to 35%) but also significant improvements in performance (up to 20%) and energy (up to 10%).

References

  1. Advanced RISC Machines Ltd. 1995. An Introduction to Thumb. Advanced RISC Machines Ltd, Austin, TX.Google ScholarGoogle Scholar
  2. Albonesi, D. 1999. Selective cache ways: On demand cache resource allocation. In Proceedings of the 32nd International Symposium on Microarchitecture. 248--259. Google ScholarGoogle Scholar
  3. Araujo, G., Centoducatte, P., and Cortes, M. 1998. Code compression based on operand factorization. In Proceedings of the 31st International Symposium on Microarchitecture. 194--201. Google ScholarGoogle Scholar
  4. Brooks, D., Tiwari, V., and Martonosi, M. 2000. Wattch: A framework for architectural-level power analysis and optimizations. In Proceedings of the 27th International Symposium on Computer Architecture. 83--94. Google ScholarGoogle Scholar
  5. Burger, D. and Austin, T. M. 1997. The SimpleScalar Tool Set, Version 2.0. Tech. Rep. 1342, University of Wisconsin--Madison Computer Sciences Department.Google ScholarGoogle Scholar
  6. Cooper, K. and McIntosh, N. 1999. Enhanced code compression for embedded RISC processors. In Proceedings of the ACM SIGPLAN '99 Conference on Programming Language Design and Implementation. 139--149. Google ScholarGoogle Scholar
  7. Corliss, M. L., Lewis, E. C., and Roth, A. 2002. DISE: Dynamic Instruction Stream Editing. Tech. Rep. MS-CIS-02-24, University of Pennsylvania. July.Google ScholarGoogle Scholar
  8. Corliss, M. L., Lewis, E. C., and Roth, A. 2003a. DISE: A programmable macro engine for customizing applications. In Proceedings of the 30th International Symposium on Computer Architecture. 362--373. Google ScholarGoogle Scholar
  9. Corliss, M. L., Lewis, E. C., and Roth, A. 2003b. A DISE implementation of dynamic code decompression. In Proceedings of the Conference on Languages, Compilers, and Tools for Embedded Systems. 232--243. Google ScholarGoogle Scholar
  10. Cormie, D. 2002. The ARM11 microarchitecture. ARM Ltd. White Paper.Google ScholarGoogle Scholar
  11. Debray, S. and Evans, W. 2002. Profile-guided code compression. In Proceedings of the 2002 ACM SIGPLAN Conference on Programming Languages Design and Implementation. 95--105. Google ScholarGoogle Scholar
  12. Debray, S. K., Evans, W., Muth, R., and B. De Sutter. 2000. Compiler techniques for code compression. ACM Trans. Program. Lang. Operating Syst. 22, 2 (Mar.), 378--415. Google ScholarGoogle Scholar
  13. Diefendorf, K. 1998. K7 challenges Intel. Microprocess. Rep. 12, 14 (Nov.).Google ScholarGoogle Scholar
  14. Ernst, J., Evans, W., Fraser, C., Lucco, S., and Proebsting, T. 1997. Code compression. In Proceedings of the ACM SIGPLAN '97 Conference on Programming Language Design and Implementation. 358--365. Google ScholarGoogle Scholar
  15. Glaskowsky, P. 2000. Pentium 4 (partially) previewed. Microprocess. Rep. 14, 8 (Aug.).Google ScholarGoogle Scholar
  16. Gwenapp, L. 1997. P6 microcode can be patched. Microprocess. Rep. 11, 12 (Sep.).Google ScholarGoogle Scholar
  17. Kemp, T. M., Montoye, R. K., Auerback, D. J., Harper, J. D., and Palmer, J. D. 1998. A decompression core for PowerPC. IBM Syst. J. 42, 6 (November), 807--812. Google ScholarGoogle Scholar
  18. Kirovski, D., Kin, J., and Mangione-Smith, W. 1997. Procedure based program compression. In Proceedings of the 30th International Symposium on Microarchitecture. 204--213. Google ScholarGoogle Scholar
  19. Kissell, K. 1997. MIPS16: High-Density MIPS for the Embedded Market. Silicon Graphics MIPS Group, Mt. View, CA.Google ScholarGoogle Scholar
  20. Lee, C., Potkonjak, M., and Mangione-Smith, W. 1997. Mediabench: A tool for evaluating and synthesizing multimedia and communications systems. In Proceedings 30th International Symposium on Microarchitecture. 330--335. Google ScholarGoogle Scholar
  21. Lefurgy, C., Bird, P., Cheng, I.-C., and Mudge, T. 1997. Improving code density using compression techniques. In Proceedings of the 30th International Symposium on Microarchitecture. 194--203. Google ScholarGoogle Scholar
  22. Lefurgy, C., Piccininni, E., and Mudge, T. 2000. Reducing code size with run-time decompression. In Proceedings of the 6th International Symposium on High-Performance Computer Architecture. 218--227.Google ScholarGoogle Scholar
  23. Lekatsas, H., Henkel, J., and Wolf, W. 2000. Code compression for low power embedded system design. In Proceedings 36th Design Automation Conference. 294--299. Google ScholarGoogle Scholar
  24. Liao, S., Devadas, S., and Keutzer, K. 1999. A text-compression-based method for code size minimization in embedded systems. ACM Trans. Design Autom. Electr. Syst. 4, 1 (Jan.), 12--38. Google ScholarGoogle Scholar
  25. Nam, S.-J., Park, I.-C., and Kyung, C.-M. 1999. Improving dictionary-based code compression in VLIW architectures. IEICE Trans. Fundam. E82-A, 11 (Nov.), 2318--2324.Google ScholarGoogle Scholar
  26. Phelan, R. 2003. Improving ARM Code Density and Performance. Tech. Rep., Advanced RISC Machines Ltd, Austin, TX.Google ScholarGoogle Scholar
  27. Szymanski, T. 1978. Assembling code for machines with span dependent instructions. Commun. ACM 21, 4 (Apr.), 300--308. Google ScholarGoogle Scholar
  28. Wilton, S. and Jouppi, N. 1994. An Enhanced Access and Cycle Time Model for On-Chip Caches. Tech. Rep., DEC Western Research Laboratory, Palo Alto, CA.Google ScholarGoogle Scholar
  29. Wolfe, A. and Chanin, A. 1992. Executing compressed programs on an embedded RISC architecture. In Proceedings of the 25th International Symposium on Microarchitecture. 81--91. Google ScholarGoogle Scholar
  30. Yang, S.-H., Powell, M., Falsafi, B., and Vijaykumar, T. 2002. Exploiting choice in resizable cache design to optimize deep-submicron processor energy-delay. In Proceedings 8th International Symposium on High Performance Computer Architecture. Google ScholarGoogle Scholar

Index Terms

  1. The implementation and evaluation of dynamic code decompression using DISE

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader