ABSTRACT
Dynamic binary translation system must perform an address translation for every execution of indirect branch instructions. The procedure to convert Source binary Program Counter (SPC) address to Translated Program Counter (TPC) address always takes more than 10 instructions, becoming a major source of performance overhead. This paper proposes a novel mechanism called SPc-Indexed REdirecting (SPIRE), which can significantly reduce the indirect branch handling overhead. SPIRE doesn't rely on hash lookup and address mapping table to perform address translation. It reuses the source binary code space to build a SPC-indexed redirecting table. This table can be indexed directly by SPC address without hashing. With SPIRE, the indirect branch can jump to the originally SPC address without address translation. The trampoline residing in the SPC address will redirect the control flow to related code cache. Only 2-6 instructions are needed to handle an indirect branch execution. As part of the source binary would be overwritten, a shadow page mechanism is explored to keep transparency of the corrupt source binary code page. Online profiling is adopted to reduce the memory overhead.
We have implemented SPIRE on an x86 to x86 DBT system, and discussed the implementation issues on different guest and host architectures. The experiments show that, compared with hash lookup mechanism, SPIRE can reduce the performance overhead by 36.2% on average, up to 51.4%, while only 5.6% extra memory is needed.
SPIRE can cooperate with other indirect branch handling mechanisms easily, and we believe the idea of SPIRE can also be applied on other occasions that need address translation.
- S. Bansal and A. Aiken. Binary translation using peephole superoptimizers. in Proceedings of the 8th USENIX conference on Operating systems design and implementation, San Diego, California, pages 177--192, 2008. Google ScholarDigital Library
- N. Nethercote and J. Seward. Valgrind: a framework for heavyweight dynamic binary instrumentation. in Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation, San Diego, California, USA, pages 89--100, 2007. Google ScholarDigital Library
- D. Bruening, Q. Zhao, and S. Amarasinghe. Transparent dynamic instrumentation. in Proceedings of the 8th ACM SIGPLAN/SIGOPS conference on Virtual Execution Environments, London, England, UK, pages 133--144, 2012. Google ScholarDigital Library
- D. Pavlou, E. Gibert, F. Latorre, and A. Gonzalez. DDGacc: boosting dynamic DDG-based binary optimizations through specialized hardware support. in Proceedings of the 8th ACM SIGPLAN/SIGOPS conference on Virtual Execution Environments, London, England, UK, pages 159--168, 2012. Google ScholarDigital Library
- Q. Zhao, D. Koh, S. Raza, D. Bruening, W.-F. Wong, and S. Amarasinghe. Dynamic cache contention detection in multi-threaded applications. in Proceedings of the 7th ACM SIGPLAN/SIGOPS international conference on Virtual execution environments, Newport Beach, California, USA, pages 27--38, 2011. Google ScholarDigital Library
- G. Lueck, H. Patil, and C. Pereira. PinADX: an interface for customizable debugging with dynamic instrumentation. in Proceedings of the Tenth International Symposium on Code Generation and Optimization, San Jose, California, USA, pages 114--123, 2012. Google ScholarDigital Library
- F. Bellard. QEMU, a fast and portable dynamic translator. in Proceedings of the annual conference on USENIX Annual Technical Conference, Anaheim, CA, pages 41--41, 2005. Google ScholarDigital Library
- J. D. Hiser, D. W. Williams, W. Hu, J. W. Davidson, J. Mars, and B. R. Childers. Evaluating indirect branch handling mechanisms in software dynamic translation systems. ACM Trans. Archit. Code Optim., vol. 8, pages 1--28, 2011. Google ScholarDigital Library
- E. Borin and Y. Wu. Characterization of DBT overhead. in Proceedings of the 2009 IEEE International Symposium on Workload Characterization (IISWC), pages 178--187, 2009. Google ScholarDigital Library
- D. Bruening, T. Garnett, and S. Amarasinghe. An infrastructure for adaptive dynamic optimization. in Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization, San Francisco, California, USA, pages 265--275, 2003. Google ScholarDigital Library
- K. Scott and J. Davidson. Strata: A Software Dynamic Translation Infrastructure. in IEEE Workshop on Binary Translation, 2001.Google ScholarDigital Library
- S. Sridhar, J. S. Shapiro, E. Northup, and P. P. Bungale. HDTrans: an open source, low-level dynamic instrumentation system. in Proceedings of the 2nd international conference on Virtual execution environments, Ottawa, Ontario, Canada, pages 175--185, 2006. Google ScholarDigital Library
- H.-S. Kim and J. E. Smith. Hardware Support for Control Transfers in Code Caches. in Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture, pages 253--264, 2003. Google ScholarDigital Library
- K. Ishizaki, M. Kawahito, T. Yasue, H. Komatsu, and T. Nakatani. A study of devirtualization techniques for a Java Just-In-Time compiler. in Proceedings of the 15th ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications, Minneapolis, Minnesota, USA, pages 294--310, 2000. Google ScholarDigital Library
- C.-K. Luk, R. Cohn, R. Muth, H. Patil, A. Klauser, G. Lowney, S. Wallace, V. J. Reddi, and K. Hazelwood. Pin: building customized program analysis tools with dynamic instrumentation. in Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation, Chicago, IL, USA, pages 190--200, 2005. Google ScholarDigital Library
- B. Dhanasekaran and K. Hazelwood. Improving indirect branch translation in dynamic binary translators. in Proceedings of the ASPLOS Workshop on Runtime Environments, Systems, Layering, and Virtualized Environments, Newport Beach, CA, pages 11--18, 2011.Google Scholar
- W. Hu, J. Wang, X. Gao, Y. Chen, Q. Liu, and G. Li. Godson-3: A Scalable Multicore RISC Processor with x86 Emulation. IEEE Micro, vol. 29, pages 17--29, 2009. Google ScholarDigital Library
- J. Li and C. Wu. A New Replacement Algorithm on Content Associative Memory for Binary Translation System. in Proceedings of the Workshop on Architectural and Microarchitectural Support for Binary Translation, pages 45--54, 2008.Google Scholar
- H. Guan, B. Liu, Z. Qi, Y. Yang, H. Yang, and A. Liang. CoDBT: A multi-source dynamic binary translator using hardware-software collaborative techniques. J. Syst. Archit., vol. 56, pages 500--508, 2010. Google ScholarDigital Library
- D. Mihocka and S. Shwartsman. Virtualization Without Direct Execution or Jitting: Designing a Portable Virtual Machine Infrastructure. in Proceedings of the Workshop on Architectural and Microarchitectural Support for Binary Translation, pages 55--70, 2008.Google Scholar
- A. Guha, K. hazelwood, and M. L. Soffa. DBT path selection for holistic memory efficiency and performance. in Proceedings of the 6th ACM SIGPLAN/SIGOPS international conference on Virtual execution environments, Pittsburgh, Pennsylvania, USA, pages 145--156, 2010. Google ScholarDigital Library
- G. Kondoh and H. Komatsu. Dynamic binary translation specialized for embedded systems. in Proceedings of the 6th ACM SIGPLAN/SIGOPS international conference on Virtual execution environments, Pittsburgh, Pennsylvania, USA, pages 157--166, 2010. Google ScholarDigital Library
- D. Bruening and V. Kiriansky. Process-shared and persistent code caches. in Proceedings of the fourth ACM SIGPLAN/SIGOPS international conference on Virtual execution environments, Seattle, WA, USA, pages 61--70, 2008. Google ScholarDigital Library
- V. J. Reddi, D. Connors, R. Cohn, and M. D. Smith. Persistent Code Caching: Exploiting Code Reuse Across Executions and Applications. in Proceedings of the International Symposium on Code Generation and Optimization, pages 74--88, 2007. Google ScholarDigital Library
- D.-Y. Hong, C.-C. Hsu, P.-C. Yew, J.-J. Wu, W.-C. Hsu, P. Liu, C.-M. Wang, and Y.-C. Chung. HQEMU: a multi-threaded and retargetable dynamic binary translator on multicores. in Proceedings of the Tenth International Symposium on Code Generation and Optimization, San Jose, California, USA, pages 104--113, 2012. Google ScholarDigital Library
- J. Smith and R. Nair, Virtual Machines: Versatile Platforms for Systems and Processes (The Morgan Kaufmann Series in Computer Architecture and Design): Morgan Kaufmann Publishers Inc., 2005. Google ScholarDigital Library
- J. D. Hiser, D. Williams, W. Hu, J. W. Davidson, J. Mars, and B. R. Childers. Evaluating Indirect Branch Handling Mechanisms in Software Dynamic Translation Systems. in Proceedings of the International Symposium on Code Generation and Optimization, pages 61--73, 2007. Google ScholarDigital Library
- Standard Performance Evaluation Corporation. SPEC CPU. http://www.spec.orgGoogle Scholar
- H. Kim, J. A. Joao, O. Mutlu, C. J. Lee, Y. N. Patt, and R. Cohn. VPC prediction: reducing the cost of indirect branches via hardware-based dynamic devirtualization. in Proceedings of the 34th annual international symposium on Computer architecture, San Diego, California, USA, pages 424--435, 2007. Google ScholarDigital Library
- J. D. Hiser, D. Williams, A. Filipi, J. W. Davidson, and B. R. Childers. Evaluating fragment construction policies for SDT systems. in Proceedings of the 2nd international conference on Virtual execution environments, Ottawa, Ontario, Canada, pages 122--132, 2006. Google ScholarDigital Library
Index Terms
- SPIRE: improving dynamic binary translation through SPC-indexed indirect branch redirecting
Recommendations
Optimizing Indirect Branches in Dynamic Binary Translators
Dynamic binary translation is a technology for transparently translating and modifying a program at the machine code level as it is running. A significant factor in the performance of a dynamic binary translator is its handling of indirect branches. ...
SPIRE: improving dynamic binary translation through SPC-indexed indirect branch redirecting
VEE '13Dynamic binary translation system must perform an address translation for every execution of indirect branch instructions. The procedure to convert Source binary Program Counter (SPC) address to Translated Program Counter (TPC) address always takes more ...
Optimizing indirect branches in a system-level dynamic binary translator
SYSTOR '12: Proceedings of the 5th Annual International Systems and Storage ConferenceA dynamic binary translator (DBT) is a runtime system that translates binary code on the fly, for example to emulate the execution of the binary code on a processor with a different instruction set. One of the major sources of the overhead is the ...
Comments