ABSTRACT
In dynamic translation system, handling indirect branch is a major source of performance overhead, because it must perform an on-the-fly address translation at each indirect branch execution. The translation systems usually adopt software prediction to reduce the overhead of address translation, but the low prediction accuracy restricts the performance improvement.
This paper analyzes the performance bottleneck of software prediction, and proposes a novel prediction mechanism called Software Prediction with Target Updating (SPTU), which can significantly improve the prediction accuracy with an acceptable overhead. Based on the observation of the phase characteristic of branch targets, SPTU adopts a coarse-grained target updating mechanism, which updates the prediction targets at a proper frequency. SPTU leverages software prediction miss count to detect phase status, and triggers target updating only when the branch phase changes.
The experiment shows that, compared with software prediction, SPTU can improve the average prediction accuracy from 48.0% to 77.5%, and reduces the performance overhead by 21.6% on average. Furthermore, SPTU could cooperate with other optimization techniques for handling indirect branches.
- M. Souza, D. Nicacio, and G. Araujo. ISAMAP: instruction mapping driven by dynamic binary translation. In Proceedings of the 2010 international conference on Computer Architecture, pages 117--138, 2012. Google ScholarDigital Library
- N. Nethercote and J. Seward. Valgrind: a framework for heavyweight dynamic binary instrumentation. In Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation, pages 89--100, 2007. Google ScholarDigital Library
- M. Payer and T. R. Gross. Fine-grained user-space security through virtualization. In Proceedings of the 7th ACM SIGPLAN/SIGOPS international conference on Virtual execution environments, pages 157--168, 2011. Google ScholarDigital Library
- D. Jones and N. Topham. High speed CPU simulation using LTU dynamic binary translation. In High Performance Embedded Architectures and Compilers, pages 50--64, 2009. Google ScholarDigital Library
- A. Noll and T. R. Gross. An infrastructure for dynamic optimization of parallel programs. In Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming, pages 325--326, 2012. Google ScholarDigital Library
- M. Kaufmann and R. G. Spallek. Superblock compilation and other optimization techniques for a Java-based DBT machine emulator. In Proceedings of the 9th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments, pages 33--40, 2013. Google ScholarDigital Library
- J. D. Hiser, D. W. Williams, W. Hu, J. W. Davidson, J. Mars, and B. R. Childers. Evaluating indirect branch handling mechanisms in software dynamic translation systems. ACM Transactions on Architecture and Code Optimization, vol. 8, pages 1--28, 2011. Google ScholarDigital Library
- E. Borin and Y. Wu. Characterization of DBT overhead. In Proceedings of the 2009 IEEE International Symposium on Workload Characterization (IISWC), pages 178--187, 2009. Google ScholarDigital Library
- D. Bruening, T. Garnett, and S. Amarasinghe. An infrastructure for adaptive dynamic optimization. In Proceedings of the International Symposium on Code generation and Optimization: Feedback-directed and Runtime Optimization, pages 265--275, 2003. Google ScholarDigital Library
- C.-K. Luk, R. Cohn, R. Muth, H. Patil, A. Klauser, G. Lowney, S. Wallace, V. J. Reddi, and K. Hazelwood. Pin: building customized program analysis tools with dynamic instrumentation. In Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation, pages 190--200, 2005. Google ScholarDigital Library
- H.-S. Kim and J. E. Smith. Hardware support for control transfers in code caches. In Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture, pages 253--264, 2003. Google ScholarDigital Library
- K. Scott and J. Davidson. Strata: a software dynamic translation infrastructure. In IEEE Workshop on Binary Translation, 2001.Google Scholar
- S. Sridhar, J. S. Shapiro, E. Northup, and P. P. Bungale. HDTrans: an open source, low-level dynamic instrumentation system. In Proceedings of the 2nd International Conference on Virtual Execution Environments, pages 175--185, 2006. Google ScholarDigital Library
- K. Ishizaki, M. Kawahito, T. Yasue, H. Komatsu, and T. Nakatani. A study of devirtualization techniques for a Java Just-In-Time compiler. In Proceedings of the 15th ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications, pages 294--310, 2000. Google ScholarDigital Library
- K. Driesen and U. Hölzle. Accurate indirect branch prediction. In Proceedings of the 25th annual international symposium on Computer architecture, pages 167--178, 1998. Google ScholarDigital Library
- P.-Y. Chang, E. Hao, and Y. N. Patt. Target prediction for indirect jumps. In Proceedings of the 24th annual international symposium on Computer architecture, pages 274--283, 1997. Google ScholarDigital Library
- N. Jia, C. Yang, D. Tong, and K. Wang. Correlated Software Prediction for Indirect Branch in Dynamic Translation Systems. Journal of Computer Research and Development, vol. 50, 2013 (in Chinese).Google Scholar
- N. Jia, C. Yang, J. Wang, D. Tong, and K. Wang. SPIRE: improving dynamic binary translation through SPC-indexed indirect branch redirecting. In Proceedings of the 9th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments, pages 1--12, 2013. Google ScholarDigital Library
- M. Payer and T. R. Gross. Generating low-overhead dynamic binary translators. In Proceedings of the 3rd Annual Haifa Experimental Systems Conference, pages 1--14, 2010. Google ScholarDigital Library
- T. Koju, X. Tong, A. I. Sheikh, M. Ohara, and T. Nakatani. Optimizing indirect branches in a system-level dynamic binary translator. In Proceedings of the 5th Annual International Systems and Storage Conference, pages 1--12, 2012. Google ScholarDigital Library
- W. Hu, J. Wang, X. Gao, Y. Chen, Q. Liu, and G. Li. Godson-3: a scalable multicore RISC processor with x86 emulation. IEEE Micro, vol. 29, pages 17--29, 2009. Google ScholarDigital Library
- J. Li and C. Wu. A new replacement algorithm on content associative memory for binary translation system. In Proceedings of the Workshop on Architectural and Microarchitectural Support for Binary Translation, pages 45--54, 2008.Google Scholar
- D. Mihocka and S. Shwartsman. Virtualization Without Direct Execution or Jitting: Designing a Portable Virtual Machine Infrastructure. In Proceedings of the Workshop on Architectural and Microarchitectural Support for Binary Translation, pages 55--70, 2008.Google Scholar
- H. Guan, B. Liu, Z. Qi, Y. Yang, H. Yang, and A. Liang. CoDBT: A multi-source dynamic binary translator using hardware-software collaborative techniques. Journal of Systems Architecture, vol. 56, pages 500--508, 2010. Google ScholarDigital Library
- K. Hazelwood and M. D. Smith. Generational cache management of code traces in dynamic optimization systems. In Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture, page 169, 2003. Google ScholarDigital Library
- A. Guha, K. hazelwood, and M. L. Soffa. DBT path selection for holistic memory efficiency and performance. In Proceedings of the 6th ACM SIGPLAN/SIGOPS international conference on Virtual execution environments, pages 145--156, 2010. Google ScholarDigital Library
- C.-C. Hsu, P. Liu, J.-J. Wu, P.-C. Yew, D.-Y. Hong, W.-C. Hsu, and C.-M. Wang. Improving dynamic binary optimization through early-exit guided code region formation. In Proceedings of the 9th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments, pages 23--32, 2013. Google ScholarDigital Library
- A. Guha, K. Hazelwood, and M. L. Soffa. Reducing exit stub memory consumption in code caches. In Proceedings of the 2nd international Conference on High Performance Embedded Architectures and Compilers, pages 87--101, 2007. Google ScholarDigital Library
- D. Bruening and V. Kiriansky. Process-shared and persistent code caches. In Proceedings of the fourth ACM SIGPLAN/SIGOPS international conference on Virtual execution environments, pages 61--70, 2008. Google ScholarDigital Library
- V. J. Reddi, D. Connors, R. Cohn, and M. D. Smith. Persistent code caching: exploiting code reuse across executions and applications. In Proceedings of the International Symposium on Code Generation and Optimization, pages 74--88, 2007. Google ScholarDigital Library
- D.-Y. Hong, C.-C. Hsu, P.-C. Yew, J.-J. Wu, W.-C. Hsu, P. Liu, C.-M. Wang, and Y.-C. Chung. HQEMU: a multi-threaded and retargetable dynamic binary translator on multicores. In Proceedings of the 10th International Symposium on Code Generation and Optimization, pages 104--113, 2012. Google ScholarDigital Library
- H. Kim, J. A. Joao, O. Mutlu, C. J. Lee, Y. N. Patt, and R. Cohn. VPC prediction: reducing the cost of indirect branches via hardware-based dynamic devirtualization. In Proceedings of the 34th Annual International Symposium on Computer Architecture, pages 424--435, 2007. Google ScholarDigital Library
- B. Dhanasekaran and K. Hazelwood. Improving indirect branch translation in dynamic binary translators. In Proceedings of the ASPLOS Workshop on Runtime Environments, Systems, Layering, and Virtualized Environments, pages 11--18, 2011.Google Scholar
- Standard Performance Evaluation Corporation. SPEC CPU. http://www.spec.orgGoogle Scholar
Index Terms
- SPTU: Improving Dynamic Binary Translation through Software Prediction with Target Updating
Recommendations
Optimizing Indirect Branches in Dynamic Binary Translators
Dynamic binary translation is a technology for transparently translating and modifying a program at the machine code level as it is running. A significant factor in the performance of a dynamic binary translator is its handling of indirect branches. ...
DTT: program structure-aware indirect branch optimization via direct-TPC-table in DBT system
CF '14: Proceedings of the 11th ACM Conference on Computing FrontiersIndirect branch handling is a major source of performance overhead in Dynamic Binary Translation (DBT) systems. Most existing solutions for indirect branches involve a run-time address translation from Source Program Counter (SPC) of the branch target ...
SPIRE: improving dynamic binary translation through SPC-indexed indirect branch redirecting
VEE '13Dynamic binary translation system must perform an address translation for every execution of indirect branch instructions. The procedure to convert Source binary Program Counter (SPC) address to Translated Program Counter (TPC) address always takes more ...
Comments