Skip to main content
Log in

Hardware Acceleration of Red-Black Tree Management and Application to Just-In-Time Compilation

  • Published:
Journal of Signal Processing Systems Aims and scope Submit manuscript

Abstract

Due to the everlasting consumer demand for more complex applications, embedded systems have evolved both in terms of complexity and heterogeneity. The architecture of such systems often includes several kinds of different computing resources (DSPs, GPUs, etc.). As a consequence, software designers are facing significant performance and portability issues to target these devices. Software relies more and more on virtualization technologies to maximize portability of applications. In order to balance portability and performance, most virtualization technologies leverage Just-in-time (JIT) compilation to provide runtime optimized code from portable one. Nevertheless, the efficiency of JIT compilation depends on the ability to compensate its overhead with execution speedups of generated code. While most research efforts focus on limiting overhead of JIT compilation phases by reducing their occurrences, this paper investigates opportunities of speeding up JIT compilation itself. We first present a performance analysis of different JIT compilation technologies in order to identify hardware and software optimization opportunities. Second, we propose a solution based on a dedicated processor with specialized instructions for critical functions of JIT compilers. These specialized instructions provide an average 5× speedup on manipulations of associative arrays and dynamic memory allocation. Based on the LLVM framework, we show a 15% overall speedup on code generator’s execution time. Because our specialized instructions are hidden behind standard libraries, we also argue that these instructions may be transparently reused for a wider range of applications.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8
Figure 9
Figure 10
Figure 11
Figure 12
Figure 13
Figure 14

Similar content being viewed by others

References

  1. Apple Inc (Original authors) and Khronos Group (Developpers) OpenCL (Open Computing Language), [Online, March 2014]. http://www.khronos.org/opencl/.

  2. ARM (2014). Cortex-A5 Processor. http://www.arm.com/products/processors/cortex-a/cortex-a5.php..

  3. ARM Limited Steele S., Java Program Manager. White paper: Accelerating to meet the challenge of embedded java, november 2001.

  4. Asanovic, K., Bodik, R., Catanzaro, B.C., Gebis, J.J., Husbands, P., Keutzer, K., Patterson, D.A., Plishker, W.L., Shalf, J., Williams, S.W., Yelick, K.A. (2006). The landscape of parallel computing research: A view from berkeley. Tech. Rep. UCB/EECS-2006-183, EECS Department, University of California, Berkeley.

  5. Aycock, J. (2003). A brief history of just-in-time. ACM Computing Surveys, 35, 97–113.

    Article  Google Scholar 

  6. Baiocchi, J., Childers, B.R., Davidson, J.W., Hiser, J.D., Misurda, J. (2007). Fragment cache management for dynamic binary translators in embedded systems with scratchpad. In: Proceedings of the 2007 international conference on compilers, architecture, and synthesis for embedded systems, CASES ’07, pp 75-84, New York, ACM.

  7. Bayer, R. (1972). Symmetric binary b-trees: Data structure and maintenance algorithms. Informatica Acta, 1, 290–306.

    Article  MATH  MathSciNet  Google Scholar 

  8. Berger, E.D., Zorn, B.G., McKinley, K.S. (2002). Reconsidering custom memory allocation. In Proceedings of the 17th ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications, OOPSLA ’02, pp. 1–12, New York, ACM.

  9. Borkar, S., & Chien, A.A. (2011). The future of microprocessors. Commun ACM, 54(5), 67–77.

    Article  Google Scholar 

  10. Campanoni, S., Agosta, G., Reghizzi, S.C. (2008). A parallel dynamic compiler for cil bytecode. SIGPLAN Not, 43(4), 11–20. doi:http://dx.doi.org/10.1145/1374752.1374754.

    Article  Google Scholar 

  11. Cao, T., Blackburn, S.M., Gao, T., McKinley, K.S. (2012). The yin and yang of power and performance for asymmetric hardware and managed software. In: Proceedings of the 39th Annual International Symposium on Computer Architecture, ISCA ’12, pp 225-236 Washington, DC, USA, IEEE Computer Society.

  12. Carbon, A., Lhuillier, Y., Charles, H.P. (2013). Hardware acceleration for just-in-time compilation on heterogeneous embedded systems. In: Application-Specific Systems, Architectures and Processors (ASAP), 2013 IEEE 24th International Conference on, pp 203-210.

  13. Carbon, A., Lhuillier, Y., Charles, H.-P. (2013). Code specialization for red-black tree man- agement algorithms. In Proceedings of the 3rd international workshop on adaptive self-tuning computing systems, ADAPT ’13, page To appear, New York, ACM.

  14. CEA LIST. Unisim virtual platforms. http://unisim-vp.org/site/index.html. [On- line, March 2014].

  15. Chang, M., Smith, E., Reitmaier, R., Bebenita, M., Gal, A., Wimmer, C., Eich, B., Franz, M. (2009). Tracing for web 3.0: trace compilation for the next generation web applications. In: Proceedings of the ACM SIGPLAN/SIGOPS international conference on Virtual execution environments, VEE ’09, pp 71-80 New York, ACM.

  16. Charles, H.-P., & Sajjad, K. (2009). HPBCG High Performance Binary Code Generator. [Online, March 2014]. http://code.google.com/p/hpbcg/.

  17. Chen, G., Kandemir, M., Vijaykrishnan, N., Irwin, M.J. (2003). Energy-aware code cache management for memory-constrained java devices. In SOC Conference, 2003. Proceedings. IEEE International [Systems-on-Chip], 179–182.

  18. Cohen, A., & Rohou, E. (2010). Processor virtualization and split compilation for hetero- geneous multicore embedded systems. In Proceedings of the 47th Design Automation Conference, DAC ’10, pages 102-107, New York, ACM.

  19. Gal, A., Probst, C.W., Franz, M. (2006). Hotpathvm: an effective jit compiler for resource-constrained devices. In: Proceedings of the 2nd international conference on virtual execution environments, VEE ’06, pp 144-153, New York, NY, USA, ACM.

  20. Guibas, L.J., & Sedgewick, R. (1978). A dichromatic framework for balanced trees. IEEE Annual Symposium on Foundations of Computer Science, 0, 8–21.

    MathSciNet  Google Scholar 

  21. Guthaus, M.R., Ringenberg, J.S., Ernst, D., Austin, T.M., Mudge, T., Brown, R.B. (2001). MiBench: A free, commercially representative embedded benchmark suite. In Pro- ceedings of the Workload Characterization, 2001. WWC-4. 2001 IEEE International Workshop, WWC ’01, Washington. IEEE Computer Society, 3–14.

  22. Heiser, G. (2008). The role of virtualization in embedded systems. In: Proceedings of the 1st workshop on Isolation and integration in embedded systems, IIES ’08, pp 11-16, New York, NY, USA, ACM.

  23. Kulkarni, P.A., & Fuller, J. (2011). Jit compilation policy on single-core and multi-core ma- chines. In Proceedings of the 2011 15th workshop on interaction between compilers and computer architectures, INTERACT ’11, Washington. IEEE Computer Society, 54–62.

  24. Kumar, R., Farkas, K.I., Jouppi, N.P., Ranganathan, P., Tullsen, D.M. (2003). Single-isa heterogeneous multi-core architectures: The potential for processor power reduction. In: Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture, MICRO 36, pp 81, Washington, IEEE Computer Society.

  25. Lattner, C., & Adve. V. (2004). LLVM: A compilation framework for lifelong program analysis & transformation. In Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization, CGO ’04, pp. 75, Washington, IEEE Computer Society.

  26. Lea, D. (2000). A memory allocator. http://g.oswego.edu/dl/html/malloc.html.

  27. Moore, R.W., Baiocchi, J.A., Childers, B.R., Davidson, J.W., Hiser, J.D. (2009). Addressing the challenges of dbt for the arm architecture. In Proceedings of the 2009 ACM SIGPLAN/SIGBED conference on languages, compilers, and tools for embedded sys- tems, LCTES ’09, pp. 147–156, New York,ACM.

  28. Nethercote, N., & Seward, J. (2007). Valgrind: a framework for heavyweight dynamic binary instrumentation. In Proceedings of the 2007 ACM SIGPLAN conference on Program- ming language design and implementation, PLDI ’07, pp. 89–100, New York, NY, USA, ACM.

  29. Nuzman, D., Dyshel, S., Rohou, E., Rosen, I., Williams, K., Yuste, D., Cohen, A., Zaks, A. (2011). Vapor simd: Auto-vectorize once, run everywhere. In Proceedings of the 9th Annual IEEE/ACM international symposium on code generation and optimization, CGO ’11, pp 151–160, Washington, DC, USA, IEEE Computer Society.

  30. Pty Ltd Southern Storm Software (2014). Dotgnu project.

  31. Radhakrishnan, R., John, L.K., Rubio, J., Vijaykrishnan, N. (1999). Execution characteristics of just-in-time compilers.

  32. Rigo, A. (2004). Representation-based just-in-time specialization and the psyco prototype for python. In Proceedings of the 2004 ACM SIGPLAN symposium on Partial evaluation and semantics-based program manipulation, PEPM ’04, pages 15-26, New York, ACM.

  33. Schoeberl, M. (2008). A java processor architecture for embedded real-time systems. J Syst Archit, 54(1-2), 265–286.

    Article  Google Scholar 

  34. Shaylor, N. (2002). A just-in-time compiler for memory-constrained low-power devices. In: Proceedings of the 2nd Java Virtual Machine Research and Technology Symposium, USENIX Association, Berkeley, (pp. 119–126). USA: CA.

  35. Suleman, M.A., Mutlu, O., Qureshi, M.K., Patt, Y.N. (2009). Accelerating critical section execution with asymmetric multi-core architectures. SIGPLAN Not, 44(3), 253– 264.

    Article  Google Scholar 

  36. Van Vleck, T. (2014). The IBM 360/67 and CP/CMS. URLhttp://www.multicians.org/thvv/360-67.html.

  37. Xamarin (2014). The Mono Project. http://www.mono-project.com.

  38. Yang, B.S., Moon, S.-M., Park, S., Lee, J., Lee, S., Park, J., Chung, Y.C., Kim, S., Ebcioglu, K., Altman, E. (1999). Latte: A java vm just-in-time compiler with fast and efficient register allocation. In: Proceedings of the 1999 International Conference on Parallel Architectures and Compilation Techniques, PACT ’99, pp 128 Washington, DC, USA, IEEE Computer Society.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Alexandre Carbon.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Carbon, A., Lhuillier, Y. & Charles, HP. Hardware Acceleration of Red-Black Tree Management and Application to Just-In-Time Compilation. J Sign Process Syst 77, 95–115 (2014). https://doi.org/10.1007/s11265-014-0902-3

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11265-014-0902-3

Keywords

Navigation