ABSTRACT
Many scripting languages use a Global Interpreter Lock (GIL) to simplify the internal designs of their interpreters, but this kind of lock severely lowers the multi-thread per-formance on multi-core machines. This paper presents our first results eliminating the GIL in Ruby using Hardware Transactional Memory (HTM) in the IBM zEnterprise EC12 and Intel 4th Generation Core processors. Though prior prototypes replaced a GIL with HTM, we tested real-istic programs, the Ruby NAS Parallel Benchmarks (NPB), the WEBrick HTTP server, and Ruby on Rails. We devised a new technique to dynamically adjust the transaction lengths on a per-bytecode basis, so that we can optimize the likelihood of transaction aborts against the relative overhead of the instructions to begin and end the transactions. Our results show that HTM achieved 1.9- to 4.4-fold speedups in the NPB programs over the GIL with 12 threads, and 1.6- and 1.2-fold speedups in WEBrick and Ruby on Rails, respectively. The dynamic transaction-length adjustment chose the best transaction lengths for any number of threads and applications with sufficiently long running times.
- Blundell, C., Raghavan, A., and Martin, M. M. K. RETCON: transactional repair without replay. In ISCA, pp. 258--269, 2010. Google ScholarDigital Library
- Cascaval, C., Blundell, C., Michael, M., Cain, H. W., Wu, P., Chiras, S., and Chatterjee, S. Software transactional memory: why is it only a research toy? ACM Queue, 6(5), pp. 46--58, 2008. Google ScholarDigital Library
- Dice, D., Lev, Y., Moir, M., and Nussbaum, D. Early experience with a commercial hardware transactional memory implementation. In ASPLOS, pp. 157--168, 2009. Google ScholarDigital Library
- ECMAScript. http://www.ecmascript.org/.Google Scholar
- Haring, R. A., Ohmacht, M., Fox, T. W., Gschwind, M. K., Satterfield, D. L., Sugavanam, K., Coteus, P. W., Heidelberger, P., Blumrich, M. A., Wisniewski, R.W., Gara, A., Chiu, G. L.-T., Boyle, P.A., Chist, N.H., and Kim, C. The IBM Blue Gene/Q compute chip. IEEE Micro, 32(2), pp. 48--60, 2012. Google ScholarDigital Library
- IBM. Power ISA Transactional Memory. Power.org, 2012.Google Scholar
- IBM. z/Architecture Principles of Operation Tenth Edition (September, 2012). http://publibfi.boulder.ibm.com/epubs/pdf/dz9zr009.pdf.Google Scholar
- Intel Corporation. Intel Architecture Instruction Set Extensions Programming Reference. 319433-012a edition, 2012.Google Scholar
- IronPython, http://ironpython.codeplex.com/.Google Scholar
- IronRuby, http://www.ironruby.net/.Google Scholar
- Jacobi, C., Slegel, T., and Greinder, D. Transactional memory architecture and implementation for IBM System z. In MICRO 45, 2012. Google ScholarDigital Library
- JRuby, http://jruby.org/.Google Scholar
- Jython, http://www.jython.org/.Google Scholar
- Lua, http://www.lua.org/Google Scholar
- Minh, C. C., Chung, J., Kozyrakis, C., and Olukotun, K. STAMP: Stanford transactional applications for multi-processing. In IISWC, pp. 35--46, 2008.Google Scholar
- NAS Parallel Benchmarks, http://www.nas.nasa.gov/publications/npb.html.Google Scholar
- Nose, T. Ruby version of NAS Parallel Benchmarks 3.0. http://www-hiraki.is.s.u-tokyo.ac.jp/members/tknose/.Google Scholar
- Odaira, R. and Castanos, J. G. Eliminating global interpreter locks in Ruby through hardware transactional memory. Research Report RT0950, IBM Research -- Tokyo, 2013.Google Scholar
- Perl threads, http://perldoc.perl.org/perlthrtut.html.Google Scholar
- PyPy Status Blog. We need Software Transactional Memory. http://morepypy.blogspot.jp/2011/08/we-need-software-transactional-memory.html.Google Scholar
- Python programming language. http://www.python.org/.Google Scholar
- Rajwar, R. and Goodman, J. R. Speculative lock elision: enabling highly concurrent multithreaded execution. In MICRO, pp. 294--305, 2001. Google ScholarDigital Library
- Riley, N. and Zilles, C. Hardware transactional memory support for lightweight dynamic language evolution. In Dynamic Language Symposium (OOPSLA Companion), pp. 998--1008, 2006. Google ScholarDigital Library
- Rubinius, http://rubini.us/.Google Scholar
- Ruby on Rails. http://rubyonrails.org/.Google Scholar
- Ruby programming language, http://www.ruby-lang.org/.Google Scholar
- Shum, C.-L. IBM zNext: the 3rd generation high frequency micro-processor chip. In HotChips 24, 2012.Google Scholar
- Stuecheli, J. Next Generation POWER microprocessor. In HotChips 25, 2013.Google Scholar
- Tabba, F. Adding concurrency in python using a commercial processor's hardware transactional memory support. ACM SIGARCH Computer Architecture News, 38(5), pp. 12--19, 2010. Google ScholarDigital Library
- Tatsubori, M., Tozawa, A., Suzumura, T., Trent, S., Onodera, T. Evaluation of a just-in-time compiler retrofitted for PHP. In VEE, pp. 121--132, 2010. Google ScholarDigital Library
- Wang, A., Gaudet, M., Wu, P., Ohmacht, M., Amaral, J. N., Barton, C., Silvera, R., Michael, M. M. Evaluation of Blue Gene/Q hardware support for transactional memories. In PACT, pp. 127--136, 2012. Google ScholarDigital Library
Index Terms
- Eliminating global interpreter locks in ruby through hardware transactional memory
Recommendations
Eliminating global interpreter locks in ruby through hardware transactional memory
PPoPP '14Many scripting languages use a Global Interpreter Lock (GIL) to simplify the internal designs of their interpreters, but this kind of lock severely lowers the multi-thread per-formance on multi-core machines. This paper presents our first results ...
Transactional Lock Elision Meets Combining
PODC '17: Proceedings of the ACM Symposium on Principles of Distributed ComputingFlat combining (FC) and transactional lock elision (TLE) are two techniques that facilitate efficient multi-thread access to a sequentially implemented data structure protected by a lock. FC allows threads to delegate their operations to another (...
Hardware Support for Relaxed Concurrency Control in Transactional Memory
MICRO '43: Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on MicroarchitectureToday's transactional memory systems implement the two-phase-locking (2PL) algorithm which aborts transactions every time a conflict happens. 2PL is a simple algorithm that provides fast transactional operations. However, it limits concurrency in ...
Comments