ABSTRACT
With the advent of Chip-Multiprocessors, Transactional Memory (TM) emerged as a powerful paradigm to simplify parallel programming. Unfortunately, as more cores become available in commodity systems, the scalability limits of a wide class of TM applications become more evident. Hence, online parallelism tuning techniques were proposed to adapt the optimal number of threads of TM applications. However, state-of-the-art solutions are exclusively tailored to single-process systems with relatively static workloads, exhibiting pathological behaviors in scenarios where multiple multi-threaded TM processes contend for the shared hardware resources.
This paper proposes RUBIC, a novel parallelism tuning method for TM applications in both single and multi-process scenarios that overcomes the shortcomings of the preciously proposed solutions. RUBIC helps the co-running processes adapt their parallelism level so that they can efficiently space-share the hardware.
When compared to previous online parallelism tuning solutions, RUBIC achieves unprecedented system-wide fairness and efficiency, both in single- and multi-process scenarios. Our evaluation with different workloads and scenarios shows that, on average, RUBIC enhances the overall performance by 26% with respect to the best-performing state-of-the-art online parallelism tuning techniques in multi-process scenarios, while incurring negligible overhead in single-process cases. RUBIC also exhibits unique features in converging to a fair and efficient state.
- W. Ruan, T. Vyas, Y. Liu, and M. Spear, "Transactionalizing legacy code: An experience report using GCC and memcached," ACM SIGARCH Computer Architecture News, 2014. Google ScholarDigital Library
- V. Luchangco, M. Wong, H. Boehm, et al., "Transactional memory support for C+," 2014.Google Scholar
- C. C. Minh, J. Chung, C. Kozyrakis, and K. Olukotun, "STAMP: Stanford transactional applications for multi-processing," IISWC'08, 2008.Google Scholar
- K. Ravichandran and S. Pande, "F2C2-S™: Flux-based feedback-driven concurrency control for STMs," IPDPS'14, IEEE, 2014. Google ScholarDigital Library
- T. Harris, M. Maas, and V. J. Marathe, "Callisto: co-scheduling parallel runtime systems," EuroSys'14, ACM, 2014. Google ScholarDigital Library
- S. Peter, A. Schüpbach, P. Barham, et al., "Design principles for end-to-end multicore schedulers," HotPar'10, USENIX Association, 2010. Google ScholarDigital Library
- D. Didona, P. Felber, D. Harmanci, P. Romano, and J. Schenker, "Identifying the optimal level of parallelism in transactional memory applications," in Networked Systems, Springer Berlin Heidelberg, 2013. Google ScholarDigital Library
- M. Ansari, M. Luján, C. Kotselidis, et al., "Robust adaptation to available parallelism in transactional memory applications," in Transactions on High-Performance Embedded Architectures and Compilers III, Springer, 2011. Google ScholarDigital Library
- K. Chan, K. T. Lam, and C.-L. Wang, "Adaptive thread scheduling techniques for improving scalability of software transactional memory," PDCN'11, ACTA Press., 2011.Google Scholar
- D. Feitelson and L. Rudolph, "Toward convergence in job schedulers for parallel supercomputers," in Job Scheduling Strategies for Parallel Processing, Springer Berlin Heidelberg, 1996. Google ScholarDigital Library
- R. Guerraoui, M. Kapalka, and J. Vitek, "STMBench7: A benchmark for software transactional memory," EuroSys'07, ACM, 2007. Google ScholarDigital Library
- W. Maldonado, P. Marlier, P. Felber, et al., "Scheduling support for transactional memory contention management," in ACM Sigplan Notices, ACM, 2010. Google ScholarDigital Library
- J. F. Nash Jr, "The bargaining problem," Econometrica: Journal of the Econometric Society, 1950.Google Scholar
- F. P. Kelly, A. K. Maulloo, and D. K. Tan, "Rate control for communication networks: shadow prices, proportional fairness and stability," Journal of the Operational Research society, 1998.Google Scholar
- D. M. Chiu and R. Jain, "Analysis of the increase and decrease algorithms for congestion avoidance in computer networks," Computer Networks and ISDN systems, 1989. Google ScholarDigital Library
- S. Ha, I. Rhee, and L. Xu, "CUBIC: a new TCP-friendly high-speed TCP variant," ACM SIGOPS Operating Systems Review, 2008. Google ScholarDigital Library
- T. Creech, A. Kotha, and R. Barua, "Efficient multiprogramming for multicores with SCAF," MICRO-46, ACM, 2013. Google ScholarDigital Library
- V. J. Marathe, M. F. Spear, C. Heriot, A. Acharya, D. Eisenstat, W. N. Scherer III, and M. L. Scott, "Lowering the overhead of nonblocking software transactional memory," TRANSACT'06, 2006.Google Scholar
- A. Dragojević, R. Guerraoui, and M. Kapalka, "Stretching transactional memory," PLDI'09, ACM, 2009. Google ScholarDigital Library
- H. Heiss and R. Wagner, "Adaptive load control in transaction processing systems," VLDB'91, Morgan Kaufmann Publishers Inc., 1991. Google ScholarDigital Library
- A. Mohtasham and J. Barreto, "Brief announcement: Fair adaptive parallelism for concurrent transactional memory applications," SPAA'15, ACM, 2015. Google ScholarDigital Library
- K. Pusukuri, R. Gupta, and L. Bhuyan, "Thread reinforcer: Dynamically determining number of threads via OS level monitoring," IISWC'11, 2011. Google ScholarDigital Library
- R. Guerraoui, M. Herlihy, and B. Pochon, "Toward a theory of transactional contention managers," PODC'05, ACM, 2005. Google ScholarDigital Library
- W. N. Scherer, III and M. L. Scott, "Advanced contention management for dynamic software transactional memory," PODC'05, ACM, 2005. Google ScholarDigital Library
- R. M. Yoo and H.-H. S. Lee, "Adaptive transaction scheduling for transactional memory systems," SPAA'08, ACM, 2008. Google ScholarDigital Library
- A. Mohtasham, R. Filipe, and J. Barreto, "FRAME: Fair resource allocation in multi-process environments," ICPADS'15, IEEE, 2015. Google ScholarDigital Library
- D. Rughetti, P. Di Sanzo, A. Pellegrini, B. Ciciani, and F. Quaglia, "Tuning the level of concurrency in software transactional memory: An overview of recent analytical, machine learning and mixed approaches," in Transactional Memory. Foundations, Algorithms, Tools, and Applications, Springer, 2015.Google Scholar
- D. Rughetti, P. Romano, F. Quaglia, and B. Ciciani, "Automatic tuning of the parallelism degree in hardware transactional memory," EuroPar'14, Springer, 2014.Google Scholar
- K. Agrawal, Y. He, W. J. Hsu, and C. E. Leiserson, "Adaptive scheduling with parallelism feedback," PPoPP'06, ACM, 2006. Google ScholarDigital Library
- A. S. Tanenbaum and H. Bos, Modern operating systems. Prentice Hall Press, 2014. Google ScholarDigital Library
- S. Zhuravlev, J. C. Saez, S. Blagodurov, A. Fedorova, and M. Prieto, "Survey of scheduling techniques for addressing shared resources in multicore processors," ACM Computing Surveys (CSUR), 2012. Google ScholarDigital Library
- A. Merkel, J. Stoess, and F. Bellosa, "Resource-conscious scheduling for energy efficiency on multicore processors," in EuroSys'10, ACM, 2010. Google ScholarDigital Library
Index Terms
- RUBIC: Online Parallelism Tuning for Co-located Transactional Memory Applications
Recommendations
Virtual world consistency: A condition for STM systems (with a versatile protocol with invisible read operations)
The aim of a Software Transactional Memory (STM) is to discharge the programmers from the management of synchronization in multiprocess programs that access concurrent objects. To that end, an STM system provides the programmer with the concept of a ...
A Lock-Based STM Protocol That Satisfies Opacity and Progressiveness
OPODIS '08: Proceedings of the 12th International Conference on Principles of Distributed SystemsThe aim of a software transactional memory (STM) system is to facilitate the delicate problem of low-level concurrency management, i.e. the design of programs made up of processes/threads that concurrently access shared objects. To that end, a STM ...
A new concurrency control mechanism for multi-threaded environment using transactional memory
Software transactional memory (STM) is one of the techniques used towards achieving non-blocking process synchronization in multi-threaded computing environment. In spite of its high potential, one of the major limitations of transactional memory (TM) ...
Comments