skip to main content
10.1145/2935764.2935770acmconferencesArticle/Chapter ViewAbstractPublication PagesspaaConference Proceedingsconference-collections
research-article

RUBIC: Online Parallelism Tuning for Co-located Transactional Memory Applications

Authors Info & Claims
Published:11 July 2016Publication History

ABSTRACT

With the advent of Chip-Multiprocessors, Transactional Memory (TM) emerged as a powerful paradigm to simplify parallel programming. Unfortunately, as more cores become available in commodity systems, the scalability limits of a wide class of TM applications become more evident. Hence, online parallelism tuning techniques were proposed to adapt the optimal number of threads of TM applications. However, state-of-the-art solutions are exclusively tailored to single-process systems with relatively static workloads, exhibiting pathological behaviors in scenarios where multiple multi-threaded TM processes contend for the shared hardware resources.

This paper proposes RUBIC, a novel parallelism tuning method for TM applications in both single and multi-process scenarios that overcomes the shortcomings of the preciously proposed solutions. RUBIC helps the co-running processes adapt their parallelism level so that they can efficiently space-share the hardware.

When compared to previous online parallelism tuning solutions, RUBIC achieves unprecedented system-wide fairness and efficiency, both in single- and multi-process scenarios. Our evaluation with different workloads and scenarios shows that, on average, RUBIC enhances the overall performance by 26% with respect to the best-performing state-of-the-art online parallelism tuning techniques in multi-process scenarios, while incurring negligible overhead in single-process cases. RUBIC also exhibits unique features in converging to a fair and efficient state.

References

  1. W. Ruan, T. Vyas, Y. Liu, and M. Spear, "Transactionalizing legacy code: An experience report using GCC and memcached," ACM SIGARCH Computer Architecture News, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. V. Luchangco, M. Wong, H. Boehm, et al., "Transactional memory support for C+," 2014.Google ScholarGoogle Scholar
  3. C. C. Minh, J. Chung, C. Kozyrakis, and K. Olukotun, "STAMP: Stanford transactional applications for multi-processing," IISWC'08, 2008.Google ScholarGoogle Scholar
  4. K. Ravichandran and S. Pande, "F2C2-S™: Flux-based feedback-driven concurrency control for STMs," IPDPS'14, IEEE, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. T. Harris, M. Maas, and V. J. Marathe, "Callisto: co-scheduling parallel runtime systems," EuroSys'14, ACM, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. S. Peter, A. Schüpbach, P. Barham, et al., "Design principles for end-to-end multicore schedulers," HotPar'10, USENIX Association, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. D. Didona, P. Felber, D. Harmanci, P. Romano, and J. Schenker, "Identifying the optimal level of parallelism in transactional memory applications," in Networked Systems, Springer Berlin Heidelberg, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. M. Ansari, M. Luján, C. Kotselidis, et al., "Robust adaptation to available parallelism in transactional memory applications," in Transactions on High-Performance Embedded Architectures and Compilers III, Springer, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. K. Chan, K. T. Lam, and C.-L. Wang, "Adaptive thread scheduling techniques for improving scalability of software transactional memory," PDCN'11, ACTA Press., 2011.Google ScholarGoogle Scholar
  10. D. Feitelson and L. Rudolph, "Toward convergence in job schedulers for parallel supercomputers," in Job Scheduling Strategies for Parallel Processing, Springer Berlin Heidelberg, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. R. Guerraoui, M. Kapalka, and J. Vitek, "STMBench7: A benchmark for software transactional memory," EuroSys'07, ACM, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. W. Maldonado, P. Marlier, P. Felber, et al., "Scheduling support for transactional memory contention management," in ACM Sigplan Notices, ACM, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. J. F. Nash Jr, "The bargaining problem," Econometrica: Journal of the Econometric Society, 1950.Google ScholarGoogle Scholar
  14. F. P. Kelly, A. K. Maulloo, and D. K. Tan, "Rate control for communication networks: shadow prices, proportional fairness and stability," Journal of the Operational Research society, 1998.Google ScholarGoogle Scholar
  15. D. M. Chiu and R. Jain, "Analysis of the increase and decrease algorithms for congestion avoidance in computer networks," Computer Networks and ISDN systems, 1989. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. S. Ha, I. Rhee, and L. Xu, "CUBIC: a new TCP-friendly high-speed TCP variant," ACM SIGOPS Operating Systems Review, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. T. Creech, A. Kotha, and R. Barua, "Efficient multiprogramming for multicores with SCAF," MICRO-46, ACM, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. V. J. Marathe, M. F. Spear, C. Heriot, A. Acharya, D. Eisenstat, W. N. Scherer III, and M. L. Scott, "Lowering the overhead of nonblocking software transactional memory," TRANSACT'06, 2006.Google ScholarGoogle Scholar
  19. A. Dragojević, R. Guerraoui, and M. Kapalka, "Stretching transactional memory," PLDI'09, ACM, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. H. Heiss and R. Wagner, "Adaptive load control in transaction processing systems," VLDB'91, Morgan Kaufmann Publishers Inc., 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. A. Mohtasham and J. Barreto, "Brief announcement: Fair adaptive parallelism for concurrent transactional memory applications," SPAA'15, ACM, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. K. Pusukuri, R. Gupta, and L. Bhuyan, "Thread reinforcer: Dynamically determining number of threads via OS level monitoring," IISWC'11, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. R. Guerraoui, M. Herlihy, and B. Pochon, "Toward a theory of transactional contention managers," PODC'05, ACM, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. W. N. Scherer, III and M. L. Scott, "Advanced contention management for dynamic software transactional memory," PODC'05, ACM, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. R. M. Yoo and H.-H. S. Lee, "Adaptive transaction scheduling for transactional memory systems," SPAA'08, ACM, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. A. Mohtasham, R. Filipe, and J. Barreto, "FRAME: Fair resource allocation in multi-process environments," ICPADS'15, IEEE, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. D. Rughetti, P. Di Sanzo, A. Pellegrini, B. Ciciani, and F. Quaglia, "Tuning the level of concurrency in software transactional memory: An overview of recent analytical, machine learning and mixed approaches," in Transactional Memory. Foundations, Algorithms, Tools, and Applications, Springer, 2015.Google ScholarGoogle Scholar
  28. D. Rughetti, P. Romano, F. Quaglia, and B. Ciciani, "Automatic tuning of the parallelism degree in hardware transactional memory," EuroPar'14, Springer, 2014.Google ScholarGoogle Scholar
  29. K. Agrawal, Y. He, W. J. Hsu, and C. E. Leiserson, "Adaptive scheduling with parallelism feedback," PPoPP'06, ACM, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. A. S. Tanenbaum and H. Bos, Modern operating systems. Prentice Hall Press, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. S. Zhuravlev, J. C. Saez, S. Blagodurov, A. Fedorova, and M. Prieto, "Survey of scheduling techniques for addressing shared resources in multicore processors," ACM Computing Surveys (CSUR), 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. A. Merkel, J. Stoess, and F. Bellosa, "Resource-conscious scheduling for energy efficiency on multicore processors," in EuroSys'10, ACM, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. RUBIC: Online Parallelism Tuning for Co-located Transactional Memory Applications

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          SPAA '16: Proceedings of the 28th ACM Symposium on Parallelism in Algorithms and Architectures
          July 2016
          492 pages
          ISBN:9781450342100
          DOI:10.1145/2935764

          Copyright © 2016 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 11 July 2016

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          Overall Acceptance Rate447of1,461submissions,31%

          Upcoming Conference

          SPAA '24

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader