skip to main content
10.1145/2967938.2967939acmconferencesArticle/Chapter ViewAbstractPublication PagespactConference Proceedingsconference-collections
research-article
Public Access

Characterizing and Optimizing the Performance of Multithreaded Programs Under Interference

Authors Info & Claims
Published:11 September 2016Publication History

ABSTRACT

As virtualization becomes ubiquitous in datacenters, there is a growing interest in characterizing application performance in multi-tenant environments to improve datacenter resource management. The performance of parallel programs is notoriously difficult to reason about in virtualized environments. Although performance degradations caused by virtualization and interferences have been extensively studied, there still lacks a comprehensive understanding why parallel programs have unpredictable slowdowns when co-located with different types of workloads.

This paper presents a systematic and quantitative study of multithreaded performance under interference. We design synthetic workloads to emulate different types of interference and study the behavior of parallel programs under such interferences. We find that unpredictable performance is the result of complex interplays between the design of the program, the memory hierarchy of the host system, and the CPU scheduling at the hypervisor. To understand the intricate relationships between multiple factors, we decompose parallel runtime into compute, synchronization and steal time, and use the runtime breakdown to measure program progress and identify execution inefficiency under interference. Based on these findings, we develop an online approach to predicting performance slowdown without requiring parallel programs to be completed, and devise two scheduling optimizations at the hypervisor to reduce slowdowns. Experimental results with Xen and representative parallel workloads show that the online performance prediction achieves on average less than 4.5% error and the optimizations reduce runtime slowdown by as much as 38% compared to stock Xen.

References

  1. J. Ahn, C. H. Park, and J. Huh. Micro-sliced virtual processors to hide the effect of discontinuous cpu availability for consolidated systems. In Proc of MICRO, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. D. H. Bailey, E. Barszcz, J. T. Barton, D. S. Browning, R. L. Carter, L. Dagum, R. A. Fatoohi, P. O. Frederickson, T. A. Lasinski, R. S. Schreiber, H. D. Simon, V. Venkatakrishnan, and S. K. Weeratunga. The nas parallel benchmarks summary and preliminary results. In Proc. of SC, 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. K. based virtual machine. http://www.linux-kvm.org/.Google ScholarGoogle Scholar
  4. S. Blagodurov, S. Zhuravlev, and A. Fedorova. Contention-aware scheduling on multicore systems. ACM Trans. Comput. Syst., 28(4), 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. K. Chakraborty, P. M. Wells, and G. S. Sohi. Supporting overcommitted virtual machines through hardware spin detection. IEEE Trans. Parallel Distrib. Syst., 23(2), 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. D. Chandra, F. Guo, S. Kim, and Y. Solihin. Predicting inter-thread cache contention on a chip multi-processor architecture. In Proc. of HPCA, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. T. Creech, A. Kotha, and R. Barua. Efficient multiprogramming for multicores with scaf. In Proc. of MICRO-46, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. C. Delimitrou and C. Kozyrakis. Paragon: Qos-aware scheduling for heterogeneous datacenters. In Proc. of ASPLOS, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. C. Delimitrou and C. Kozyrakis. Quasar: Resource-efficient and qos-aware cluster management. In Proc. of ASPLOS, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. T. Dey, W. Wang, J. W. Davidson, and M. L. Soffa. Characterizing multi-threaded applications based on shared-resource contention. In Proc. of ISPASS, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. X. Ding, P. B. Gibbons, M. A. Kozuch, and J. Shan. Gleaner: Mitigating the blocked-waiter wakeup problem for virtualized multicore applications. In Proc. of USENIX ATC, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. F. Guo and Y. Solihin. A framework for providing quality of service in chip multi-processors. In Proc. of MICRO, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. T. Harris, M. Mass, and V. J. Marathe. Callisto: Co-scheduling parallel runtime systems. In Proc. of EuroSys, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. R. Iyer, L. Zhao, F. Guo, R. Illikkal, S. Makineni, D. Newell, Y. Solihin, L. Hsu, and S. Reinhardt. Qos policy and architecture for cache/memory in cmp platforms. In Proc. of SIGMETRICS, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. M. Kambadur, T. Moseley, R. Hank, and M. A. Kim. Measuring interference between live datacenter applications. In Proc. of SC, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. H. Kim, S. Kim, J. Jeong, J. Lee, and S. Maeng. Demand-based coordinated scheduling for smp vms. In Proc. of ASPLOS, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. F. Liu and Y. Solihin. Studying the impact of hardware prefetching and bandwidth partitioning in chip-multiprocessors. In Proc. of SIGMETRICS, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. X. Liu and B. Wu. Scaanalyzer: A tool to identify memory scalability bottlenecks in parallel programs. In Proc. of SC, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. J. Mars, L. Tang, R. Hundt, K. Skadron, and M. L. Soffa. Bubble-up: Increasing utilization in modern warehouse scale computers via sensible co-locations. In Proc. of MICRO, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. P. E. McKenney. Differential profiling. Software - Practice and Experience, 29(3), 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. O. Mutlu and T. Moscibroda. Stall-time fair memory access scheduling for chip multiprocessors. In Proc. of MICRO, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. D. Novaković, N. Vasić, S. Novaković, D. Kostić, and R. Bianchini. Deepdive: Transparently identifying and managing performance interference in virtualized environments. In Proc. of USENIX ATC, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. J. Ousterhout. Scheduling techniques for concurrent systems.Google ScholarGoogle Scholar
  24. A. Raman, A. Zaks, J. W. Lee, and D. I. August. Parcae: A system for flexible parallel execution. In Proc. of PLDI, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. J. Rao and X. Zhou. Towards fair and efficient smp virtual machine scheduling. In Proc. of PPoPP, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. X. Song, H. Chen, and B. Zang. Characterizing the performance and scalability of many-core applications on virtualized platforms. Technical Report FDUPPITR-2010-002, Parallel Processing Institute, Fudan University, 2010.Google ScholarGoogle Scholar
  27. S. Sridharan, G. Gupta, and G. S. Sohi. Holistic run-time parallelism management for time and energy efficiency. In Proc. of ICS, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. S. Sridharan, G. Gupta, and G. S. Sohi. Adaptive, efficient, parallel execution of parallel programs. In Proc. of PLDI, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. O. Sukwong and H. S. Kim. Is co-scheduling too expensive for smp vms? In Proc. of EuroSys, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. L. Tang, J. Mars, W. Wang, T. Dey, and M. L. Soffa. Reqos: Reactive static/dynamic compilation for qos in warehouse scale computers. In Proc. of ASPLOS, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. The PARSEC benchmarks. http://parsec.cs.princeton.edu/.Google ScholarGoogle Scholar
  32. V. Uhlig, J. LeVasseur, E. Skoglund, and U. Dannowski. Towards scalable multiprocessor virtual machines. In Proc. of VM, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. C. Weng, Q. Liu, L. Yu, and M. Li. Dynamic adaptive scheduling for virtual machines. In Proc. of HPDC, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. C. Xu, S. Gamage, P. N. Rao, A. Kangarlou, R. R. Kompella, and D. Xu. vslicer: Latency-aware virtual machine scheduling via differentiated-frequency cpu slicing. In Proc. of HPDC, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. H. Yang, A. Breslow, J. Mars, and L. Tang. Bubble-flux: Precise online qos management for increased utilization in warehouse scale computers. In Proc. of ISCA, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. X. Zhang, E. Tune, R. Hagmann, R. Jnagal, V. Gokhale, and J. Wilkes. Cpi2: Cpu performance isolation for shared compute clusters. In Proc. of EuroSys, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. F. Zhou, M. Goel, P. Desnoyers, and R. Sundaram. Scheduler vulnerabilities and coordinated attacks in cloud computing. J. Comput. Secur., 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Characterizing and Optimizing the Performance of Multithreaded Programs Under Interference

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      PACT '16: Proceedings of the 2016 International Conference on Parallel Architectures and Compilation
      September 2016
      474 pages
      ISBN:9781450341219
      DOI:10.1145/2967938

      Copyright © 2016 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 11 September 2016

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      PACT '16 Paper Acceptance Rate31of119submissions,26%Overall Acceptance Rate121of471submissions,26%

      Upcoming Conference

      PACT '24
      International Conference on Parallel Architectures and Compilation Techniques
      October 14 - 16, 2024
      Southern California , CA , USA

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader