skip to main content
10.1145/1985793.1985822acmconferencesArticle/Chapter ViewAbstractPublication PagesicseConference Proceedingsconference-collections
research-article

LIME: a framework for debugging load imbalance in multi-threaded execution

Published: 21 May 2011 Publication History

Abstract

With the ubiquity of multi-core processors, software must make effective use of multiple cores to obtain good performance on modern hardware. One of the biggest roadblocks to this is load imbalance, or the uneven distribution of work across cores. We propose LIME, a framework for analyzing parallel programs and reporting the cause of load imbalance in application source code. This framework uses statistical techniques to pinpoint load imbalance problems stemming from both control flow issues (e.g., unequal iteration counts) and interactions between the application and hardware (e.g., unequal cache miss counts). We evaluate LIME on applications from widely used parallel benchmark suites, and show that LIME accurately reports the causes of load imbalance, their nature and origin in the code, and their relative importance.

References

[1]
M. Aguilera, J. Mogul, J. Wiener, P. Reynolds, and A. Muthitacharoen. Performance debugging for distributed systems of black boxes. In 9th Symp. on Operating Systems Principles (SOSP), 2003.
[2]
A. Bhattacharjee and M. Martonosi. Thread criticality predictors for dynamic performance, power, and resource management in chip multiprocessors. In 36th Int'l. Symp. on Computer Architecture (ISCA), 2009.
[3]
C. Bienia, S. Kumar, and K. Li. PARSEC vs. SPLASH-2: A quantitative comparison of two multithreaded benchmark suites on chip-multiprocessors. In 2008 Int'l. Symp. on Workload Characterization, 2008.
[4]
C. Bienia, S. Kumar, J. P. Singh, and K. Li. The PARSEC benchmark suite: Characterization and architectural implications. In 17th Int'l. Conf. on Parallel Architectures and Compilation Techniques (PACT), 2008.
[5]
Q. Cai, J. González, R. Rakvic, G. Magklis, P. Chaparro, and A. González. Meeting points: using thread criticality to adapt multicore hardware to parallel regions. In 17th Int'l Conf. on Parallel Arch. and Compilation Techniques (PACT), 2008.
[6]
M. Efroymson. Multiple regression analysis. In Mathematical Methods for Digital Computers, pages 191--203. Wiley, 1960.
[7]
G. H. Golub and C. F. Van Loan. Matrix computations. John Hopkins Univ. Press, 1996.
[8]
Intel Corp. Intel threading building blocks 3.0, 2010. http://www.intel.com/software/products/tbb/.
[9]
A. K. Jain and R. C. Dubes. Algorithms for clustering data. Prentice-Hall, Inc., 1988.
[10]
A. Jaleel, M. Mattina, and B. Jacob. Last level cache (LLC) performance of data mining workloads on a CMP - a case study of parallel bioinformatics workloads. In 12th Int'l. Symp. on High Performance Computer Architecture (HPCA), 2006.
[11]
A. Joshi, A. Phansalkar, L. Eeckhout, and L. K. John. Measuring benchmark similarity using inherent program characteristics. IEEE Trans. Comput., 55(6):769--782, 2006.
[12]
S. Kumar, C. J. Hughes, and A. Nguyen. Carbon: architectural support for fine-grained parallelism on chip multiprocessors. In 34th Int'l. Symp. on Computer Architecture (ISCA), 2007.
[13]
R. Larsen and M. Marx. An Introduction to Mathematical Statistics and Its Applications. Pearson, 2000.
[14]
J. Li, J. F. Martinez, and M. C. Huang. The thrifty barrier: Energy-aware synchronization in shared-memory multiprocessors. In 10th Int'l. Symp. on High Perf. Computer Architecture (HPCA), 2004.
[15]
R. G. Lomax. Statistical Concepts: A Second Course. Lawrence Erlbaum Associates, 2007.
[16]
C. Luk, R. Cohn, R. Muth, H. Patil, A. Klauser, G. Lowney, S. Wallace, V. Reddi, and K. Hazelwood. Pin: building customized program analysis tools with dynamic instrumentation. In SIGPLAN Conf. on Programming Language Design and Implementation (PLDI), 2005.
[17]
M. J. Norusis. PASW Statistics 18 Statistical Procedures Companion. Pearson, 2010.
[18]
OpenMP Architecture Review Board. Openmp application program interface. Technical report, 2010. http://openmp.org/wp/openmp-specifications/.
[19]
A. Phansalkar, A. Joshi, and L. K. John. Analysis of redundancy and application balance in the spec cpu2006 benchmark suite. In 34th Int'l. Symp. on Computer Architecture (ISCA), 2007.
[20]
J. Renau, B. Fraguela, J. Tuck, W. Liu, M. Prvulovic, L. Ceze, S. Sarangi, P. Sack, K. Strauss, and P. Montesinos. SESC simulator, January 2005. http://sesc.sourceforge.net.
[21]
C. Sanderson. Armadillo library, 2010. http://mloss.org/software/view/176/.
[22]
N. Tallent and J. Mellor-Crummey. Effective performance measurement and analysis of multithreaded applications. In 14th SIGPLAN Symp. on Principles and Practice of Parallel Programming (PPoPP), 2009.
[23]
N. Tallent, J. Mellor-Crummey, and A. Porterfield. Analyzing lock contention in multithreaded applications. In 15th SIGPLAN Symp. on Principles and Practice of Parallel Programming (PPoPP), 2010.
[24]
S. Woo, M. Ohara, E. Torrie, J. P. Singh, and A. Gupta. The SPLASH-2 programs: Characterization and methodological considerations. In 22nd Int'l Symp. on Computer Architecture, 1995.

Cited By

View all
  • (2023)A Large-Scale Empirical Study of Real-Life Performance Issues in Open Source ProjectsIEEE Transactions on Software Engineering10.1109/TSE.2022.316762849:2(924-946)Online publication date: 1-Feb-2023
  • (2019)Parallelism-centric what-if and differential analysesProceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation10.1145/3314221.3314621(485-501)Online publication date: 8-Jun-2019
  • (2019)EraseMeProceedings of the 2019 Great Lakes Symposium on VLSI10.1145/3299874.3318027(319-322)Online publication date: 13-May-2019
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ICSE '11: Proceedings of the 33rd International Conference on Software Engineering
May 2011
1258 pages
ISBN:9781450304450
DOI:10.1145/1985793
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 21 May 2011

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. load imbalance
  2. parallel section
  3. performance debugging

Qualifiers

  • Research-article

Conference

ICSE11
Sponsor:
ICSE11: International Conference on Software Engineering
May 21 - 28, 2011
HI, Waikiki, Honolulu, USA

Acceptance Rates

Overall Acceptance Rate 276 of 1,856 submissions, 15%

Upcoming Conference

ICSE 2025

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)5
  • Downloads (Last 6 weeks)0
Reflects downloads up to 14 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2023)A Large-Scale Empirical Study of Real-Life Performance Issues in Open Source ProjectsIEEE Transactions on Software Engineering10.1109/TSE.2022.316762849:2(924-946)Online publication date: 1-Feb-2023
  • (2019)Parallelism-centric what-if and differential analysesProceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation10.1145/3314221.3314621(485-501)Online publication date: 8-Jun-2019
  • (2019)EraseMeProceedings of the 2019 Great Lakes Symposium on VLSI10.1145/3299874.3318027(319-322)Online publication date: 13-May-2019
  • (2019)TS-BatPro: Improving Energy Efficiency in Data Centers by Leveraging Temporal–Spatial BatchingIEEE Transactions on Green Communications and Networking10.1109/TGCN.2018.28710253:1(236-249)Online publication date: Mar-2019
  • (2019)Machine Learning-Based Analysis of Program Binaries: A Comprehensive StudyIEEE Access10.1109/ACCESS.2019.29176687(65889-65912)Online publication date: 2019
  • (2019)Hecate: Automated Customization of Program and Communication Features to Reduce Attack SurfacesSecurity and Privacy in Communication Networks10.1007/978-3-030-37231-6_17(305-319)Online publication date: 11-Dec-2019
  • (2018)MORPHProceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security10.1145/3243734.3278518(2315-2317)Online publication date: 15-Oct-2018
  • (2017)A fast causal profiler for task parallel programsProceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering10.1145/3106237.3106254(15-26)Online publication date: 21-Aug-2017
  • (2015)Towards classification of concurrency bugs based on observable propertiesProceedings of the First International Workshop on Complex faUlts and Failures in LargE Software Systems10.5555/2819419.2819428(41-47)Online publication date: 16-May-2015
  • (2015)CaramelProceedings of the 37th International Conference on Software Engineering - Volume 110.5555/2818754.2818863(902-912)Online publication date: 16-May-2015
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media