skip to main content
10.1145/1065944.1065964acmconferencesArticle/Chapter ViewAbstractPublication PagesppoppConference Proceedingsconference-collections
Article

Exposing speculative thread parallelism in SPEC2000

Published: 15 June 2005 Publication History

Abstract

As increasing the performance of single-threaded processors becomes increasingly difficult, consumer desktop processors are moving toward multi-core designs. One way to enhance the performance of chip multiprocessors that has received considerable attention is the use of thread-level speculation (TLS). As a case study, we manually parallelized several of the SPEC CPU2000 floating point and integer applications using TLS. The use of manual parallelization enabled us to apply techniques and programmer expertise that are beyond the current capabilities of automated parallelizers. With the experience gained from this, we provide insight into ways to aggressively apply TLS to parallelize applications for high performance. This information can help guide future advanced TLS compiler design.For each application, we discuss how and where parallelism was located within the application, the impediments to extracting this parallelism using TLS, and the code transformations that were required to overcome these impediments. We also generalize these experiences to a discussion of common hindrances to TLS parallelization, and describe methods of programming that help expose application parallelism to TLS systems. These guidelines can assist developers of uniprocessor programs to create applications that can easily port to TLS systems and yield good performance. By using manual parallelization on SPEC2000, we provide guidance on where thread-level parallelism exists in these well known benchmarks, what limits its extraction, how to reduce these limitations and what performance can be expected on these applications from a chip multiprocessor system with TLS.

References

[1]
B. Blume, et. al, "Restructuring programs for high-speed computers with Polaris," Proc. 1996 ICPP Workshop on. Challenges for Parallel Processing, pp. 149--161, Aug. 1996.
[2]
M. Chen and K. Olukotun, "The JRPM system for dynamically parallelizing Java programs," Proc. 30th Annual Intl. Sym. on Computer Architecture (ISCA), San Diego, CA, pp. 434--445, Jun. 2003.
[3]
G.Z. Chrysos and J.S. Emer, "Memory dependence prediction using store sets," ISCA-25, Barcelona, Spain, pp. 142--153, June 1998.
[4]
M. Cintra, J. Martinez and J. Torrellas, "Architectural support for scalable speculative parallelization in shared-memory multiprocessors," ISCA-27, Vancouver, Canada, pp. 13--24, June 2000.
[5]
M. Cintra and J. Torrellas, "Eliminating squashes through learning cross-thread violations in speculative parallelization for Multiprocessors," Proc. 8th Intl. Sym. on High-Performance Computer Architecture (HPCA), Cambridge, Massachusetts, pp. 43--54, Feb. 2002.
[6]
J. Clabes, et al., "Design and implementation of the POWER5 microprocessor," IEEE Intl. Solid-State Circuits Conference (ISSCC), San Francisco, CA, Feb. 15-19, 2004.
[7]
F. Gabbay and A. Mendelson, "Using value prediction to increase the power of speculative execution hardware," ACM Transactions on Computer Systems, vol. 16, pp. 234--270, Aug. 1998.
[8]
L. Hammond, et al., "The Stanford Hydra CMP," IEEE Micro, pp. 71--84, Mar.-Apr. 2000.
[9]
P. Kongetira, "A 32-way multithreaded SPARC processor," Hot Chips 16, Stanford, California, Aug. 22-24, 2004.
[10]
K. Krewell, "AMD vs. Intel in dual-core duel," Microprocessor Report, Scottsdale, AZ, July 6, 2004.
[11]
D. Lammers, "Intel cancels Tejas, moves to dual-core designs," EETimes, Manhasset, New York, May 7, 2004.
[12]
K.M. Lepak, G.B. Bell, and M.H. Lipasti, "Silent stores and store value locality," IEEE Transactions on Computers, vol. 50, pp. 1174--1190, Nov. 2001.
[13]
S.W. Liao, et al., "SUIF Explorer: An Interactive and Interprocedural Parallelizer," Proc. Sym. Principles and Practices of Parallel Programming 1999 (PPOPP 1999), pp. 37--48, Atlanta, Georgia, Aug. 1999.
[14]
J.F. Martinez and J. Torrellas, "Speculative synchronization: applying thread-level speculation to explicitly parallel applications," Proc. 10th Intl. Conf. on Architectural Support for Programming Languages and Operating Systems (ASPLOS), San Jose, California, pp. 18--29, Oct. 2002.
[15]
C. McNairy and R. Bhatia, "Montecito - The next product in the Itanium Processor Family," Hot Chips 16, Stanford, California, Aug. 22-24, 2004.
[16]
A. Moshovos, S.E. Breach, T.N. Vijaykumar, G.S. Sohi, "Dynamic speculation and synchronization of data dependences," ISCA-24, Denver, Colorado, pp. 181--193, June 1997.
[17]
K. Olukotun, L. Hammond, and M. Willey, "Improving the performance of speculatively parallel applications on the Hydra CMP," Proc. 13th ACM International Conference on Supercomputing (ICS), Rhodes, Greece, pp. 21--30, June 1999.
[18]
C.-L. Ooi, et al., "Multiplex: unifying conventional and speculative thread-level parallelism on a chip multiprocessor," ICS-15, June 2001.
[19]
M. Prabhu and K. Olukotun, "Using thread-level speculation to simplify manual parallelization," Proc. Sym. PPOPP'03, San Diego, CA, pp. 1--12, June 11-13, 2003.
[20]
L. Rauchwerger, N. Amato, and D. Padua, "Run-time methods for parallelizing partially parallel loops," ICS-9, Barcelona, Spain, pp. 137--146, July 1995.
[21]
T. Sherwood and B. Calder, "Time varying behavior of programs," Tech. Rep. No. CS99-630, Dept. of Computer Science and Eng., UCSD, Aug. 1999.
[22]
J.G. Steffan, C.B. Colohan, A. Zhai, and T.C. Mowry, "Improving value communication for thread-level speculation," HPCA-8, Cambridge, Massachusetts, pp. 65--75, Feb. 2002.
[23]
J. Steffan, C. Colohan, A. Zhai, and T. Mowry, "A scalable approach to thread-level speculation," ISCA-27, Vancouver, Canada, pp. 1--12, June 2000.
[24]
A. Zhai, C.B. Colohan, J.G. Steffan, and T.C. Mowry, "Compiler optimization of scalar value communication between speculative threads," ASPLOS-X, San Jose, California, pp. 171--183, Oct. 2002.
[25]
Y. Zhang, L. Rauchwerger, and J. Torrellas, "Hardware for speculative parallelization of partially-parallel loops in DSM multiprocessors," HPCA-5, Orlando, Florida, pp. 135--141, Jan. 1999.
[26]
C. Zilles and G. Sohi, "Execution-based prediction using speculative slices," ISCA-28, Goteborg, Sweden, pp. 2--13, July 2001.

Cited By

View all
  • (2024)IDaTPA: importance degree based thread partitioning approach in thread level speculationDiscover Computing10.1007/s10791-024-09440-x27:1Online publication date: 19-Jun-2024
  • (2022)Improving Latency Using Manual Parallel ProgrammingChip Multiprocessor Architecture10.1007/978-3-031-01720-9_4(103-139)Online publication date: 5-Mar-2022
  • (2020)An Adaptive Thread Partitioning Approach in Speculative MultithreadingAlgorithms and Architectures for Parallel Processing10.1007/978-3-030-60245-1_6(78-91)Online publication date: 29-Sep-2020
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
PPoPP '05: Proceedings of the tenth ACM SIGPLAN symposium on Principles and practice of parallel programming
June 2005
310 pages
ISBN:1595930809
DOI:10.1145/1065944
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 15 June 2005

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. SPEC CPU2000
  2. chip multiprocessors
  3. feedback-driven optimization
  4. manual parallel programming
  5. multithreading
  6. thread-level speculation

Qualifiers

  • Article

Conference

PPoPP05
Sponsor:

Acceptance Rates

Overall Acceptance Rate 230 of 1,014 submissions, 23%

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)9
  • Downloads (Last 6 weeks)0
Reflects downloads up to 22 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)IDaTPA: importance degree based thread partitioning approach in thread level speculationDiscover Computing10.1007/s10791-024-09440-x27:1Online publication date: 19-Jun-2024
  • (2022)Improving Latency Using Manual Parallel ProgrammingChip Multiprocessor Architecture10.1007/978-3-031-01720-9_4(103-139)Online publication date: 5-Mar-2022
  • (2020)An Adaptive Thread Partitioning Approach in Speculative MultithreadingAlgorithms and Architectures for Parallel Processing10.1007/978-3-030-60245-1_6(78-91)Online publication date: 29-Sep-2020
  • (2018)Parallel Precomputation with Input Value Prediction for Model Predictive Control SystemsIEICE Transactions on Information and Systems10.1587/transinf.2018PAP0003E101.D:12(2864-2877)Online publication date: 1-Dec-2018
  • (2018)SpecRPCProceedings of the 19th International Middleware Conference10.1145/3274808.3274829(266-278)Online publication date: 26-Nov-2018
  • (2016)A Survey on Thread-Level Speculation TechniquesACM Computing Surveys10.1145/293836949:2(1-39)Online publication date: 30-Jun-2016
  • (2016)A Flexible Chip Multiprocessor Simulator Dedicated for Thread Level Speculation2016 IEEE Trustcom/BigDataSE/ISPA10.1109/TrustCom.2016.0327(2127-2132)Online publication date: Aug-2016
  • (2015)Compiler-Driven Software Speculation for Thread-Level ParallelismACM Transactions on Programming Languages and Systems10.1145/282150538:2(1-45)Online publication date: 22-Dec-2015
  • (2015)The Effects of Parameter Tuning in Software Thread-Level Speculation in JavaScript EnginesACM Transactions on Architecture and Code Optimization10.1145/268603611:4(1-25)Online publication date: 9-Jan-2015
  • (2015)Parallelizing Block Cryptography Algorithms on Speculative MulticoresAlgorithms and Architectures for Parallel Processing10.1007/978-3-319-27119-4_1(3-15)Online publication date: 16-Dec-2015
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media