Abstract
Thread-level speculation (TLS) allows potentially dependent threads to speculatively execute in parallel, thus making it easier for the compiler to extract parallel threads. However, the high cost associated with unbalanced load, failed speculation, and inter-thread value communication makes it difficult to obtain the desired performance unless the speculative threads are carefully chosen.
In this paper, we focus on extracting parallel threads from loops in general-purpose applications because loops, with their regular structures and significant coverage on execution time, are ideal candidates for extracting parallel threads. General-purpose applications, however, usually contain a large number of nested loops with unpredictable parallel performance and dynamic behavior, thus making it difficult to decide which set of loops should be parallelized to improve overall program performance. Our proposed loop selection algorithm addresses all these difficulties. We have found that (i) with the aid of profiling information, compiler analyses can achieve a reasonably accurate estimation of the performance of parallel execution, and that (ii) different invocations of a loop may behave differently, and exploiting this dynamic behavior can further improve performance. With a judicious choice of loops, we can improve the overall program performance of SPEC2000 integer benchmarks by as much as 20%.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Intel Pentium Processor Extreme Edition, http://www.intel.com/products/processor/pentiumXE/prodbrief.pdf
Open Research Compiler for Itanium Processor Family, http://ipf-orc.sourceforge.net/
Akkary, H., Driscoll, M.: A Dynamic Multithreading Processor. In: Proceedings of Micro-31 (December 1998)
Blume, B., Eigenmann, R., Faigin, K., Grout, J., Hoeflinger, J., Padua, D., Petersen, P., Pottenger, B., Rauchwerger, L., Tu, P., Weatherford, S.: Polaris: Improving the Effectiveness of Parallelizing Compilers. In: Pingali, K.K., Gelernter, D., Padua, D.A., Banerjee, U., Nicolau, A. (eds.) LCPC 1994. LNCS, vol. 892. Springer, Heidelberg (1995)
Chen, M., Olukotun, K.: TEST: A Tracer for Extracting Speculative Threads. In: Proceedings of 2003 International Symposium on CGO (March 2003)
Cintra, M.H., MartÃnez, J.F., Torrellas, J.: Architectural support for scalable speculative parallelization in shared-memory multiprocessors. In: Proceedings of the ISCA (2000)
Colohan, C.B., Zhai, A., G.S.J., Mowry, T.C.: The Impact of Thread Size and Selection on the Performance of Thread-Level Speculation (in progress)
Du, D.Z., Pardalos, P.M.: Handbook of Combinatorial Optimization. Kluwer Academic Publishers, Dordrecht (1999)
Gopal, S., Vijaykumar, T., Smith, J., Sohi, G.: Speculative Versioning Cache. In: Proceedings of the 4th HPCA (February 1998)
Hall, M.W., Anderson, J.M., Amarasinghe, S.P., Murphy, B.R., Liao, S.-W., Bugnion, E., Lam, M.S.: Maximizing Multiprocessor Performance with the SUIF Compiler. IEEE Computer (12) (1999)
Hammond, L., Willey, M., Olukotun, K.: Data Speculation Support for A Chip Multiprocessor. In: Proceedings of ASPLOS-8 (October 1998)
Johnson, T.A., Eigenmann, R., Vijaykumar, T.N.: Min-Cut Program Decomposition for Thread-Level Speculation. In: Proceedings of PLDI (2004)
Kalla, R., Sinharoy, B., Tendler, J.M.: IBM Power5 Chip: a Dual-Core Multithreaded Processor. IEEE Micro. (2004) (2)
Kongetira, P., Aingaran, K., Olukotun, K.N.: A 32-Way Multithreaded Sparc Processor. IEEE Micro. (2005) (2)
Luk, C.-K., Cohn, R., Muth, R., Patil, H., Klauser, A., Lowney, G., Wallace, S., Reddi, V.J., Hazelwood, K.: Pin: Building Customized Program Analysis Tools with Dynamic Instrumentation. In: Proceedings of the ACM Intl. Conf. on Programming Language Design and Implementation (June 2005)
Marcuello, P., Gonzlez, A.: Clustered Speculative Multithreaded Processors. In: Proceedings of MICRO-32 (November 1999)
Moshovos, A.I., Breach, S.E., Vijaykumar, T., Sohi, G.S.: Dynamic Speculation and Synchronization of Data Dependences. In: The Proceedings of the 24th ISCA (June 1997)
Olukotun, K., Hammond, L., Willey, M.: Improving the Performance of Speculatively Parallel Applications on the Hydra CMP. In: Proceedings of the ACM Int. Conf. on Supercomputing (June 1999)
Oplinger, J., Heine, D., Lam, M.S.: In Search of Speculative Thread-Level Parallelism. In: Malyshkin, V.E. (ed.) PaCT 1999. LNCS, vol. 1662. Springer, Heidelberg (1999)
Prabhu, M., Olukotun, K.: Exposing Speculative Thread Parallelism in SPEC 2000. In: Proceedings of the 9th ACM Symposium on Principles and Practice of Parallel Programming (2005)
Quinones, C.G., Madriles, C., Sanchez, J., Marcuello, P., González, A., Tullsen, D.M.: Mitosis Compiler: An Infrastructure for Speculative Threading Based on Pre-Computation Slices. In: Proceedings of the ACM Intl. Conf. on Programming Language Design and Implementation (June 2005)
Rauchwerger, L., Padua, D.A.: The LRPD Test: Speculative RunTime Parallelization of Loops with Privatization and Reduction Parallelization. IEEE Transactions on Parallel Distributed Systems (2), 160–180 (1999)
Renau, J., Tuck, J., Liu, W., Ceze, L., Strauss, K., Torrellas, J.: Tasking with Out-of-Order Spawn in TLS Chip Multiprocessors: Microarchitecture and Compilation. In: Proceeding of the 19th ACM International Conference on Supercomputing (2005)
Sohi, G.S., Breach, S.E., Vijaykumar, T.N.: Multiscalar Processors. In: Proceedings of the 22nd ISCA (June 1995)
Steffan, J.G., Colohan, C.B., Zhai, A., Mowry, T.C.: Improving Value Communication for Thread-Level Speculation. In: Proceedings of the 8th HPCA (February 2002)
Tsai, J.-Y., Huang, J., Amlo, C., Lilja, D., Yew, P.-C.: The Superthreaded Processor Architecture. IEEE Transactions on Computers (9) (1999)
Vijaykumar, T.N., Sohi, G.S.: Task Selection for a Multiscalar Processor. In: Proceeding of the 31st International Symposium on Microarchitecture (December 1998)
Zhai, A., Colohan, C.B., Steffan, J.G., Mowry, T.C.: Compiler Optimization of Memory- Resident Value Communication Between Speculative Threads. In: Proceedings of 2004 International Symposium on CGO (March 2004)
Zhai, A., Colohan, C.B., Steffan, J.G., Mowry, T.C.: Compiler Optimization of Scalar Value Communication Between Speculative Threads. In: Proceedings of the 10th ASPLOS (October 2002)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Wang, S., Dai, X., Yellajyosula, K.S., Zhai, A., Yew, PC. (2006). Loop Selection for Thread-Level Speculation. In: Ayguadé, E., Baumgartner, G., Ramanujam, J., Sadayappan, P. (eds) Languages and Compilers for Parallel Computing. LCPC 2005. Lecture Notes in Computer Science, vol 4339. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-69330-7_20
Download citation
DOI: https://doi.org/10.1007/978-3-540-69330-7_20
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-69329-1
Online ISBN: 978-3-540-69330-7
eBook Packages: Computer ScienceComputer Science (R0)