Abstract
Although hardware support for Thread-Level Speculation (TLS) can ease the compiler’s tasks in creating parallel programs by allowing the compiler to create potentially dependent parallel threads, advanced compiler optimization techniques must be developed and judiciously applied to achieve the desired performance. In this paper, we take a close examination on two data compression benchmarks, gzip and bzip2, propose, implement and evaluate new compiler optimization techniques to eliminate performance bottlenecks in their parallel execution and improve their performance. The proposed techniques (i) remove the critical forwarding path created by synchronizing memory-resident values; (ii) identify and categorize reduction-like variables whose intermediate results are used within loops, and propose code transformation to remove the inter-thread data dependences caused by these variables; and (iii) transform the program to eliminate stalls caused by variations in thread size. While no previous work has reported significant performance improvement on parallelizing these two benchmarks, we are able to achieve up to 36% performance improvement for gzip and 37% for bzip2.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Akkary, H., Driscoll, M.: A Dynamic Multithreading Processor. In: 31st Annual IEEE/ACM International Symposium on Microarchitecture (Micro-31), December 1998, ACM Press, New York (1998)
AMD Corporation. Leading the industry: Multi-core technology & dual-core processors from amd (2005), http://multicore.amd.com/en/Technology/
Bhowmik, A., Franklin, M.: A fast approximate interprocedural analysis for speculative multithreading compiler. In: 17th Annual ACM International Conference on Supercomputing, ACM, New York (2003)
Blume, W., et al.: Parallel programming with polaris. IEEE Computer 29(12), 78–82 (1996)
Burrow, M., Wheeler, D.: A block-sorting lossless data compression algorithm. Tech. Rep. 124, Digital Systems Research Center (May 1994)
Chen, P.-S., et al.: Compiler support for speculative multithreading architecture with probabilistic points-to analysis. In: ACM SIGPLAN 2003 Symposium on Principles and Practice of Parallel Programming, ACM, New York (2003)
Cintra, M., Torrellas, J.: Learning cross-thread violations in speculative parallelization for multiprocessors. In: 8th International Symposium on High-Performance Computer Architecture (HPCA-8) (2002)
Du, Z.-H., et al.: A cost-driven compilation framework for speculative parallelization of sequential programs. In: ACM SIGPLAN 04 Conference on Programming Language Design and Implementation (PLDI’04), June 2004, ACM, New York (2004)
Dubey, P., et al.: Single-program speculative multithreading (spsm) architecture: Compiler-assisted fine-grained multithreading. In: Malyshkin, V. (ed.) Parallel Computing Technologies. LNCS, vol. 964, Springer, Heidelberg (1995)
Franklin, M., Sohi, G.S.: The expandable split window paradigm for exploiting fine-grain parallelsim. In: 19th Annual International Symposium on Computer Architecture (ISCA ’92), May, pp. 58–67 (1992)
Gupta, M., Nim, R.: Techniques for Speculative Run-Time Parallelization of Loops. In: Supercomputing ’98, November (1998)
Hammond, L., Willey, M., Olukotun, K.: Data Speculation Support for a Chip Multiprocessor. In: 8th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-IIX), October (1998)
Hiranandani, S., Kennedy, K., Tseng, C.-W.: Preliminary experiences with the Fortran D compiler. In: Supercomputing ’93 (1993)
Intel Corporation. Intel’s dual-core processor for desktop PCs (2005), http://www.intel.com/personal/desktopcomputer/dual_core/index.htm
Intel Corporation. Intel itanium architecture software developer’s manual, revision 2.2 (2006), http://www.intel.com/design/itanium/manuals/iiasdmanual.htm
Johnson, T.A., Eigenmann, R., Vijaykumar, T.N.: Min-cut program decomposition for thread-level speculation. In: ACM SIGPLAN 04 Conference on Programming Language Design and Implementation (PLDI’04), June 2004, ACM, New York (2004)
Kalla, R., Sinharoy, B., Tendler, J.M.: IBM Power5 Chip: A Dual-Core Multithreaded Processor. In: Microprocessor Forum ’99, October (1999)
Kennedy, K., Allen, R.: Optimizing Compilers for Modern Architectures: A Dependence-based Approach. Academic Press, London (2002)
Knight, T.: An Architecture for Mostly Functional Languages. In: Proceedings of the ACM Lisp and Functional Programming Conference, August 1986, pp. 500–519. ACM Press, New York (1986)
Krishnan, V., Torrellas, J.: The Need for Fast Communication in Hardware-Based Speculative Chip Multiprocessors. In: Malyshkin, V. (ed.) Parallel Computing Technologies. LNCS, vol. 1662, Springer, Heidelberg (1999)
Li, X.-F., et al.: Software value prediction for speculative parallel threaded computations. In: 1st Value-Prediction Workshop (VPW 2003), June (2003)
Liu, W., et al.: Posh: A tls compiler that exploits program structure. In: ACM SIGPLAN 2006 Symposium on Principles and Practice of Parallel Programming, March 2006, ACM, New York (2006)
Luk, C.-K., et al.: Pin: Building Customized Program Analysis Tools with Dynamic Instrumentation. In: ACM SIGPLAN 05 Conference on Programming Language Design and Implementation (PLDI’05), June 2005, ACM, New York (2005)
Marcuello, P., Gonzalez, A.: Clustered speculative multithreaded processors. In: 13th Annual ACM International Conference on Supercomputing, Rhodes, Greece, June 1999, ACM, New York (1999)
Oplinger, J., Heine, D., Lam, M.: In search of speculative thread-level parallelism. In: Proceedings PACT 99, October (1999)
Prabhu, M., Olukotun, K.: Using thread-level speculation to simplify manual parallelization. In: ACM SIGPLAN 2003 Symposium on Principles and Practice of Parallel Programming, ACM, New York (2003)
Prabhu, M., Olukotun, K.: Exposing speculative thread parallelism in spec2000. In: ACM SIGPLAN 2005 Symposium on Principles and Practice of Parallel Programming, ACM, New York (2005)
Quinones, C.G., et al.: Mitosis compiler: an infrastructure for speculative threading based on pre-computation slices. In: ACM SIGPLAN 05 Conference on Programming Language Design and Implementation (PLDI’05), June 2005, ACM, New York (2005)
Sohi, G.S., Breach, S., Vijaykumar, T.N.: Multiscalar Processors. In: 22nd Annual International Symposium on Computer Architecture (ISCA ’95), June, pp. 414–425 (1995)
Steffan, J.G., Colohan, C.B., Mowry, T.C.: Architectural support for thread-level data speculation. Tech. Rep. CMU-CS-97-188, School of Computer Science, Carnegie Mellon University (November 1997)
Steffan, J.G., et al.: A Scalable Approach to Thread-Level Speculation. In: 27th Annual International Symposium on Computer Architecture (ISCA ’00), June (2000)
Sun Corporation. Throughput computing—niagara (2005), http://www.sun.com/processors/throughput/
Tjiang, S., et al.: Integrating scalar optimization and parallelization. In: Banerjee, U., et al. (eds.) Languages and Compilers for Parallel Computing. LNCS, vol. 589, pp. 137–151. Springer, Heidelberg (1992)
Tsai, J.-Y., et al.: The Superthreaded Processor Architecture. IEEE Transactions on Computers, Special Issue on Multithreaded Architectures 48(9) (1999)
Tsai, J.-Y., Jiang, Z., Yew, P.-C.: Compiler techniques for the superthreaded architectures. International Journal of Parallel Programming - Special Issue on Languages and Compilers for Parallel Computing (June 1998)
Vijaykumar, T.N., Breach, S.E., Sohi, G.S.: Register communication strategies for the multiscalar architecture. Tech. Rep. Technical Report 1333, Department of Computer Science, University of Wisconsin-Madison (Feb. 1997)
Vijaykumar, T.N., Sohi, G.S.: Task selection for a multiscalar processor. In: 31st Annual IEEE/ACM International Symposium on Microarchitecture (Micro-31), Nov. 1998, IEEE, Los Alamitos (1998)
Wang, S., et al.: Loop selection for thread-level speculation. In: The 18th International Workshop on Languages and Compilers for Parallel Computing, Oct. (2005)
Zhai, A., et al.: Compiler optimization of scalar value communication between speculative threads. In: 10th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-X), Oct. (2002)
Zhai, A., et al.: Compiler optimization of memory-resident value communication between speculative threads. In: The 2004 International Symposium on Code Generation and Optimization, Mar. (2004)
Zilles, C., Sohi, G.S.: Master/slave speculative parallelization. In: 35th Annual IEEE/ACM International Symposium on Microarchitecture (Micro-35), Nov. 2002, IEEE, Los Alamitos (2002)
Ziv, J., Lempel, A.: A universal algorithm for sequential data compression. IEEE Transaction on Information Theory 23(3), 337–343 (1977)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer Berlin Heidelberg
About this paper
Cite this paper
Wang, S., Zhai, A., Yew, PC. (2007). Exploiting Speculative Thread-Level Parallelism in Data Compression Applications. In: Almási, G., Caşcaval, C., Wu, P. (eds) Languages and Compilers for Parallel Computing. LCPC 2006. Lecture Notes in Computer Science, vol 4382. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-72521-3_10
Download citation
DOI: https://doi.org/10.1007/978-3-540-72521-3_10
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-72520-6
Online ISBN: 978-3-540-72521-3
eBook Packages: Computer ScienceComputer Science (R0)