Abstract
With the trend of growing number of integrated processing cores on Chip Multiprocessors, researchers are working hard to increase the available parallelism of software programs so as to efficiently harness the growing computing power. One noticeable direction among these efforts is speculative multi-threading (SpMT), a.k.a thread level speculation, which aims to extract thread level parallelism by splitting a sequential execution thread into several finer ones and execute them on parallel. A SpMT thread is in speculative status before it “knows” all its input data are correct. A speculative thread needs to write to the L1 cache, but its output might be discarded if the speculation eventually fails. However, another speculative thread may have already read in such speculative output. Therefore, some mechanism is needed to support speculative read and write. And because the SpMT threads are extracted from a single thread, they usually share lots of data, thus there might be intense data coherence among the L1 caches. It would be very complicated to support data coherence and speculation together. This Paper proposes a shared write buffer among the SpMT cores. We are able to confine the speculative read and write in the SWB, thus the speculation will not interference with coherence, and the L1 cache design could be drastically simplified. Experiments show that the SWB can capture a big portion of inter-core data sharing, reduce cache coherence, and drastically improve data access performance of SpMT threads.












Similar content being viewed by others
References
Akkary H, Driscoll MA (1998) A dynamic multithreading processor. In: Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture, MICRO31. IEEE Computer Society Press, Los Alamitos, pp 226–236
Bhowmik A, Franklin M (2003) A fast approximate interprocedural analysis for speculative multithreading compilers. In: Proceedings of the 17th annual international conference on supercomputing, ICS ’03. ACM, New York, pp 32–41. doi:10.1145/782814.782822
Blake G, Dreslinski RG, Mudge T, Flautner K (2010) Evolution of thread-level parallelism in desktop applications. In: ISCA ’10: proceedings of the 37th annual international symposium on computer architecture. ACM, New York, pp 302–313
Chen S, Gibbons PB, Kozuch M, Liaskovitis V, Ailamaki A, Blelloch GE, Falsafi B, Fix L, Hardavellas N, Mowry TC, Wilkerson C (2007) Scheduling threads for constructive cache sharing on cmps. In: Proceedings of the nineteenth annual ACM symposium on parallel algorithms and architectures, SPAA ’07. ACM, New York, pp 105–115. doi:10.1145/1248377.1248396
Dubey P, OBrien K, OBrien KM, Barton C (1995) Single-program speculative multithreading (SPSM) architecture: compiler-assisted fine-grained multithreading. Technical report
Franklin M, Sohi GS (1996) ARB: a hardware mechanism for dynamic reordering of memory references. IEEE Trans Comput 45(5):552–571. doi:10.1109/12.509907
Gopal S, Vijaykumar TN, Smith JE, Sohi GS (1998) Speculative versioning cache. In: 1998 Fourth international symposium on high-performance computer architecture, 1998. Proceedings. IEEE, pp 195–205
Keckler SW, Dally WJ, Maskit D, Carter NP, Chang A, Lee WS (1998) Exploiting fine-grain thread level parallelism on the MIT multi-ALU processor. In: Proceedings of the 25th annual international symposium on computer architecture, ISCA ’98. IEEE Computer Society, Washington, pp 306–317. doi:10.1145/279358.279399
Krishnan V, Torrellas J (1999) A chip-multiprocessor architecture with speculative multithreading. IEEE Trans Comput 48(9):866–880
Krishnan V, Torrellas J (1999) A chip-multiprocessor architecture with speculative multithreading. IEEE Trans Comput 48(9):866–880. http://portal.acm.org/citation.cfm?id=318107.318113
Marcuello P, González A, Tubella J (1998) Speculative multithreaded processors. In: Proceedings of the 12th international conference on supercomputing—ICS ’98, 4. ACM, IEEE, pp 77–84. doi:10.1145/277830.277850. http://portal.acm.org/citation.cfm?doid=277830.277850
Marcuello P, Tubella J, González A (1999) Value prediction for speculative multithreaded architectures. In: Proceedings of the 32nd annual ACM/IEEE international symposium on microarchitecture, MICRO 32. IEEE Computer Society, Washington, pp 230–236
Packirisamy V, Wang S, Zhai A, Hsu WC, Yew PC (2006) Supporting speculative multithreading on simultaneous multithreaded processors. In: Robert Y, Parashar M, Badrinath R, Prasanna V (eds.) High performance computing—HiPC 2006. Lecture notes in computer science, vol 4297. Springer, Berlin, pp 148–158. doi:10.1007/11945918_19
Pugsley SH, Spjut JB, Nellans DW, Balasubramonian R (2010) SWEL: hardware cache coherence protocols to map shared data onto shared caches. In: Proceedings of the 19th international conference on parallel architectures and compilation techniques—PACT ’10, p 465. doi:10.1145/1854273.1854331
Puiggali J, Szymanski BK, Jové T, Marzo JL (2013) Dynamic branch speculation in a speculative parallelization architecture for computer clusters. Concurr Comput Pract Exp 25(7):932–960. doi:10.1002/cpe.2872
Roth A, Sohi GS (2001) Speculative data-driven multithreading. In: HPCA ’01: proceedings of the 7th international symposium on high-performance computer architecture. IEEE Computer Society, Washington
Sohi GS, Breach SE, Vijaykumar TN (1995) Multiscalar processors. SIGARCH Comput Archit News 23:414–425. doi:10.1145/223982.224451
Steffan JG, Colohan CB, Zhai A, Mowry TC (2000) A scalable approach to thread-level speculation. In: SIGARCH computer architecture news, ISCA ’00, vol 28. ACM, New York, pp. 1–12. doi:10.1145/339647.339650
Tsai JYTJY, Yew PCYPC (1996) The superthreaded architecture: thread pipelining with run-time data dependence checking and control speculation. In: Proceedings of the 1996 conference on parallel architectures and compilation technique, pp 35–46. doi:10.1109/PACT.1996.552553
Vijaykumar TN, Gopal S, Smith JE, Sohi G (2001) Speculative versioning cache. IEEE Trans Parallel Distrib Syst 12:1305–1317. doi:10.1109/71.970565
Ye J, Chen T (2012) Exploring potential parallelism of sequential programs with superblock reordering. In: IEEE HPCC-2012. doi:10.1109/HPCC.2012.12
Ye J, Yan H, Hou H, Chen T (2014) Potential thread-level-parallelism exploration with superblock reordering. Computing 96(6):545–564. doi:10.1007/s00607-014-0387-8
Ye JM, Cao M, Qu Z, Chen T (2012) Regional cache organization for NoC based many-core processors. J Comput Syst Sci. doi:10.1016/j.jcss.2012.05.002
Acknowledgments
This research is supported by the National Natural Science Foundation of China under Grant No. 61070001, the National Natural Science Foundation of Zhejiang Province No. LQ12F02017, the Special Funds for Key Program of the China No. 2011ZX0302-004-002 and 2012ZX01031001-003, the Key Science Foundation of Zhejiang Province under Grand No. 2010C11048, Open Fund of Mobile Network Application Technology Key Laboratory of Zhejiang Province, Innovation Group of New Generation of Mobile Internet Software and Services of Zhejiang Province.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Chen, M., Ye, J., Chen, T. et al. Shared write buffer to boost applications on SpMT architecture. J Supercomput 73, 3508–3525 (2017). https://doi.org/10.1007/s11227-016-1710-2
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-016-1710-2