Shared write buffer to boost applications on SpMT architecture

Chen, Ming; Ye, John; Chen, Tianzhou; Dai, Hongjun

doi:10.1007/s11227-016-1710-2

Shared write buffer to boost applications on SpMT architecture

Published: 08 April 2016

Volume 73, pages 3508–3525, (2017)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

Ming Chen¹,
John Ye¹,
Tianzhou Chen¹ &
…
Hongjun Dai²

195 Accesses
Explore all metrics

Abstract

With the trend of growing number of integrated processing cores on Chip Multiprocessors, researchers are working hard to increase the available parallelism of software programs so as to efficiently harness the growing computing power. One noticeable direction among these efforts is speculative multi-threading (SpMT), a.k.a thread level speculation, which aims to extract thread level parallelism by splitting a sequential execution thread into several finer ones and execute them on parallel. A SpMT thread is in speculative status before it “knows” all its input data are correct. A speculative thread needs to write to the L1 cache, but its output might be discarded if the speculation eventually fails. However, another speculative thread may have already read in such speculative output. Therefore, some mechanism is needed to support speculative read and write. And because the SpMT threads are extracted from a single thread, they usually share lots of data, thus there might be intense data coherence among the L1 caches. It would be very complicated to support data coherence and speculation together. This Paper proposes a shared write buffer among the SpMT cores. We are able to confine the speculative read and write in the SWB, thus the speculation will not interference with coherence, and the L1 cache design could be drastically simplified. Experiments show that the SWB can capture a big portion of inter-core data sharing, reduce cache coherence, and drastically improve data access performance of SpMT threads.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Compiler-directed scratchpad memory data transfer optimization for multithreaded applications on a heterogeneous many-core architecture

Article Open access 15 May 2021

SRCP: sharing and reuse-aware replacement policy for the partitioned cache in multicore systems

Article 12 June 2021

Affinity-Aware Synchronization in Work Stealing Run-Times for NUMA Multi-core Processors

References

Akkary H, Driscoll MA (1998) A dynamic multithreading processor. In: Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture, MICRO31. IEEE Computer Society Press, Los Alamitos, pp 226–236
Bhowmik A, Franklin M (2003) A fast approximate interprocedural analysis for speculative multithreading compilers. In: Proceedings of the 17th annual international conference on supercomputing, ICS ’03. ACM, New York, pp 32–41. doi:10.1145/782814.782822
Blake G, Dreslinski RG, Mudge T, Flautner K (2010) Evolution of thread-level parallelism in desktop applications. In: ISCA ’10: proceedings of the 37th annual international symposium on computer architecture. ACM, New York, pp 302–313
Chen S, Gibbons PB, Kozuch M, Liaskovitis V, Ailamaki A, Blelloch GE, Falsafi B, Fix L, Hardavellas N, Mowry TC, Wilkerson C (2007) Scheduling threads for constructive cache sharing on cmps. In: Proceedings of the nineteenth annual ACM symposium on parallel algorithms and architectures, SPAA ’07. ACM, New York, pp 105–115. doi:10.1145/1248377.1248396
Dubey P, OBrien K, OBrien KM, Barton C (1995) Single-program speculative multithreading (SPSM) architecture: compiler-assisted fine-grained multithreading. Technical report
Franklin M, Sohi GS (1996) ARB: a hardware mechanism for dynamic reordering of memory references. IEEE Trans Comput 45(5):552–571. doi:10.1109/12.509907
Article MATH Google Scholar
Gopal S, Vijaykumar TN, Smith JE, Sohi GS (1998) Speculative versioning cache. In: 1998 Fourth international symposium on high-performance computer architecture, 1998. Proceedings. IEEE, pp 195–205
Keckler SW, Dally WJ, Maskit D, Carter NP, Chang A, Lee WS (1998) Exploiting fine-grain thread level parallelism on the MIT multi-ALU processor. In: Proceedings of the 25th annual international symposium on computer architecture, ISCA ’98. IEEE Computer Society, Washington, pp 306–317. doi:10.1145/279358.279399
Krishnan V, Torrellas J (1999) A chip-multiprocessor architecture with speculative multithreading. IEEE Trans Comput 48(9):866–880
Article Google Scholar
Krishnan V, Torrellas J (1999) A chip-multiprocessor architecture with speculative multithreading. IEEE Trans Comput 48(9):866–880. http://portal.acm.org/citation.cfm?id=318107.318113
Marcuello P, González A, Tubella J (1998) Speculative multithreaded processors. In: Proceedings of the 12th international conference on supercomputing—ICS ’98, 4. ACM, IEEE, pp 77–84. doi:10.1145/277830.277850. http://portal.acm.org/citation.cfm?doid=277830.277850
Marcuello P, Tubella J, González A (1999) Value prediction for speculative multithreaded architectures. In: Proceedings of the 32nd annual ACM/IEEE international symposium on microarchitecture, MICRO 32. IEEE Computer Society, Washington, pp 230–236
Packirisamy V, Wang S, Zhai A, Hsu WC, Yew PC (2006) Supporting speculative multithreading on simultaneous multithreaded processors. In: Robert Y, Parashar M, Badrinath R, Prasanna V (eds.) High performance computing—HiPC 2006. Lecture notes in computer science, vol 4297. Springer, Berlin, pp 148–158. doi:10.1007/11945918_19
Pugsley SH, Spjut JB, Nellans DW, Balasubramonian R (2010) SWEL: hardware cache coherence protocols to map shared data onto shared caches. In: Proceedings of the 19th international conference on parallel architectures and compilation techniques—PACT ’10, p 465. doi:10.1145/1854273.1854331
Puiggali J, Szymanski BK, Jové T, Marzo JL (2013) Dynamic branch speculation in a speculative parallelization architecture for computer clusters. Concurr Comput Pract Exp 25(7):932–960. doi:10.1002/cpe.2872
Article Google Scholar
Roth A, Sohi GS (2001) Speculative data-driven multithreading. In: HPCA ’01: proceedings of the 7th international symposium on high-performance computer architecture. IEEE Computer Society, Washington
Sohi GS, Breach SE, Vijaykumar TN (1995) Multiscalar processors. SIGARCH Comput Archit News 23:414–425. doi:10.1145/223982.224451
Article Google Scholar
Steffan JG, Colohan CB, Zhai A, Mowry TC (2000) A scalable approach to thread-level speculation. In: SIGARCH computer architecture news, ISCA ’00, vol 28. ACM, New York, pp. 1–12. doi:10.1145/339647.339650
Tsai JYTJY, Yew PCYPC (1996) The superthreaded architecture: thread pipelining with run-time data dependence checking and control speculation. In: Proceedings of the 1996 conference on parallel architectures and compilation technique, pp 35–46. doi:10.1109/PACT.1996.552553
Vijaykumar TN, Gopal S, Smith JE, Sohi G (2001) Speculative versioning cache. IEEE Trans Parallel Distrib Syst 12:1305–1317. doi:10.1109/71.970565
Article Google Scholar
Ye J, Chen T (2012) Exploring potential parallelism of sequential programs with superblock reordering. In: IEEE HPCC-2012. doi:10.1109/HPCC.2012.12
Ye J, Yan H, Hou H, Chen T (2014) Potential thread-level-parallelism exploration with superblock reordering. Computing 96(6):545–564. doi:10.1007/s00607-014-0387-8
Article Google Scholar
Ye JM, Cao M, Qu Z, Chen T (2012) Regional cache organization for NoC based many-core processors. J Comput Syst Sci. doi:10.1016/j.jcss.2012.05.002
Google Scholar

Download references

Acknowledgments

This research is supported by the National Natural Science Foundation of China under Grant No. 61070001, the National Natural Science Foundation of Zhejiang Province No. LQ12F02017, the Special Funds for Key Program of the China No. 2011ZX0302-004-002 and 2012ZX01031001-003, the Key Science Foundation of Zhejiang Province under Grand No. 2010C11048, Open Fund of Mobile Network Application Technology Key Laboratory of Zhejiang Province, Innovation Group of New Generation of Mobile Internet Software and Services of Zhejiang Province.

Author information

Authors and Affiliations

Zhejiang University, Hangzhou, People’s Republic of China
Ming Chen, John Ye & Tianzhou Chen
Shandong University, Jinan, People’s Republic of China
Hongjun Dai

Authors

Ming Chen
View author publications
You can also search for this author inPubMed Google Scholar
John Ye
View author publications
You can also search for this author inPubMed Google Scholar
Tianzhou Chen
View author publications
You can also search for this author inPubMed Google Scholar
Hongjun Dai
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Ming Chen.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chen, M., Ye, J., Chen, T. et al. Shared write buffer to boost applications on SpMT architecture. J Supercomput 73, 3508–3525 (2017). https://doi.org/10.1007/s11227-016-1710-2

Download citation

Published: 08 April 2016
Issue Date: August 2017
DOI: https://doi.org/10.1007/s11227-016-1710-2

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Shared write buffer to boost applications on SpMT architecture

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Compiler-directed scratchpad memory data transfer optimization for multithreaded applications on a heterogeneous many-core architecture

SRCP: sharing and reuse-aware replacement policy for the partitioned cache in multicore systems

Affinity-Aware Synchronization in Work Stealing Run-Times for NUMA Multi-core Processors

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now