REPLICA MBTAC: multithreaded dual-mode processor

Forsell, Martti; Roivainen, Jussi; Leppänen, Ville

doi:10.1007/s11227-017-2199-z

REPLICA MBTAC: multithreaded dual-mode processor

Published: 16 December 2017

Volume 74, pages 1911–1933, (2018)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

275 Accesses
3 Citations
Explore all metrics

Abstract

Prevailing trend in design of chip multiprocessors (CMP) has been that single-core processors are replicated. Therefore, they typically define asynchronous computational model, require heavily locality-aware memory allocation, and present high overheads in intercommunication. This kind of properties make parallel programming very challenging and prone to errors. We introduce our new dual-mode MultiBunched/Threaded Architecture with Chaining (MBTAC) processor core, the main building block of the REPLICA CMP. It provides a modern, sophisticated way for writing general purpose parallel programs backed up by native execution capabilities/realization of key concepts. These include support for cost-efficient machine instruction-level synchronization and uniform shared global memory for enabling easy-to-program memory allocation of data structures and data movement. MBTAC makes use of low-overhead thread-context switching solution; it has parallel computing savvy functional unit organization to exploit inter-thread instruction-level parallelism and highly efficient multioperations. To evaluate the goodness of our proposal, we implemented three MBTAC constellations featuring up to 2048 parallel threads on FPGA, compared it with respect to DLX and Intel’s Core i7 processors. The results point toward high performance in communication-intensive problems, simplified parallel programmability, and regular, implementation-friendly structure.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Breaking the von Neumann bottleneck: architecture-level processing-in-memory technology

Article 27 April 2021

Xingqi Zou, Sheng Xu, … Yinhe Han

Balancing Tracking Granularity and Parallelism in Many-Task Systems: The Horizons Approach

Article Open access 06 April 2024

Peter Thoman & Philip Salzmann

Efficient High-Level Programming in Plain Java

Article 05 December 2022

Rui S. Silva & João L. Sobral

Notes

This paper in an extended version of the paper [9] with more detailed description of the MBTAC processor, inclusion of MBTAC for 16-core constellations in FPGA prototype description and evaluation section, and measurements and results. It extends also theoretical work [7] that used a weakly implementable development version of the ECLIPSE architecture [5] for the tests.

References

Ahmad M, Hijaz F, Shi Q, Khan O (2015) Crono: a benchmark suite for multithreaded graph algorithms executing on futuristic multicores. In: Workload Characterization (IISWC), 2015 IEEE International Symposium on, pp 44–55
Dietzfelbinger M, Karlin A, Mehlhorn K, Meyer auf der Heide F, Rohnert H, Tarjan RE (1994) Dynamic perfect hashing: upper and lower bounds. SIAM J Comput 23(4):738–761
Article MathSciNet MATH Google Scholar
Engelmann C (1992) Simulationen von PRAM’s, Master’s thesis. Universitat des Saarlandes, FB Informatik
Forsell M (1994) Are multiport memories physically feasible? SIGARCH Comput Archit News 22(4):47–54
Article Google Scholar
Forsell M (2002) A scalable high-performance computing solution for networks on chips. IEEE Micro 22(5):46–55
Article Google Scholar
Forsell M (2004) E—a language for thread-level parallel programming on synchronous shared memory NOCs. WSEAS Trans Comput 3(3):807–812
Google Scholar
Forsell M (2011) A PRAM-NUMA model of computation for addressing low-TLP workloads. Int J Netw Comput 1(1):21–35
Article Google Scholar
Forsell M (2011) Performance comparison of some shared memory organizations for 2D mesh-like NOCs. Microprocess Microsyst 35(2):274–284
Article Google Scholar
Forsell M, Roivainen J, Leppänen V (2014) Prototyping the MBTAC processor for the REPLICA CMP. In: Proceedings of the 2014 IEEE International Parallel & Distributed Processing Symposium Workshops, IPDPSW ’14. IEEE Computer Society, Washington, pp 709–716
Hennessy J, Patterson D (1990) Computer architecture: a quantitative approach. Morgan Kaufmann Publishers Inc., Palo Alto
MATH Google Scholar
HiPEAC (2013) The HiPEAC vision for advance computing in Horizon 2020. http://www.hipeac.net/system/files/hp-roadmap-2013.pdf
Intel (2006) Research at Intel From a Few Cores to Many: A Tera-scale Computing Research Overview. White Paper
Jaja J (1992) Introduction to parallel algorithms. Addison-Wesley, Reading
MATH Google Scholar
Keller J, Kessler C, Traff J (2001) Practical PRAM programming. Wiley, New York
Google Scholar
Krommydas K, Scogland TRW, Feng W-C (2013) On the programmability and performance of heterogeneous platforms. In: Proceedings of the 2013 International Conference on Parallel and Distributed Systems, ICPADS ’13. IEEE Computer Society, Washington, pp 224–231
Lenoski D, Laudon J, Gharachorloo K, Weber W-D, Gupta A, Hennessy J, Horowitz M, Lam MS (1992) The Stanford Dash multiprocessor. Computer 25(3):63–79
Article Google Scholar
Leppänen V (1996) Studies on the realization of PRAM. Turku Centre for Computer Science, University of Turku, Turku, Finland
Merritt R (2011) Panel: Wall ahead in multicore programming (Multicore Expo). EE Times
Park JJK, Park Y, Mahlke S (2015) Chimera: collaborative preemption for multitasking on a shared GPU. In: Proceedings of ASPLOS
Patterson D (2010) The trouble with multi-core. IEEE Spectr 47(7):28–32
Article Google Scholar
Ranade AG (1991) How to emulate shared memory. J Comput Syst Sci 42(3):307–326
Article MathSciNet MATH Google Scholar
Semiconductor Industry Association (2015) International Technology Roadmap for Semiconductors. http://www.semiconductors.org/main/2015_international_technology_roadmap_for_semiconductors_itrs/
Sun Microsystems (2005) Throughput computing: changing the economics and ecology of the data center with innovative SPARC Technology. White paper
Vishkin U (2008) Toward realizing a PRAM-on-a-chip vision. In: Proceedings of the 2007 Conference on Parallel Processing, Euro-Par’07. Springer, Berlin, pp 5–6
Vishkin U (2011) Using simple abstraction to reinvent computing for parallelism. Commun ACM 54(1):75–85
Article Google Scholar

Download references

Acknowledgements

This work was funded by VTT, the Grant 289773 of Academy of Finland and the Celtic-Plus Project CONVINcE.

Author information

Authors and Affiliations

Computing Platforms Team, VTT, Box 1100, 90571, Oulu, Finland
Martti Forsell & Jussi Roivainen
Department of Information Technology, University of Turku, 20014, Turku, Finland
Ville Leppänen

Authors

Martti Forsell
View author publications
You can also search for this author in PubMed Google Scholar
Jussi Roivainen
View author publications
You can also search for this author in PubMed Google Scholar
Ville Leppänen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Martti Forsell.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Forsell, M., Roivainen, J. & Leppänen, V. REPLICA MBTAC: multithreaded dual-mode processor. J Supercomput 74, 1911–1933 (2018). https://doi.org/10.1007/s11227-017-2199-z

Download citation

Published: 16 December 2017
Issue Date: May 2018
DOI: https://doi.org/10.1007/s11227-017-2199-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

REPLICA MBTAC: multithreaded dual-mode processor

Abstract

Access this article

Similar content being viewed by others

Breaking the von Neumann bottleneck: architecture-level processing-in-memory technology

Balancing Tracking Granularity and Parallelism in Many-Task Systems: The Horizons Approach

Efficient High-Level Programming in Plain Java

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

REPLICA MBTAC: multithreaded dual-mode processor

Abstract

Access this article

Similar content being viewed by others

Breaking the von Neumann bottleneck: architecture-level processing-in-memory technology

Balancing Tracking Granularity and Parallelism in Many-Task Systems: The Horizons Approach

Efficient High-Level Programming in Plain Java

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation