Skip to main content
Log in

REPLICA MBTAC: multithreaded dual-mode processor

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

Prevailing trend in design of chip multiprocessors (CMP) has been that single-core processors are replicated. Therefore, they typically define asynchronous computational model, require heavily locality-aware memory allocation, and present high overheads in intercommunication. This kind of properties make parallel programming very challenging and prone to errors. We introduce our new dual-mode MultiBunched/Threaded Architecture with Chaining (MBTAC) processor core, the main building block of the REPLICA CMP. It provides a modern, sophisticated way for writing general purpose parallel programs backed up by native execution capabilities/realization of key concepts. These include support for cost-efficient machine instruction-level synchronization and uniform shared global memory for enabling easy-to-program memory allocation of data structures and data movement. MBTAC makes use of low-overhead thread-context switching solution; it has parallel computing savvy functional unit organization to exploit inter-thread instruction-level parallelism and highly efficient multioperations. To evaluate the goodness of our proposal, we implemented three MBTAC constellations featuring up to 2048 parallel threads on FPGA, compared it with respect to DLX and Intel’s Core i7 processors. The results point toward high performance in communication-intensive problems, simplified parallel programmability, and regular, implementation-friendly structure.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

Notes

  1. This paper in an extended version of the paper [9] with more detailed description of the MBTAC processor, inclusion of MBTAC for 16-core constellations in FPGA prototype description and evaluation section, and measurements and results. It extends also theoretical work [7] that used a weakly implementable development version of the ECLIPSE architecture [5] for the tests.

References

  1. Ahmad M, Hijaz F, Shi Q, Khan O (2015) Crono: a benchmark suite for multithreaded graph algorithms executing on futuristic multicores. In: Workload Characterization (IISWC), 2015 IEEE International Symposium on, pp 44–55

  2. Dietzfelbinger M, Karlin A, Mehlhorn K, Meyer auf der Heide F, Rohnert H, Tarjan RE (1994) Dynamic perfect hashing: upper and lower bounds. SIAM J Comput 23(4):738–761

    Article  MathSciNet  MATH  Google Scholar 

  3. Engelmann C (1992) Simulationen von PRAM’s, Master’s thesis. Universitat des Saarlandes, FB Informatik

  4. Forsell M (1994) Are multiport memories physically feasible? SIGARCH Comput Archit News 22(4):47–54

    Article  Google Scholar 

  5. Forsell M (2002) A scalable high-performance computing solution for networks on chips. IEEE Micro 22(5):46–55

    Article  Google Scholar 

  6. Forsell M (2004) E—a language for thread-level parallel programming on synchronous shared memory NOCs. WSEAS Trans Comput 3(3):807–812

    Google Scholar 

  7. Forsell M (2011) A PRAM-NUMA model of computation for addressing low-TLP workloads. Int J Netw Comput 1(1):21–35

    Article  Google Scholar 

  8. Forsell M (2011) Performance comparison of some shared memory organizations for 2D mesh-like NOCs. Microprocess Microsyst 35(2):274–284

    Article  Google Scholar 

  9. Forsell M, Roivainen J, Leppänen V (2014) Prototyping the MBTAC processor for the REPLICA CMP. In: Proceedings of the 2014 IEEE International Parallel & Distributed Processing Symposium Workshops, IPDPSW ’14. IEEE Computer Society, Washington, pp 709–716

  10. Hennessy J, Patterson D (1990) Computer architecture: a quantitative approach. Morgan Kaufmann Publishers Inc., Palo Alto

    MATH  Google Scholar 

  11. HiPEAC (2013) The HiPEAC vision for advance computing in Horizon 2020. http://www.hipeac.net/system/files/hp-roadmap-2013.pdf

  12. Intel (2006) Research at Intel From a Few Cores to Many: A Tera-scale Computing Research Overview. White Paper

  13. Jaja J (1992) Introduction to parallel algorithms. Addison-Wesley, Reading

    MATH  Google Scholar 

  14. Keller J, Kessler C, Traff J (2001) Practical PRAM programming. Wiley, New York

    Google Scholar 

  15. Krommydas K, Scogland TRW, Feng W-C (2013) On the programmability and performance of heterogeneous platforms. In: Proceedings of the 2013 International Conference on Parallel and Distributed Systems, ICPADS ’13. IEEE Computer Society, Washington, pp 224–231

  16. Lenoski D, Laudon J, Gharachorloo K, Weber W-D, Gupta A, Hennessy J, Horowitz M, Lam MS (1992) The Stanford Dash multiprocessor. Computer 25(3):63–79

    Article  Google Scholar 

  17. Leppänen V (1996) Studies on the realization of PRAM. Turku Centre for Computer Science, University of Turku, Turku, Finland

  18. Merritt R (2011) Panel: Wall ahead in multicore programming (Multicore Expo). EE Times

  19. Park JJK, Park Y, Mahlke S (2015) Chimera: collaborative preemption for multitasking on a shared GPU. In: Proceedings of ASPLOS

  20. Patterson D (2010) The trouble with multi-core. IEEE Spectr 47(7):28–32

    Article  Google Scholar 

  21. Ranade AG (1991) How to emulate shared memory. J Comput Syst Sci 42(3):307–326

    Article  MathSciNet  MATH  Google Scholar 

  22. Semiconductor Industry Association (2015) International Technology Roadmap for Semiconductors. http://www.semiconductors.org/main/2015_international_technology_roadmap_for_semiconductors_itrs/

  23. Sun Microsystems (2005) Throughput computing: changing the economics and ecology of the data center with innovative SPARC Technology. White paper

  24. Vishkin U (2008) Toward realizing a PRAM-on-a-chip vision. In: Proceedings of the 2007 Conference on Parallel Processing, Euro-Par’07. Springer, Berlin, pp 5–6

  25. Vishkin U (2011) Using simple abstraction to reinvent computing for parallelism. Commun ACM 54(1):75–85

    Article  Google Scholar 

Download references

Acknowledgements

This work was funded by VTT, the Grant 289773 of Academy of Finland and the Celtic-Plus Project CONVINcE.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Martti Forsell.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Forsell, M., Roivainen, J. & Leppänen, V. REPLICA MBTAC: multithreaded dual-mode processor. J Supercomput 74, 1911–1933 (2018). https://doi.org/10.1007/s11227-017-2199-z

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-017-2199-z

Keywords

Navigation