Skip to main content
Log in

Data-Driven Thread Execution on Heterogeneous Processors

  • Published:
International Journal of Parallel Programming Aims and scope Submit manuscript

Abstract

In this paper we report our experience in implementing and evaluating the Data-Driven Multithreading (DDM) model on a heterogeneous multi-core processor. DDM is a non-blocking multithreading model that decouples the synchronization from the computation portions of a program, allowing them to execute asynchronously in a dataflow manner. Thread dependencies are determined by the compiler/programmer while thread scheduling is done dynamically at runtime based on data availability. The target processor for this implementation is the Cell processor. We call this implementation the Data-Driven Multithreading Virtual Machine for the Cell processor (DDM-\(\hbox {VM}_c\)). Thread scheduling is handled in software by the Power Processing Element core of the Cell while the Synergistic Processing Element cores execute the program threads. DDM-\(\hbox {VM}_c\) virtualizes the parallel resources of the Cell, handles the heterogeneity of the cores, manages the Cell memory hierarchy efficiently and supports distributed execution across a cluster of Cell nodes. DDM-\(\hbox {VM}_c\) has been implemented on a single Cell processor with six computation cores, as well as, on a four Cell processor cluster with 24 computation cores. We present an in-depth performance analysis of DDM-\(\hbox {VM}_c\), using a suite of standard computational benchmarks. The evaluation shows that DDM-\(\hbox {VM}_c\) scales well and tolerates scheduling overheads, memory and communication latencies effectively. Furthermore, DDM-\(\hbox {VM}_c\) compares favorably with other platforms targeting the Cell processor, such as, the CellSs and Sequoia.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15

Similar content being viewed by others

References

  1. Arandi, S.: The Data-Driven Multithreading Virtual Machine. Ph.D. thesis (2012)

  2. Arandi, S., Evripidou, P.: Programming multi-core architectures using data-flow techniques. In: SAMOS ’10: Proceedings of the 10th International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation, pp. 152–161. Samos, Greece (2010)

  3. Arandi, S., Evripidou, P.: DDM-VMc: the data-driven multithreading virtual machine for the cell processor. In: Proceedings of the 6th International Conference on High Performance and Embedded Architectures and Compilers, HiPEAC ’11, pp. 25–34. ACM, New York, NY, USA (2011). doi:10.1145/1944862.1944869

  4. Arandi, S., Michael, G., Evripidou, P., Kyriacou, C.: Combining compile and run-time dependency resolution in data-driven multithreading. In: 2011 First Workshop on Data-Flow Execution Models for Extreme Scale Computing (DFM), pp. 45–52. IEEE (2011)

  5. Arul, J.M., Kavi, K.M.: Scalability of scheduled data flow architecture (sdf) with register contexts. In: Proceedings Fifth International Conference on Algorithms and Architectures for Parallel Processing, 2002, pp. 214–221. IEEE (2002)

  6. Arvind, Gostelow, K.P.: The u-interpreter. Computer 15(2), 42–49 (1982). doi:10.1109/MC.1982.1653940

    Article  Google Scholar 

  7. Bellens, P., Pérez, J.M., Badia, R.M., Labarta, J.: CellSs: a Programming Model for the Cell BE Architecture. In: SC ’06: Proceedings of the 2006 ACM/IEEE Conference on Supercomputing, p. 86. ACM, New York, NY, USA (2006). doi:10.1145/1188455.1188546

  8. Budimlic, Z., Chandramowlishwaran, A., Knobe, K., Lowney, G., Sarkar, V., Treggiari, L.: Multi-core implementations of the concurrent collections programming model. In: CPC09: 14th International Workshop on Compilers for Parallel Computers (2009)

  9. Budimlic, Z., Chandramowlishwaran, A.M., Knobe, K., Lowney, G.N., Sarkar, V., Treggiari, L.: Declarative aspects of memory management in the concurrent collections parallel programming model. In: DAMP ’09: Proceedings of the 4th Workshop on Declarative Aspects of Multicore Programming, pp. 47–58. ACM, New York, NY, USA (2009). doi:10.1145/1481839.1481846

  10. Chen, T., Lin, H., Zhang, T.: Orchestrating data transfer for the cell/b.e. processor. In: ICS ’08: Proceedings of the 22nd Annual International Conference on Supercomputing, pp. 289–298. ACM, New York, NY, USA (2008). doi:10.1145/1375527.1375570

  11. Dennis, J.B.: First version of a data flow procedure language. Programming Symposium. In: Proceedings Colloque sur la Programmation, pp. 362–376. Springer, London, UK (1974)

  12. Eichenberger, A.E., O’Brien, K., O’Brien, K.M., Wu, P., Chen, T., Oden, P.H., Prener, D.A., Shepherd, J.C., So, B., Sura, Z., Wang, A., Zhang, T., Zhao, P., Gschwind, M., Archambault, R., Gao, Y., Koo, R.: Using advanced compiler technology to exploit the performance of the Cell Broadband EngineTM architecture. IBM Syst. J. 45(1), 59–84 (2006)

    Article  Google Scholar 

  13. Fatahalian, K., Horn, D.R., Knight, T.J., Leem, L., Houston, M., Park, J.Y., Erez, M., Ren, M., Aiken, A., Dally, W.J., Hanrahan, P.: Sequoia: programming the memory hierarchy. In: SC ’06: Proceedings of the 2006 ACM/IEEE Conference on Supercomputing, p. 83. ACM, New York, NY, USA (2006). doi:10.1145/1188455.1188543

  14. Giorgi, R., Popovic, Z., Puzovic, N.: Dta-c: A decoupled multi-threaded architecture for cmp systems. In: Proceedings of IEEE SBAC-PAD (2007)

  15. Gonzàlez, M., Vujic, N., Martorell, X., Ayguadé, E., Eichenberger, A.E., Chen, T., Sura, Z., Zhang, T., O’Brien, K., O’Brien, K.: Hybrid access-specific software cache techniques for the cell be architecture. In: PACT ’08: Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques, pp. 292–302. ACM, New York, NY, USA (2008). doi:10.1145/1454115.1454156

  16. Inc., R.: Cell Be Porting and Tuning with Rapidmind: A Case Study. White Paper; see http://www.rapidmind.net/case-cell.php

  17. Kahle, J.A., Day, M.N., Hofstee, H.P., Johns, C.R., Maeurer, T.R., Shippy, D.: Introduction to the cell multiprocessor. IBM J. Res. Dev. 49(4/5), 589–604 (2005)

    Article  Google Scholar 

  18. Kumar, R., Tullsen, D.M., Jouppi, N.P., Ranganathan, P.: Heterogeneous chip multiprocessors. Computer 11, 32–38 (2005)

    Article  Google Scholar 

  19. Kyriacou, C., Evripidou, P., Trancoso, P.: Cacheflow: A short-term optimal cache management policy for data driven multithreading. In: Proceedings of EuroPar-04, pp. 561–570 (2004)

  20. Kyriacou, C., Evripidou, P., Trancoso, P.: Data-driven multithreading using conventional microprocessors. IEEE Trans. Parallel Distrib. Syst. 17(10), 1176–1188 (2006). doi:10.1109/TPDS.2006.136

    Article  MATH  Google Scholar 

  21. Matheou, G., Evripidou, P.: Verilog-based simulation of hardware support for data-flow concurrency on multicore systems. In: SAMOS XIII, 2013, pp. 280–287. IEEE (2013)

  22. Matheou, G., Evripidou, P.: Architectural support for data-driven execution. ACM Trans. Archit. Code Optim. 11(4), 52:1–52:25 (2015). doi:10.1145/2686874

    Article  Google Scholar 

  23. Matheou, G., Evripidou, P.: FREDDO: an efficient framework for runtime execution of data-driven objects. Technical Report TR-16-1, Department of Computer Science, University of Cyprus, Nicosia, Cyprus (2016). https://www.cs.ucy.ac.cy/docs/techreports/TR-16-1.pdf

  24. Matheou, G., Evripidou, P.: FREDDO: an efficient framework for runtime execution of data-driven objects. In: Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications (PDPTA), p. 265 (2016)

  25. Matheou, G., Watson, I., Evripidou, P.: Recursion support for the data-driven multithreading model. In: Fifth International Workshop on Data-Flow Execution Models for Extreme Scale Computing (DFM) (in press)

  26. Michael, G., Arandi, S., Evripidou, P.: Data-flow concurrency on distributed multi-core systems. In: Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications (PDPTA), p. 515 (2013)

  27. Nijhuis, M., Bos, H., Bal, H.E., Augonnet, C.: Mapping and synchronizing streaming applications on cell processors. In: HiPEAC ’09: Proceedings of the 4th International Conference on High Performance Embedded Architectures and Compilers, pp. 216–230. Springer, Berlin, Heidelberg (2009)

  28. Olofsson, A., Nordström, T., Ul-Abdin, Z.: Kickstarting high-performance energy-efficient manycore architectures with epiphany. In: 2014 48th Asilomar Conference on Signals, Systems and Computers, pp. 1719–1726. IEEE (2014)

  29. Pérez, J.M., Bellens, P., Badia, R.M., Labarta, J.: Cellss: making it easier to program the Cell Broadband Engine processor. IBM J. Res. Dev. 51(5), 593–604 (2007)

    Article  Google Scholar 

  30. Solinas, M., Badia, R.M., Bodin, F., Cohen, A., Evripidou, P., Faraboschi, P., Fechner, B., Gao, G.R., Garbade, A., Girbal, S., et al.: The TERAFLUX project: exploiting the dataflow paradigm in next generation teradevices. In: Proceedings of the 2013 Euromicro Conference on Digital System Design (DSD), pp. 272–279. IEEE (2013)

  31. Stavrou, K., Nikolaides, M., Pavlou, D., Arandi, S., Evripidou, P., Trancoso, P.: TFlux: a portable platform for data-driven multithreading on commodity multicore systems. In: ICPP ’08: Proceedings of the 2008 37th International Conference on Parallel Processing, pp. 25–34. IEEE Computer Society, Washington, DC, USA (2008). doi:10.1109/ICPP.2008.74

  32. Watson, I., Gurd, J.: A practical data flow computer. Computer 15, 51–57 (1982)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to George Matheou.

Additional information

This work is partially supported by the Cyprus Research Promotion Foundation under Grant PENEK/ENISX/0308/44 and the EU FP7 TeraFlux project.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Arandi, S., Matheou, G., Kyriacou, C. et al. Data-Driven Thread Execution on Heterogeneous Processors. Int J Parallel Prog 46, 198–224 (2018). https://doi.org/10.1007/s10766-016-0486-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10766-016-0486-6

Keywords

Navigation