Abstract
In this paper we report our experience in implementing and evaluating the Data-Driven Multithreading (DDM) model on a heterogeneous multi-core processor. DDM is a non-blocking multithreading model that decouples the synchronization from the computation portions of a program, allowing them to execute asynchronously in a dataflow manner. Thread dependencies are determined by the compiler/programmer while thread scheduling is done dynamically at runtime based on data availability. The target processor for this implementation is the Cell processor. We call this implementation the Data-Driven Multithreading Virtual Machine for the Cell processor (DDM-\(\hbox {VM}_c\)). Thread scheduling is handled in software by the Power Processing Element core of the Cell while the Synergistic Processing Element cores execute the program threads. DDM-\(\hbox {VM}_c\) virtualizes the parallel resources of the Cell, handles the heterogeneity of the cores, manages the Cell memory hierarchy efficiently and supports distributed execution across a cluster of Cell nodes. DDM-\(\hbox {VM}_c\) has been implemented on a single Cell processor with six computation cores, as well as, on a four Cell processor cluster with 24 computation cores. We present an in-depth performance analysis of DDM-\(\hbox {VM}_c\), using a suite of standard computational benchmarks. The evaluation shows that DDM-\(\hbox {VM}_c\) scales well and tolerates scheduling overheads, memory and communication latencies effectively. Furthermore, DDM-\(\hbox {VM}_c\) compares favorably with other platforms targeting the Cell processor, such as, the CellSs and Sequoia.
Similar content being viewed by others
References
Arandi, S.: The Data-Driven Multithreading Virtual Machine. Ph.D. thesis (2012)
Arandi, S., Evripidou, P.: Programming multi-core architectures using data-flow techniques. In: SAMOS ’10: Proceedings of the 10th International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation, pp. 152–161. Samos, Greece (2010)
Arandi, S., Evripidou, P.: DDM-VMc: the data-driven multithreading virtual machine for the cell processor. In: Proceedings of the 6th International Conference on High Performance and Embedded Architectures and Compilers, HiPEAC ’11, pp. 25–34. ACM, New York, NY, USA (2011). doi:10.1145/1944862.1944869
Arandi, S., Michael, G., Evripidou, P., Kyriacou, C.: Combining compile and run-time dependency resolution in data-driven multithreading. In: 2011 First Workshop on Data-Flow Execution Models for Extreme Scale Computing (DFM), pp. 45–52. IEEE (2011)
Arul, J.M., Kavi, K.M.: Scalability of scheduled data flow architecture (sdf) with register contexts. In: Proceedings Fifth International Conference on Algorithms and Architectures for Parallel Processing, 2002, pp. 214–221. IEEE (2002)
Arvind, Gostelow, K.P.: The u-interpreter. Computer 15(2), 42–49 (1982). doi:10.1109/MC.1982.1653940
Bellens, P., Pérez, J.M., Badia, R.M., Labarta, J.: CellSs: a Programming Model for the Cell BE Architecture. In: SC ’06: Proceedings of the 2006 ACM/IEEE Conference on Supercomputing, p. 86. ACM, New York, NY, USA (2006). doi:10.1145/1188455.1188546
Budimlic, Z., Chandramowlishwaran, A., Knobe, K., Lowney, G., Sarkar, V., Treggiari, L.: Multi-core implementations of the concurrent collections programming model. In: CPC09: 14th International Workshop on Compilers for Parallel Computers (2009)
Budimlic, Z., Chandramowlishwaran, A.M., Knobe, K., Lowney, G.N., Sarkar, V., Treggiari, L.: Declarative aspects of memory management in the concurrent collections parallel programming model. In: DAMP ’09: Proceedings of the 4th Workshop on Declarative Aspects of Multicore Programming, pp. 47–58. ACM, New York, NY, USA (2009). doi:10.1145/1481839.1481846
Chen, T., Lin, H., Zhang, T.: Orchestrating data transfer for the cell/b.e. processor. In: ICS ’08: Proceedings of the 22nd Annual International Conference on Supercomputing, pp. 289–298. ACM, New York, NY, USA (2008). doi:10.1145/1375527.1375570
Dennis, J.B.: First version of a data flow procedure language. Programming Symposium. In: Proceedings Colloque sur la Programmation, pp. 362–376. Springer, London, UK (1974)
Eichenberger, A.E., O’Brien, K., O’Brien, K.M., Wu, P., Chen, T., Oden, P.H., Prener, D.A., Shepherd, J.C., So, B., Sura, Z., Wang, A., Zhang, T., Zhao, P., Gschwind, M., Archambault, R., Gao, Y., Koo, R.: Using advanced compiler technology to exploit the performance of the Cell Broadband EngineTM architecture. IBM Syst. J. 45(1), 59–84 (2006)
Fatahalian, K., Horn, D.R., Knight, T.J., Leem, L., Houston, M., Park, J.Y., Erez, M., Ren, M., Aiken, A., Dally, W.J., Hanrahan, P.: Sequoia: programming the memory hierarchy. In: SC ’06: Proceedings of the 2006 ACM/IEEE Conference on Supercomputing, p. 83. ACM, New York, NY, USA (2006). doi:10.1145/1188455.1188543
Giorgi, R., Popovic, Z., Puzovic, N.: Dta-c: A decoupled multi-threaded architecture for cmp systems. In: Proceedings of IEEE SBAC-PAD (2007)
Gonzàlez, M., Vujic, N., Martorell, X., Ayguadé, E., Eichenberger, A.E., Chen, T., Sura, Z., Zhang, T., O’Brien, K., O’Brien, K.: Hybrid access-specific software cache techniques for the cell be architecture. In: PACT ’08: Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques, pp. 292–302. ACM, New York, NY, USA (2008). doi:10.1145/1454115.1454156
Inc., R.: Cell Be Porting and Tuning with Rapidmind: A Case Study. White Paper; see http://www.rapidmind.net/case-cell.php
Kahle, J.A., Day, M.N., Hofstee, H.P., Johns, C.R., Maeurer, T.R., Shippy, D.: Introduction to the cell multiprocessor. IBM J. Res. Dev. 49(4/5), 589–604 (2005)
Kumar, R., Tullsen, D.M., Jouppi, N.P., Ranganathan, P.: Heterogeneous chip multiprocessors. Computer 11, 32–38 (2005)
Kyriacou, C., Evripidou, P., Trancoso, P.: Cacheflow: A short-term optimal cache management policy for data driven multithreading. In: Proceedings of EuroPar-04, pp. 561–570 (2004)
Kyriacou, C., Evripidou, P., Trancoso, P.: Data-driven multithreading using conventional microprocessors. IEEE Trans. Parallel Distrib. Syst. 17(10), 1176–1188 (2006). doi:10.1109/TPDS.2006.136
Matheou, G., Evripidou, P.: Verilog-based simulation of hardware support for data-flow concurrency on multicore systems. In: SAMOS XIII, 2013, pp. 280–287. IEEE (2013)
Matheou, G., Evripidou, P.: Architectural support for data-driven execution. ACM Trans. Archit. Code Optim. 11(4), 52:1–52:25 (2015). doi:10.1145/2686874
Matheou, G., Evripidou, P.: FREDDO: an efficient framework for runtime execution of data-driven objects. Technical Report TR-16-1, Department of Computer Science, University of Cyprus, Nicosia, Cyprus (2016). https://www.cs.ucy.ac.cy/docs/techreports/TR-16-1.pdf
Matheou, G., Evripidou, P.: FREDDO: an efficient framework for runtime execution of data-driven objects. In: Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications (PDPTA), p. 265 (2016)
Matheou, G., Watson, I., Evripidou, P.: Recursion support for the data-driven multithreading model. In: Fifth International Workshop on Data-Flow Execution Models for Extreme Scale Computing (DFM) (in press)
Michael, G., Arandi, S., Evripidou, P.: Data-flow concurrency on distributed multi-core systems. In: Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications (PDPTA), p. 515 (2013)
Nijhuis, M., Bos, H., Bal, H.E., Augonnet, C.: Mapping and synchronizing streaming applications on cell processors. In: HiPEAC ’09: Proceedings of the 4th International Conference on High Performance Embedded Architectures and Compilers, pp. 216–230. Springer, Berlin, Heidelberg (2009)
Olofsson, A., Nordström, T., Ul-Abdin, Z.: Kickstarting high-performance energy-efficient manycore architectures with epiphany. In: 2014 48th Asilomar Conference on Signals, Systems and Computers, pp. 1719–1726. IEEE (2014)
Pérez, J.M., Bellens, P., Badia, R.M., Labarta, J.: Cellss: making it easier to program the Cell Broadband Engine processor. IBM J. Res. Dev. 51(5), 593–604 (2007)
Solinas, M., Badia, R.M., Bodin, F., Cohen, A., Evripidou, P., Faraboschi, P., Fechner, B., Gao, G.R., Garbade, A., Girbal, S., et al.: The TERAFLUX project: exploiting the dataflow paradigm in next generation teradevices. In: Proceedings of the 2013 Euromicro Conference on Digital System Design (DSD), pp. 272–279. IEEE (2013)
Stavrou, K., Nikolaides, M., Pavlou, D., Arandi, S., Evripidou, P., Trancoso, P.: TFlux: a portable platform for data-driven multithreading on commodity multicore systems. In: ICPP ’08: Proceedings of the 2008 37th International Conference on Parallel Processing, pp. 25–34. IEEE Computer Society, Washington, DC, USA (2008). doi:10.1109/ICPP.2008.74
Watson, I., Gurd, J.: A practical data flow computer. Computer 15, 51–57 (1982)
Author information
Authors and Affiliations
Corresponding author
Additional information
This work is partially supported by the Cyprus Research Promotion Foundation under Grant PENEK/ENISX/0308/44 and the EU FP7 TeraFlux project.
Rights and permissions
About this article
Cite this article
Arandi, S., Matheou, G., Kyriacou, C. et al. Data-Driven Thread Execution on Heterogeneous Processors. Int J Parallel Prog 46, 198–224 (2018). https://doi.org/10.1007/s10766-016-0486-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10766-016-0486-6