ABSTRACT
The current trend in development of parallel programming models is to combine different well established models into a single programming model in order to support efficient implementation of a wide range of real world applications. The dataflow model has particularly managed to recapture the interest of the research community due to its ability to express parallelism efficiently. Thus, a number of recently proposed hybrid parallel programming models combine dataflow and traditional shared memory. Their findings have influenced the introduction of task dependency in the recently published OpenMP 4.0 standard.
In this paper, we present DaSH - the first comprehensive benchmark suite for hybrid dataflow and shared memory programming models. DaSH features 11 benchmarks, each representing one of the Berkeley dwarfs that capture patterns of communication and computation common to a wide range of emerging applications. We also include sequential and shared-memory implementations based on OpenMP and TBB to facilitate easy comparison between hybrid dataflow implementations and traditional shared memory implementations based on work-sharing and/or tasks. Finally, we use DaSH to evaluate three different hybrid dataflow models, identify their advantages and shortcomings, and motivate further research on their characteristics.
- Amer, A., Maruyama, N., Pericàs, M., Taura, K., Yokota, R., and Matsuoka, S. Fork-Join and Data-Driven Execution Models on Multi-core Architectures: Case Study of the FMM. In Proc. of the 2013 International Supercomputing Conference (ISC'13) (Leipzig, Germany 2013), IEEE, 255--266.Google ScholarCross Ref
- Arvind and Culler, D. E. Dataflow architectures. In Annual review of computer science. Annual Reviews Inc., Palo Alto, CA, USA, 1986. Google ScholarDigital Library
- Asanovic, K., Bodik, R., Demmel, J. et al. A view of the parallel computing landscape. Communications of the ACM, 52, 10 (October 2009), 56--67. Google ScholarDigital Library
- Barcelona Supercomputing Center. Paraver Performance Analysis Tool. January 2014. http://www.bsc.es/computer-sciences/performance-tools/paraver.Google Scholar
- Barnes, J. and Hut, P. A hierarchical O(NlogN) force calculation algorithm. Nature, 324, 4 (December 1986), 446--449.Google ScholarCross Ref
- Dennis, J. B. and Misunas, D. P. A preliminary architecture for a basic data-flow processor. SIGARCH Computer Architecture News, 3, 4 (1974), 126--132. Google ScholarDigital Library
- Dooley, I., Mei, C., Lifflander, J., and Kale, L. V. A study of memory-aware scheduling in message driven parallel programs. In Proc. of the 17th International Conference on High Performance Computing (HiPC) (Goa, 2010), IEEE, 1--10.Google ScholarCross Ref
- Gajinov, V., Stipic, S., Unsal, O. S., Harris, T., Ayguade, E., and Cristal, A. Integrating Dataflow Abstractions into the Shared Memory Model. In Proc. of the 24th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD) (New York, 2012), IEEE, 243--251. Google ScholarDigital Library
- Gilmore, P. C. and Gomory, R. E. A linear programming approach to the cutting stock problem. Operations Research, 9, 6 (1961), 849--859.Google ScholarDigital Library
- Goodman, D., Khan, S., Seaton, C., Guskov, Y., Khan, B., Lujan, M., and Watson, I. DFScala: High Level Dataflow Support for Scala. In the 2nd Workshop on Data-Flow Execution Models for Extreme Scala Computing (Minneapolis, USA 2012), 18--26. Google ScholarDigital Library
- Harris, T., Larus, J., and Rajwar, R. Transactional Memory (Second Edition). Morgan & Claypool Publishers, 2010. Google ScholarDigital Library
- hfcca.py tool. January 2014. https://github.com/terryyin/hfcca.Google Scholar
- Intel. Threading Building Blocks - version 4.2. October 2013. http://software.intel.com/sites/products/documentation/doclib/tbb_sa/help/index.htm#reference/reference.htm.Google Scholar
- Kale, L. V. and Krishnan, S. CHARM++: a portable concurrent object oriented system based on C++. In Proc. of the 8th Annual Conference on Object-oriented Programming Systems, Languages, and Applications (OOPSLA '93) (New York, USA 1993), ACM, 91--108. Google ScholarDigital Library
- Karypis, G. and Kumar, V. A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs. SIAM Journal on Scientific Computing, 20, 1 (Decemebr 1998), 359--392. Google ScholarDigital Library
- Lauderdale, C., Glines, M., Zhao, J., Spiotta, A., and Khan, R. SWARM: A unified framework for parallel-for, task dataflow, and distributed graph traversal. ET International, Inc., Newark, USA, 2013.Google Scholar
- Microsoft. TPL Dataflow Library. January 2014. http://msdn.microsoft.com/en-us/library/hh228603.aspx.Google Scholar
- Orozco, D., Garcia, E., Pavel, R., Khan, R., and Gao, G. Tideflow: The time iterated dependency flow execution model. In the 1st Workshop on Data-Flow Execution Models for Extreme Scale Computing (DFM) (Galveston Island, USA 2011), IEEE, 1--9. Google ScholarDigital Library
- Perez, J. M., Badia, R. M., and Labarta, J. A dependency-aware task-based programming environment for multi-core architectures. In Proc. of the 2008 International Conference on Cluster Computing (Tsukuba. Japan 2008), IEEE, 142--151.Google ScholarCross Ref
- Rabiner, L. R. A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition. Proceedings of the IEEE, 77, 2 (February 1989), 257--286.Google Scholar
- Seaton, C., Goodman, D., Lujan, M., and Watson, I. Applying dataflow and transactions to Lee routing. In the 5th Workshop on Programmability Issues for Heterogeneous Multicores (MULTIPROG) (Paris, France 2012).Google Scholar
- Stanford Computer Graphics Laboratory. 3D Scanning Repository. February 2014. http://graphics.stanford.edu/data/3Dscanrep/.Google Scholar
- Stavrou, K., Kyriacou, C., Evripidou, P., and Trancoso, P. Chip multiprocessor based on data-driven multithreading model. International Journal of High Performance Systems Architecture, 1, 1 (2007), 34--43. Google ScholarDigital Library
- Tseng, H. and Tullsen, D. M. Software data-triggered threads. In Proc. of the International Conference on Object Oriented Programming Systems Languages and Applications (OOPSLA'12) (New York, USA 2012), ACM, 703--716. Google ScholarDigital Library
Index Terms
DaSH: a benchmark suite for hybrid dataflow and shared memory programming models: with comparative evaluation of three hybrid dataflow models
Recommendations
A Case Study of Hybrid Dataflow and Shared-Memory Programming Models: Dependency-Based Parallel Game Engine
SBAC-PAD '14: Proceedings of the 2014 IEEE 26th International Symposium on Computer Architecture and High Performance ComputingRecently proposed hybrid dataflow and shared memory programming models combine these two underlying models in order to support a wider range of problems naturally. The effectiveness of such hybrid models for parallel implementations of dense and sparse ...
Integrating Dataflow Abstractions into the Shared Memory Model
SBAC-PAD '12: Proceedings of the 2012 IEEE 24th International Symposium on Computer Architecture and High Performance ComputingIn this paper we present Atomic Dataflow model (ADF), a new task-based parallel programming model for C/C++ which integrates dataflow abstractions into the shared memory programming model. The ADF model provides pragma directives that allow a programmer ...
Comments