Providing high level tools for parallel programming while sustaining a high level of performance has been a challenge that techniques like Domain Specific Embedded Languages try to solve. In previous works, we investigated the design of such a DSEL—NT\(^2\)—providing a Matlab -like syntax for parallel numerical computations inside a C++ library. In this paper, we show how NT\(^2\!\) has been redesigned for shared memory systems in an extensible and portable way. The new NT\(^2\!\) design relies on a tiered Parallel Skeleton system built using asynchronous task management and automatic compile-time taskification of user level code. We describe how this system can operate various shared memory runtimes and evaluate the design by using two benchmarks implementing linear algebra algorithms.

Similar content being viewed by others
As defined by Czarnecki et al. [20].
The Parallel Linear Algebra for Multicore Architectures [2, 13] is a software framework that rewrites a major part of LAPACK subroutines to take advantage of multicore architectures. PLASMA implements tile algorithms and uses both tile data layout and dynamic task scheduling to achieve good performance.
Abrahams, D., Gurtovoy, A.: C++ Template Metaprogramming: concepts, Tools, and Techniques from Boost and Beyond. Pearson Education, Boston (2004)
Agullo, E., Dongarra, J., Hadri, B., Kurzak, J., Langou, J., Langou, J., Ltaief, H., Luszczek, P., YarKhan, A.: Plasma users guide. Techn. Rep., Electrical Engineering and Computer Science Department, University of Tennessee. http://icl.cs.utk.edu/projectsfiles/plasma/pdf/usersguide.pdf (2009)
Aldinucci, M., Danelutto, M., Dazzi, P.: Muskel: an expandable skeleton environment. Scal. Comput. Pract. Exp. 8(4), 325–341 (2001)
Aldinucci, M., Danelutto, M., Dnnweber, J.: Optimization techniques for implementing parallel skeletons in grid environments. In: Gorlatch, S. (ed.) Proceedings of CMPP: International Workshop on Constructive Methods for Parallel Programming, pp. 35–47. Universitat Munster, Stirling (2004)
Aldinucci, M., Danelutto, M., Kilpatrick, P., Torquati, M.: Fastflow: high-level and efficient streaming on multi-core. In: Pllana, S., Xhafa, F. (eds.) Programming Multi-core and Many-core Computing Systems, chap 13. Parallel and Distributed Computing. Wiley (2014)
An, P., Jula, A., Rus, S., Saunders, S., Smith, T., Tanase, G.,Thomas, N., Amato, N., Rauchwerger, L.: STAPL: an adaptive, generic parallel C++ library. In: Dietz, H.G. (ed.) Languages and Compilers for Parallel Computing. Lecture Notes in Computer Science, vol. 2624, pp. 193–208. Springer, Berlin, Heidelberg (2003)
Ayguadé, E., Copty, N., Duran, A., Hoeflinger, J., Lin, Y., Massaioli, F., Teruel, X., Unnikrishnan, P., Zhang, G.: The design of openmp tasks. Parallel Distrib. Syst. IEEE Trans. 20(3), 404–418 (2009)
Baker Jr, H.C., Hewitt, C.: The incremental garbage collection of processes. ACM SIGART Bull. 12, 55–59 (1977)
Benoit, A., Cole, M., Gilmore, S., Hillston, J.: Flexible skeletal programming with Eskel. In: Proceedings of the 11th International Euro-Par Conference on Parallel Processing, Lisbon, Portugal. Euro-Par’05, pp. 761–770. Springer-Verlag, Berlin, Heidelberg (2005)
Black, F., Scholes, M.: The pricing of options and corporate liabilities. J. Polit. Econ. 81(3), 637–654 (1973)
Bosilca, G., Bouteiller, A., Danalis, A., Herault, T., Dongarra, J.: From serial loops to parallel execution on distributed systems. In: Kaklamanis, C., Papatheodorou, T., Spirakis, P.G. (eds.) Euro-Par 2012 Parallel Processing. Lecture Notes in Computer Science, vol. 7484, pp. 246–257. Springer, Berlin, Heidelberg (2012)
Harshvardhan, A. Buss, Papadopoulos, I., Pearce, O., Smith, T., Tanase, G., Thomas, N., Xu, X., Bianco, M., Amato, N.M., Rauchwerger, L.: Stapl: standard template adaptive parallel library. In: Proceedings of the 3rd Annual Haifa Experimental Systems Conference, SYSTOR ’10, pp. 14:1–14:10, ACM, New York, (2010)
Buttari, A., Langou, J., Kurzak, J., Dongarra, J.: A class of parallel tiled linear algebra algorithms for multicore architectures. Parallel Comput. 35(1), 38–53 (2009)
Chamberlain, B.L., Callahan, D., Zima, H.P.: Parallel programmability and the chapel language. Int. J. High Perform. Comput. Appl. 21(3), 291–312 (2007)
Ching, W.-M., Zheng, D.: Automatic parallelization of array-oriented programs for a multi-core machine. Int. J. Parallel Progr. 40(5), 514–531 (2012)
Mysen, C., Gustafsson, N., Austern, M., Yasskin, J.: N3785: executors and schedulers, revision 3. Technical report. http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2013/n3785.pdf (2013)
Ciechanowicz, P., Kuchen, H.: Enhancing muesli’s data parallel skeletons for multi-core computer architectures. In: High Performance Computing and Communications (HPCC), 12th IEEE International Conference on, pp. 108–113. IEEE (2010)
Cole, M.: Bringing skeletons out of the closet: a pragmatic manifesto for skeletal parallel programming. Parallel Comput. 30(3), 389–406 (2004)
Cole, M.I.: Algorithmic skeletons: structured management of parallel computation. Pitman London, (1989)
Czarnecki, K., Eisenecker, U.W., Glück, R., Vandevoorde, D., Veldhuizen, T.L.: Generative programming and active libraries. In Generic Programming, pp. 25–39 (1998)
Dawes, B., Abrahams, D., Rivera. R.: Boost C++ Libraries. http://www.boost.org (2009)
Emoto, K., Matsuzaki, K., Hu, Z., Takeichi, M.: Domain-specific optimization strategy for skeleton programs. In: Kermarrec, A.-M., Boug, L., Priol, T. (eds.) Euro-Par 2007 Parallel Processing. Lecture Notes in Computer Science, vol. 4641, pp. 705–714. Springer, Berlin (2007)
Estérie, P., Gaunard, M., Falcou, J., Lapresté, J.-T., Rozoy, B.: Boost. simd: generic programming for portable simdization. In: Proceedings of the 21st International Conference on Parallel Architectures and Compilation Techniques, pp. 431–432. ACM, (2012)
Falcou, J., Gaunard, M., Lapresté, J.-T., The numerical template toolbox. http://www.github.com/MetaScale/nt2 (2013)
Falcou, J., Sérot, J., Pech, L., Lapresté, J.-T.: Meta-programming applied to automatic smp parallelization of linear algebra code. In Euro-Par 2008-Parallel Processing, pp. 729–738. Springer, Berlin, (2008)
Friedman, D.P., Wise, D.S.: The impact of applicative programming on multiprocessing. Indiana University, Computer Science Department (1976)
Grelck, C., Scholz, S.-B.: Saca functional array language for efficient multi-threaded execution. Int. J. Parallel Progr. 34(4), 383–427 (2006)
Hudak, P.: Building domain-specific embedded languages. ACM Comput. Surv. 28(4es), 196 (1996)
Kaiser, H., Brodowicz, M., Sterling, T.: Parallex an advanced parallel execution model for scaling-impaired applications. In: Parallel Processing Workshops, 2009. ICPPW’09. International Conference on, pp. 394–401. IEEE, (2009)
Kale, L.V., and Krishnan, S.: CHARM++: A Portable Concurrent Object Oriented System Based on C++, 28(10). ACM, (1993)
Kuchen, H.: A Skeleton Library. Springer, Berlin (2002)
Niebler, E.: Proto : A compiler construction toolkit for DSELs. In: Proceedings of ACM SIGPLAN Symposium on Library-Centric Software Design, (2007)
Gustafsson, N., Laksberg, A., Sutter, H., Mithani, S.: N3857: Improvements to std::future \(<\)T\(>\) and related APIs. Technical report. http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2014/n3857.pdf (2014)
OpenMP Architecture Review Board. OpenMP application program interface version 4, (2013)
Reinders, J.: Intel Threading Building Blocks: outfitting C++ for Multi-Core Processor Parallelism. O’Reilly Media, California (2010)
Spinellis, D.: Notable design patterns for domain-specific languages. J. Syst. Softw. 56(1), 91–99 (2001)
The C++ Standards Committee. ISO/IEC 14882:2011, Standard for Programming Language C++. Technical report. http://www.open-std.org/jtc1/sc22/wg21 (2011)
The C++ Standards Committee. N3797: Working Draft, Standard for Programming Language C++. Technical report. http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2013/n3797.pdf (2013)
Tratt, L.: Model transformations and tool integration. Softw. Syst. Model. 4(2), 112–122 (2005)
Vandevoorde, D., Josuttis, N.M.: C++ Templates. Addison-Wesley Longman Publishing Co, Boston (2002)
Veldhuizen, T.: Expression templates. C++ Report 7, 26–31 (1995)
Escriba, V.B.J.: N3865: More Improvements to std::future \(<\)T\(>\). Technical report. http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2014/n3865.pdf (2014)
Yarkhan, A., Kurzak, J., and Dongarra, J.: Quark users guide. Technical report, Technical Report April, Electrical Engineering and Computer Science, Innovative Computing Laboratory, University of Tenessee (2011)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Tran Tan, A., Falcou, J., Etiemble, D. et al. Automatic Task-Based Code Generation for High Performance Domain Specific Embedded Language. Int J Parallel Prog 44, 449–465 (2016). https://doi.org/10.1007/s10766-015-0354-9
Issue Date:
DOI: https://doi.org/10.1007/s10766-015-0354-9