Abstract
Programming correct parallel software in a cost-effective way is a challenging task requiring a high degree of expertise. As an attempt to overcoming the pitfalls undermining parallel programming, this paper proposes a pattern-based, formally grounded tool that eases writing parallel code by automatically generating platform-dependent programs from high-level, platform-independent specifications. The tool builds on three pillars: (1) a platform-agnostic parallel programming pattern, called PCR, (2) a formal translation of PCRs into a parallel execution model, namely Concurrent Collections (CnC), and (3) a program rewriting engine that generates code for a concrete runtime implementing CnC. The experimental evaluation carried out gives evidence that code produced from PCRs can deliver performance metrics which are comparable with handwritten code but with assured correctness. The technical contribution of this paper is threefold. First, it discusses a parallel programming pattern, called PCR, consisting of producers, consumers, and reducers which operate concurrently on data sets. To favor correctness, the semantics of PCRs is mathematically defined in terms of the formalism FXML. PCRs are shown to be composable and to seamlessly subsume other well-known parallel programming patterns, thus providing a framework for heterogeneous designs. Second, it formally shows how the PCR pattern can be correctly implemented in terms of a more concrete parallel execution model. Third, it proposes a platform-agnostic C++ template library to express PCRs. It presents a prototype source-to-source compilation tool, based on C++ template rewriting, which automatically generates parallel implementations relying on the Intel CnCC++ library.
Similar content being viewed by others
Notes
An in-depth discussion of parallel programming languages is out of the scope of this paper.
Cyclic composition through recursion is discussed in Sect. 3.
References
Aldinucci, M., Campa, S., Danelutto, M., Kilpatrick, P., Torquati, M.: Design patterns percolating to parallel programming framework implementation. Int. J. Parallel Program. 42(6), 1012–1031 (2014)
Aldinucci, M., Danelutto, M., Kilpatrick, P., Torquati, M.: Fastflow: high-level and efficient streaming on multi-core. Programming multi-core and many-core computing systems, parallel and distributed computing (2014)
Anand, C.K., Kahl, W.: Synthesizing and verifying multicore parallelism in categories of nested code graphs. In: Process Algebra for Parallel and Distributed Processing, vol. 2, pp. 3–45. Chapman & Hall (2009)
Asanovic, K., Bodik, R., Demmel, J., Keaveny, T., Keutzer, K., Kubiatowicz, J., Morgan, N., Patterson, D., Sen, K., Wawrzynek, J., Wessel, D., Yelick, K.: A view of the parallel computing landscape. CACM 52(10), 56–67 (2009)
Assayad, I., Bertin, V., Defaut, F.-X., Gerner, P., Quévreux, O., Yovine, S.: Jahuel: a formal framework for software synthesis. In Formal Methods and Software Engineering, pp. 204–218. Springer (2005)
Belikov, E., Deligiannis, P., Totoo, P., Aljabri, M., Loidl, H.-W.: A survey of high-level parallel programming models (2013)
Blumofe, R.D., Joerg, C.F., Kuszmaul, B.C., Leiserson, C.E., Randall, K.H., Zhou, Y.: Cilk: an efficient multithreaded runtime system. J. Parallel Distrib. Comput. 37(1), 55–69 (1996)
Buchty, R., Karl, V., Weiss, W., Weiss, J.-P.: A survey on hardware-aware and heterogeneous computing on multicore processors and accelerators. Concurr. Comput. Pract. Exp. 24(7), 663–675 (2012)
Budimlić, Z., et al.: Concurrent collections. Sci. Program. 18(3), 203–217 (2010)
Chamberlain, B.L.: Chapel. MIT Press, Cambridge (2015)
Ciechanowicz, P., Poldner, M., Kuchen, H.: The Münster skeleton library Muesli—a comprehensive overview. Technical report, Münster (2009)
Cole, M.I.: Algorithmic skeletons: structured management of parallel computation. Pitman, London (1989)
Dagum, L., Menon, R.: OpenMP: an industry standard API for shared-memory programming. IEEE Comput. Sci. Eng. 5(1), 46–55 (1998)
Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. CACM 51(1), 107–113 (2008)
Ekanayake, J., Li, H., Zhang, B., Gunarathne, T., Bae, S.-H., Qiu, J., Fox, G.: Twister: a runtime for iterative mapreduce. In: 19th ACM International Symposium on High Performance Distributed Computing, pp. 810–818, New York, NY, USA. ACM (2010)
Enmyren, J., Kessler, C.W.: SkePU: a multi-backend skeleton programming library for multi-GPU systems. In Proceedings of 4th International Workshop on High-level parallel programming and applications, pp. 5–14. ACM (2010)
Falcou, J., Sérot, J.: Formal semantics applied to the implementation of a skeleton-based parallel programming library. Parallel Comput. Archit. Algorithms Appl. 38, 243–252 (2008)
Falcou, J., Sérot, J., Chateau, T., Lapresté, J.-T.: Quaff: efficient c++ design for parallel skeletons. Parallel Comput. 32(7), 604–615 (2006)
González-Vélez, H., Cole, M.: Adaptive structured parallelism for distributed heterogeneous architectures: a methodological approach with pipelines and farms. Concurr. Comput. Pract. Exp. 22(15), 2073–2094 (2010)
Horacio, G.-V., Leyton, M.: A survey of algorithmic skeleton frameworks: high-level structured parallel programming enablers. Softw. Pract. Exp. 40(12), 1135–1160 (2010)
Grelck, C., Scholz, S.-B., Shafarenko, A.: Asynchronous stream processing with s-net. Int. J. Parallel Program. 38(1), 38–67 (2010)
Hempel, R.: The MPI standard for message passing. In: International Conference on High-Performance Computing and Networking, pp. 247–252. Springer (1994)
Hoare, C.A.R.: Communicating sequential processes. In: The Origin of Concurrent Programming, pp. 413–443. Springer (1978)
Hoefler, T., Belli, R.: Scientific benchmarking of parallel computing systems: twelve ways to tell the masses when reporting performance results. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC ’15, pp. 73:1–73:12, New York, NY, USA, ACM (2015)
Imam, S., Sarkar, V.: The Eureka programming model for speculative task parallelism. In: LIPIcs-Leibniz Internatinal Proceedings in Informatics, vol. 37. Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik (2015)
Isard, M., Budiu, M., Yu, Y., Birrell, A., Fetterly, D.: Dryad: distributed data-parallel programs from sequential building blocks. In: ACM SIGOPS Operating Systems Review, vol. 41, No. 3, pp. 59–72. ACM (2007)
Javed, N., Loulergue, F.: OSL: optimized Bulk Synchronous Parallel Skeletons on Distributed Arrays. Springer, Berlin (2009)
Leijen, D., Schulte, W., Burckhardt, S.: The design of a task parallel library. ACM Sigplan Notices 44(10), 227–242 (2009)
Lu, S., Park, S., Seo, E., Zhou, Y.: Learning from mistakes: a comprehensive study on real world concurrency bug characteristics. In: ACM Sigplan Notices, vol. 43, No. 3, pp. 329–339. ACM (2008)
McCool, M., Reinders, J., Robison, A.: Structured Parallel Programming: Patterns for Efficient Computation. Elsevier, Amsterdam (2012)
Reinders, J.: Intel Threading Building Blocks: Outfitting C++ for Multi-core Processor Parallelism. O’Reilly Media Inc, Sebastopol (2007)
Saraswat, V.A., Sarkar, V., von Praun, C.: X10: concurrent programming for modern architectures. In: Proceedings of 12th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pp. 271. ACM (2007)
Stephens, R.: A survey of stream processing. Acta Inform. 34, 491–541 (1997)
Thies, W., Amarasinghe, S.P.: An empirical characterization of stream programs and its implications for language and compiler design. In: Knoop, J., Salapura, V., Gschwind, M. (eds.) 19th International Conference on Parallel Architecture and Compilation Techniques, pp. 365–376. ACM (2010)
Thies, W., Karczmarek, M., Amarasinghe, S.: Streamit: A language for streaming applications. In: Compiler Construction, pp. 179–196. Springer (2002)
Valiant, L.G.: A bridging model for parallel computation. CACM 33(8), 103–111 (1990)
Valiant, L.G.: A bridging model for multi-core computing. In: Algorithms-ESA 2008, pp. 13–28. Springer (2008)
Walker, E.F., Floyd, R., Neves, P.: Asynchronous remote operation execution in distributed systems. In: Proceedings of 10th International Conference on Distributed Computing Systems, pp. 253–259 (1990)
Yovine, S., Assayad, I., Defaut, F.-X., Zanconi, M., Basu, A.: A formal approach to derivation of concurrent implementations in software product lines. In: Alexander, M., Gardner, W. (eds.) Algebra for Parallel and Distributed Processing, Chapter 11, pp. 359–401. Chapman and Hall, CRC Press, Boca Raton (2008)
Zandifar, M., Jabbar, M.A., Majidi, A., Keyes, D., Amato, N.M., Rauchwerger, L.: Composing algorithmic skeletons to express high-performance scientific applications. In: Proceedings of 29th ACM on International Conference on Supercomputing, pp. 415–424. ACM (2015)
Zandifar, M., Thomas, N., Amato, N.M., Rauchwerger, L.: The STAPL Skeleton Framework. Springer, Cham (2015)
Acknowledgements
Partially funded by LIA INFINIS (CNRS, Université Paris Diderot, CONICET, Universidad de Buenos Aires), PEDECIBA and SNI. Thanks to CSC-CONICET for granting use of cluster TUPAC.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Pérez, G., Yovine, S. Formal specification and implementation of an automated pattern-based parallel-code generation framework. Int J Softw Tools Technol Transfer 21, 183–202 (2019). https://doi.org/10.1007/s10009-017-0465-2
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10009-017-0465-2