Abstract
Bulk Synchronous Parallel (BSP) is a model for parallel computing with predictable scalability. BSP has a cost model: programs can be assigned a cost which describes their resource usage on any parallel machine. However, the programmer has to manually derive this cost. This paper describes an automatic method for the derivation of BSP program costs, based on classic cost analysis and approximation of polyhedral integer volumes. Our method requires and analyzes programs with textually aligned synchronization and textually aligned, polyhedral communication. We have implemented the analysis and our prototype obtains cost formulas that are parametric in the input parameters of the program and the parameters of the BSP computer and thus bound the cost of running the program with any input on any number of cores. We evaluate the cost formulas and find that they are indeed upper bounds, and tight for data-oblivious programs. Additionally, we evaluate their capacity to predict concrete run times in two parallel settings: a multi-core computer and a cluster. We find that when exact upper bounds can be found, they accurately predict run-times. In networks with full bisection bandwidth, as the BSP model supposes, results are promising with errors < 50%.
Similar content being viewed by others
Abbreviations
- \(n \in \mathbb {N}\) :
-
Non-negative integer
- \(S{} \in \mathbb {N}\) :
-
Number of supersteps
- \(\textsc {aexp}\) :
-
Syntactic group of arithmetic expressions
- \(\textsc {bexp}\) :
-
Syntactic group of Boolean expressions
- \(\textsc {cexp}\) :
-
Syntactic group of cost expressions
- \(\textsc {seq}\) :
-
Syntactic group of sequential programs
- \(\textsc {par}\) :
-
Syntactic group of parallel programs
- \(\mathcal {A}{{\llbracket {\cdot }\rrbracket }} : \textsc {aexp}\times {\varSigma }\rightarrow \mathbb {N}\) :
-
Semantics of arithmetic expressions
- \(\mathcal {B}{{\llbracket {\cdot }\rrbracket }} : \textsc {bexp}\times {\varSigma }\rightarrow \{{\texttt {tt}}, {\texttt {ff}}\}\) :
-
Semantics of Boolean expressions
- \(\mathcal {C}{{\llbracket {\cdot }\rrbracket }} : \textsc {cexp}\times {\varSigma }\rightarrow \mathbb {N}_\omega \) :
-
Semantics of cost expressions
- \(\mathbb {T}\) :
-
Termination states
- \(u\in \mathbb {U}= \{ \mathbf a , \mathbf b , ... \}\) :
-
Cost unit
- \(w\in \mathbb {W}= (\mathbb {N}\times \mathbb {U})^{*}\) :
-
Work trace
- \(\epsilon \) :
-
Empty sequence
- \(\mathbb {R}\) :
-
Communication requests
- \(Cost_{\textsc {seq}}: \mathbb {W}\rightarrow (\mathbb {U}\rightarrow \mathbb {N})\) :
-
The cost of a sequential execution
- \(Cost_{\textsc {par}}: \mathbb {W}^{p \times S{}} \times \mathbb {R}^{p \times S{}} \rightarrow (\mathbb {U}\rightarrow \mathbb {N})\) :
-
The cost of a parallel execution
- \(\textsc {sca}: \textsc {seq}\rightarrow (\mathbb {U}\rightarrow \textsc {cexp})\) :
-
Sequential cost analysis
- \(\omega \) :
-
The expression of unbounded cost
- :
-
Concatenation of sequences
- \((:) : A^p \times A^{p \times S{}} \rightarrow A^{p \times (S{}+1)}\) :
-
Left concatenation of vector to matrix
- \({\textit{Comm}}: {\varSigma }^p \times \mathbb {R}\times {\varSigma }^p\) :
-
Implementation specific communication relation
- \(\sigma \in {\varSigma }= \mathbb {X}\rightarrow \mathbb {N}\) :
-
Environment
- \(\sigma ^{i,p} \in {\varSigma }\) :
-
Process local environment
- :
-
Sequential composition of a set of statements
References
Albert, E., Arenas, P., Correas, J., Genaim, S., Gómez-Zamalloa, M., Martin-Martin, E., Puebla, G., Román-Díez, G.: Resource analysis: from sequential to concurrent and distributed programs. In: Proceedings on FM 2015: Formal Methods: 20th International Symposium, Oslo, Norway, 24–26 June 2015, pp. 3–17. Springer (2015). https://doi.org/10.1007/978-3-319-19249-9_1
Albert, E., Arenas, P., Genaim, S., Puebla, G.: Closed-form upper bounds in static cost analysis. J. Autom. Reason. 46(2), 161–203 (2011)
Albert, E., Arenas, P., Genaim, S., Puebla, G., Zanardini, D.: Cost analysis of Java bytecode. In: European Symposium on Programming, pp. 157–172. Springer (2007)
Benabderrahmane, M.W., Pouchet, L.N., Cohen, A., Bastoul, C.: The polyhedral model is more widely applicable than you think. In: International Conference on Compiler Construction, pp. 283–303. Springer (2010)
Boulet, P., Redon, X.: Communication pre-evaluation in HPF. In: Proceedings of the 4th International Euro-Par Conference on Parallel Processing, Euro-Par ’98, pp. 263–272. Springer, London (1998)
Chatarasi, P., Shirako, J., Kong, M., Sarkar, V.: An extended polyhedral model for SPMD programs and its use in static data race detection. In: Ding, C., Criswell, J., Wu, P. (eds.) Languages and Compilers for Parallel Computing, pp. 106–120. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-52709-3_10
Clauss, P.: Counting solutions to linear and nonlinear constraints through Ehrhart polynomials: applications to analyze and transform scientific programs. In: Proceedings of the 10th International Conference on Supercomputing, ICS ’96, pp. 278–285. ACM, New York (1996). https://doi.org/10.1145/237578.237617
Cousot, P., Halbwachs, N.: Automatic discovery of linear restraints among variables of a program. In: Proceedings of the 5th ACM SIGACT-SIGPLAN Symposium on Principles of Programming Languages, pp. 84–96. ACM (1978)
Di Martino, B., Mazzeo, A., Mazzocca, N., Villano, U.: Parallel program analysis and restructuring by detection of point-to-point interaction patterns and their transformation into collective communication constructs. Sci. Comput. Program. 40(2–3), 235–263 (2001). https://doi.org/10.1016/S0167-6423(01)00017-X
Hayashi, Y., Cole, M.: Static performance prediction of skeletal parallel programs. Parallel Algorithms Appl. 17(1), 59–84 (2002)
Heine, F., Slowik, A.: Volume driven data distribution for NUMA-machines. In: Euro-Par 2000 Parallel Processing, pp. 415–424. Springer, Berlin (2000). https://doi.org/10.1007/3-540-44520-X_53
Hill, J.M.D., McColl, B., Stefanescu, D.C., Goudreau, M.W., Lang, K., Rao, S.B., Suel, T., Tsantilas, T., Bisseling, R.H.: BSPlib: the BSP programming library. Parallel Comput. 24(14), 1947–1980 (1998). https://doi.org/10.1016/S0167-8191(98)00093-3
Hoffmann, J., Shao, Z.: Automatic static cost analysis for parallel programs. In: European Symposium on Programming Languages and Systems, pp. 132–157. Springer (2015)
Jakobsson, A., Dabrowski, F., Bousdira, W., Loulergue, F., Hains, G.: Replicated synchronization for imperative BSP programs. In: International Conference on Computational Science (ICCS), Procedia Computer Science. Elsevier., Zürich (2017)
Jeannet, B., Miné, A.: Apron: a library of numerical abstract domains for static analysis. In: International Conference on Computer Aided Verification, pp. 661–667. Springer (2009)
Juurlink, B.H.H., Wijshoff, H.A.G.: A quantitative comparison of parallel computation models. In: Proceedings of the Eighth Annual ACM Symposium on Parallel Algorithms and Architectures, SPAA ’96, pp. 13–24. ACM, New York (1996). https://doi.org/10.1145/237502.241604
Lengauer, C.: Loop parallelization in the polytope model. In: International Conference on Concurrency Theory, pp. 398–416. Springer (1993)
Nielson, F., Nielson, H.R., Hankin, C.: Principles of Program Analysis. Springer, Berlin (2004)
Tesson, J., Loulergue, F.: Formal semantics of DRMA-style programming in BSPlib. In: Wyrzykowski, R., Dongarra, J., Karczewski, K., Wasniewski, J. (eds.) Parallel Processing and Applied Mathematics, vol. 4967, pp. 1122–1129. Springer, Berlin (2008)
Valiant, L.G.: A bridging model for parallel computation. Commun. ACM 33(8), 103–111 (1990). https://doi.org/10.1145/79173.79181
Verdoolaege, S.: isl: An integer set library for the polyhedral model. In: International Congress on Mathematical Software, pp. 299–302. Springer (2010)
Verdoolaege, S., Grosser, T.: Polyhedral extraction tool. In: Second International Workshop on Polyhedral Compilation Techniques (IMPACT’12), Paris (2012)
Verdoolaege, S., Seghir, R., Beyls, K., Loechner, V., Bruynooghe, M.: Counting integer points in parametric polytopes using Barvinok’s rational functions. Algorithmica 48(1), 37–66 (2007)
Wegbreit, B.: Mechanical program analysis. Commun. ACM 18(9), 528–539 (1975). https://doi.org/10.1145/361002.361016
Wilhelm, R., Engblom, J., Ermedahl, A., Holsti, N., Thesing, S., Whalley, D., Bernat, G., Ferdinand, C., Heckmann, R., Mitra, T., et al.: The worst-case execution-time problem—overview of methods and survey of tools. ACM Trans. Embedded Comput. Syst. (TECS) 7(3), 36 (2008)
Winskel, G.: The Formal Semantics of Programming Languages: An Introduction. MIT Press, Cambridge (1993)
Zimmermann, W.: Automatic Worst Case Complexity Analysis of Parallel Programs. International Computer Science Institute, California (1990)
Acknowledgements
I thank Wijnand Suijlen for his help with the evaluation and his comments on the article, as well as Frédéric Loulergue for his insightful comments that greatly improved this article. I also thank the anonymous reviewers for their remarks on earlier drafts.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Jakobsson, A. Automatic Cost Analysis for Imperative BSP Programs. Int J Parallel Prog 47, 184–212 (2019). https://doi.org/10.1007/s10766-018-0562-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10766-018-0562-1