Abstract
This paper presents a new approach to autotuning data-parallel programs. Autotuning is a search for optimal program settings, which maximize its performance. The novelty of the approach lies in the use of the model checking method to find the optimal tuning parameters by the method of counterexamples. In our work, we abstract from specific programs and specific processors by defining their representative abstract patterns. Our counterexample method implements the following four steps. At the first step, an execution model of an abstract program on an abstract processor is described in the language of a model checking tool. At the second step, in the language of the model checking tool, we formulate the optimality property that depends on the constructed model. At the third step, we find the optimal values of the tuning parameters by using counterexamples constructed during the verification of the optimality property. In the fourth step, we extract the information about the tuning parameters from the counterexample for the optimal parameters. We apply this approach to autotuning parallel programs written in OpenCL, a popular modern language that extends the C language for programming both standard multi-core processors (CPUs) and massively parallel graphics processing units (GPUs). As a verification tool we use the SPIN verifier and its model representation language Promela, which formal semantics is good for modeling the execution of parallel programs on processors with different architectures.
Notes
A multiprocessor with Fermi architecture includes two dispatchers and one of its warp can have 16 calculators or LSUs, or 4 SFUs.
REFERENCES
Ansel, J., Kamil, S., Veeramachaneni, K., Ragan-Kelley, J., Bosboom, J., O’Reilly, U.-M., and Amarasinghe, S., OpenTuner: An extensible framework for program autotuning, PACT ’14: Proc. 23rd Int. Conf. Parallel Architectures and Compilation, Edmonton, Canada, 2014, New York: Association for Computing Machinery, 2014, pp. 303–316. https://doi.org/10.1145/2628071.2628092
Beckingsale, D., Pearce, O., Laguna, I., and Gamblin, T., Apollo: Reusable models for fast, dynamic tuning of input-dependent code, IEEE Int. Parallel and Distributed Processing Symp. (IPDPS), Orlando, Fla., 2017, IEEE, 2017, pp. 307–316. https://doi.org/10.1109/IPDPS.2017.38
Chen, C., Chame, J., and Hall, M., CHiLL: A framework for composing high-level loop transformations, Technical Report 08-897, Los Angeles, 2008, pp. 136–150.
Christen, M., Schenk, O., and Burkhart, H., PATUS: A code generation and autotuning framework for parallel iterative stencil computations on modern microarchitectures, IEEE Int. Parallel & Distributed Processing Symp., Anchorage, Alaska, 2011, IEEE, 2011, pp. 676–687. https://doi.org/10.1109/IPDPS.2011.70
Whaley, R.C. and Dongarra, J.J., Automatically tuned linear algebra software, SC ’98: Proc. 1998 ACM/IEEE Conf. on Supercomputing, Orlando, Fla., 1998, IEEE, 1998, p. 38. https://doi.org/10.1109/SC.1998.10004
Frigo, M. and Johnson, S.G., The design and implementation of FFTW3, Proc. IEEE, 2005, vol. 93, no. 2, pp. 216–231. https://doi.org/10.1109/JPROC.2004.840301
Fursin, G., Kashnikov, Yu., Memon, A.W., Chamski, Z., Temam, O., Namolaru, M., Yom-Tov, E., Mendelson, B., Zaks, A., Courtois, E., Bodin, F., Barnard, P., Ashton, E., Bonilla, E., Thomson, J., Williams, C.K.I., and O’Boyle, M., Milepost GCC: Machine learning enabled self-tuning compiler, Int. J. Parallel Programming, 2011, vol. 39, pp. 296–327. https://doi.org/10.1007/s10766-010-0161-2
Nugteren, C. and Codreanu, V., CLTUne: A generic auto-tuner for OpenCL kernels, IEEE 9th Int. Symp. on Embedded Multicore/Many-Core Systems-on-Chip, Turin, Italy, 2015, IEEE, 2015, pp. 195–202. https://doi.org/10.1109/MCSoC.2015.10
Rasch, A. and Gorlatch, S., ATF: A generic, directive-based auto-tuning framework, Concurrency Comput.: Pract. Exper., 2018, vol. 31, no. 5, p. e4423. https://doi.org/10.1002/cpe.4423
Tapus, C., Chung, I.-H., and Hollingsworth, J.K., Active harmony: Towards automated performance tuning, SC ’02: Proc. 2002 ACM/IEEE Conf. on Supercomputing, Baltimore, Md., 2002, IEEE, 2002, p. 44. https://doi.org/10.1109/SC.2002.10062
Vuduc, R., Demmel, J.W., and Yelick, K.A., OSKI: A library of automatically tuned sparse matrix kernels, J. Phys.: Conf. Ser., 2005, vol. 16, p. 521. https://doi.org/10.1088/1742-6596/16/1/071
Clarke, E.M., Henzinger, T.A., and Veith, H., Introduction to model checking, Handbook of Model Checking, Clarke, E.M., Henzinger, T.A., Veith, H., and Bloem, R., Eds., Cham: Springer, 2018, pp. 1–26. https://doi.org/10.1007/978-3-319-10575-8_1
Ruys, T.C. and Brinksma, E., Experience with literate programming in the modelling and validation of systems, Tools and Algorithms for the Construction and Analysis of Systems. TACAS 1998, Steffen, B., Ed., Lecture Notes in Computer Science, vol. 1384, Berlin: Springer, 1998, pp. 393–408. https://doi.org/10.1007/BFb0054185
Ruys, T., Optimal scheduling using branch and bound with SPIN 4.0, Model Checking Software. SPIN 2003, Ball, T. and Rajamani, S.K., Eds., Lecture Notes in Computer Science, vol. 2648, Berlin: Springer, 2003, pp. 1–17. https://doi.org/10.1007/3-540-44829-2_1
Brinksma, E., Mader, A., and Fehnker, A., Verification and optimization of a PLC control schedule, Int. J. Software Tools Technol. Transfer, 2002, vol. 4, pp. 21–33. https://doi.org/10.1007/s10009-002-0079-0
Wijs, A., van de Pol, J., and Bortnik, E.M., Solving scheduling problems by untimed model checking: The clinical chemical analyser case study, FMICS ’05: Proc. 10th Int. Workshop on Formal Methods for Industrial Critical Systems, Lisbon, 2005, New York: Association for Computing Machinery, 2005, pp. 54–61. https://doi.org/10.1145/1081180.1081188
Malik, R. and Pena, P.N., Optimal task scheduling in a flexible manufacturing system using model checking, IFAC-PapersOnLine, 2018, vol. 51, no. 7, pp. 230–235.https://doi.org/10.1016/j.ifacol.2018.06.306
The OpenCL Specification, Khronos OpenCL Working Group, 2021.
Holzmann, G.J., The SPIN Model Checker: Primer and Reference Manual, Addison-Wesley Professional, 2003.
Hoare, C.A.R., Communicating Sequential Processes, Englewood Cliffs, N.J.: Prentice-Hall, 1985.
Gaspari, M. and Zavattaro, G., An algebra of actors, Formal Methods for Open Object-Based Distributed Systems. FMOODS 1999, Ciancarini, P., Fantechi, A., and Gorrieri, R., Eds., IFIP—The International Federation for Information Processing, vol. 10, Boston: Springer, 1999, pp. 3–18. https://doi.org/10.1007/978-0-387-35562-7_2
Cimatti, A., Edelkamp, S., Fox, M., Magazzeni, D., and Plaku, E., Automated planning and model checking, Dagstuhl Seminar 14482, Dagstuhl Reports, vol. 4, Schloss Dagstuhl–Leibniz Zentrum für Informatik, 2015. https://doi.org/10.4230/DagRep.4.11.227
Glaskowsky, P.N., NVIDIA’s Fermi: The first complete GPU computing architecture. NVIDIA Corporation, 2009.
CONFLICT OF INTEREST
The authors declare that they have no conflicts of interest.
Funding
This work was supported by the DAAD research scholarship of Dr. Natalia Garanina no. 91735805, by State assignment no. AAAA-A19-119120290056-0, and by the DFG project PPP-DL at the University of Muenster, Germany.
Author information
Authors and Affiliations
Corresponding authors
Additional information
Translated by F. Baron
About this article
Cite this article
Garanina, N.O., Gorlatch, S.P. Autotuning Parallel Programs by Model Checking. Aut. Control Comp. Sci. 56, 634–648 (2022). https://doi.org/10.3103/S0146411622070045
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.3103/S0146411622070045