Abstract
The present study proposes a novel method based on evolutionary and fuzzy approaches for unifying two-level perfect nested loops. In this method, the Shuffled Frog Leaping Algorithm (SFLA) is used for achieving optimal answers, and simultaneously, three critical factors are applied as an input in determining basic dependence vectors. The use of fuzzy logic versus fixed coefficients for these three factors has led to the creation of optimal results with high variability and has solved the problem regarding the existence of the main vectors. In addition, the present algorithm has been proposed for many input data so that it can be used in parallel compilers automatically and with low complexity. After implementing and evaluating the proposed method, it was found that, compared to other existing methods, the results achieved were very close to optimal, in the least time, and with the lowest Dependence Cone Size (DCS) and highest number of input vectors.
Similar content being viewed by others
Availability of data and materials
Not applicable.
Change history
25 February 2024
References 43 and 44 have been updated to the correct format.
References
Gunes OG, Sima UA (2010) Parallelization of an ant-based clustering approach. Kybernetes 39:656–677
Ying VA (2019) Scaling sequential code with hardware-software co-design for fine-grain speculative parallelization (Doctoral dissertation, Massachusetts Institute of Technology)
Maramzin A, Vasiladiotis C, Lozano RC, Cole M, Franke B (2019) It looks like you’re writing a parallel loop” a machine learning based parallelization assistant. In: AI-SEPS 2019—Proceedings of the 6th ACM SIGPLAN International Workshop on AI-Inspired and Empirical Methods for Software Engineering on Parallel Computing Systems, co-located with SPLASH 2019. New York, New York, USA: Association for Computing Machinery, Inc, pp. 1–10.
Arabnejad H, Bispo J, Cardoso JMP, Barbosa JG (2019) Source-to-source compilation targeting OpenMP-based automatic parallelization of C applications. J Supercomput 76:6753–6785
Liu H, Xu J, Ding L (2019) Coarse-grained automatic parallelization approach for branch nested loop. Int J Performability Eng 15:2871–2881.
Harel R, Mosseri I, Levin H, Alon L or, Rusanovsky M, Oren G (2020) Source-to-source parallelization compilers for scientific shared-memory multi-core and accelerated multiprocessing: analysis, pitfalls, enhancement and potential. Int J Parallel Program 48:1–31.
Iwasawa K (2010) Detecting method of parallelism from nested loops with loop carried data dependences. In: Proceedings—5th international multi-conference on Computing in the Global information technology, ICCGI 2010, pp 287–92.
Bakhtin VA, Krukov VA (2019) DVM-approach to the automation of the development of parallel programs for clusters. Program Comp Softw 45:121–132
Bondhugula U, Hartono A (2008) JR-P of the, 2008 undefined. Pluto: A practical and fully automatic polyhedral program optimization system. researchgate.net.
Bielecki W, Pałkowski M (2016) Tiling arbitrarily nested loops by means of the transitive. Int J Appl Math Comp Sci 26:919–39.
Palkowski M, Bielecki W (2018) Parallel tiled code generation with loop permutation within tiles. Comput Inform 36:1261–1282
Bielecki W, Skotnicki P (2019) Insight into tiles generated by means of a correction technique. J Supercomput 75:2665–2690.
Prema S, Nasre R, Jehadeesan R, Panigrahi BK (2019) A study on popular auto-parallelization frameworks. Concurr Comput 31:e5168.
Bielecki W, Poliwoda M (2021) Automatic Parallel Tiled Code Generation Based on Dependence Approximation. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Springer, Cham; 12942 LNCS, pp 260–75.
Abdollahi-Kalkhoran A, Lotfi S, Izadkhah H (2022) TEA-SEA: Tiling and scheduling of non-uniform two-level perfectly nested loops using an evolutionary approach. Expert Syst Appl 191:116152
Ding-Kai Chen, Torrellas J, Pen-Chung Yew (2002) An efficient algorithm for the run-time parallelization of DOACROSS loops. Institute of Electrical and Electronics Engineers (IEEE), pp 518–27.
Mahjoub S, Lotfi S (2011) The UTLEA: Uniformization of non-uniform iteration spaces in three-level perfect nested loops using an evolutionary algorithm. Communications in Computer and Information Science. In Interna. Berlin, Springer, Heidelberg.
Mahjoub S, Vojoudi H (2016) The UTFLA: uniformization of non-uniform iteration spaces in two-level perfect nested loops using SFLA. J Supercomp, 72.
Tzen TH, Ni LM (1993) Dependence uniformization: a loop parallelization technique. IEEE Trans Parallel Distrib Syst 4:547–558
Shang W, Hodzic E, Chen Z (1996) On uniformization of affine dependence algorithms. IEEE Trans Comput 45(7):827–840
Mahjoub S, Golsorkhtabaramiri, M., Salehi Amiri SS (2022) TLP: Towards three‐level loop parallelisation. IET Comput Digit; Tech., pp 1–13.
Parsa S, Lotfi S (2007) Wave-fronts parallelization and scheduling. In: Innovations’07: 4th International Conference on Innovations in Information Technology, IIT. IEEE Computer Society, pp 382–386.
Searles R, Chandrasekaran S, Joubert W, Hernandez O (2018) Abstractions and directives for adapting wavefront algorithms to future architectures. In: Proceedings of the Platform for Advanced Scientific Computing Conference, PASC 2018. New York, NY, USA: Association for Computing Machinery, Inc; pp 1–10.
Li Y, Schwiebert L (2020) Memory-optimized wavefront parallelism on GPUs. Int J Parallel Program, pp 1–24.
Tarhini AA (2013) Automatic loop parallelization (Doctoral dissertation)
Pean DL, Chen C (2001) ODCHP: A new effective mechanism to maximize parallelism of nested loops with non-uniform dependences. J Syst Softw 56:279–297.
Athanasaki M (2004) EK-12th EC, 2004 undefined. Scheduling of tiled nested loops onto a cluster with a fixed number of SMP nodes. ieeexplore.ieee.org.
Athanasaki M, Sotiropoulos A, Tsoukalas G, Koziris N, Tsanakas P (2005) Hyperplane grouping and pipelined schedules: How to execute tiled loops fast on clusters of SMPs. J Supercomput 33:197–226
Lee Y (2004) Software CC-J of S and, 2005 undefined. A two-level scheduling method: An effective parallelizing technique for uniform nested loops on a dsp multiprocessor. Elsevier, Amsterdam.
Baskaran MM, Vydyanathan N, Bondhugula UK, Ramanujam J, Rountev A, Sadayappan P (2009) Compiler-assisted dynamic scheduling for effective parallelization of loop nests on multicore processors. ACM SIGPLAN Notices. Association for Computing Machinery (ACM), vol 44, pp 219–228.
Beletska A, Bielecki W, Cohen A, Palkowski M, Siedlecki K (2011) Coarse-grained loop parallelization: Iteration Space Slicing vs affine transformations. Parallel Comput. North-Holland, pp 479–497.
Hajieskandar A, Lotfi S (2011) Using an evolutionary algorithm for scheduling of two-level nested loops. In: International conference on Information and Electronics Engineering, pp 100–105
Hajieskandar A, Lotfi S, Ghahramanian S (2012) Two level nested loops tiled iteration space scheduling by changing wave-front angles approach. Int J Ad Res Comp Commun Eng, pp 126–133.
Hajieskandar A, Sohafi-Bonab J, Ghahramanian S (2015) Using of cuckoo search algorithm and wave-fronts approach with changing angle for tiled iteration space scheduling of two-level nested loops. In: International conference on Advances in Software, Control and Mechanical Engineering, pp 1–9
Chen DK, Yew PC (1996) On effective execution of nonuniform DOACROSS loops. IEEE Trans Parallel Distrib Syst 7:463–476
Zaafrani A, Ito MR (1994) Parallel region execution of loops with irregular dependencies. In: Internatonal conference on Parallel Processing, vol 2. IEEE, pp 11–19
Ju J, Chaudhary V (1997) Unique sets oriented parallelization of loops with non-uniform dependences. Comput J 40:322–339
Cho CK, Lee MH.(1997) A loop parallelization method for nested loops with non-uniform dependences. In: Proceedings international conference on Parallel and Distributed Systems. IEEE, pp 314–321
Pean DL, Chen C (2001) An optimized three region partitioning technique to maximize parallelism of nested loops with non-uniform dependences. J Inf Sci Eng 17(3):463–489
Abdi Reyhan Z, Lotfi S, Isazadeh A, Karimpour J (2021) Intra-tile parallelization for two-level perfectly nested loops with non-uniform dependences. Comput J 64(9):1358–1383
Lotfi S, Parsa S (2009) Parallel loop generation and scheduling. J Supercomput 50:289–306
Eusuff M, Lansey K, Pasha F (2006) Shuffled frog-leaping algorithm: A memetic meta-heuristic for discrete optimization. Eng Optim 38(2):129–154.
Mortazavi A (2020) Large-scale structural optimization using a fuzzy reinforced swarm intelligence algorithm. Adv Eng Soft 142:102790.
Mortazavi A (2022) Interactive fuzzy Bayesian search algorithm: A new reinforced swarm intelligence tested on engineering and mathematical optimization problems. Expert Syst Appl 187:115954.
Cheng MY, Prayogo D (2017) A novel fuzzy adaptive teaching–learning-based optimization (FATLBO) for solving structural optimization problems. Eng Comput 33:55–69.
Funding
No funding.
Author information
Authors and Affiliations
Contributions
SM, MG*, SSSA wrote the main manuscript text. SM prepared figures. The Corresponding author of this manuscript is MG. All authors reviewed the manuscript.
Corresponding author
Ethics declarations
Conflict of interest
We declare that the authors have no competing interests as defined by Springer, or other interests that might be perceived to influence the results and/or discussion reported in this paper.
Ethical Approval
Not applicable.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Mahjoub, S., Golsorkhtabaramiri, M. & Amiri, S.S.S. Optimal uniformization for non-uniform two-level loops using a hybrid method. J Supercomput 79, 12791–12814 (2023). https://doi.org/10.1007/s11227-023-05194-3
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-023-05194-3