Abstract:
Tools that provide optimization hints for program developers are facing severe obstacles and often unable to provide meaningful guidance on how to parallelize real-life a...Show MoreMetadata
Abstract:
Tools that provide optimization hints for program developers are facing severe obstacles and often unable to provide meaningful guidance on how to parallelize real-life applications. The main reason is due to the high code complexity and its large size when considering commercially valuable code. Such code is often rich with pointers, heavily nested conditional statements, nested while–based loops, function calls, etc. These constructs prevent existing compiler analysis from extracting the full parallelization potential. We propose a new paradigm to overcome this issue by automatically transforming the code’ into a much simpler skeleton-like form that is more conductive for auto-parallelization. We then apply existing tools of source–level automatic parallelization on the skeletonized code in order to expose possible parallelization patterns. The skeleton code, along with the parallelized version, are then provided to the programmer in the form of an Integrated Development Environment (IDE) recommendation. The proposed skeletonization algorithm replaces pointers by integer indexes and C-struct references by references to multi-dimensional arrays. For example, the loop while(p\ne NULL)\lbrace p\rightarrow val++; p=p\rightarrow next;\rbrace will be skeletonized to: for(Ip=0;Ip< N;Ip++)\lbrace Aval[Ip]++{;} \rbrace where Aval[] holds the embedding of the original list. Consequently, the main goal of the skeletonization process is to embed pointer-based data structures into arrays. Though the skeletonized code is not semantically equivalent to the original code, it suggests a possible parallelization pattern for the selected code segment and can be used as an effective parallelization hint to the programmer. We applied the method on the SPEC CPU benchmarks and the skeletonization process detected 27 percent additional loops that can be parallelized/vectorized on top of the compiler auto-parallelizer/vectorizer. A performance gain of up to 45 percent was measure...
Published in: IEEE Transactions on Parallel and Distributed Systems ( Volume: 26, Issue: 11, 01 November 2015)