Abstract:
Multimedia and DSP applications have been widely present in embedded devices. Due to their intrinsic nature, such application domains are benefited from Data Level Parall...Show MoreMetadata
Abstract:
Multimedia and DSP applications have been widely present in embedded devices. Due to their intrinsic nature, such application domains are benefited from Data Level Parallelism (DLP) exploitation, which is mostly employed in current embedded platforms by using vectorization techniques extending the underlying ISA. However, such strategy relies on specific library which affects software productivity and compiler support, such as ARM auto-vectorization approach, which breaks binary compatibility. This work proposes a transparent Dynamic SIMD Assembler (DSA) that is capable of detecting vectorizable code regions at runtime without requiring specific library or compilers. As a case study, we coupled the DSA to a 128-bit wide ARM NEON Engine. Results show that the proposed approach shows performance improvements of 31% over the original execution (without DLP exploitation). In addition, Dynamic SIMD Assembler, besides keeping binary compatibility, outperforms ARM auto-vectorization technique in 6%.
Date of Conference: 27-31 August 2018
Date Added to IEEE Xplore: 15 November 2018
ISBN Information: