As a guest user you are not logged in or recognized by your IP address. You have
access to the Front Matter, Abstracts, Author Index, Subject Index and the full
text of Open Access publications.
Current hardware trends place increasing pressure on programmers and tools to optimize scientific code. Numerous tools and techniques exist, but no single tool is a panacea; instead, an assortment of performance tuning utilities are necessary to best utilize scarce resources (e.g., bandwidth, functional units, cache). This paper describes an optimization strategy combining static assembly analysis using the MAQAO tool with dynamic information from hardware performance monitoring (HPM) and memory traces. A new technique, decremental analysis (DECAN), is introduced to iteratively identify the individual instructions causing performance bottlenecks. We present a case study on an industrial application from Dassault-Aviation on a Xeon Core 2 platform. Our strategy helps discover and fix problems related to memory access locality and loop unrolling, which leads to a sequential and parallel speedup of up to 2.5.
This website uses cookies
We use cookies to provide you with the best possible experience. They also allow us to analyze user behavior in order to constantly improve the website for you. Info about the privacy policy of IOS Press.
This website uses cookies
We use cookies to provide you with the best possible experience. They also allow us to analyze user behavior in order to constantly improve the website for you. Info about the privacy policy of IOS Press.