research-article

Code vectorization using Intel Array Notation

Authors:
Olaf Krzikalla

ZIH, TU Dresden

ZIH, TU Dresden
View Profile

,
Georg Zitzlsberger

Intel Deutschland GmbH

Intel Deutschland GmbH
View Profile

WPMVP '16: Proceedings of the 3rd Workshop on Programming Models for SIMD/Vector ProcessingMarch 2016Article No.: 6Pages 1–8https://doi.org/10.1145/2870650.2870655

Published:13 March 2016Publication History

WPMVP '16: Proceedings of the 3rd Workshop on Programming Models for SIMD/Vector Processing

Pages 1–8

ABSTRACT

In this paper, we explain the steps we have taken to port a large, industry-grade computational fluid dynamics application to the Intel® Xeon Phi™coprocessor using the C/C++ Array Notation extensions of Intel® Cilk™Plus. An essential part of the performance refactoring process for the Xeon Phi coprocessor is to achieve high-quality SIMD-vectorization. Even though there are other ways to vectorize code, the Array Notation extensions has proven to work best for our application. We have encapsulated the Array Notation extension syntax in a C++ wrapper class to drastically reduce the refactoring effort. In addition the architecture independency of Array Notation extensions minimizes porting and tuning efforts further. In this paper, we study how our approach helps the compiler to generate vectorized code. Derived from that study, we summarize our key learnings and findings as well as current limitations. Finally, we present a performance evaluation of the ported computational fluid dynamics application by using the introduced C++ wrapper class and differentiate our solution to other related solutions.

References

Cilk Plus/LLVM. Website. Available online at http://cilkplus.github.io.Google Scholar
GCC 4.9 Release Series. Website. Available online at https://gcc.gnu.org/gcc-4.9/changes.html.Google Scholar
Intel Cilk Plus. Website. Available online at https://www.cilkplus.org.Google Scholar
Intel Developer Zone: Additional Predefined Macros. Website. Available online at https://software.intel.com/en-us/node/514528.Google Scholar
Intel Developer Zone: Data Alignment to Assist Vectorization. Website. Available online at https://software.intel.com/en-us/articles/data-alignment-to-assist-vectorization.Google Scholar
Intel Developer Zone: Extensions for Array Notation. Website. Available online at https://software.intel.com/de-de/node/522647.Google Scholar
Intel Developer Zone: Intel Math Kernel Library (Intel MKL). Website. Available online at https://software.intel.com/en-us/intel-mkl.Google Scholar
Intel Xeon Phi User's Group (IXPUG). Website. Available online at https://www.ixpug.org.Google Scholar
Intel ® Math Library. Website. Available online at https://software.intel.com/de-de/node/522652.Google Scholar
Introduction to the Intel ® SIMD Data Layout Templates (Intel ® SDLT). Website. Available online at https://software.intel.com/en-us/node/600110;.Google Scholar
N3396: Dynamic memory allocation for over-aligned data. Website. Available online at http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2012/n3396.htm.Google Scholar
Optimizing Memory Bandwidth on Stream Triad. Website. Available online at https://software.intel.com/en-us/articles/optimizing-memory-bandwidth-on-stream-triad.Google Scholar
TRACE. Website. Available online at http://www.dlr.de/sc/en/desktopdefault.aspx/tabid-5142/8655 read-3183.Google Scholar
The openmp api specification for parallel programming. Website, 2013. Available online at http://www.openmp.org/visited on Nov. 14th 2013.Google Scholar
Pierre Estérie, Joel Falcou, Mathias Gaunard, and Jean-Thierry Lapresté. Boost.simd: Generic programming for portable simdization. In Proceedings of the 2014 Workshop on Programming Models for SIMD/Vector Processing, WPMVP '14, pages 1--8, New York, NY, USA, 2014. ACM. Google ScholarDigital Library
A. Fog. VCL. C++ vector class library. Website, 2014. Available online at http://www.agner.org/optimize/#vectorclass.Google Scholar
Matthias Kretz and Volker Lindenstruth. Vc: A c++ library for explicit vectorization. Software: Practice and Experience, 42(11):1409--1430, 2012. Google ScholarDigital Library
Olaf Krzikalla, Kim Feldhoff, Ralph Müller-Pfefferkorn, and Wolfgang Nagel. Scout: A Source-to-Source Transformator for SIMD-Optimizations. In 4th Workshop on Productivity and Performance (PROPER 2011), Bordeaux, France, August 2011.Google Scholar
Olaf Krzikalla, Kim Feldhoff, Ralph Müller-Pfefferkorn, and Wolfgang Nagel. Auto-Vectorization Techniques for Modern SIMD Architectures. In 16th International Workshop on Compilers for Parallel Computing (CPC 2012), Padova, Italy, January 2012.Google Scholar
Roland Leißa, Sebastian Hack, and Ingo Wald. Extending a c-like language for portable simd programming. In Proceedings of the 17th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP '12, pages 65--74, New York, NY, USA, 2012. ACM. Google ScholarDigital Library
S. Maleki, Yaoqing Gao, M. J. Garzaran, T. Wong, and D. A. Padua. An evaluation of vectorizing compilers. In Parallel Architectures and Compilation Techniques (PACT), 2011 International Conference on, pages 372--382, 2011. Google ScholarDigital Library
M. Pharr and W. R. Mark. ispc: A spmd compiler for high-performance cpu programming. In Innovative Parallel Computing (InPar), 2012, pages 1--13, May 2012.Google ScholarCross Ref
Julien Sebot and Nathalie Drach-Temam. Memory bandwidth: The true bottleneck of simd multimedia performance on a superscalar processor. In Rizos Sakellariou, John Gurd, Len Freeman, and John Keane, editors, Euro-Par 2001 Parallel Processing, volume 2150 of Lecture Notes in Computer Science, pages 439--447. Springer Berlin Heidelberg, 2001. Google ScholarDigital Library
W. Sutherland. The viscosity of gases and molecular force. Philosoph. Mag. 5, 36:507--531, 1893.Google Scholar

Recommendations

Boundary element quadrature schemes for multi- and many-core architectures

In the paper we study the performance of the regularized boundary element quadrature routines implemented in the BEM4I library developed by the authors. Apart from the results obtained on the classical multi-core architecture represented by the Intel ...
Read More
Performance Evaluation and Improvements of the PoCL Open-Source OpenCL Implementation on Intel CPUs
IWOCL '21: Proceedings of the 9th International Workshop on OpenCL

The Portable Computing Language (PoCL) is a vendor independent open-source OpenCL implementation that aims to support a variety of compute devices in a single platform. Evaluating PoCL versus the Intel OpenCL implementation reveals significant ...
Read More
Rethinking SIMD Vectorization for In-Memory Databases
SIGMOD '15: Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data

Analytical databases are continuously adapting to the underlying hardware in order to saturate all sources of parallelism. At the same time, hardware evolves in multiple directions to explore different trade-offs. The MIC architecture, one such example, ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
WPMVP '16: Proceedings of the 3rd Workshop on Programming Models for SIMD/Vector Processing
March 2016
52 pages
ISBN:9781450340601
DOI:10.1145/2870650
Editors:
Jan Eitzinger
University of Erlangen-Nuremberg, Germany
,
Joel Falcou
LRI, Université Paris-Sud
,
Illie Gabriel Tanase
IBM Research
,
James Brodman
Intel
Copyright © 2016 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 13 March 2016
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Intel Xeon Phi coprocessor
Intel cilk plus Array Notation extension
SIMD
vectorization
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate20of30submissions,67%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 2
  Total Citations
  View Citations
- 173
  Total Downloads
- Downloads (Last 12 months)2
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Code vectorization using Intel Array Notation

WPMVP '16: Proceedings of the 3rd Workshop on Programming Models for SIMD/Vector Processing

ABSTRACT

References

Cited By

Recommendations

Boundary element quadrature schemes for multi- and many-core architectures

Performance Evaluation and Improvements of the PoCL Open-Source OpenCL Implementation on Intel CPUs

Rethinking SIMD Vectorization for In-Memory Databases

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Code vectorization using Intel Array Notation

WPMVP '16: Proceedings of the 3rd Workshop on Programming Models for SIMD/Vector Processing

ABSTRACT

References

Cited By

Recommendations

Boundary element quadrature schemes for multi- and many-core architectures

Performance Evaluation and Improvements of the PoCL Open-Source OpenCL Implementation on Intel CPUs

Rethinking SIMD Vectorization for In-Memory Databases

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media