skip to main content
10.1145/3303117.3306166acmconferencesArticle/Chapter ViewAbstractPublication PagesppoppConference Proceedingsconference-collections
research-article

Compiling Efficiently with Arithmetic Emulation for the Custom-Width Connex Vector Processor

Published:16 February 2019Publication History

ABSTRACT

Compiling from sequential C programs using LLVM for the wide Connex vector accelerator, a competitive customizable architecture for embedded applications with 32 to 4096 16-bit integer lanes, is challenging.

Our compiler targets Opincaa, a JIT assembler and coordination C++ library for Connex, which is able to run portable programs w.r.t. the vector width. For this to work, our back end needs to handle symbolic C/C++ expressions represented as adjacent inline assembly strings, which are used as scalar immediate operands in the vector code.

Also, our back end for Connex needs to lower code to emulate efficiently arithmetic operations for non-native types such as 32-bit integer and 16-bit floating point. To simplify the work of the compiler writer we conceive a method to code generate how we lower these operations inside LLVM's instruction selection pass.

We report speedup factors of up to 12.24 when running on a Connex processor with 128 lanes w.r.t. the dual-core ARM Cortex A9 clocked at a frequency 6.67 times higher, and an energy efficiency improvement average of 1.07 times. However, note that a Connex IC can achieve an order of magnitude more energy efficiency than our FPGA implementation.

References

  1. LLVM Documentation: TableGen, available at http://llvm.org/docs/TableGen/.Google ScholarGoogle Scholar
  2. Connex Opincaa LLVM compiler, http://gitlab.dcae.pub.ro/research/ConnexRelated/OpincaaLLVM.Google ScholarGoogle Scholar
  3. The Connex Opincaa library, http://gitlab.dcae.pub.ro/research/opincaa.Google ScholarGoogle Scholar
  4. S. G. Akl. The Design and Analysis of Parallel Algorithms. Prentice-Hall, 1989. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. ARM Manchester Design Center. Support for Scalable Vector Architectures in LLVM IR, 2016.Google ScholarGoogle Scholar
  6. K. Asanovic. Vector Microprocessors. PhD thesis, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. R. Auler, P. C. Centoducatte, et al. ACCGen: An Automatic ArchC Compiler Generator. ISCA-HPC '12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. C. Bîră, R. Hobincu, et al. Energy-Efficient Computation of L1 and L2 Norms on a FPGA SIMD Accelerator, with Applications to Visual Search. In CSCC'14.Google ScholarGoogle Scholar
  9. C. Bîră, L. Petrică, and R. Hobincu. OPINCAA: A Lightweight and Flexible Programming Environment For Parallel SIMD Accelerators. RJIST'13.Google ScholarGoogle Scholar
  10. R. L. Bocchino, Jr. and V. S. Adve. Vector LLVA: A Virtual Vector Instr. Set for Media Processing. VEE'06. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. D. Brooks and M. Martonosi. Dynamically Exploiting Narrow Width Operands to Improve Processor Power and Performance. HPCA'99. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Francesco Petrogalli. A Sneak Peek into SVE and VLA Programming, ARM White Paper, 2016.Google ScholarGoogle Scholar
  13. Gheorghe M. Ştefan. The Connex Instruction Set Architecture, 2015.Google ScholarGoogle Scholar
  14. T. Grosser and T. Hoefler. Polly-ACC Transparent Compilation to Heterogeneous Hardware. ICS'16. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. J. Hauser. SoftFloat, http://www.jhauser.us/arithmetic/SoftFloat.html.Google ScholarGoogle Scholar
  16. K. Karuri, R. Leupers, G. Ascheid, et al. Design and Implementation of a Modular and Portable IEEE 754 Compliant Floating-point Unit. DATE'06. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. C. Lattner and V. Adve. LLVM: A Compilation Framework for Lifelong Program Analysis and Transformation. CGO'04. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. B. C. Lopes and R. Auler. Getting Started with LLVM Core Libraries. Packt Publishing, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. M. Maliţa and G. M. Ştefan. Map-scan Node Accelerator for Big-data. In 2017 IEEE Big Data, pages 3524--3529.Google ScholarGoogle Scholar
  20. G. Mendonça et al. DawnCC: Automatic Annotation for Data Parallelism and Offloading. TACO'17.Google ScholarGoogle Scholar
  21. A. Munshi et al. OpenCL Programming Guide. 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. C. Nugteren et al. A Detailed GPU Cache Model Based on Reuse Distance Theory. In HPCA'14.Google ScholarGoogle Scholar
  23. D. Nuzman et al. Vapor SIMD: Auto-vectorize Once, Run Everywhere. CGO'11. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. M. Pandey and S. Sarda. LLVM Cookbook. Packt, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. G. Pokam et al. Speculative Software Management of Datapath-width for Energy Optimization. LCTES'04. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. D. B. Skillicorn and D. Talia. Models and Languages for Parallel Computation. ACM Comput. Surv., June 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. G. Ştefan, C. Bîră, R. Hobincu, and M. Maliţa. FPGA-Based Programmable Accelerator for Hybrid Processing, ROMJIST 2016.Google ScholarGoogle Scholar
  28. G. M. Ştefan and M. Maliţa. Can One-Chip Parallel Computing Be Liberated From Ad Hoc Solutions? In CSCC'14.Google ScholarGoogle Scholar
  29. N. Stephens, S. Biles, M. Boettcher, J. Eapen, et al. The ARM Scalable Vector Extension. IEEE Micro'17. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Y. Wu, J. Nunez-Yanez, R. Woods, and D. S. Nikolopoulos. Power Modelling and Capping for Heterogeneous ARM/FPGA SoCs. In FPT'14.Google ScholarGoogle Scholar
  1. Compiling Efficiently with Arithmetic Emulation for the Custom-Width Connex Vector Processor

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      WPMVP'19: Proceedings of the 5th Workshop on Programming Models for SIMD/Vector Processing
      February 2019
      35 pages
      ISBN:9781450362917
      DOI:10.1145/3303117

      Copyright © 2019 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 16 February 2019

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed limited

      Acceptance Rates

      Overall Acceptance Rate20of30submissions,67%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader