Skip to main content

Optimizations for Very Long and Sparse Vector Operations on a RISC-V VPU: A Work-in-Progress

  • Conference paper
  • First Online:
High Performance Computing (ISC High Performance 2023)

Abstract

A substantial scope to vectorize the present-day workloads in scientific computations and machine learning have highlighted Vector Processing Unit (VPU) as a target accelerator in high performance computing systems. The performance of sparse vector operations in these systems is generally limited by memory throughput due to a small fraction of non-zeros in the operand vectors. Beyond the conventional methods used to improve the memory throughput, this work considers an approach of supporting sparse very long vector operations to improve memory-level parallelism. This comes with a need to efficiently handle these sparse long vector operations on vector engines, to improve performance as well as to save energy. This paper presents enhancements to a RISC-V VPU to achieve this and a supporting infrastructure around the VPU in a manycore system. This work-in-progress paper discusses the current results on the enhanced VPU with pointers to the planned modifications.

The work has received funding from the European High-Performance Computing Joint Undertaking (JU) under grant agreement No. 946002. The JU receives support from the European Union’s Horizon 2020 research and innovation programme and Spain, Croatia, Turkey.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. open vector interface spec. https://github.com/semidynamics/OpenVectorInterface/blob/master/open_vector_interface_spec.pdf. Accessed 19 June 2023

  2. Risc-v “v” vector extension. https://github.com/riscv/riscv-v-spec/blob/0.7.1/v-spec.adoc. Accessed 19 June 2023

  3. Risc-v vectorized benchmark suite. https://github.com/RALC88/riscv-vectorized-benchmark-suite. Accessed 19 June 2023

  4. Asanovic, K., Wawrzynek, J.: Vector microprocessors. Ph.D. thesis (1998). aAI9901978

    Google Scholar 

  5. Balkind, J., et al.: OpenPiton: an open source manycore research framework. In: Proceedings of the Twenty-First International Conference on Architectural Support for Programming Languages and Operating Systems (2016)

    Google Scholar 

  6. Ceze, L., Tuck, J., Torrellas, J.: Are we ready for high memory-level parallelism (2006)

    Google Scholar 

  7. Cristal, A., Ortega, D., Llosa, J., Valero, M.: Kilo-instruction processors. In: Veidenbaum, A., Joe, K., Amano, H., Aiso, H. (eds.) ISHPC 2003. LNCS, vol. 2858, pp. 10–25. Springer, Heidelberg (2003). https://doi.org/10.1007/978-3-540-39707-6_2

    Chapter  Google Scholar 

  8. Fell, A., et al.: The Marenostrum experimental exascale platform (MEEP). Supercomput. Front. Innov. 8(1), 62–81 (2021). https://doi.org/10.14529/jsfi210105. https://superfri.org/index.php/superfri/article/view/369

  9. Gale, T.: The future of sparsity in deep neural networks (2020). https://www.sigarch.org/the-future-of-sparsity-in-deep-neural-networks/

  10. Kurth, A., et al.: An open-source platform for high-performance non-coherent on-chip communication. CoRR abs/2009.05334 (2020). https://arxiv.org/abs/2009.05334

  11. McKee, S.A., Wisniewski, R.W.: Memory wall. In: Padua, D. (ed.) Encyclopedia of Parallel Computing, pp. 1110–1116. Springer, Boston (2011). https://doi.org/10.1007/978-0-387-09766-4_234

    Chapter  Google Scholar 

  12. Minervini, F., et al.: Vitruvius+: an area-efficient RISC-V decoupled vector coprocessor for high performance computing applications. ACM Trans. Archit. Code Optim. 20, 1–25 (2022). https://doi.org/10.1145/3575861

    Article  Google Scholar 

  13. Monemi, A., Tang, J., Palesi, M., Marsono, M.N.: ProNoC: a low latency network-on-chip based many-core system-on-chip prototyping platform. Microprocess. Microsyst. 54, 60–74 (2017). https://doi.org/10.1016/j.micpro.2017.08.007

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Gopinath Mahale .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Mahale, G. et al. (2023). Optimizations for Very Long and Sparse Vector Operations on a RISC-V VPU: A Work-in-Progress. In: Bienz, A., Weiland, M., Baboulin, M., Kruse, C. (eds) High Performance Computing. ISC High Performance 2023. Lecture Notes in Computer Science, vol 13999. Springer, Cham. https://doi.org/10.1007/978-3-031-40843-4_35

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-40843-4_35

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-40842-7

  • Online ISBN: 978-3-031-40843-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics