CLAM: Compiler Leasing of Accelerator Memory

Chen, Dong; Ding, Chen; Patru, Dorin

doi:10.1007/978-3-030-72789-5_7

Dong Chen¹⁰,
Chen Ding¹¹ &
Dorin Patru¹²

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 11998))

Included in the following conference series:

International Workshop on Languages and Compilers for Parallel Computing

311 Accesses

Abstract

With Moore’s Law ending and general-purpose processor speed plateauing, there are increasing interests and adoption of accelerators designed and built with FPGAs or SOCs. It is challenging to program the local memory of an accelerator. Past solutions are either based on scratchpad memory, which is entirely compiler managed, or on cache, which admits no direct program control.

This paper proposes a new approach similar to memory allocation where a program treats local memory as a heap and controls its allocation, while the hardware manages the remaining operations, e.g. data fetch and placement. The position paper describes this collaborative solution, discusses open research questions, and presents some preliminary results.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
The exact algorithm of CARL, its optimality in assigning leases, and a set of results (more on this later) were described in a different document under submission. Here we address the remaining problems of CARL.
2.
Due to page limit, we only show 6 representative benchmarks instead of all 30 benchmarks (adi, cholesky, covariance, gemm, jacobi_1d, symm). These 6 are selected based on the shape of the curves. The other 24 have similar curves to one of the showed benchmarks. The calculation for cache space cost of CARL assumes the assigned lease for each reference only apply to its data blocks have reuses.

References

Beyls, K., D’Hollander, E.H.: Generating cache hints for improved program efficiency. J. Syst. Archit. 51(4), 223–250 (2005)
Article Google Scholar
Bondhugula, U., Hartono, A., Ramanujam, J., Sadayappan, P.: A practical automatic polyhedral parallelizer and locality optimizer. In: Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation, pp. 101–113 (2008)
Google Scholar
Cascaval, C., Padua, D.A.: Estimating cache misses and locality using stack distances. In: Proceedings of the International Conference on Supercomputing, pp. 150–159 (2003)
Google Scholar
Chen, D., Liu, F., Ding, C., Pai, S.: Locality analysis through static parallel sampling. In: Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation. pp. 557–570 (2018), http://doi.acm.org/10.1145/3192366.3192402
Denning, P.J.: Virtual memory. ACM Comput. Surv. 2(3), 153–189 (1970). https://doi.acm.org/10.1145/356571.356573
Ghosh, S., Martonosi, M., Malik, S.: Cache miss equations: an analytical representation of cache misses. In: Proceedings of the 11th International Conference on Supercomputing, pp. 317–324. ACM (1997)
Google Scholar
Hu, X., Wang, X., Zhou, L., Luo, Y., Ding, C., Wang, Z.: Kinetic modeling of data eviction in cache. In: Proceedings of USENIX Annual Technical Conference, pp. 351–364 (2016). https://www.usenix.org/conference/atc16/technical-sessions/presentation/hu
Kennedy, K., McKinley, K.S.: Optimizing for parallelism and data locality. In: ACM International Conference on Supercomputing 25th Anniversary Volume, pp. 151–162. ACM (1992)
Google Scholar
Li, P., Pronovost, C., Wilson, W., Tait, B., Zhou, J., Ding, C., Criswell, J.: Beating OPT with statistical clairvoyance and variable size caching. In: Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems, pp. 243–256 (2019). https://doi.org/10.1145/3297858.3304067
Pouchet, L.N., Yuki, T.: Polybench/c 4.2.1. https://sourceforge.net/projects/polybench/files/ (2016)
Wang, Z., McKinley, K.S., L.Rosenberg, A., Weems, C.C.: Using the compiler to improve cache replacement decisions. In: Proceedings of the International Conference on Parallel Architecture and Compilation Techniques, Charlottesville, Virginia (2002)
Google Scholar
Wolf, M.E., Lam, M.S.: A data locality optimizing algorithm. In: Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation, pp. 30–44 (1991)
Google Scholar
Wu, M., Yeung, D.: Efficient reuse distance analysis of multicore scaling for loop-based parallel programs. ACM Trans. Comput. Syst. 31(1), 1 (2013). http://doi.acm.org/10.1145/2427631.2427632

Download references

Acknowledgments

The authors wish to thank Dr. Sreepathi Pai and Shawn Maag for the initial participation, the anonymous reviewers of LCPC and the workshop participants for the feedback. The financial support was provided in part by the National Science Foundation (Contract No. CNS-1909099, CCF-1717877).

Author information

Authors and Affiliations

Department of Computer Science, National University of Defense Technology, Changsha, China
Dong Chen
Department of Computer Science, University of Rochester, Rochester, NY, USA
Chen Ding
Department of Electrical Engineering, Rochester Institute of Technology, Rochester, NY, USA
Dorin Patru

Authors

Dong Chen
View author publications
You can also search for this author in PubMed Google Scholar
Chen Ding
View author publications
You can also search for this author in PubMed Google Scholar
Dorin Patru
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Georgia Institute of Technology, Atlanta, GA, USA
Santosh Pande
Georgia Institute of Technology, Atlanta, GA, USA
Vivek Sarkar

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chen, D., Ding, C., Patru, D. (2021). CLAM: Compiler Leasing of Accelerator Memory. In: Pande, S., Sarkar, V. (eds) Languages and Compilers for Parallel Computing. LCPC 2019. Lecture Notes in Computer Science(), vol 11998. Springer, Cham. https://doi.org/10.1007/978-3-030-72789-5_7

Download citation

DOI: https://doi.org/10.1007/978-3-030-72789-5_7
Published: 26 March 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-72788-8
Online ISBN: 978-3-030-72789-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics