Abstract
With Moore’s Law ending and general-purpose processor speed plateauing, there are increasing interests and adoption of accelerators designed and built with FPGAs or SOCs. It is challenging to program the local memory of an accelerator. Past solutions are either based on scratchpad memory, which is entirely compiler managed, or on cache, which admits no direct program control.
This paper proposes a new approach similar to memory allocation where a program treats local memory as a heap and controls its allocation, while the hardware manages the remaining operations, e.g. data fetch and placement. The position paper describes this collaborative solution, discusses open research questions, and presents some preliminary results.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
The exact algorithm of CARL, its optimality in assigning leases, and a set of results (more on this later) were described in a different document under submission. Here we address the remaining problems of CARL.
- 2.
Due to page limit, we only show 6 representative benchmarks instead of all 30 benchmarks (adi, cholesky, covariance, gemm, jacobi_1d, symm). These 6 are selected based on the shape of the curves. The other 24 have similar curves to one of the showed benchmarks. The calculation for cache space cost of CARL assumes the assigned lease for each reference only apply to its data blocks have reuses.
References
Beyls, K., D’Hollander, E.H.: Generating cache hints for improved program efficiency. J. Syst. Archit. 51(4), 223–250 (2005)
Bondhugula, U., Hartono, A., Ramanujam, J., Sadayappan, P.: A practical automatic polyhedral parallelizer and locality optimizer. In: Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation, pp. 101–113 (2008)
Cascaval, C., Padua, D.A.: Estimating cache misses and locality using stack distances. In: Proceedings of the International Conference on Supercomputing, pp. 150–159 (2003)
Chen, D., Liu, F., Ding, C., Pai, S.: Locality analysis through static parallel sampling. In: Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation. pp. 557–570 (2018), http://doi.acm.org/10.1145/3192366.3192402
Denning, P.J.: Virtual memory. ACM Comput. Surv. 2(3), 153–189 (1970). https://doi.acm.org/10.1145/356571.356573
Ghosh, S., Martonosi, M., Malik, S.: Cache miss equations: an analytical representation of cache misses. In: Proceedings of the 11th International Conference on Supercomputing, pp. 317–324. ACM (1997)
Hu, X., Wang, X., Zhou, L., Luo, Y., Ding, C., Wang, Z.: Kinetic modeling of data eviction in cache. In: Proceedings of USENIX Annual Technical Conference, pp. 351–364 (2016). https://www.usenix.org/conference/atc16/technical-sessions/presentation/hu
Kennedy, K., McKinley, K.S.: Optimizing for parallelism and data locality. In: ACM International Conference on Supercomputing 25th Anniversary Volume, pp. 151–162. ACM (1992)
Li, P., Pronovost, C., Wilson, W., Tait, B., Zhou, J., Ding, C., Criswell, J.: Beating OPT with statistical clairvoyance and variable size caching. In: Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems, pp. 243–256 (2019). https://doi.org/10.1145/3297858.3304067
Pouchet, L.N., Yuki, T.: Polybench/c 4.2.1. https://sourceforge.net/projects/polybench/files/ (2016)
Wang, Z., McKinley, K.S., L.Rosenberg, A., Weems, C.C.: Using the compiler to improve cache replacement decisions. In: Proceedings of the International Conference on Parallel Architecture and Compilation Techniques, Charlottesville, Virginia (2002)
Wolf, M.E., Lam, M.S.: A data locality optimizing algorithm. In: Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation, pp. 30–44 (1991)
Wu, M., Yeung, D.: Efficient reuse distance analysis of multicore scaling for loop-based parallel programs. ACM Trans. Comput. Syst. 31(1), 1 (2013). http://doi.acm.org/10.1145/2427631.2427632
Acknowledgments
The authors wish to thank Dr. Sreepathi Pai and Shawn Maag for the initial participation, the anonymous reviewers of LCPC and the workshop participants for the feedback. The financial support was provided in part by the National Science Foundation (Contract No. CNS-1909099, CCF-1717877).
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Chen, D., Ding, C., Patru, D. (2021). CLAM: Compiler Leasing of Accelerator Memory. In: Pande, S., Sarkar, V. (eds) Languages and Compilers for Parallel Computing. LCPC 2019. Lecture Notes in Computer Science(), vol 11998. Springer, Cham. https://doi.org/10.1007/978-3-030-72789-5_7
Download citation
DOI: https://doi.org/10.1007/978-3-030-72789-5_7
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-72788-8
Online ISBN: 978-3-030-72789-5
eBook Packages: Computer ScienceComputer Science (R0)