Generating efficient data movement code for heterogeneous architectures with distributed-memory | IEEE Conference Publication | IEEE Xplore