Improving GPGPU Performance via Cache Locality Aware Thread Block Scheduling | IEEE Journals & Magazine | IEEE Xplore