Taming Single-Thread Program Performance on Many Distributed On-Chip L2 Caches | IEEE Conference Publication | IEEE Xplore