Optimizing loop performance for clustered VLIW architectures | IEEE Conference Publication | IEEE Xplore