Hardware thread reordering to boost OpenCL throughput on FPGAs | IEEE Conference Publication | IEEE Xplore