Improving system latency of AI accelerator with on-chip pipelined activation preprocessing and multi-mode batch inference | IEEE Conference Publication | IEEE Xplore