Towards Efficient Deep Neural Network Training by FPGA-Based Batch-Level Parallelism | IEEE Conference Publication | IEEE Xplore