SWPU: A 126.04 TFLOPS/W Edge-Device Sparse DNN Training Processor With Dynamic Sub-Structured Weight Pruning | IEEE Journals & Magazine | IEEE Xplore