Scalable Low-Latency Persistent Neural Machine Translation on CPU Server with Multiple FPGAs

Scalable Low-Latency Persistent Neural Machine Translation on CPU Server with Multiple FPGAs | IEEE Conference Publication | IEEE Xplore