Floating-point fused multiply-add with reduced latency | IEEE Conference Publication | IEEE Xplore