Abstract:
Long Short Term Memory (LSTM) networks as one of the most used Recurrent Neural Networks (RNN) structures offer high accuracy for sequence learning tasks. However, it is ...Show MoreMetadata
Abstract:
Long Short Term Memory (LSTM) networks as one of the most used Recurrent Neural Networks (RNN) structures offer high accuracy for sequence learning tasks. However, it is challenging to offer low latency and high throughput while satisfying the low power constraints at the same time for computationally expensive LSTM operations. This work offers a two-pronged approach to accelerate inference in RNN networks. First, linear quantization technique is applied to reduce the complexity of operations, power consumption and required memory resources. Then, a new activation implementation method is proposed, called lookupx, to accelerate sigmoid function computation during inference. It is shown that lowering precision to 4-bit integer numbers for inputs causes only 2% accuracy loss and the lookupx activation methodology has 1.9x better performance and 50x lower power consumption while decreasing the required chip area 1.2x compared to integer domain activation functions with the same accuracy result.
Date of Conference: 04-07 December 2023
Date Added to IEEE Xplore: 10 January 2024
ISBN Information: