In recurrent neural networks such as the long short-term memory (LSTM), the sigmoid and hyperbolic tangent functions are commonly used as activation functions in the network units. Other activation functions developed for the neural networks are not thoroughly analyzed in LSTMs. While many researchers have adopted LSTM networks for classification tasks, no comprehensive study is available on the choice of activation functions for the gates in these networks. In this paper, we compare 23 different kinds of activation functions in a basic LSTM network with a single hidden layer. Performance of different activation functions and different number of LSTM blocks in the hidden layer are analyzed for classification of records in the IMDB, Movie Review, and MNIST data sets. The quantitative results on all data sets demonstrate that the least average error is achieved with the Elliott activation function and its modifications. Specifically, this family of functions exhibits better results than the sigmoid activation function which is popular in LSTM networks.

