1 Introduction

When it comes to applying deep learning in the financial sector, many people may immediately think of the prediction of stock price movements. This is indeed a high-profile research direction, but the practical application of deep learning in the financial sector is far more extensive.

The ability of the sustainable development and profit are important for investors. Tam and Kiang [1] applied artificial neural networks to predict bank bankruptcy. They used a back propagation (BP) neural network with single hidden layer to predict bankruptcy for Texas banks between 1985 and 1987. The results showed that such a simple neural network is more accurate and stable than multivariate discriminant analysis model, logistic model, K-nearest neighbors (KNN) model, and decision tree model.

Hutchinson et al. [2] also used ANN for analyzing option prices and hedging S&P 500 futures options from 1987 to 1991. It is verified the result is better than the Black-Scholes-Merton option pricing model, which is the classic option pricing model in the financial studies.

Due to the limitations of the linear regressions, the exchange rate has been considered unpredictable in the past period of time, but the ability of the neural network to describe the nonlinear relationship opens up a new way to predict the exchange rate. Lee and Chen [3] used RNN to predict the trend of the five currencies against the US dollar exchange rate, and demonstrated significant market synchronization ability in the forecast of the yen against the US dollar and the British pound against the US dollar. It reveals the potential of neural networks to describe nonlinear relationships.

In 2006, Hinton [4] published a paper in Science that proposed the Deep Belief Network (DBN), breaking the bottleneck of gradient disappearance in multilayer network training, and letting the concept of deep learning enter people’s field of vision. The improvement of computer hardware performance also makes the training of multilayer networks more efficient. Therefore, more and more researchers use deep learning methods to solve the complex problems that shallow neural networks are invalid, which includes the problems of financial analysis.

Heaton et al. [5] gave a brief overview of the development of deep learning in the finance, summarizing the three models of deep learning, namely Autoencoder, Rectified Neural Networks, and Long Short-Term Memory (LTSM). They also introduced two ways to avoid over-fitting, namely regularization and dropout. They then used the Autoencoder model as an example, and selected a small number of stocks to build two training sets (One training set is the 10 stocks most similar to the S&P500 trend, another set based on the former, adding 10 stocks that are the least similar to the S&P500 trend), trained the S&P500 stock index between 2014 and 2015. The results showed that as long as the samples are diverse enough, a small number of samples can approximate the real value with almost any precision by deep learning.

Nowadays, countless researchers have devoted a lot of energy to study deep neural networks, and have harvested many excellent achievements. These studies allow us to stand on the shoulders of giants, so we have a strong basis on financial analysis using deep learning.

2 Related Research

Deep learning has developed to a certain scale, with a knowledge system containing many models and optimization methods. Deep Belief Network (DBN) is a classic deep learning model proposed by Hinton [4]. It is based on the Restricted Boltzmann Machine (RBM), and the entire network can be regarded as a stack of several RBMs. In DBN, the layer-by-layer unsupervised training method is used to gradually optimize the parameters of each RBM structure, which is called “pre-training”. After pre-training, the parameters close to the optimal solution can be obtained, and the problem of gradient disappearance when training multilayer neural networks is solved to some extent. Then, the whole neural network is trained by BP algorithm, which is called “fine tuning”. The DBM (Deep Boltzmann machine) model is another RBM-based model proposed by Salakhutdinov and Hinton [6], which has been proven to perform well in handwritten digit recognition and object recognition. The difference between DBM and RBM are the multiple hidden layers (RBM has only one hidden layer), and the connection between all adjacent layers is undirected (in the DBM structure, only the last two layers are undirected connections).

Convolutional Neural Network (CNN) is a forward artificial neural network model proposed by Yann LeCun et al. in 1998. Its typical network structure includes convolutional layer, pooled layer, and fully connected layer. It has been widely used in computer vision. The operation process is that we give a picture (a training sample) as an input, and sequentially scan the input picture pixel through multiple convolution operators. The scan result is activated by the activation function to obtain the feature map. Then we use the pooling operator to get the downsample of the feature map and output the result as the input to the next layer. After all the convolution and pooling layers, we use the fully connected neural network for further operations. Finally, the result is given by the output layer.

Recurrent Neural Network (RNN) is a type of neural network efficient for processing time series data. In the traditional neural network, the signals of neurons in each layer can only be propagated to the next layer, and the data processing is independent at each moment. However, the RNN memorizes the previous information and applies it to current data processing. That is, the hidden layer not only receives the output from upper layer, but also receives the output made by itself from the previous time. Saad et al. [7] applied Time-Delay Neural Network, Probabilistic Neural Network, and RNN models to predict the closing price of the stock. The experimental results show that all three models have predictive abilities. However, the gradient disappearance occurred on the RNN, which means that the gradient generated at a certain moment disappears after several propagations on the time axis, and the long-range information cannot be effectively memorized. Long Short-Term Memory (LSTM) is developed to solve this weakness of RNN. LSTM introduces the concept of cell state, which allows add or forget information, and also form a closed loop through the gates, thereby overcoming the weakness that RNN cannot effectively memorize long-range information. Chen et al. [8] tried to use LSTM to predict stock returns in China’s stock market. Experiments show that compared with the stochastic forecasting method, the LSTM model’s prediction accuracy for stock returns increased from 14.3 to 27.2%.

Autoencoder is a neural network that reproduces the input signal as much as possible. Its learning process is first to encode the input and then decode the reconstructed input as an output. This approach is often used for feature extraction, denoising, and so on. Autoencoder generally refers to a network model with only one hidden layer, while Deep Autoencoder is a network model with multiple hidden layers. The training of Deep Autoencoder is similar to DBM. Pre-training is performed between every two layers using RBM, and then the parameters are fine-tuned by the BP algorithm.

Because of the large number of layers in deep learning, the generated models are usually very complex. Over-fitting is prone to occur when the training sample set is not large enough. Therefore, some techniques are needed to control the training process, and to reduce or even prevent serious over-fitting. Commonly used techniques include early stopping, regularization, and drop out.

Early stopping is a method to prevent over-fitting by controlling the epochs of trainings. It stops iteration at the right timing to prevent over-fitting. Prechelt [9] made a deep study in this method. The regularization method refers to adding a regular term of the weights to the loss function, avoiding the overly complicated weights lead to an over-fitting model. The main idea of drop out is to randomly remove certain connections between the hidden layers in each training epoch to reduce the complexity of the model. The papers by Heaton et al. [5] introduce practical applications of regularization and drop out.

3 Case Study: Short-Term Prediction of the Shanghai Composite Index Using RNN

In this example, the RNN will be applied to make short-term prediction of the Shanghai Stock Index. The prediction method is to use the opening price, the highest price, the lowest price, the closing price and the trading volume of the previous 10 days to predict the highest price of the next day. The input characteristic vector is

$$ X = \left( {Opening\,price,highest\,price,lowest\,price,closing\,price,trading\,volume} \right) $$
(1)

Then the time series inputted is

$$ \left( {X_{1} , X_{2} ,X_{3} ,X_{4} ,X_{5} ,X_{6} ,X_{7} ,X_{8} ,X_{9} ,X_{10} } \right) $$
(2)

The expected output is

$$ Y_{11}^{*} = (highest\,price)_{t = 11} $$
(3)

3.1 Data Collection and Processing

First, we collected transaction data of the Shanghai Stock Exchange Index from January 4, 2005 to June 30, 2017 for a total of 3,034 trading days, including date, opening price, highest price, lowest price, closing price, and trading volume. The data from January 4, 2005 to December 31, 2015, a total of 2,671 transaction days were used as training sets. The data from January 4, 2016 to June 30, 2017, a total of 363 trading days were used as test sets.

As input, the transaction data of the previous 10 days are processed as a moving time series, and the highest price of the 11th day is taken as the expected output, namely:

$$ \left( {X_{1} , X_{2} ,X_{3} ,X_{4} ,X_{5} ,X_{6} ,X_{7} ,X_{8} ,X_{9} ,X_{10} } \right)\, {=>} \, \left( {{\text{Y}}_{11}^{*} } \right) $$
(4)
$$ \left( {X_{2} ,X_{3} ,X_{4} ,X_{5} ,X_{6} ,X_{7} ,X_{8} ,X_{9} ,X_{10} ,X_{11} } \right) \, {=>} \, \left( {{\text{Y}}_{12}^{*} } \right) $$
(5)

After building the moving time series, the scales of input data were normalized into [0, 1]. So far the data processing is completed and they were prepared to input to the model.

3.2 Model Structure and Parameters

The neural network has an RNN layer as its hidden layer, and this hidden layer contains 32 neurons, which are iteratively trained in 10 time steps. The hidden layer activation function is tanh. Because it is a regression analysis problem, the output layer does not perform nonlinear transformation, that is, the linear unit y = x is directly used. The mean square error (MSE) is selected as the loss function, namely:

$$ MSE = \frac{1}{n}\sum\limits_{i = 1}^{n} {\left( {y^{*} - y} \right)^{2} } $$
(6)

where n is the total number of training samples, y* is the expected output, and y is the label of the model, namely real highest price of the 11th day.

The optimization strategy was gradient descent, and the learning rate was set to 0.01. The early stopping technique was used to prevent over-fitting. When the value of loss function did not reduce during continuous 5 iterations, the training would stop, the preset iteration time is 500 epochs, and the actual iteration time is 407 epochs.

3.3 Analysis of Results

The comparison of the predicted curve of the model on the test set with the real curve is shown in Fig. 1.

Fig. 1
figure 1

The predicted curve of the highest price of the Shanghai Composite Index generated by RNN

It can be seen that the predicted curve is basically consistent with the actual curve in the trend, though it still has some local deviations. The daily error rate is used to further measure the accuracy of the prediction. The calculation formula is

$$ DER = \frac{{ABS\left( {y^{*} - y} \right)}}{y*}*100\% $$
(7)

The ABS is absolute value function, and the daily error rate of the 363 trading day predictions of the test set is shown in Table 1.

Table 1 Statistics on the daily error rate of Shanghai Composite Index

In the 363 trading days of the test set, the number of days with predicted daily error rates less than or equal to 1% were 224, accounting for 61.71%. The predicted daily error rates ranged from 1 to 2% were 95 days, accounting for 26.17%. In general, the generalization performance on the test set was good and the model can be considered to have certain practical value. We analyzed the 4 days with the highest predicted error rate and found that it appeared on January 5, 12, 27, and February 26, 2016. These 4 days’ Shanghai Composite Index has experienced a sharp decline compared to other days. According to the securities news, China’s stock market experienced a serious crash in 2015. So the circuit breaker was officially implemented on January 4, 2016. However, the circuit breaker was repeatedly triggered by the continuous falling of the stock price. This mechanism was implemented for only 4 days. Even so, in January 2016, it was the biggest drop in a single month in the past 8 years (last time is the global financial crisis in 2008). This explains the high deviations and shows that RNN’s prediction of stock index as a technical analysis method has the disadvantage of being insensitive to bad news in the market (Fig. 2).

Fig. 2
figure 2

Daily error rate histogram of predicted Shanghai Composite Index

4 Conclusion

This paper studies the applications of deep learning in finance and focuses on applying a recurrent neural network to stock price prediction. The experimental result suggests that in the context of big data, deep learning is very suitable for financial analysis and can achieve high performance. Therefore, it is very promising to apply deep learning to solve more financial problem.