A meta extreme learning machine method for forecasting financial time series

Fernández, César; Salinas, Luis; Torres, Claudio E.

doi:10.1007/s10489-018-1282-3

A meta extreme learning machine method for forecasting financial time series

Published: 05 September 2018

Volume 49, pages 532–554, (2019)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

César Fernández^1,2,
Luis Salinas^1,2 &
Claudio E. Torres^1,2

1261 Accesses
15 Citations
Explore all metrics

Abstract

In the last decade, the problem of forecasting time series in very different fields has received increasing attention due to its many real-world applications. In particular, in the very challenging case of financial time series, the underlying phenomenon of stock time series exhibits complex behaviors, including non-stationary, non-linearity and non-trivial scaling properties. In the literature, a wide-used strategy to improve the forecasting capability is the combination of several models. However, the majority of the published researches in the field of financial time series use different machine learning models where only one type of predictor, either linear or nonlinear, is considered. In this paper we first measure relevant features present in the underlying process to propose a forecast method. We select the Sample Entropy and Hurst Exponent to characterize the behavior of stock time series. The characterization reveals the presence of moderate randomness, long-term memory and scaling properties. Thus, based on the measured properties, this paper proposes a novel one-step-ahead off-line meta-learning model, called μ-XNW, for the prediction of the next value x_t+1 of a financial time series $x_{t}$, t = 1, 2, 3, … , that integrates a naive or linear predictor (LP), for which the predicted value of $x_{t + 1}$ is just repeating the last value $x_{t}$, an extreme learning machine (ELM) and a discrete wavelet transform (DWT), both based on the nprevious values of $x_{t + 1}$. LP, ELM and DWT are the constituent of the proposed model μ-XNW. We evaluate the proposed model using four well-known performance measures and validated the usefulness of the model using six high-frequency stock time series belong to the technology sector. The experimental results validate that including internal estimators that are able to the capture the relevant features measured (randomness, long-term memory and scaling properties) successfully improve the accuracy of the forecasting over methods that do not include them.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Adaptive wavelet transform model for time series data prediction

Article 09 October 2019

Financial time series forecasting using optimized multistage wavelet regression approach

Article 22 April 2022

Towards an efficient machine learning model for financial time series forecasting

Article 10 June 2023

References

Abadi M, Agarwal A, Barham P, Brevdo E, Chen Z, Citro C, Corrado GS, Davis A, Dean J, Devin M, Ghemawat S, Goodfellow I, Harp A, Irving G, Isard M, Jia Y, Jozefowicz R, Kaiser L, Kudlur M, Levenberg J, Mané D, Monga R, Moore S, Murray D, Olah C, Schuster M, Shlens J, Steiner B, Sutskever I, Talwar K, Tucker P, Vanhoucke V, Vasudevan V, Viégas F, Vinyals O, Warden P, Wattenberg M, Wicke M, Yu Y, Zheng X (2015) TensorFlow: Large-scale machine learning on heterogeneous systems. https://www.tensorflow.org/. Software available from tensorflow.org
Adhikari R (2015) A neural network based linear ensemble framework for time series forecasting. Neurocomputing 157:231–242. https://doi.org/10.1016/j.neucom.2015.01.012
Article Google Scholar
Adhikari R, Agrawal RK (2014) A combination of artificial neural network and random walk models for financial time series forecasting. Neural Comput Applic 24(6):1441–1449. https://doi.org/10.1007/s00521-013-1386-y
Article Google Scholar
Aldridge I (2013) High-Frequency Trading: a practical guide to algorithmic strategies and trading systems. Wiley, Hoboken, NJ
Google Scholar
Arthur D, Vassilvitskii S (2007) k-means++: the advantages of careful seeding. In: SODA ’07: Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms, pp 1027–1035. Society for Industrial and Applied Mathematics, Philadelphia
Atsalakis GS, Valavanis KP (2009) Surveying stock market forecasting techniques - part ii: Soft computing methods. Expert Syst Appl 36(3):5932–5941
Article Google Scholar
Bahrammirzaee A (2010) A comparative survey of artificial intelligence applications in finance: Artificial neural networks, expert system and hybrid intelligent systems. Neural Comput Appl 19(8):1165–1195
Article Google Scholar
Bishop CM (1996) Neural networks for pattern recognition. Oxford University Press, USA
MATH Google Scholar
Blatter C (2013) Wavelets: Eine einführung (Advanced Lectures in Mathematics) (German Edition) Vieweg+Teubner Verlag
Bollerslev T (1986) Generalized autoregressive conditional heteroskedasticity. J Econ 31(3):307–327
Article MathSciNet MATH Google Scholar
Box GEP, Jenkins G (1970) Time series analysis, forecasting and control. Holden-Day, Incorporated
Box GEP, Jenkins G (1970) Time series analysis, forecasting and control. Holden-Day, Incorporated
Broomhead D, Lowe D (1988) Multivariable functional interpolation and adaptive networks. Complex Systems 2:321–355
MathSciNet MATH Google Scholar
Cavalcante RC, Brasileiro RC, Souza VL, Nobrega JP, Oliveira AL (2016) Computational intelligence and financial markets: a survey and future directions. Expert Syst Appl 55:194–211. https://doi.org/10.1016/j.eswa.2016.02.006
Article Google Scholar
Chollet F et al (2015). https://github.com/fchollet/keras
Cybenko G (1989) Approximation by superpositions of a sigmoidal function. Mathematics of Control Signals, and Systems 2:303–314
Article MathSciNet MATH Google Scholar
Dacorogna MM, Gencay R, Muller U, Olsen RB, Olsen OV (2001) An introduction to high frequency finance. Academic Press, New York
Google Scholar
Daubechies I (1992) Ten lectures on wavelets. Society for industrial and applied mathematics, Philadelphia
Book MATH Google Scholar
Doucoure B, Agbossou K, Cardenas A (2016) Time series prediction using artificial wavelet neural network and multi-resolution analysis: Application to wind speed data. Renew Energy 92:202–211. https://doi.org/10.1016/j.renene.2016.02.003
Article Google Scholar
Durbin M (2010) All About High-Frequency Trading (All About Series). McGraw-Hill, New York
Google Scholar
Elman JL (1990) Finding structure in time. Cogn Sci 14(2):179–211
Article Google Scholar
Engle RF (1982) Autoregressive conditional heteroscedasticity with estimates of the variance of United Kingdom inflation. Econometrica 50(4):987–1007
Article MathSciNet MATH Google Scholar
Fan J, Yao Q (2005) Nonlinear time series: Nonparametric and parametric methods (springer series in statistics). Springer
Gooijer JGD (2017) Elements of nonlinear time series analysis and forecasting (springer series in statistics). Springer
Gooijer JGD, Hyndman RJ (2006) 25 years of time series forecasting. Int J Forecast 22(3):443–473. https://doi.org/10.1016/j.ijforecast.2006.01.001
Article Google Scholar
Granger C, Andersen A (1978) An introduction to bilinear time series models gottingen
Guillaume DM, Dacorogna MM, Davé RR, Muller UA, Olsen RB, Pictet OV (1997) From the bird’s eye to the microscope: A survey of new stylized facts of the intra-daily foreign exchange markets. Finance Stochast 1:95–129
Article MATH Google Scholar
Hamilton JD (1989) A New Approach to the Economic Analysis of Nonstationary Time Series and the Business Cycle. Econometrica 57(2):357–384
Article MathSciNet MATH Google Scholar
Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780
Article Google Scholar
Hornik K (1991) Approximation capabilities of multilayer feedforward networks. Neural Netw 4(2):251–257
Article Google Scholar
Hornik K, Stinchcombe M, White H (1989) Multilayer feedforward networks are universal approximators. Neural Netw 2(5):359–366
Article MATH Google Scholar
Hornik K, Stinchcombe MB, White H (1990) Universal approximation of an unknown mapping and its derivatives using multilayer feedforward networks. Neural Netw 3(5):551–560
Article Google Scholar
Huang GB, Wang D, Lan Y (2011) Extreme learning machines: a survey. Int J Machine Learning & Cybernetics 2(2):107–122
Article Google Scholar
Huang GB, Zhu QY, Siew CK (2004) Extreme learning machine: a new learning scheme of feedforward neural networks. In: Proceedings of the 2004 IEEE international joint conference on Neural networks, 2004, vol 2, pp 985–990
Hurst H (1956) Methods of using long-term storage in reservoirs. ICE Proceedings 5:519–543
Google Scholar
Hyndman RJ, Khandakar Y (2008) Automatic time series forecasting: The forecast package for r. J Stat Softw 27(3):1–22
Article Google Scholar
Hyndman RJ, Koehler AB, Snyder RD, Grose S (2002) A state space framework for automatic forecasting using exponential smoothing methods. Int J Forecast 18(3):439–454. https://doi.org/10.1016/S0169-2070(01)00110-8
Article Google Scholar
In F, Kim S (2006) Multiscale hedge ratio between the australian stock and futures markets: Evidence from wavelet analysis. Journal of Multinational Financial Management 16(4):411–423. https://doi.org/10.1016/j.mulfin.2005.09.002
Article MathSciNet Google Scholar
Javed K, Gouriveau R, Zerhouni N (2014) Sw-elm: A summation wavelet extreme learning machine algorithm with a priori parameter initialization. Neurocomputing 123:299–307. https://doi.org/10.1016/j.neucom.2013.07.021. http://www.sciencedirect.com/science/article/pii/S0925231213007649. Contains Special issue articles: Advances in Pattern Recognition Applications and Methods
Article Google Scholar
Richman JS, Moorman JR (2000) Physiological time-series analysis using approximate entropy and sample entropy. American Physiological Society 278(6):H2039–H2049
Google Scholar
Richman JS, Lake DE, Moorman JR (2004) Sample entropy. Methods Enzymol 384:172–184. https://doi.org/10.1016/S0076-6879(04)84011-4. Numerical Computer Methods, Part E
Article Google Scholar
Kantz H, Schreiber T (2004) Nonlinear time series analysis. Cambridge University Press, Cambridge
MATH Google Scholar
Karuppiah J, Los CA (2005) Wavelet multiresolution analysis of high-frequency asian fx rates, summer 1997. International Review of Financial Analysis 14(2):211–246
Article Google Scholar
Lahmiri S (2014) Wavelet low- and high-frequency components as features for predicting stock prices with backpropagation neural networks. Journal of King Saud University - Computer and Information Sciences 26 (2):218–227. https://doi.org/10.1016/j.jksuci.2013.12.001
Article Google Scholar
Lai TL, Xing H (2008) Statistical models and methods for financial markets (springer texts in statistics). Springer
Li S, Goel L, Wang P (2016) An ensemble approach for short-term load forecasting by extreme learning machine. Appl Energy 170:22–29. https://doi.org/10.1016/j.apenergy.2016.02.114
Article Google Scholar
Liao S, Feng C (2014) Meta-elm: {ELM} with {ELM} hidden nodes. Neurocomputing 128:81–87
Article Google Scholar
Ma J, Li Y (2017) Gauss-jordan elimination method for computing all types of generalized inverses related to the 1-inverse. J Comput Appl Math 321:26–43. https://doi.org/10.1016/j.cam.2017.02.010
Article MathSciNet MATH Google Scholar
Makridakis S, Spiliotis E, Assimakopoulos V (2018) Statistical and machine learning forecasting methods: Concerns and ways forward. PLOS ONE 13(3):1–26. https://doi.org/10.1371/journal.pone.0194889
Article Google Scholar
Mallat S (1989) A theory for multiresolution signal decomposition: The wavelet representation. IEEE Transactions on Pattern Analysis and Machine Intelligence 11:674–693
Article MATH Google Scholar
Mallat S (1999) A Wavelet Tour of Signal Processing. Academic Press, San Diego
MATH Google Scholar
Mandelbrot BB, Wallis JR (1969) Robustness of the rescaled range r/s in the measurement of noncyclic long run statistical dependence. Water Resour Res 5(5):967–988. https://doi.org/10.1029/WR005i005p00967
Article Google Scholar
Mariano RS, kuen Tse Y (2008) Econometric forecasting and High-Frequency data analysis (lecture notes series, institute for mathematical sciences, national university of singapore). World Scientific Publishing Company
Montavon G, Orr G B, Müller KR (eds.) (2012) Neural Networks: Tricks of the Trade - Second Edition, Lecture Notes in Computer Science, vol. 7700 Springer
Müller UA, Dacorogna MM, Olsen RB, Pictet OV, Schwarz M, Morgenegg C (1990) Statistical study of foreign exchange rates, empirical evidence of a price change scaling law, and intraday analysis. J Bank Financ 14 (6):1189–1208
Article Google Scholar
Palit AK, Popovic D (2010) Computational intelligence in time series forecasting: Theory and engineering applications (advances in industrial control). Springer, Berlin
MATH Google Scholar
Percival DB, Walden AT (2006) Wavelet methods for time series analysis (cambridge series in statistical and probabilistic mathematics). Cambridge University Press, Cambridge
MATH Google Scholar
Pincus SM (1991) Approximate entropy as a measure of system complexity. Proc Natl Acad Sci 88(6):2297–2301. https://doi.org/10.1073/pnas.88.6.2297
Article MathSciNet MATH Google Scholar
Priestley MB (1980) State-Dependent Models: A general approach to non-linear time series analysis. J Time Series Anal 1(1):47–71
Article MathSciNet MATH Google Scholar
Python PWT (2017). https://pywavelets.readthedocs.io/en/latest/
Qiu T, Guo L, Chen G (2008) Scaling and memory effect in volatility return interval of the chinese stock market. Physica A: Statistical Mechanics and its Applications 387(27):6812– 6818
Article Google Scholar
Rao CR, Mitra SK (1972) Generalized inverse of matrices and its applications (probability & mathematical statistics). Wiley, New York
Google Scholar
Sauer T (2011) Numerical analysis, 2nd edn. Addison-Wesley Publishing Company, USA
Google Scholar
Shin Y, Ghosh J (1991) The pi-sigma network : an efficient higher-order neural network for pattern classification and function approximation. In: Proceedings of the international joint conference on neural networks, pp 13–18
Shrivastava NA, Panigrahi BK (2014) A hybrid wavelet-elm based short term price forecasting for electricity markets. Int J Electr Power Energy Syst 55:41–50
Article Google Scholar
Shumway RH, Stoffer DS (2006) Time Series Analysis and Its Applications With R Examples. Springer, Berlin. ISBN 978-0-387-29317-2
MATH Google Scholar
Strang G, Nguyen T (1997) Wavelets and filter banks. Wellesley-Cambridge Press, Cambridge
MATH Google Scholar
Sun ZL, Choi TM, Au KF, Yu Y (2008) Sales forecasting using extreme learning machine with applications in fashion retailing. Decis Support Syst 46(1):411–419
Article Google Scholar
Tong H (1983) Threshold models in nonlinear time series analysis. Springer, Berlin
Book MATH Google Scholar
Trefethen LN, Bau D (1997) Numerical linear algebra. SIAM
Tsay RS (2012) An introduction to analysis of financial data with R. wiley, New York
Google Scholar
Zhang Q, Benveniste A (1992) Wavelet networks. IEEE Trans Neural Networks 3(6):889–898
Article Google Scholar
Zubulake P, Lee S (2011) The high frequency game changer: How automated trading strategies have revolutionized the markets (wiley trading), Wiley, New York

Download references

Acknowledgments

This work has been partially funded by the Centro Científico Tecnológico de Valparaíso – CCTVal, CONICYT PIA/Basal Funding FB0821, FONDECYT 1150810, FONDECYT 11160744 and UTFSM PIIC2015. The authors gratefully thanks Alejandro Cañete from IFITEC S.A. – Financial Technology, for providing the stock time series for this study.

Author information

Authors and Affiliations

Departamento de informática, Universidad Técnica Federico Santa María, Av España 1690, Valparaíso, Chile
César Fernández, Luis Salinas & Claudio E. Torres
CCTVal – Centro Científico Tecnológico de Valparaíso, Universidad Técnica Federico Santa María, Valparaíso, Chile
César Fernández, Luis Salinas & Claudio E. Torres

Authors

César Fernández
View author publications
You can also search for this author in PubMed Google Scholar
Luis Salinas
View author publications
You can also search for this author in PubMed Google Scholar
Claudio E. Torres
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to César Fernández.

Ethics declarations

Conflict of interests

The authors declare that they have no conflicts of interest.

Appendices

Appendix A: Sample entropy

Pincus [58] adapted the notion of ”entropy” for real-world use. In this context, entropy means order or regularity or complexity. Fix $m,N \in \mathbf {N}$ with $m\leq N$, and $r \in \mathbf {R}^{+}$. Given a time series of data $u(1), u(2),\dots , u(N)$ from measurements equally spaced in time, form the sequence of vectors$\mathbf {x}(1)$, $\mathbf {x}(2)$, …, $\mathbf {x}(N-m + 1) \in \mathbf {R}^{m}$ defined by:

$$\begin{array}{@{}rcl@{}} \mathbf{x}(i)&:=&[ u(i), u(i + 1),\dots, u(i+m-1) ]\in \mathbf{R}^{m}\\ i&=&1,2,\dots,N-m + 1 . \end{array} $$

(34)

Let d be some norm in $\mathbf {R}^{m}$, for instance:

$$\begin{array}{@{}rcl@{}} d(\mathbf{x}(i),\mathbf{x}(j)) \!&:=&\! \| \mathbf{x}(i) - \mathbf{x}(j) \|_{\infty} := \max_{1\leq k\leq m} | u(i+k-1)\\ &&\!-u(j+k-1)| \\ \!&=&\! \max\left\{ |u(i)-u(j)|, |u(i + 1)-u(j + 1)|\right.\\ &&\left., \dots , |u(i+m-1)-u(j+m-1)| \right\} . \\ \end{array} $$

(35)

Then, Pincus defines

$$ {C_{i}^{m}}(r)\!:=\! \frac{\# \left\{ j\!\in\! \mathbf{N} : j\!\leq\! N - m + 1\ \text{and}\ d(\mathbf{x}(i),\mathbf{x}(j))\!\leq\! r \right\} }{ N - m + 1 } . $$

(36)

The denominator $N-m + 1$ in (36) is the total number of segments of length m available in the signal $u(1),\dots , u(N)$, i.e. $1\leq \text {numerator}\leq N-m + 1$. Note that in the numerator of (36) the distance from the pattern (segment, template, or vector) $\mathbf {x}(i)$ must be measured w.r.t. all$N-m + 1$ patterns of length m available in the sequence $u(1),\dots , u(N)$. Only the patterns with distance $\leq r$ are counted. Thus:

$$ \frac{1}{N-m + 1}\leq {C_{i}^{m}}(r)\leq 1 . $$

(37)

We observe that, for $i\in \mathbf {N}$ fixed with $i\leq N-m + 1$:

$$\begin{array}{@{}rcl@{}} &&\# \left\{ j\in \mathbf{N} : j\leq N-m + 1\ \text{and}\ d(\mathbf{x}(i),\mathbf{x}(j))\leq r \right\} \\ &=& \sum\limits_{j = 1}^{N-m + 1} H\left( r-d\left( \mathbf{x}(i),\mathbf{x}(j)\right)\right) , \end{array} $$

where H is the Heaviside step function. Thus,

$${C_{i}^{m}}(r) = \frac{\displaystyle\sum\limits_{j = 1}^{N-m + 1} H\left( r-d\left( \mathbf{x}(i),\mathbf{x}(j)\right)\right)}{N-m + 1} . $$

Pincus further defines

$$\begin{array}{@{}rcl@{}} C^{m}(r) &:=& (N-m + 1) {\sum}_{i = 1}^{N-m + 1} {C_{i}^{m}}(r) \\ &=& {\sum}_{i = 1}^{N-m + 1} {\sum}_{j = 1}^{N-m + 1} H\left( r-d\left( \mathbf{x}(i),\mathbf{x}(j)\right)\right) , \end{array} $$

(38)

and

$$ \beta_{m} := \lim_{r\to0} \lim_{N\to\infty}\frac{\log C^{m}(r)}{\log r} . $$

(39)

In order to explain in deep the sample entropy measure, we introduce the following notation:

$$\begin{array}{@{}rcl@{}} u[1:N] :&& \text{the signal or segment}\ u(1),u(2),\dots,u(N) , \\ u[j:j+m-1] :&& \begin{array}{l} \text{the template } u(j),u(j + 1),\dots,u(j+m-1) \\ \text{of the signal } u[1:N] , \end{array} \\ \mathbf{x}(j) :&& \begin{array}{l} \text{the template }u(j),u(j + 1),\dots,u(j+m-1) \\ \text{as the }\mathit{vector}~[u(j),u(j + 1),\dots,u(j+m-1)] \in \mathbf{R}^{m} , \end{array} \\ u[j:j+m-1] \sqsubset u[1:N] :&& u[j:j+m-1]\ \text{is a \textit{template} of}\ u[1:N] . \end{array} $$

Recall that ${C_{i}^{m}}(r)$ counts those templates $\mathbf {x}(j)\equiv u[j:j+m-1]$ of the signal $u[1:N]$, whose $\|\cdot \|_{\infty }$-distance to the fixed template $\mathbf {x}(i)\equiv u[i:i+m-1]$ is $\leq r$.

Richman and Moorman [40] define:

$$\begin{array}{@{}rcl@{}} {B_{i}^{m}}(r) &:=& \frac{\displaystyle \# \left\{ \mathbf{x}_{m}(j) \sqsubset u[1:N] : 1\leq j\leq N-m ,\ j\neq i ,\ d(\mathbf{x}_{m}(j),\mathbf{x}_{m}(i))\leq r \right\} } {N-m-1} , \end{array} $$

(40)

$$\begin{array}{@{}rcl@{}} {A_{i}^{m}}(r) &:=& \frac{\displaystyle \# \left\{ \mathbf{x}_{m + 1}(j) \sqsubset u[1:N] : 1\leq j\leq N-m ,\ j\neq i ,\ d(\mathbf{x}_{m}(j),\mathbf{x}_{m}(i))\leq r \right\} } {N-m-1} . \end{array} $$

(41)

Note that there are $N-m + 1$ templates $\mathbf {x}_{m}(j)$ of length m in the signal $u[1:N]$. However, the convention of Richman-Moorman is to consider for ${B_{i}^{m}}(r)$ only $N-m$ of them, for instance, the first $N-m$ of them, thus disregarding the template $\mathbf {x}_{m}(N-m + 1)\equiv u[N-m + 1:N]\sqsubset u[1:N]$. In the case of ${A_{i}^{m}}(r)$, there are exactly $N-m$ templates of length $m + 1$ in $u[1:N]$. Then Richman and Moorman define the corresponding integral coefficients:

$$ B^{m}(r) := \frac{\displaystyle\sum\limits_{i = 1}^{N-m} {B_{i}^{m}}(r)}{N-m} , \qquad A^{m}(r) := \frac{\displaystyle\sum\limits_{i = 1}^{N-m} {A_{i}^{m}}(r)}{N-m} . $$

(42)

Thus,

$$\begin{array}{@{}rcl@{}} B^{m}(r) &=& \text{the probability that \textit{two different} templates of} \\ && \text{length }m, \text{both }\sqsubset u[1:N],\text{ will }\mathit{match~within}~ r , \\ &=& \mathscr{P}(\{ u[i,i+m-1],u[j,j+m-1]\sqsubset u[1:N] \\ && \quad\quad: i\neq j ,\ d(u[i,i+m-1], \\ &&\quad\quad u[j,j+m-1])\leq r \} ) , \end{array} $$

(43)

$$\begin{array}{@{}rcl@{}} A^{m}(r)&=& \text{the probability that \textit{two} templates of length } \\ &&m + 1. \text{both }\sqsubset u[1:N],\text{ will }\mathit{match~within}~ r , \\ &=& \mathscr{P}(\{ u[i,i+m],u[j,j+m]\sqsubset u[1:N] : \\ && i\neq j ,\ d(u[i,i+m],u[j,j+m])\leq r \} ) . \end{array} $$

(44)

“To match withinr” means here that the distance

$$\begin{array}{@{}rcl@{}} d(\text{template 1, template 2} ) &=& \| \text{template 1}-\text{template 2} \|_{\infty}\\ &&\text{is}\quad \leq r. \end{array} $$

We observe that:

$$\begin{array}{@{}rcl@{}} A^{m}(r) \!&=&\! \mathscr{P} (\{ u[i,i+m],u[j,j+m]\sqsubset u[1:N] : i\neq j ,\\ &&\quad\quad d(u[i,i+m],u[j,j+m])\leq r \} ) . \end{array} $$

Consider now the set in the argument of $\mathscr{P}$. Let us call it S:

$$\begin{array}{@{}rcl@{}} S&=& \{ u[i,i+m],u[j,j+m] \sqsubset u[1:N]\\ &&: i\neq j ,\ d(u[i,i+m],u[j,j+m])\leq r \} \\ &=& \{ u[i,i+m],u[j,j+m] \sqsubset u[1:N] \\ &&: i\neq j ,\ d(u[i,i+m-1],u[j,j+m-1])\leq r \\ && \text{and}\quad |u(i+m)-u(j+m)|\leq r \} \\ &=& \{ u[i,i+m],u[j,j+m] \sqsubset u[1:N] \\ && : i\neq j ,\ d(u[i,i+m-1],u[j,j+m-1])\leq r \} \\ &&\bigcap \{ u[i,i+m],u[j,j+m] \sqsubset u[1:N] \\ &&: i\neq j ,\ |u(i+m)-u(j+m)|\leq r \}. \end{array} $$

The measure of this set is, of course, $A^{m}(r)$:

$$\begin{array}{@{}rcl@{}} A^{m}(r) &=& \mathscr{P}(S) \\ &=& \mathscr{P} [ |u(j+m)-u(i+m)|\leq r ,\ i\neq j\ |\ i\neq j ,\\ &&|u(j+k)-u(i+k)|\leq r ,\ 0\leq k\leq m-1 ] \\ && \times\ \mathscr{P} [ \{ |u(j+k)-u(i+k)|\leq r ,\ i\neq j ,\\ && 0\leq k\leq m-1 \} ] \\ &=& \mathscr{P} [ |u(j+m)-u(i+m)|\leq r ,\ i\neq j\ |\ \\ && i\neq j ,\ |u(j+k)-u(i+k)|\leq r ,\ 0\leq k\\ &&\leq m-1 ] \times B^{m}(r). \end{array} $$

Thus,

$$\begin{array}{@{}rcl@{}} \frac{A^{m}(r)}{B^{m}(r)} \!&=&\! \mathscr{P} [ |u(j + m) - u(i+m)|\!\leq\! r \ |\ i\neq j ,\ |u(j+k)\\ &&-u(i+k)|\leq r ,\ 0\leq k\leq m-1 ]. \end{array} $$

(45)

Loosely speaking we can say that this is the conditional probability that two different templates of length $m + 1$, whose m initial points are within a tolerance r of each other, remain within the tolerance r of each other at the next point (the $m + 1$-th point of the template).

Richman and Moorman define then the statistic:

$$ \text{SampEn}(m,r,N) := -\log\frac{A^{m}(r)}{B^{m}(r)} , $$

(46)

and the corresponding parameter estimated by this statistic, as:

$$\begin{array}{@{}rcl@{}} \text{SampEn}(m,r) &:=& \lim_{N\to\infty} \text{SampEn}(m,r,N) \\ &=& \lim_{n\to\infty}\left( -\log\frac{A^{m}(r)}{B^{m}(r)}\right) . \end{array} $$

(47)

Appendix B: Wavelet transform background

Wavelet analysis is today a common tool in signal and spectral analysis, in particular because of its multiresolution and localization capabilities both in time and frequency domain. In the time series context, wavelet analysis allows the decomposition and localization, at different time and frequency scales, relevant features present on the underlying processes. The literature on wavelet theory is extensive; we mention just a few standard references: [9, 18, 51, 57]. In this section we collect the bare minimum of wavelet theory, necessary for the best understanding of the algorithms developed in the next section.

1.1 B1 Continuous wavelet transform

A function $\psi :\mathbb {R}\to \mathbb {C}$, such that $\psi \in L^{2}$ and $\|\psi \|^{2} = {\int }_{-\infty }^{\infty }|\psi (x)|^{2} dt = 1$, is called wavelet or mother-wavelet whenever $\psi $ satisfies the admissibility condition:

$$ C_{\psi}:= 2\pi{\int}_{-\infty}^{\infty} \frac{\left| \widehat{\psi}(a) \right|^{2}}{|a|} da <\infty , $$

(48)

where $\widehat {\psi }$ denotes the Fourier transform of $\psi $. Note that (48) implies that $\psi $ must have zero average, i.e., $\widehat {\psi }(0)=\frac {1}{\sqrt {2\pi }} {\int }_{-\infty }^{\infty } \psi (t) dt= 0$. When a fixed wavelet $\psi \in L^{2}$ has been selected, then the continuous wavelet-transform of a time signal $f\in L^{2}$ is defined by:

$$\begin{array}{@{}rcl@{}} \mathcal{W}\!f(a,b) &=& {\int}_{-\infty}^{\infty} f(t) \frac{1}{|a|^{1/2}} \overline{\psi\left( \frac{t-b}{a} \right)} dt,\\ && a,b\in\mathbb{R} ,\ a\neq0 , \end{array} $$

(49)

where $\overline {\psi ((t-b)/a)}$ denotes the complex conjugate of $\psi ((t-b)/a)$. Then f can be synthesized from its wavelet transform $\mathcal {W}\!f$ by means of the inversion formula: [9, (3.7), p. 67], [51, Th. 4.3, p. 81].

$$\begin{array}{@{}rcl@{}} f(t) \!&=&\! \frac{1}{C_{\psi}} \int\limits_{0\neq a\in\mathbb{R}} \int\limits_{b\in\mathbb{R}} \mathcal{W}\!f(a,b) \frac{1}{\sqrt{|a|}} \psi\!\left( \frac{t-b}{a}\right)\! \frac{da db}{|a|^{2}} ,\\ &&t\!\in\!\mathbb{R} . \end{array} $$

(50)

1.2 B2 Discrete wavelet transform

A well known approach to discrete wavelet analysis is by means of multiresolution analysis (MRA). Main references for this section are [9, Ch. 5, p. 105], [51, Ch. VII, p. 220]. MRA provides a systematic way to construct discrete wavelets. An MRA is a sequence $\left \{V_{j}\right \}_{j\in \mathbb {Z}}$ of closed subspaces of $L^{2}$ having the following properties also known as axioms: [9, Sec. 5.1, p. 106], [51, Def. 7.1, p. 220],

(a)
V_j ⊂ V_j− 1 for all $j\in \mathbb {Z}$;
(b)
$\bigcap _{j\in \mathbb {Z}}V_{j}=\{0\}$ (axiom of separation);
(c)
$\bigcup _{j\in \mathbb {Z}}V_{j}$ is dense in $L^{2}$ (axiom of completeness);
(d)
f ∈ V_j ⇔ f(∙− 2^jk) ∈ V_j;
(e)
f ∈ V_j ⇔ f(∙/2) ∈ V_j+ 1 (or, equivalently, $f(2\cdot \bullet )\in V_{j} \Leftrightarrow f\in V_{j + 1}$, obviously); and
(f)
(Blatter) There exists a function $\phi \in L^{2}\cap L^{1}$ such that $\left \{ \phi (\bullet -k) \right \}_{k\in \mathbb {Z}}$ is an orthonormal basis of $V_{0}$. This function $\phi $ is called scaling function of the MRA.

For each $j\in \mathbb {Z}$ the functions,

$$ \phi_{j,k}(t) := 2^{-j/2} \phi(2^{-j}t-k) ,\quad k\in\mathbb{Z} , $$

(51)

constitute an orthonormal basis of $V_{j}$.

Since $\phi \in V_{0}\subset V_{-1}$, $\phi $ can be represented in terms of the orthonormal basis (51) of $V_{-1}$ (i.e., $j=-1$). In fact, the inclusion $V_{0}\subset V_{-1}$ is equivalent to the existence of a sequence $\{h_{k}\}_{k\in \mathbb {Z}}$ with ${\sum }_{k\in \mathbb {Z}}|h_{k}|^{2}\leq \infty $ such that the scaling equation (52) holds: [9, p. 118, eq. (2)]

$$ \phi(t)=\sqrt{2}\sum\limits_{k\in\mathbb{Z}}h_{k}\phi(2t-k) ,\quad \text{for almost all}\ t\in\mathbb{R} . $$

(52)

The sequence $\{h_{k}\}_{k\in \mathbb {Z}}$ uniquely defines the scaling function $\phi $. Axiom (a) is an obstruction for the functions $\phi _{j,k}$ to build an orthonormal basis of $L^{2}$. One considers then an additional system of pairwise orthogonal subspaces $W_{j}$ of $L^{2}$, defined as the orthogonal complements of $V_{j}$ in $V_{j-1}$, so that $V_{j-1}=V_{j}\oplus W_{j}$, $V_{j}\perp W_{j}$, $j\in \mathbb {Z}$, having the property:

$$ f\in W_{j} \quad\Leftrightarrow\quad f\left( 2^{j}\cdot\bullet\right)\in W_{0} , $$

(53)

which is analogous to axiom (e). The orthogonal direct sum of these subspaces: [9, p. 108, (5.1)],

$$ \bigoplus_{j\in\mathbb{Z}} W_{j} \quad\text{is dense in}\quad L^{2} . $$

(54)

There is moreover a function $\psi \in W_{0}$, called the mother wavelet, such that $\{\psi (\bullet -k)\}_{k\in \mathbb {Z}}$ is an orthonormal basis of $W_{0}$, given by: [9, pp. 123-4, eqs. (16)–(19)]

$$\begin{array}{@{}rcl@{}} \psi(t)&=&\sqrt{2} {\sum}_{k\in\mathbb{Z}} g_{k} \phi(2t-k) ,\qquad g_{k}=(-1)^{k-1} \overline{h_{-k-1}} ,\\ &&k\in\mathbb{Z} , \end{array} $$

(55)

where $\overline {h_{-k-1}}$ denotes the complex conjugate of $h_{-k-1}$. Moreover, the function system $\{\psi _{j,k}\}_{j,k\in \mathbb {Z}}$ defined by: [9, p. 124, (5.13)]

$$ \psi_{j,k}(t):= 2^{-j/2} \psi\left( 2^{-j}t-k \right) , \quad j,k\in\mathbb{Z} , $$

(56)

is an orthonormal wavelet-basis of $L^{2}(\mathbb {R})$.

1.3 B.3 Algorithms

We start from (52) and (55). From (52) follows for $j,n\in \mathbb {Z}$ the identity:

$$ 2^{-j/2} \phi(2^{-j} t-n ) = 2^{-(j-1)/2} \sum\limits_{k\in\mathbb{Z}}\phi(2^{-(j-1)} t-2n-k ) , $$

(57)

which is usually written as a recursive formula “ϕ_j− 1,∙” $\to $ “ϕ_j,∙” as: [9, p. 131, eq. (3)]

$$ \phi_{j,n} = \sum\limits_{k\in\mathbb{Z}} h_{k} \phi_{j-1, 2n+k} ,\quad j,n\in\mathbb{Z} . $$

(58)

Similarly, the recursive formula “ϕ_j− 1,∙” $\to $ “ψ_j,∙” is given by: [9, p. 131, eq. (4)]

$$ \psi_{j,n} = \sum\limits_{k\in\mathbb{Z}} g_{k} \phi_{j-1,2n+k} ,\quad j,n\in\mathbb{Z} . $$

(59)

The well-known fast filter bank algorithm allows the computation of the orthogonal wavelet coefficients of a signal $f\in L^{2}$. Since at the scale j, $\{\phi _{j,n}\}_{n\in \mathbb {Z}}$ and $\{\psi _{j,n}\}_{n\in \mathbb {Z}}$ are orthonormal bases of $V_{j}$ and $W_{j}$ respectively, the projections of f ontho these spaces are given by: [51, p. 255]

$$\begin{array}{@{}rcl@{}} a_{j,n} &=& \langle f,\phi_{j,n} \rangle = {\int}_{-\infty}^{\infty} f(t) \overline{\phi\left( 2^{-j}t-n\right)} dt , \quad n\in\mathbb{Z} ,\\ d_{j,n} &=& \langle f,\psi_{j,n} \rangle = {\int}_{-\infty}^{\infty} f(t) \overline{\psi\left( 2^{-j}t-n\right)} dt , \quad n\in\mathbb{Z} . \end{array} $$

These coefficients can be calculated recursively with a cascade of discrete convolutions and subsamplings. [51, p. 255, Theorem 7.7] At the analysis or decomposition:

$$ a_{j + 1,n} = \sum\limits_{k\in\mathbb{Z}} h_{k-2n} a_{j,k} ,\qquad d_{j + 1,n} = \sum\limits_{k\in\mathbb{Z}} g_{k-2n} a_{j,k} , $$

(60)

and at the synthesis or reconstruction:

$$ a_{j,n} = \sum\limits_{k\in\mathbb{Z}} h_{n-2k} a_{j + 1,k} + \sum\limits_{k\in\mathbb{Z}} g_{n-2k} d_{j + 1,k}. $$

(61)

Appendix C: Parameter setting of forecasting models

Tables 10 and 11 show the best setting obtained by the training process in each neural network model. The activation functions correspond to Hyperbolic Tangent (Tanh), Logistic (Log), Softsign (Ssgn) and SoftPlus (Spls) and Identity (Id). The best fitting in RFB in the hidden layer, these were: Cubic (Cub) and Multiquadric (MQ). In MLP, PSN, LSTM, the setting corresponds to number of inputs, number of hidden neurons, activation function in hidden layer and activation function in output layer. ELME₁ and ENN follow the same notation previous, however, only consider an activation function on the hidden layer. In the case of SW-ELM, the notation only considers the number of inputs and hidden nodes, since the model uses a particular activation function. In the case of ARIMA model, the best settings are shown in Table 12

Table 10 The best setting of state-of-the-art models included in this study for the stocks: INT, HPQ and CSCO

Full size table

Table 11 The best setting of state-of-the-art models included in this study for the stocks: VZ, MSFT and IBM

Full size table

Table 12 The best setting of ARIMA for each stock

Full size table

Appendix D: Computational Complexity

The forecasting models exhibited in this study were implemented in Python and R, and the execution times were measured in seconds using the modules time (Python) and tictoc (R). To ensure a sequential execution inside each individual model, or inside each individual model constituent, the number of execution threads was set to 1. More precisely, the number of threads of the OPENBLAS environmental variable was set to 1: OPENBLAS_NUM_THREADS= 1.

In Section 5.5.1 we calculate the number of elementary operations ($\mathscr{T}_{\text {WE}}$) required to find the best WE architecture. Table 13 shows the mother wavelets used for fitting the WE models and theirs corresponding indices.

Table 13 Mother wavelets

Full size table

Tables 14 and 15 show the CC-values for each forecasting model and time series. The execution time for the naive model is 0.0000024594 seconds in mean for each time series.

Table 14 The computational complexity (CC) exhibited by the forecasting models included in this study for the stocks: INT, HPQ and CSCO

Full size table

Table 15 The computational complexity (CC) exhibited by the forecasting models included in this study for the stocks: VZ, MSFT and IBM

Full size table

Tables 16 and 17 show the CC-values and THEIL-values for ELME_k models for each time series.

Table 16 The computational complexity (CC) exhibited by the ELME models for the stocks: INT, HPQ, CSCO, VZ, MSFT and IBM

Full size table

Table 17 The THEIL values exhibited by the ELME models for the stocks: INT, HPQ, CSCO, VZ, MSFT and IBM

Full size table

The execution environment used to obtain the experimental results is shown in Table 18.

Table 18 Description of the execution environment

Full size table

Rights and permissions

Reprints and permissions

About this article

Cite this article

Fernández, C., Salinas, L. & Torres, C.E. A meta extreme learning machine method for forecasting financial time series. Appl Intell 49, 532–554 (2019). https://doi.org/10.1007/s10489-018-1282-3

Download citation

Published: 05 September 2018
Issue Date: 15 February 2019
DOI: https://doi.org/10.1007/s10489-018-1282-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A meta extreme learning machine method for forecasting financial time series

Abstract

Access this article

Similar content being viewed by others

Adaptive wavelet transform model for time series data prediction

Financial time series forecasting using optimized multistage wavelet regression approach

Towards an efficient machine learning model for financial time series forecasting

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interests

Appendices

Appendix A: Sample entropy

Appendix B: Wavelet transform background

1.1 B1 Continuous wavelet transform

1.2 B2 Discrete wavelet transform

1.3 B.3 Algorithms

Appendix C: Parameter setting of forecasting models

Appendix D: Computational Complexity

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A meta extreme learning machine method for forecasting financial time series

Abstract

Access this article

Similar content being viewed by others

Adaptive wavelet transform model for time series data prediction

Financial time series forecasting using optimized multistage wavelet regression approach

Towards an efficient machine learning model for financial time series forecasting

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interests

Appendices

Appendix A: Sample entropy

Appendix B: Wavelet transform background

1.1 B1 Continuous wavelet transform

1.2 B2 Discrete wavelet transform

1.3 B.3 Algorithms

Appendix C: Parameter setting of forecasting models

Appendix D: Computational Complexity

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation