Hybrid change point detection for time series via support vector regression and CUSUM method

https://doi.org/10.1016/j.asoc.2020.106101Get rights and content

Highlights

  • We consider the change point detection problem for ARMA type time series.

  • To detect a change point, we use a hybrid of the SVR and CUSUM methods.

  • We calculate the forecasting errors based on the SVR method.

  • The SVR errors are used in the construction of the LSCUSUM test.

  • Monte Carlo simulations and a real example is provided for illustration.

Abstract

This study considers the change point testing problem regarding time series based on the location and scale-based cumulative sum (LSCUSUM) test constructed with the residuals obtained from support vector regression (SVR)-autoregressive moving average (ARMA) models. For this, we first estimate the model parameters in SVR–ARMA models from a training time series sample, in which a long AR model is fitted to the data to obtain residuals. We then use these as initial values of the error terms in SVR–ARMA (p,q) models and obtain the forecasting values recursively until the updated error terms converge to a certain limit. Finally, we select an optimal order of p,q with the root mean square error (RMSE) and use the forecasting errors from this selected model as the residuals for constructing the LSCUSUM test. Monte Carlo simulations are performed to evaluate the validity of the test. A real data example is provided for illustration.

Introduction

In this study, we consider the change point testing problem for time series based on the location and scale-based cumulative sum (LSCUSUM) test constructed with the support vector regression (SVR)-based autoregressive moving average (ARMA) models. Since [1], the problem of testing for a parameter change has been an important issue in economics, engineering, and medicine, and a multitude of articles have been published in various research areas [2]. Because time series often suffer from structural changes owing to changes in policy and critical social events, the change point test has been viewed as a core issue in this context for several decades. The CUSUM test has been quite popular among many change point tests due to its ease of use and abundant articles exist regarding the CUSUM test for time series. For earlier works, we refer to [3], [4], [5], [6] and the papers cited therein, and further, for recent developments, see [7], [8], [9], [10], [11], [12], [13], [14], [15] who consider several different types of CUSUM tests and compare their performance.

The conventional estimate-based CUSUM test is designed to compare the discrepancy among sequentially obtained estimators [5]. This estimates-based test generally performs well but suffers from severe size distortions and produces low powers on some occasions, particularly when the underlying model is complicated and has many unknown parameters. Therefore, the residual-based CUSUM test for time series models has been advocated as a remedy [6], [7], [8]. However, the residual-based CUSUM test for location-scale models undergoes a severe power loss in location parameter changes. To overcome this drawback, [9] and [10] suggested using the score vector-based CUSUM test for ARMA-generalized autoregressive conditional heteroscedastic (GARCH) models. [11] also proposed a modified residual-based CUSUM test to lessen an effort to deal with the derivatives of the log-likelihood functions in constructing the test of [10] and enhanced the power performance. [12] further improved the test of [11] by introducing a lot handier location and scale-based CUSUM (LSCUSUM) test, demonstrating its validity for ARMA-type models. Because the LSCUSUM test is constructed only with observations and residuals, it has advantages over other CUSUM tests in terms of hybrid capability with other methods that can afford to calculate residuals. Motivated by this, we consider using the SVR–ARMA model in implementing the LSCUSUM test.

In the construction of the LSCUSUM test, an important step is to estimate the residuals accurately. That is, a correct time series prediction is crucial because the residuals are merely the prediction errors. Time series prediction is generally important to forecast the behavior of time series and detect malfunctions or anomalies in statistical process control. In the literature, the most popular time series forecast method is using the classical ARMA model. Conventional linear ARMA models yield an accurate prediction when a time series truly follows them. However, if the time series has significant nonlinear characteristics, the prediction result based on the ARMA models is incorrect and hard to harness for further applications. In this case, practitioners can employ nonparametric prediction methods such as a recurrent neural network (RNN) and support vector regression [16], [17]. The RNN method is well known to outperform the ARMA model in many situations, particularly when time series have, to certain extent, nonlinear and non-stationary features. However, it has some limitations such as the need for a large number of tuning parameters, difficulty in finding a unique global solution owing to a different choice of initial weights, and over-fitting [18]. In contrast, the SVR has flexibility, outstanding forecasting accuracy, and a balance between the training and generalization errors, resulting in better empirical performance than the RNN as well as ARMA models [19], [20]. It is well known that the SVR minimizes the structural risk and meets the Structural Risk Minimization Principle, while the RNN minimizes the empirical risk, namely, the error regarding the in-sample estimating data [21]. Motivated by this, we also adopt the SVR method for time series prediction, and based on the obtained ARMA residuals, we construct the LSCUSUM test to test for change points. See [22] for a reference concerning the SVR–ARMA method.

The rest of this paper is organized as follows. Section 2 introduces the LSCUSUM test in the classical ARMA model and outlines its basic principle. Section 3 proposes a forecasting method based on the SVR–ARMA model and describes how to determine an optimal SVR–ARMA model. The residuals are obtained through an optimal SVR–ARMA model to a given training time series sample, which is split into two subseries. A long AR model is fitted to the first subseries to obtain the initial values of residuals, which are used as the error terms in the SVR–ARMA(p,q) model and are recursively updated until the obtained residuals converge to a certain limit. This procedure is applied to each p,q less than a predetermined K. Then, for each estimated SVR–ARMA(p,q) model, we calculate the root mean square errors (RMSEs) based on the second subseries and select an optimal ARMA order with the smallest RMSE. The determined SVR–ARMA(p,q) model is then applied to obtain the prediction errors or residuals, which are finally used to construct the LSCUSUM test. Section 4 performs Monte Carlo simulations to evaluate the LSCUSUM test’s validity for various time series models. Section 4 provides a real data example for illustration. Section 5 provides concluding remarks.

Section snippets

LSCUSUM test for ARMA models

To develop a CUSUM test in time series models, [11] considered the CUSUM test for the location-scale model of the form yt=gt(μ)+ht(θ)ηt, where gt(μ) and ht(θ) are the conditional mean and variance with parameters μ and θ=(μT,λT)T, and ηt are iid error terms with mean zero and unit variance. The location-scale model includes a broad class of autoregressive conditional heteroscedastic (ARCH) time series models, covering ARMA-generalized ARCH (GARCH) models. To implement a change point test, they

SVR model

Support vector regression (SVR) is a functional tool to approximate various types of functions and make accurate predictions for time series. SVR aims to identify a nonlinear function f that approximates the output yt within a forecasting error based on given data {(xt,yt)}t=1n, where xtRk is a k-dimensional input vector and ytR is a scalar output. More precisely, f has the following form: f(xt)=wTϕ(xt)+b,where w and b are regression parameter vectors and ϕ() is a known nonlinear function.

Prediction based on SVR-ARMA model

Suppose that a training sample y1,,yn,yn+1,,yn+m is given. Here, y1,,yn and yn+1,,yn+m are also used as validation samples. We assume that the sample is generated from the ARMA model: yt=f(yt1,,ytp,ϵt1,,ϵtq)+ϵt,where f is an unknown function to be estimated, p,q are nonnegative integers that should be properly determined, and ϵt are iid random variables with zero mean and a finite variance. If the training sample is known to follow an SVR–ARMA(p,q) model with specific orders p and q,

Simulation study

In this section, we evaluate the performance of the SVR-based LSCUSUM tests TˆnLS and Tˆnmax for ARMA, threshold ARMA, and time-varying AR models. Each simulation is conducted with a sample size of 500 at the nominal level of 0.050. The sizes and powers are calculated as the rejection number of the null of no changes out of 500 repetitions. Under alternatives, the change is assumed to occur in the middle of the testing sample. The SVR-based LSCUSUM tests are compared with the ARMA-based LSCUSUM

Real data analysis

In this section, we apply the SVR-based LSCUSUM method to the Nikkei daily 225 data. We analyze 100*log-returns of daily Nikkei225 prices from January 4 2010 to December 31 2018. We split the dataset into the training dataset from January 4 2010 to December 30 2014 and the testing dataset from July 1 2015 to December 31 2018. Fig. 2, Fig. 3 plot daily and weekly datasets and Fig. 4, Fig. 5 plot daily and weekly log-return datasets.

Fig. 6, Fig. 7 show that both autocorrelation function (ACF) and

Concluding remarks

In this study, we consider the SVR-based LSCUSUM test to detect a change point for time series. Our simulation study confirms the validity of our method and shows that the SVR-LSCUSUM test outperforms the ARMA-based LSCUSUM test when the underlying model is nonlinear. For illustration, a data analysis was conducted using a Nikkei225 dataset, which also supports the practicality of the SVR-based LSCUSUM test. We plan to extend our work to time series with high volatility in our future project.

CRediT authorship contribution statement

Sangyeol Lee: Conceptualization, Funding acquisition, Methodology, Project administration, Supervision, Writing - original draft. Sangjo Lee: Data curation, Formal analysis, Methodology, Software, Validation, Visualization. Miteum Moon: Methodology, Software, Validation.

Declaration of Competing Interest

No author associated with this paper has disclosed any potential or pertinent conflicts which may be perceived to have impending conflict with this work. For full disclosure statements refer to https://doi.org/10.1016/j.asoc.2020.106101.

Acknowledgments

We would like to thank the Editor, an AE, and three anonymous referees for their valuable comments. This research is supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Science, ICT and future Planning (No. 2018R1A2A2A05019433).

References (32)

  • CsörgőM. et al.

    Limit Theorems in Change-Point Analysis

    (1997)
  • InclánC. et al.

    Use of cumulative sums of squares for retrospective detection of changes of variance

    J. Amer. Statist. Assoc.

    (1994)
  • KimS. et al.

    On the CUSUM test for parameter changes in GARCH (1, 1) models

    Comm. Statist. Theory Methods

    (2000)
  • LeeS. et al.

    On the CUSUM of squares test for variance change in nonstationary and nonparametric time series models

    Ann. Inst. Statist. Math.

    (2003)
  • LeeS. et al.

    The CUSUM test for parameter change in regression models with ARCH errors

    J. Japan Statist. Soc.

    (2004)
  • NaO. et al.

    Change point detection in SCOMDY models

    AStA Adv. Stat. Anal.

    (2013)
  • Cited by (24)

    • Data-driven quantification of public–private partnership experience levels under uncertainty with Bayesian hierarchical model

      2021, Applied Soft Computing
      Citation Excerpt :

      It has been proved to be consistent in detecting the number of change points in a series of sectors, such as hazard rate and finance [22]. Moreover, different hybrid methods have been proposed by scholars to increase the estimation accuracy, such as a hybrid of support vector regression and CUSUM method [23], as well as a hybrid of support vector machine and fuzzy statistical clustering method [24]. However, in real-world applications, the uncertainty in data is unavoidable [25].

    View all citing articles on Scopus
    View full text