Elsevier

Applied Soft Computing

Volume 76, March 2019, Pages 31-44
Applied Soft Computing

Learning regularity in an economic time-series for structure prediction

https://doi.org/10.1016/j.asoc.2018.12.003Get rights and content

Highlights

  • This paper employs a novel approach for time-series segmentation, clustering of similar segments and labeling of the series with the known structured segments.

  • An automata based prediction of the next time-series segment emerging at the next time-point is also introduced.

  • Structure prediction of a time series.

Abstract

Although an economic time-series has an apparently random fluctuation over time, there exists certain regularity in the functional behavior of the series. This paper attempts to identify the regularly occurring structures in an economic time-series with an aim to represent the series as a specific sequence of such structures for forecasting applications. The applications include prediction of the most probable structure with its expected duration, along with predicted values lying thereon. Representation of a time-series by a set of regularly recurring structures is undertaken by invoking three main steps: (i) non-uniform length segmentation of the series, (ii) identification of the recurrent patterns by clustering of the generated segments, and (iii) representing the sequence of regular structures using a specially designed automaton. The automaton is used here to both encode the sequence of structures representing the time-series and also to act as an inference engine for stochastic forecasting about the time-series. Experiments undertaken on large (28 years’) daily economic time-series data sets confirm the success in automated structure prediction with an average prediction accuracy of 88.05%, average precision of 91.24% and average recall of 93.42%.

Introduction

A time-series represents a sequence of the time-functional values available at regular interval of time, such as yearly population-growth [1], daily recordings of temperature [2], humidity and rainfall [3], daily variation in economic growth [4] of a country, and many others. Prediction of a time-series refers to determining the value of the time-series at future time points from its current and preceding values. Among the well-known principles of time-series prediction, deterministic [5], stochastic [6], neural [7], [8], [9], fuzzy [10], [11], [12], [13], [14], [15], chaos theoretic [16], regression [17] and other approaches indicated in [18] need special mention. Besides prediction of time-series, there exist a few research works dealing with matching [19], segmentation [20], [21], [22], [23], [24], [25], [26], [27] dimensionality reduction [28], clustering [18], and classification [29], [30] of time-series. Knowledge acquisition [31], [32] from a time-series by time-series sub-sequence clustering is another interesting research arena. One important but scarcely studied aspect of time-series research is forecasting of (up/down/side-ways) moves/structures emerging from selected time-points. A structure comprises a block of contiguous data points, having a meaningful shape/geometry. Prediction of structures is important in economic time-series, as the forecasted structures help investors and traders in their business planning.

The motivation of this paper is two-fold. First, it would be able to automatically recognize the repetitive structures present in an economic time-series in absence of any knowledge about structure-shape, duration and location of their endpoints. Second, it offers a method to predict the (most) probable structures expected to emerge from the selected time-points. It would also infer the sequence of the most probable structures between any pair of partitions of a pre-partitioned time-series. The first motivation helps in determining the approximate duration, shape and probability of occurrence of the primitive structures present in the series. The second motivation helps in predicting structure-shape, approximate duration and probability of occurrence, which provide useful insights to business traders and investors.

Identification of the primitive structures present in a time-series is performed here in two steps. In the first step, the time series is segmented heuristically based on the similarity in local slopes of the contiguous data points in the series. Second, the segments extracted are clustered (grouped) based on similarity of the geometric shapes of the segments. Given a daily (economic) time-series of significant length (10 years), it is noted from experiments that there exist approximately 7 to 10 clusters of the segments. In other words, there exist 7 to 10 possible primitive structures embedded in the time-series. So, a time-series of finite length can be represented by a sequence of known structures. While predicting the most probable structure, emanating from a given time-point, any one of these 7–10 possible structures is expected as the solution. One natural question is: how to predict the most probable structure. This is undertaken here with the help of a Dynamic Stochastic Automaton (DSA), which is briefly outlined below.

The DSA employed here includes nodes (states) representing partitions of the series and directed arcs representing state-transitions from one partition to the other due to the presence of a structure between the partitions. The structure pk originating from a partition Zi and terminating to a partition Zj is represented in DSA as two states si and sj and one input symbol occurring at si causing transition to state sj. The probability of transition from a state to other (feasible) states due to occurrence of an input symbol to the former state is attached to the respective arcs along with the average duration of the transition in the series. Suppose, the structure emanating at time t needs to be predicted. As the time-point ‘t’ is given, the partition containing the time-series value at time t and hence the state describing the partition are known. Then the input symbol, representative of a structure with the highest probability of occurrence, is searched in the selected state of the DSA. The said structure is inferred as the next possible structure of the time-series emanating at time t.

The originality of the paper thus is 2-fold. First, a new technique for online structure prediction in a time-series, involving 3 basic steps: (i) segmentation, (ii) clustering and (iii) prediction using DSA, is proposed. Second, the algorithms developed for the individual steps are novel and have shown promising performance with their existing counterparts. An outline to the original contributions of the proposed segmentation, clustering and DSA-based prediction and their relative merits with existing algorithms are summarized below.

Most of the traditional segmentation algorithms utilizes piecewise linear approximation (PLA) criterion [33] to distinguish segments based on their inter-segment homogeneity (here uniform slope). In other words, a segment boundary is selected at a time point, where the slopes of the current and the next segments have significant difference. Two common varieties of segmentation algorithms, developed satisfying the PLA criterion, include the top-down [27], the bottom-up [26] algorithms. While the top-down algorithm recursively splits a fragment of a given time-series into two components based on difference in slopes, the bottom-up algorithms recursively merge segments greedily as long as PLA criterion within the segment is satisfied. The top-down and the bottom-up algorithms work in offline mode. Among the online segmentation algorithms, the Sliding Window (SW) algorithm is popular. The SW algorithm aims at identifying the position of the next segment boundary by iteratively widening the present segment, until the measure of the “approximation error” [34] (like Maximum Vertical Distance (MVD) [35] or Root Mean Square Error [11]) of the segment exceeds a predefined threshold. Several extensions [36], [37] to the basic SW algorithm have been undertaken to reduce its time-complexity. Keogh, for example, combined sliding window and bottom up principles to design a new algorithm called SWAB [37]. Among other interesting techniques of segmentation, Dynamic Programming [22], [38], Guaranteed Error-Bound based segmentation [35], [39], [40], clustering [20], least square approximation [24] and evolutionary algorithm [23] based techniques need special mention.

The proposed segmentation algorithm differs from the existing ones by the following counts. First, it labels the transition between pairs of successive time-points in the series as ‘rise’(R), ‘fall’(F) and ‘equal’(E) depending on increase, decrease and no change in amplitudes of the successive points. Second, it labels a sequence of a fixed number of time-points into v = R/F/E, if the frequency count of v is maximum within the predefined fixed number of data points. Third, it groups two or more consecutive sequence of time-series points having common label x into a segment. Thus the proposed algorithm is similar with natural segmentation of ridges on a mountain by a traveler based on the local gradients.

The structures extracted from a long time-series of several years are often found to have shape-similarity, which can be easily recognized manually by visual inspection. However, because of non-uniformity in the duration of the extracted structures and their peak-to-peak amplitude, they need to be pre-processed before automatic clustering. The z-score normalization is employed to normalize their peak-to-peak amplitude and fixed number of sampling is used to normalize their duration, irrespective of structure-length. Once normalized, a clustering algorithm [41], [42] could be employed to group similar structures of fixed duration and amplitude using certain distance metrics [37], [43], [44] to measure similarity [45], [46], [47]. If p number of clusters is discovered, then the cluster centers of p clusters are recorded as the primitive structure-shapes in the normalized form. Although any traditional clustering algorithms, such as k-means, k-medoids, fuzzy c-means clustering algorithm and the like could serve the purpose, they suffer from 2 fundamental limitations: (i) performance of clustering on the initialization of cluster centers [48] and the requirement to specify the number of clusters [42], which is not known in the present situation. The above problems can be alleviated by data-density based clustering algorithms. Density Based Spatial Clustering of Applications with Noise (DBSCAN) [49] is one such well-known data-density dependent clustering algorithm.

The DBSCAN algorithm clusters points located in the highest data density region and also labels points with lower surrounding density as noise. DBSCAN-DLP [50], which is also an extension of classical DBSCAN, attempts to cluster data points based on their variety of data densities. In this paper, we attempt to extend the classical DBSCAN algorithm to hierarchically cluster [51], [52] data points at different density levels. The DBSCAN-DLP pre-processes the input data and then organizes them into layers based on their density. Next for each layer of data, the classical DBSCAN is utilized to cluster the data points. Here, an analogous technique with one fundamental difference is proposed. In the present settings, a greedy recursive technique, which in lieu of preprocessing the data set, layers the data set by isolating the cluster with the highest density and dropping the points with relatively lower density as outliers, is proposed. A recursive realization of the data layering autonomously places the cluster centers in the relative order of data density. Besides, data points of uniform density, irrespective of their spatial locations, are clustered at the same level.

After clustering of the structures by the proposed extension of DBSCAN clustering, the cluster centers representing the ideal members of each cluster are preserved for future usage. In fact, the stochastic automation is developed to encode the knowledge for state-transition, considering the structures representative of the cluster centers, as the input symbol of the automaton. The idea used for knowledge encoding [31] in the proposed Dynamic Stochastic Automaton is novel and unique in comparison to the related works reported in [17], [53], [54], [55].

The paper is divided into eight sections. Section 2 provides an overview of the proposed structure prediction technique. Section 3 provides a novel algorithm for segmentation of a time-series. Section 4 extends the DBSCAN algorithm for hierarchical multi-resolution clustering. Section 5 provides the principles of time-series representation by a stochastic automaton. Section 6 deals with prediction experiments on 3 well-known economic time-series: Taiwan Stock Exchange Index (TAIEX) [56], National Association of Securities Dealers Automated Quotations (NASDAQ) [57] and Dow Jones Industrial Average (DJIA) [58] for each year in the period 1990–2017, resulting in an average structure prediction accuracy of 88.05%. A study of the performance analysis of the proposed algorithm is undertaken in Section 7. The concluding points are summarized in Section 8.

Section snippets

An overview of the proposed structure prediction technique

Structures are meaningful contiguous blocks of time-series data. Structure prediction, here, is formulated as a 2-phase problem. In the first phase, the knowledge acquired from the time-series is encoded in a DSA. In the second phase, the DSA is used to forecast the possible shape of the meaningful moves in the time-series at a desired time-point.

Fig. 1 provides a schematic overview of the DSA construction from a time-series. First, the time-series is segmented into (homogeneous) time-blocks of

Slope-sensitive natural segmentation

The Slope-Sensitive Natural Segmentation (SSNS) algorithm presented here, segments a time-series in a manner similar to natural segmentation of ridges on a mountain based on slope changes. The algorithm works in two phases. The first phase is required to label the lines joining consecutive pairs of data points as rise (R), fall (F) and zero-slope (E). In the second phase, it classifies the windows of fixed length (intuitively chosen as five) containing the maximum number of labels of R/F/E into

Clustering of segmented structures

A typical time-series of 5000 to 10,000 data points contains several hundred segmented structures of similar shapes. To determine the primitive structures in a time-series, we need to perform two steps: (i) pre-processing and (ii) clustering of the pre-processed structures. The pre-processing is again done in two phases. The first phase includes representing the segmented time-blocks/structures of different lengths into vectors of fixed dimension b. The value of b is selected heuristically as

Knowledge encoding and prediction by dynamic stochastic automaton

A dynamic stochastic automaton (DSA) [60] is a 7-tuple, given by F=(Q,I,QG,V,δ,δt,δG)where,

Q= A finite set of states, representing partitions,

I= A finite set of input symbols, representing primitive patterns (cluster centers) obtained by clustering of extracted segments,

QG= Set of next states (called a group) for a given state and a given input symbol: QG:Q×IP(Q), where P(Q) denotes the power set of Q. It should be noted that QG has two representations. It can both be represented as a set of

Prediction experiments

Prediction of a time-series involves two phases, where the former phase, referred to as training phase, is concerned with knowledge acquisition, while the latter phase, called test phase, deals with forecasting of the time-series at future time-points. Here, segmentation, clustering and DSA construction, introduced earlier, together constitutes the training phase, and forecasting using the DSA is undertaken in the test phase.

Let xt be a point on a time-series at time t. The test phase here is

Performance analysis

The analysis includes both the overall and the individual performances of the proposed segmentation and clustering algorithms.

Conclusions

The paper introduced a novel scheme to encode a time-series by an automaton using three main steps: segmentation, clustering and DSA construction. These steps are usually performed in the training phase. In the test phase, the automaton is consulted to predict the most probable structure/sequence of structures between any two given states (partitions) along with the probability of the MPS/MPSS and their expected duration. The online prediction scheme proposed here is useful, as it does not

Acknowledgment

The authors gratefully acknowledge the funding they received from UPE-II project (JU/UPE-II/2015) in Cognitive Science, funded by UGC, India .

References (63)

  • Warren LiaoT.

    Clustering of time series data – a survey

    Pattern Recognit.

    (2005)
  • BarneaO. et al.

    On fitting a model to a population time series with missing values

    Isr. J. Ecol. Evol.

    (2006)
  • ChenS.M. et al.

    Temperature prediction using fuzzy time series

    IEEE Trans. Syst. Man, Cybern. B

    (2000)
  • SmallM. et al.

    Detecting determinism in time series: the method of surrogate data

    IEEE Trans. Circuits Syst. I: Fundam. Theory Appl.

    (2003)
  • AbrahamB. et al.

    Statistical Methods for Forecasting

    (2008)
  • ConnorJ.T. et al.

    Recurrent neural networks and robust time series prediction

    IEEE Trans. Neural Netw.

    (1994)
  • MirikitaniD.T. et al.

    Recursive bayesian recurrent neural networks for time-series modeling

    IEEE Trans. Neural Netw.

    (2010)
  • ChenS.M. et al.

    TAIEX forecasting based on fuzzy time series and fuzzy variation groups

    IEEE Trans. Fuzzy Syst.

    (2011)
  • ChenS.M. et al.

    TAIEX forecasting using fuzzy time series and automatically generated weights of multiple factors

    IEEE Trans. Syst. Man Cybern. A

    (2012)
  • HanM. et al.

    Prediction of chaotic time series based on the recurrent predictor neural network

    IEEE Trans. Signal Process.

    (2004)
  • Q. Lin, C. Hammerschmidt, G. Peellegrino, S. Verwer, Short-term time series forecasting with regression automata, in:...
  • EslingP. et al.

    Multiobjective time series matching for audio classification and retrieval

    IEEE Trans. Audio Speech Lang. Process.

    (2013)
  • G.F. Bryant, S.R. Duncan, A solution to the segmentation problem based on dynamic programming, in: Proc. Third IEEE...
  • ChungF.L. et al.

    An evolutionary approach to pattern-based time series segmentation

    IEEE Trans. Evol. Comput.

    (2004)
  • FuchsE. et al.

    Online segmentation of time series based on polynomial least-squares approximations

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2010)
  • KeoghE. et al.

    Segmenting time series: a survey and novel approach

  • E. Keogh, P. Smyth, A probabilistic approach to fast pattern matching in time series databases, in: Proc. ACM...
  • ShatkayH. et al.

    Approximate Queries and Representations for Large Data SequencesTechnical Report CS-95-03

    (1995)
  • ZhaoY. et al.

    Generalized dimension reduction framework for recent-biased time series analysis

    IEEE Trans. Knowl. Data Eng.

    (2006)
  • FulcherB.D. et al.

    Highly comparative feature-based time-series classification

    IEEE Trans. Knowl. Data Eng.

    (2014)
  • L. Wei, E. Keogh, Semi-supervised time series classification, in: KDD ’06 Proceedings of the 12th ACM SIGKDD...
  • Cited by (27)

    • Artificial intelligence algorithm application in wastewater treatment plants: Case study for COD load prediction

      2021, Applications of Artificial Intelligence in Process Systems Engineering
    View all citing articles on Scopus
    View full text