Determination of temporal information granules to improve forecasting in fuzzy time series

https://doi.org/10.1016/j.eswa.2013.10.046Get rights and content

Highlights

  • Partitioning the universe of discourse in consideration of temporal information.

  • Determining intervals by time series segmentation and information granules.

  • Using the proposed method forecasting accuracies were significantly improved.

  • These intervals carry well-defined semantics.

  • The proposed method is very robust and stable to forecast in fuzzy time series.

Abstract

Partitioning the universe of discourse and determining intervals containing useful temporal information and coming with better interpretability are critical for forecasting in fuzzy time series. In the existing literature, researchers seldom consider the effect of time variable when they partition the universe of discourse. As a result, and there is a lack of interpretability of the resulting temporal intervals. In this paper, we take the temporal information into account to partition the universe of discourse into intervals with unequal length. As a result, the performance improves forecasting quality. First, time variable is involved in partitioning the universe through Gath–Geva clustering-based time series segmentation and obtain the prototypes of data, then determine suitable intervals according to the prototypes by means of information granules. An effective method of partitioning and determining intervals is proposed. We show that these intervals carry well-defined semantics. To verify the effectiveness of the approach, we apply the proposed method to forecast enrollment of students of Alabama University and the Taiwan Stock Exchange Capitalization Weighted Stock Index. The experimental results show that the partitioning with temporal information can greatly improve accuracy of forecasting. Furthermore, the proposed method is not sensitive to its parameters.

Introduction

For more than one decade, fuzzy time series has successfully been used to deal with various domain problems, such as stock index forecasting (Yu, 2005), hydrometeorology forecasting (Wang, Liu, & Yin, 2012), wastewater treatment (Wen & Lee, 1999), enrollment prediction (Song & Chissom, 1993a), temperature forecasting (Wang & Chen, 2009), etc. The concept of fuzzy time series was firstly introduced by Song and Chissom (1993b). Using the concept of fuzzy time series, they presented the time-invariant fuzzy time series model and the time-variant fuzzy time series model to forecast the enrollments of the University of Alabama. Following Song and Chissom, related studies mainly focus on improving the forecasting accuracy and reducing the computational complexity of the method.

The forecasting process in fuzzy time series consists of the following four steps: (1) partitioning the universe of discourse, (2) defining fuzzy sets and fuzzifying time series with the use of these fuzzy sets(fuzzification), (3) establishing fuzzy logical relationships from the fuzzy time series, and (4) forecasting and defuzzification of the output of fuzzy time series. In recent years, researchers have been realizing many studies to improve and explore all of these four steps. Concerning step (1), Huarng (2001a) observed that the length of intervals in the universe of discourse affects significantly forecasting results in fuzzy time series. In the sequel, they proposed the distribution-based length method and the average-based length method for handling the forecasting problems; Huarng (2006) suggested a different method which is called ratio-based lengths of intervals. Compared with the others of arbitrarily chosen length, Huarng’s method has generated more accurate forecasts for enrollment, inventory demand and Taiwan stock price data. Li, Cheng, and Lin (2008) presented a model using fuzzy c-means (FCM) clustering to deal with interval partitioning, which takes the nature of data points into account and produces unequal-sized intervals. Wang and Chen (2009) proposed a method based on clustering techniques to predict the temperature and the Taiwan futures exchange. Kuo et al. (2009) presented a method to forecast the enrollment by involving particle swarm optimization. Yolcu et al., 2009, Egrioglu et al., 2010, Egrioglu et al., 2011 proposed a new method based on the use of a single- variable constrained optimization to determine the length of interval. Chi, Fu, and Che (2010) suggested a K-means clustering technique for selecting the length of each interval. Zarandi, Molladavoudi, and Hemmati (2010) proposed Imperialist Competitive Algorithm to determine the length of interval. Bang and Lee (2011) presented a new clustering algorithm of which the structure hierarchically classifies non-linear data; There is not too much work dealing with step (2) besides the use of some fuzzy set theory. Step (3) is also deemed to one of the critical phases to influence forecasting result. Cheng, Chen, Teoh, and Chiang (2008) presented an adaptive expectation model for the Taiwan Stock Exchange Capitalization Weighted Stock Index(TAIEX) forecasting; Chen and Wang (2010) proposed a high-order fuzzy time series forecasting method using fuzzy-trend logical relationships; To reduce computational complexity, Chen (1996) presented an efficient forecasting procedure by grouping fuzzy logical relationships into rules and performing simplified arithmetic operations on these groups. Yu, 2005, Cheng et al., 2007, Teoh et al., 2007, Lee et al., 2009, Hung and Lin, 2013 had also made their efforts to improve it. With regard to phase (4), most of the fuzzy time series models’ are the same as that of Song and Chissom.

One of the evident limitations of these models is that they consider ad hoc approaches to process the original numeric data, and researchers seldom take the influence of time variable and the distribution of data itself into account when they partition the universe of discourse. Methods such as particle swarm optimization (Kuo et al., 2009), clustering techniques (Chen & Wang, 2010), support vector machines (Chen & Kao, 2013), entropy-based model (Cheng, Chang, & Yeh, 2006), and refined model (Yu, 2005), which utilize heuristics to segment intervals into subintervals in order to produce high forecasting accuracy, are not supported by underlining semantics.

In this paper, we propose a novel approach to determine an unequal length partitioning in consideration of temporal information(time variable) and distribution of data itself. The proposed method is based on GG clustering-based time series segmentation and the concept of information granules. The role of GG clustering-based time series segmentation and information granules is to determine temporal intervals of unequal length so that the model comes with increased accuracy and enhanced interpretability. The advantages of the proposed method can be summarized as follows:

  • This approach becomes more comprehensive because of participation of time variable in partitioning the universe of discourse into intervals with unequal length.

  • It has been observed that the forecasting accuracy for the two well-known data sets was significantly improved when the proposed method is employed.

  • We determine intervals by GG clustering-based time series segmentation and information granules and these intervals carry well-defined semantics.

The proposed method has been experimentally tested on enrollment and the Taiwan Stock Exchange Capitalization Weighted Stock Index time series forecasting. The experimental results show that forecasting accuracy is evidently improved when comparing the proposed method with the equal length partitioning used in the previous studies.

The remaining content of this paper is organized as follows: In Section 2, we provide a brief review of fuzzy time series, GG clustering-based time series segmentation and information granules. In Section 3, we present the proposed method to partition the universe of discourse and determine unequal length intervals with temporal information. The performance of the proposed method on both enrollment and TAIEX time series forecasting are examined in Section 4. Conclusions are presented in Section 5.

Section snippets

Related works

In this section, some related background material including fuzzy time series, GG clustering-based time series segmentation and information granules is briefly reviewed.

Partition of the universe of discourse

Partitioning of the universe of discourse and determining intervals becomes critical to the quality of forecasting of fuzzy time series. In this section, we present a method showing how to partition the universe into unequal length intervals in order to improve forecast accuracy.

On one hand, in dealing with time series forecasting problem, often the value of one variable depends on its adjacent predecessors in time, so the time-coordinate would play an important role in partitioning of the

Forecasting

Enrollment forecasting for the University of Alabama, was done in the previous studies of fuzzy time series. This data are used here to show the advantages of the proposed method. Chen, 1996, Lee et al., 2009 are the two classical models with high forecasting accuracy especially the Lee’s one when one considers frequency of fuzzy logical relationship. To estimate the forecasting accuracy of the proposed method, we the used mean square errors(MSE) as a performance measure.

Conclusions

In this paper, we have proposed partitioning method through adding information of time domain for fuzzy time series forecasting. More effective non-uniform partitioning with temporal information is obtained resulting in higher forecasting accuracy. Our approach has been tested with the use of several historical time series data of enrollment of Alabama University and the TAIEX in 1992. In all cases, some improvement has been reported. The partition of the universe of discourse using information

Acknowledgments

This work is supported by the Natural Science Foundation of China under Grant 61175041, Boshidian Funds 20110041110017 and Canada Research Chair (CRC) Program.

References (33)

Cited by (0)

View full text