Keywords

1 Introduction

The time delay embedding is heavily explored area of nonlinear time series analysis and nonlinear dynamical systems areas. Usually the most common application of this method is when we want to discover the dynamics of underlying system from the univariate scalar time series created from the measurements of one of the investigated system’s outputs. Taken’s Embedding Theorem [18] implies, that one can reconstruct an equivalent dynamics from univariate time series using it’s time delays. To carry out the embedding procedure, which should result in the reconstructed attractor in the output, two parameters need to be estimated: time delay and embedding dimension.

The time delay is a integer value describing which samples from the investigated time series we need to incorporate to time-lagged embedding vector - reconstructing the underlying phase space. There are few approaches of estimation of time delay value \(T_{d}\) [9]. One group of methods is series correlation approaches (autocorrelation, mutual information [8] or high order correlations [2]). Second grop are approaches of phase space extension (fillfactor [5], wavering product [4] or average displacement [17]). There are available also multiple autocorrelation and non-bias multiple autocorrelation methods [11]. The embedding dimension is an equivalent of the real underlying phase space dimension. It could be estimated using the false nearest neighbors method [3] or it’s extension - Cao’s method [7]. Another methods are also the saturation of system invariants method [1] or neural network approaches [13].

The popularity of the univariate embedding may be caused by the fact, that according to the embedding theorem, for recovering dynamics only a univariate time series is needed. In fact, often many time series measured in the output of the test process are available. Since in multivariate case more data is available, it helps to establish more accurate embedding - in the sense of further predictions or in the presence of data noise. However, it brings a new dilemma: which quantities from multivariate time series to use and whether is better to use constant or non-constant embedding parameters for all quantities selected to embedding vector [6].

The problem of multivariate time series embedding can be seen in terms of suitable conditioned embedding of the considered set of time series [19]. In the related work there are two approaches of multivariate time series embedding: Uniform embedding and Non-uniform embedding. The uniform embedding scheme is more popular approach and assumes that embedding parameters: the time delay and embedding dimension are selected a priori and separately for each time series. The non-uniform embedding is based on the progressive selection of time delayed values from a set of candidate values(e.g. X, Y, Z) and incorporation them to the embedding vector. In each step the most informative time delayed variables are chosen and then added to the time delay vector. As a selection criteria the mutual information between constructed embedding vector and the future state of the system is used [14, 19].

A particular case of multivariate time series is a rotational data time series. There are three main parametrization of rotations: matrix of rotation, Euler angles and Quaternions. Basing on this fact one may record and construct rotational time series according to the one of the above parameterizations. In this paper quaternion rotational time series is considered.

The main goal of this work is to propose time delay estimation method for uniform time delay embedding of multivariate quaternion rotational data. The proposed approach bases on mutual information approach and it’s re-designed for quaternion kinematic time series. The presented method could be used in the further time delay embedding and nonlinear analysis aimed to detect deterministic chaos properties in the investigated data. The author would like to underline that the considered method allows to estimate the time delay value staying in quaternion domain, which should help to keep physical sense of the kinematic data.

The paper is organized as following: in the second section the applicability of mutual information method for quaternion data is discussed and proposed approach is described. It also includes the information about investigated quaternion time series and how K-Means algorithm is applied. The third section presents the numerical results. The conclusions are presented in section four.

2 Mutual Information Approach for Quaternions

Quaternions are computationally efficient parametrization of rotational data. They are an extension of complex numbers defined as following:

$$\begin{aligned} q=[w,(x,y,z)] = w + ix + jy + kz \end{aligned}$$
(1)

where: w represents a real part and \(\mathbf {v} = (x,y,z)\) is called a vector part (i, j and k are equivalents of imaginary unit).

The details of Quaternions algebra widely used in the parametrization of rotations is well described in the related work(e.g. [10]). In the scope of our interests are unit quaternions which describe the rotation in 3D space:

$$\begin{aligned} \left\| q\right\| = 1 \end{aligned}$$
(2)

where quaternion norm is defined by:

$$\begin{aligned} \left\| q\right\| = \sqrt{w^{2} + x^{2} + y^{2} + z^{2}}\end{aligned}$$
(3)

We assume that the method is designed for the following quaternion time series formed by unit quaternions as following:

$$\begin{aligned} Q(n) = (q_{1},q_{2},...,q_{N}) = (w_{1}+ix_{1}+jy_{1}+kz_{1},...,w_{N}+ix_{N}+jy_{N}+kz_{N}) \end{aligned}$$
(4)

2.1 Mutual Information - Existing Approach

The mutual information is a measure which describes the general dependence of two variables. The definition comes from Shannon’s information theory, which gives the formalism of measuring information spreading. Frasser proposed to use this approach in time delay estimation process [8].

Let’s assume that there are two nonlinear systems: A and B. The outputs of these systems are denoted as a and b, while the values of these outputs are represented by \(a_{i}\) and \(b_{k}\). The mutual information factor describes how many bits of \(b_{k}\) could be predicted where \(a_{i}\) is known.

$$\begin{aligned} I_{AB}(a_{i},b_{k})=log_{2}\bigg (\frac{P_{AB}(a_{i},b_{k})}{P_{A}(a_{i})P_{B}(b_{k})}\bigg ), \end{aligned}$$
(5)

where \(P_{A}(a_{i})\) is the probability that \(a=a_{i}\) and \(P_{B}(b_{k})\) is the probability that \(b=b_{k}\) and \(P_{AB}(a_{i}, b_{k})\) is the joint probability that \(a=a_{i}\) and \(b=b_{k}\).

The average mutual information factor can be described by:

$$\begin{aligned} I_{AB}(T) = \sum _{a_{i},b_{k}} P_{AB}(a_{i},b_{k})I_{AB}(a_{i},b_{k}). \end{aligned}$$
(6)

In order to use this method to assess the correlation between different samples in the same time series, the Average mutual information factor is finally described by the equation:

$$\begin{aligned} \begin{array}{c} I(T)= \sum ^{N}_{n=1} P(S(n),S(n+T))\\ \quad \quad log_{2}\bigg (\frac{P(S(n),S(n+T))}{P(S(n))P(S(n+T))}\bigg ).\\ \end{array} \end{aligned}$$
(7)

Fraser and Swinney [8] propose that \(T_m\) where the first minimum of I(T) occurs as a useful selection of time lag \(T_{d}\). This selection guarantees that the measurements are somewhat independent, but not statistically independent. In case of absence of the average mutual information clear minimum, this criterion needs to be replaced by choosing \(T_d\) as the time for which the average mutual information reaches four-fifths of its initial value:

$$\begin{aligned} \frac{I(T_{d})}{I(0)}\approx \frac{4}{5}. \end{aligned}$$
(8)

2.2 Mutual Information Extension for Quaternion Time Series

The average mutual information method for univariate time series consists of 2-dimensional adaptive histogram and that is the problem in it’s application to multivariate(quaternion’s case). The empirical histogram is straightforward to estimate for univariate time series, however for quaternion’s time series it’s not trivial. Computation of multivariate histogram is exhaustive process and in the result one may obtain the histogram empty in some places.

In the current approach instead of multidimensional histogram for quaternions we propose here to use histogram based on clusters. The whole quaternion time series is initially clustered into k-groups (where k is defined a priori). The obtained clusters are treated as an equivalent of histogram bins. Further in empirical histogram estimation, instead of computing the probability of belonging to the histogram’s bins, the probability of belonging to the clusters is being computed.

figure a

Data clustering as a part of machine learning and data sciences is an actively investigated field of science. There are many available clustering techniques. The review of clustering methods is presented in the related work e.g. [16]. In the presented approach K-means clustering algorithm was selected as a the simplest and commonly used method. The main goal of this work is not to examine the efficiency of clustering approaches but to provide the mutual information estimation technique used clusters based histogram. In the oder hand, the author see the underlying potential in investigation of the impact of clustering method selection on the general algorithm’s performance.

The K-means algorithm was described by MacQueen [12]. It partitions the data into K clusters (\(C_1\), \(C_2\), ..., \(C_k\)), represented by their centers. The center of each cluster, until converge, is calculated as the mean of all the samples belonging to that cluster. Initially the centers are selected randomly. Then, in each iteration each sample is assigned to the closest cluster center according to the Euclidean distance. Then the centers are re-calculated. The whole procedure is repeated until the convergence criteria is fulfilled (e.g. there is no relocation of the centers in new iteration) [16]. The whole procedure is presented in the pseudo-code 1.

figure b

Finally, the mutual information algorithm for a quaternion time series is described by the pseudo code 2. Initially it treats quaternion time series as a 4-dimensional time series and partitions it into K-clusters. The next step is an estimation of probability of belonging samples to the each cluster which is an equivalent of estimation of probability of belonging samples to histogram bins in a standard version of the algorithm.

The author sees the potential advance in the method using an algorithm of clustering which partitions quaternion time series staying in quaternion domain. It will be the subject of further research. It is also worth to investigate how the number of the clusters impacts the performance of the algorithm.

3 Numerical Results

The method was tested on live kinematic data recorded in the Human Motion Laboratory (HML) of the Polish-Japanese Institute of Information Technology. The recordings were performed using the Vicon Motion Kinematics Acquisition and Analysis System equipped with 10 Near InfraRed cameras. The cameras were attached to a suit which was worn by a subject.

Fig. 1.
figure 1

Mutual information dependency for the patient A

Gait sequences were recorded in Euler angles and then converted to quaternions. Six kinds of time series were recorded - movements of femurs, tibias and feet (left and right). The method was tested on the data recorded from the treadmill walking of two healthy patients. The number of clusters in K-means algorithm was set to 7. The designed method was additionally compared with the quaternion angle method presented by the author in the same conference last year [15]. The results for the designed method are presented using solid line, where the results performed using quaternion angle method are presented using dashed line. Constantly the first local minima of the mutual information functions(time delay selection criteria) was marked by vertical lines. All estimated time delays are gathered in the Table 1.

Fig. 2.
figure 2

Mutual information dependency for the patient B

Table 1. Time delay estimation comparison for subject A and subject B

4 Conclusion

The main goal of this article was to present a time delay estimation method for a quaternion time series. The approach extends the existing mutual information approach for quaternion time series by incorporation of K-means clustering for multivariate data instead of the empirical histogram. The method might be a first step to perform time delay embedding staying only in the quaternion domain, which will be a field of author’s further research.

From visual inspection one can see that the results from the proposed method are in the same range as the results coming from previously investigated quaternion angle method. It is worth to underline that the differences in the result are expected, since quaternion angle method bases only on a part of quaternion’s information where the new method utilize the whole information carried by a quaternion.

The field of further interests should be also the impact of clustering parameters on the methods performance and the analysis of the quality of embedding using the proposed approach.