Dynamic clustering defines partitions within data and prototypes to each partition. Distance metrics are responsible for checking the closeness between instances and prototypes. Considering the literature about interval data, distances depend on interval bounds and the information inside the intervals is ignored. This paper proposes new distances, which explore the information inside of intervals. It also presents a mapping of intervals to points, which preserves their spatial location and internal variation. We formulate a new hybrid distance for interval data based on the well-known \(L_q\) distance for point data. This new distance allows for a weighted formulation of the hybridism. Hence, we propose a Hybrid \(L_q\) distance, a Weighted Hybrid \(L_q\) distance, as well as the adaptive version of the Hybrid \(L_q\) distance for interval data. Experiments with synthetic and real interval data sets illustrate the usefulness of the hybrid approach to improve dynamic clustering for interval data.

The authors would like to thank CNPq and CAPES (Brazilian Agencies) for their financial support.
A Proof of Proposition 1
Fixing cluster k and dimension j, the hybrid weights of the \(WHL_q\) distance are obtained using Lagrange multipliers under the restrictions: \(w_{k,1}^j + w_{k,2}^j = 1\); \(w_{k,1}^j \ge 0 \); \(w_{k,2}^j \ge 0\); and \(t > 1\). Let
The Hybrid weight values are computed by:
The partitional dynamic clustering criterion for the \(WHL_q\) distance is given by
under the restrictions: \(w_{k,1}^j + w_{k,2}^j = 1\), \(w_{k,1}^j \ge 0\), \(w_{k,2}^j \ge 0\) and \(t > 1\). The solution can be found using Lagrange multipliers. Let \(J_{d_{WHLq}}(\varLambda _1^1,\ldots , \varLambda _K^p)\) be the version of Eq. (28) with the Lagrange multipliers (\(\varLambda _k^j\)) associated restrictions. Thus, it becomes
Weights can be found when the partial derivatives of \(J_{d_{WHLq}}\) are equal to 0. Fixing cluster k and dimension j, deriving \(J_{d_{WHLq}}\) according to the first weight component (\(w_{k,1}^j\)), we get
and isolating the \(w_{k,1}^j\) term, we get
Now, deriving \(J_{d_{WHLq}}\) according to the second weight component (\(w_{k,2}^j\)), we obtain
we get \(w_{k,2}^j\), as follows:
With the expressions above, used to find weights, we compute the Lagrange multiplier (\(\varLambda _k^j\)) using the restriction \(w_{k,1}^j + w_{k,2}^j =1\). Then,
Replacing Eq. (40)) in Eq. (33)), we get
Now, replacing Eq. (40)) in Eq. (37)), we get
So, the hybrid weights can be computed using Eqs. (41) and (42). \(\square \)
B Proof of Proposition 2
Fixing cluster k, the hybrid weights of the \(WHL_{\infty }\) distance are obtained using Lagrange multipliers under the following restrictions: \(w_{k,1} + w_{k,2} = 1\); \(w_{k,1} \ge 0\); \(w_{k,2} \ge 0\); and \( t > 1\). Let
The partitional dynamic clustering criterion for the \(WHL_{\infty }\) distance is given by
under the restrictions: \(w_{k,1} + w_{k,2} = 1\); \(w_{k,1} \ge 0\); \(w_{k,2} \ge 0\); and \(t > 1\). This solution can be computed using Lagrange multipliers. Eq. (43) is rewritten to incorporate Lagrange multipliers (\(\varLambda _k\)) and its respective restrictions. It then becomes
Weights can be found when partial derivatives of \(J_{d_{WHLq}}\), with respect to the weights, are equal to 0. For fixed cluster k and dimension j, deriving \(J_{d_{WHLq}}\) according to the first weight component (\(w_{k,1}\)), we get
we get
Now, deriving \(J_{d_{WHLq}}\) according to the second weight component (\(w_{k,2}\)), we get
it becomes
The Lagrange multiplier (\(\varLambda _k\)) is computed based on restriction \(w_{k,1}+ w_{k,2} =1\). Then,
Replacing Eq. (55) in Eq. (48), we get
Now, replacing Eq. (55) in Eq. (52), we get
So, the hybrid weights are computed by Eqs. (56) and (57). \(\square \)
C Proof of Proposition 3
Fixing cluster k and dimension j, the prototype for the \(HL_1\) and \(HL_{\infty }\) distances has an analytic solution, given by Eq. (58),
The criterion to be minimized for the \(HL_1\) distance is
Fixing cluster k and dimension j, it is possible to reduce the optimization complexity to
The problem is resumed to optimize the two sums
Each sum is minimized by the median of the respective set [27]. Then,
The criterion to be minimized for the \(HL_{\infty }\) distance is
Fixing the cluster k, it is possible to reduce the optimization complexity to
The problem is reduced to optimizing the two sums independently,
The \(\max \) function can be rewritten as a limit of the \(HL_q\) distance when \(q \rightarrow \infty \), so
As the terms of the sums are positive, their minimization entails the minimization of all sums. Fixing dimension j, the problem is reduced to
This follows the \(HL_1\) optimization problem, whose solution is the medians of lower bounds and ranges. Then,
Using the inverse mapping [see Eq. (15)], we compute the upper bounds as
\(\square \)
D Proof of Proposition 4
Fixing cluster k and dimension j, the prototype for the \(HL_2\) distance has an analytic solution, which is the mean of the interval bounds. It is computed by Eq. (72),
where \(|C_k|\) is the number of instances allocated in the cluster \(C_k\).
The criterion to be minimized for the \(HL_2\) distance is
Fixing cluster k and dimension j, it is possible to reduce the optimization complexity to
The solution is found using minimum squares. Partial derivatives of \(J_k^j\) with respect to \(\underline{g}_k^j\) and \(\breve{g}_k^j\) should be null. So,
where \(|C_k|\) is the number of instances allocated in cluster k. The \(HL_2\) prototypes are computed by:
Using the inverse mapping (see Eq. (15)), we compute the upper bounds as
\(\square \)
E Proof of Proposition 5
Fixing cluster k and dimension j, the prototype for the \(HL_q\) distance (when \(q > 1\)) can be found using the Newton–Raphson numeric method. Let the sets \(L_k^j =\left\{ \underline{\gamma }_n^j | \gamma _n \in C_k \right\} \) and \(R_k^j = \left\{ \breve{\gamma }_k^j | \gamma _n \in C_k \right\} \). Algorithm 2 shows how to compute the prototype components \(\underline{g}_k^j\) and \(\breve{g}_k^j\), respectively.
Let an ascending ordered set \(X = \left\{ x_1, x_2, \ldots , x_N\right\} \), i.e., \(x_i \le x_{i+1}\), and the function \(f:\mathfrak {R}\leftarrow \mathfrak {R}\),
with \(q > 1\). We are interested on the value which minimizes f(v). This function can be rewritten as follows:
where \(sgn(\cdot )\) is a constant, defined as
The first derivative of \(f(\cdot )\) is given by:
and the second derivative of \(f(\cdot )\) is given by:
When \(q > 1\) the second derivative is always positive. We conclude that the first derivative is monotonically increasing for any v.
The value \(v_*\) which minimizes f(v) must satisfy \(f'(v_*)=0\). Suppose a value \(v_{-}\) with \(v_{-} < x_1\). Then, \(v_{-} < x_i, \forall x_i\). So, \(x_i -v_{-} > 0\), which implies that \(sgn(x_i-v_{-})=1, \forall x_i\). Then, the first derivative becomes
which always assumes a negative value. So, \(f'(v_{-}) < 0\).
Now, suppose that \(v_+ > x_N\). Then, \(v_+ > x_i, \forall x_i\). So, \(x_i-v_+<0\), implying \(sgn(x_i-v_+)=-1\). Then, the first derivative becomes
which assumes positive values. So, \(f'(v_+) > 0\).
When \(v < x_1\), \(f'(v) < 0\) and when \(v > x_N\), \(f'(v) > 0\), so, the function \(f'(v)\) changes its signal on the interval \([x_1, x_N]\), then \(\exists \, v_* \in [x_1, x_N]\) such that \(f'(v_*)=0\). As \(f'(v)\) is monotonically increasing, this solution is unique. Unfortunately, the \(f'(v)\) expression is too complex and a general analytic solution cannot be computed. We propose the use of Newton−Raphson numeric method to find \(v_*\). In this case, a initial value \(v_0\) is chosen randomly on interval \([x_1,x_N]\). Iterative values \(\left\{ v_i \right\} \) are computed as follows:
Convergence occurs when \(|v_i - v_{i-1}| < \epsilon \), with \(\epsilon > 0\).
The criterion to be minimized for the \(HL_q\) distance is given by
Fixing the kth cluster and jth dimension results in
and the two sums must be minimized:
The steps described above (for the f function) can be applied for each sum independently. The sets \(L_k^j =\left\{ \underline{\gamma }_n^j | \gamma _n \in C_k \right\} \) and \(R_k^j = \left\{ \breve{\gamma }_k^j | \gamma _n \in C_k \right\} \) can be replaced by set X, and components \(\underline{g}_k^j\) and \(\breve{g}_k^j\) are determined. Algorithm 2 shows the steps to compute them using the Newton–Raphson numeric method. \(\square \)
F Proof of Proposition 6
Fixing cluster k, dimension j and parameter q (\(q \ge 1\)), the prototypes for \(WHL_q\) and \(AHL_q\) distances are computed according to one of three cases:
- 1.
If \((q=1\) or \(q=\infty )\), prototypes have an analytic solution, given by:
$$\begin{aligned} \underline{g}_k^j = \underset{\gamma _n \in C_k }{Me } \left\{ \underline{\gamma }_n^j \right\} \, \quad \text{ and } \quad \, b_{g_k}^j = \underline{g}_k^j + \underset{\gamma _n \in C_k }{Me } \left\{ \breve{\gamma }_n^j \right\} . \end{aligned}$$ - 2.
If \((q=2)\), prototypes have an analytic solution, given by:
$$\begin{aligned} \underline{g}_k^j = \frac{1}{|C_k|} \sum _{j=1}^p \underline{\gamma }_n^j \quad \text{ and } \quad \,\, b_{g_k}^j =\underline{g}_k^j + \frac{1}{|C_k|} \sum _{j=1}^p \breve{\gamma }_n^j, \end{aligned}$$where \(|C_k|\) is the number of instances in cluster k.
- 3.
If (\(q\ne 1\) and \(q \ne 2\) and \(q \ne \infty \)), the Newton–Raphson numeric method is used, as described by Algorithm 2. The sets \(L_k^j = \left\{ \underline{\gamma }_n^j | \gamma _n \in C_k \right\} \) and \(R_k^j = \left\{ \breve{\gamma }_k^j | \gamma _n \in C_k \right\} \) are manipulated by it, resulting on the values of \(\underline{g}_k^j\) and \(\breve{g}_k^j\), respectively. Prototype upper bounds are found by \(b_{g_k}^j=\underline{g}_k^j +\breve{g}_k^j\), for \(j=1\ldots ,p\).
The optimization criterion for the \(AHL_q\) distance is given by:
Fixing cluster k and dimension j, the optimization problem can be reduced to
The optimization criterion for the \(WHL_q\) distance is given by:
Fixing cluster k and dimension j, the optimization problem can be reduced to
Adaptive and Hybrid weights become constants when classes and dimensions are fixed. The problem is reduced to optimize the following two sums:
Solutions are proposed according to the value of the q parameter. If \(q=1\), the optimization becomes
whose solution, according to Proposition 3, is given by:
If \(q=2\), the optimization becomes
The solution, according to Proposition 4, is given by:
where \(|C_k|\) is the number of instances in cluster k.
The optimization criterion for the \(WHL_{\infty }\) distance is given by:
Fixing cluster k, the optimization problem can be reduced to
When the cluster is fixed, the hybrid weights become constants; then, the problem is reduced to optimizing the two sums independently:
The solution are the medians of lower bounds and ranges (as showed on Proposition 3). Then,
If \(q>1\), \(q \ne 2\) or \(q\ne \infty \), it is not possible to express an analytic solution for Eq. in (96). So the solution is computed as discussed on Proposition 5. The Newton–Raphson numeric method is used, as described by Algorithm 2. The sets \(L_k^j = \left\{ \underline{\gamma }_n^j | \gamma _n \in C_k \right\} \) and \(R_k^j = \left\{ \breve{\gamma }_k^j | \gamma _n \in C_k \right\} \) are parameters for this algorithm, resulting on the values of \(\underline{g}_k^j\) and \(\breve{g}_k^j\), respectively.
Using the inverse mapping (see Eq. (15)), we compute the upper bounds as
\(\square \)
