Elsevier

Knowledge-Based Systems

Volume 91, January 2016, Pages 189-203
Knowledge-Based Systems

A tree-based incremental overlapping clustering method using the three-way decision theory

https://doi.org/10.1016/j.knosys.2015.05.028Get rights and content

Abstract

Existing clustering approaches are usually restricted to crisp clustering, where objects just belong to one cluster; meanwhile there are some applications where objects could belong to more than one cluster. In addition, existing clustering approaches usually analyze static datasets in which objects are kept unchanged after being processed; however many practical datasets are dynamically modified which means some previously learned patterns have to be updated accordingly. In this paper, we propose a new tree-based incremental overlapping clustering method using the three-way decision theory. The tree is constructed from representative points introduced by this paper, which can enhance the relevance of the search result. The overlapping cluster is represented by the three-way decision with interval sets, and the three-way decision strategies are designed to updating the clustering when the data increases. Furthermore, the proposed method can determine the number of clusters during the processing. The experimental results show that it can identifies clusters of arbitrary shapes and does not sacrifice the computing time, and more results of comparison experiments show that the performance of proposed method is better than the compared algorithms in most of cases.

Introduction

Most of existing clustering algorithms usually analyze static datasets in which objects are kept unchanged after being processed [1], [2]. However, in many practical applications, the datasets are dynamically modified which means some previously learned patterns have to be updated accordingly [3], [4]. Although these approaches have been successfully applied, there are some situations in which a richer model is needed for representing a cluster [5], [6]. For example, a researcher may collaborate with other researchers in different fields, therefore, if we cluster the researchers according to their interested areas, it could be expected that some researchers belong to more than one cluster. In these areas, overlapping clustering is useful and important as well as incremental clustering.

For this reason, the problem of incremental overlapping clustering is addressed in this paper. The main contribution of this work is an incremental overlapping clustering detection method, called TIOC-TWD (Tree-based Incremental Overlapping Clustering method using the Three-Way Decision theory). The proposed method introduces a new incremental clustering framework with three-way decision using interval sets and a new searching tree based on representative points, which together allows to obtain overlapping clusters when data increases. Besides, the TIOC-TWD introduces new three-way strategies to update efficiently the clustering after multiple objects increases. Furthermore, the proposed method can dynamically determine the number of clusters, and it does not need to define the number of cluster in advance. The above characteristics make the TIOC-TWD appropriate for handling overlapping clustering in applications where the data is increasing.

The experimental results show that the proposed method not only can identify clusters of arbitrary shapes, but also can merge small clusters into the big one when the data changes; the proposed method can detect new clusters which might be the result of splitting or new patterns. Besides, more experimental results show that the performance of proposed method is better than the compared algorithms in most of cases. We note that a short version of this work had been appeared in the RSCTC-2014 Workshop on the Three-Way Decisions and Probabilistic Rough Sets [7].

Section snippets

Related work

Nowadays, there are some achievements on the incremental clustering approaches. Ester et al. [8] put forward the IncDBSCAN clustering algorithm based on the DBSCAN. After that, Goyal et al. [9] proposed the derivation work which is more efficient than the IncDBSCAN because it is capable of adding points in bulk to existing set of clusters. Patra et al. [10] proposed an incremental clustering algorithm based on distance and leaders, but the algorithm needs to search the whole data space to find

Three-way decision clustering

To define our framework, let a universe be U={x1,,xn,,xN}, and the resulting clustering scheme C={C1,,Ck,,CK} is a family of clusters of the universe. The xn is an object which has D attributes, namely, xn=(xn1,,xnd,,xnD). The xnd denotes the value of the d-th attribute of the object xn, where n{1,,N}, and d{1,,D}.

We can look at the cluster analysis problem from a decision making perspective. For crisp clustering, it is a typical two-way decision; meanwhile for overlapping clustering

The TIOC-TWD clustering method

The processing of the proposed TIOC-TWD method is illustrated in Fig. 1. In fact, we also devise an overlapping clustering algorithm using three-way decision strategy for the initial static data, which is based on a graph of representative points by calculating the similarity between representative regions. It is called Algorithm 1 and described in Section 4.2.

Evaluation indices and datasets

We evaluate the proposed TIOC-TWD clustering approach through the following experiments. All the experiments are performed on a 2.67 GHz computer with 4 GB memory, and all algorithms are programmed in C++. The quality of the final clustering is evaluated by the traditional indices such as the accuracy, F-measure [42] and NMI [43], where the objects in boundary regions are deemed to be positive regions to fit these common formula.

Table 5 gives the summary information about the datasets and the

Conclusions

Existing clustering approaches are either restricted to crisp clustering or static datasets. In order to develop an approach to deal with overlapping clustering as well as incremental clustering, this paper proposed a new tree-based incremental overlapping clustering method using the three-way decision theory, called TIOC-TWD.

This paper first introduced three-way decision clustering to represent the overlapping clustering as well as crisp clustering, and described the problem of incremental

Acknowledgment

This work was supported in part by the National Natural Science Foundation of China under Grant Nos. 61379114 & 61272060.

References (44)

  • G. Peters et al.

    Dynamic rough clustering and its applications

    Appl. Soft Comput.

    (2012)
  • Y.Y. Yao

    The superiority of three-way decisions in probabilistic rough set models

    Inform. Sci.

    (2011)
  • C. Luo et al.

    Fast algorithms for computing rough approximations in set-valued decision system while updating criteria values

    Inform. Sci.

    (2015)
  • C. Luo et al.

    Dynamic maintenance of approximations in set-valued ordered decision systems under the attribute generalization

    Inform. Sci.

    (2014)
  • N. Azam et al.

    Analyzing uncertainties of probabilistic rough set regions with game-theoretic rough sets

    Int. J. Approx. Reason.

    (2014)
  • D.C. Liang et al.

    Systematic studies on three-way decisions with interval-valued decision-theoretic rough sets

    Inform. Sci.

    (2014)
  • H. Yu et al.

    An automatic method to determine the number of clusters using decision-theoretic rough set

    Int. J. Approx. Reason.

    (2014)
  • M. Chen et al.

    Interval set clustering

    Expert Syst. Appl.

    (2011)
  • A. Pérez-Suárez et al.

    A new overlapping clustering algorithm based on graph theory

  • H. Yu et al.

    An incremental clustering approach based on three-way decisions

  • M. Ester, H.-P. Kriegel, J. Sander, M. Wimmer, X.W. Xu, Incremental clustering for mining in a data warehousing...
  • N. Goyal, P. Goyal, K. Venkatramaiah, P.C. Deepak, P.S. SANNOP, An efficient density based incremental clustering...
  • Cited by (202)

    View all citing articles on Scopus
    View full text