A tree-based incremental overlapping clustering method using the three-way decision theory
Introduction
Most of existing clustering algorithms usually analyze static datasets in which objects are kept unchanged after being processed [1], [2]. However, in many practical applications, the datasets are dynamically modified which means some previously learned patterns have to be updated accordingly [3], [4]. Although these approaches have been successfully applied, there are some situations in which a richer model is needed for representing a cluster [5], [6]. For example, a researcher may collaborate with other researchers in different fields, therefore, if we cluster the researchers according to their interested areas, it could be expected that some researchers belong to more than one cluster. In these areas, overlapping clustering is useful and important as well as incremental clustering.
For this reason, the problem of incremental overlapping clustering is addressed in this paper. The main contribution of this work is an incremental overlapping clustering detection method, called TIOC-TWD (Tree-based Incremental Overlapping Clustering method using the Three-Way Decision theory). The proposed method introduces a new incremental clustering framework with three-way decision using interval sets and a new searching tree based on representative points, which together allows to obtain overlapping clusters when data increases. Besides, the TIOC-TWD introduces new three-way strategies to update efficiently the clustering after multiple objects increases. Furthermore, the proposed method can dynamically determine the number of clusters, and it does not need to define the number of cluster in advance. The above characteristics make the TIOC-TWD appropriate for handling overlapping clustering in applications where the data is increasing.
The experimental results show that the proposed method not only can identify clusters of arbitrary shapes, but also can merge small clusters into the big one when the data changes; the proposed method can detect new clusters which might be the result of splitting or new patterns. Besides, more experimental results show that the performance of proposed method is better than the compared algorithms in most of cases. We note that a short version of this work had been appeared in the RSCTC-2014 Workshop on the Three-Way Decisions and Probabilistic Rough Sets [7].
Section snippets
Related work
Nowadays, there are some achievements on the incremental clustering approaches. Ester et al. [8] put forward the IncDBSCAN clustering algorithm based on the DBSCAN. After that, Goyal et al. [9] proposed the derivation work which is more efficient than the IncDBSCAN because it is capable of adding points in bulk to existing set of clusters. Patra et al. [10] proposed an incremental clustering algorithm based on distance and leaders, but the algorithm needs to search the whole data space to find
Three-way decision clustering
To define our framework, let a universe be , and the resulting clustering scheme is a family of clusters of the universe. The xn is an object which has D attributes, namely, . The denotes the value of the d-th attribute of the object xn, where , and .
We can look at the cluster analysis problem from a decision making perspective. For crisp clustering, it is a typical two-way decision; meanwhile for overlapping clustering
The TIOC-TWD clustering method
The processing of the proposed TIOC-TWD method is illustrated in Fig. 1. In fact, we also devise an overlapping clustering algorithm using three-way decision strategy for the initial static data, which is based on a graph of representative points by calculating the similarity between representative regions. It is called Algorithm 1 and described in Section 4.2.
Evaluation indices and datasets
We evaluate the proposed TIOC-TWD clustering approach through the following experiments. All the experiments are performed on a 2.67 GHz computer with 4 GB memory, and all algorithms are programmed in C++. The quality of the final clustering is evaluated by the traditional indices such as the accuracy, F-measure [42] and NMI [43], where the objects in boundary regions are deemed to be positive regions to fit these common formula.
Table 5 gives the summary information about the datasets and the
Conclusions
Existing clustering approaches are either restricted to crisp clustering or static datasets. In order to develop an approach to deal with overlapping clustering as well as incremental clustering, this paper proposed a new tree-based incremental overlapping clustering method using the three-way decision theory, called TIOC-TWD.
This paper first introduced three-way decision clustering to represent the overlapping clustering as well as crisp clustering, and described the problem of incremental
Acknowledgment
This work was supported in part by the National Natural Science Foundation of China under Grant Nos. 61379114 & 61272060.
References (44)
Data clustering: 50 years beyond k-means
Pattern Recogn. Lett.
(2010)- et al.
Soft clustering-fuzzy and rough approaches and their extensions and derivatives
Int. J. Approx. Reason.
(2013) - et al.
Incorder: incremental density-based community detection in dynamic networks
Knowl.-Based Syst.
(2014) - et al.
Self-adaptive and dynamic clustering for online anomaly detection
Expert Syst. Appl.
(2011) - et al.
A survey on clustering algorithms for wireless sensor networks
Comput. Commun.
(2007) - et al.
Incremental spectral clustering by efficiently updating the eigen-system
Pattern Recogn.
(2010) - et al.
Dynamic hierarchical algorithms for document clustering
Pattern Recogn. Lett.
(2010) - et al.
An algorithm based on density and compactness for dynamic overlapping clustering
Pattern Recogn.
(2013) Online fuzzy medoid based clustering algorithms
Neurocomputing
(2014)- et al.
Mmr: an algorithm for clustering categorical data using rough set theory
Data Knowl. Eng.
(2007)
Dynamic rough clustering and its applications
Appl. Soft Comput.
The superiority of three-way decisions in probabilistic rough set models
Inform. Sci.
Fast algorithms for computing rough approximations in set-valued decision system while updating criteria values
Inform. Sci.
Dynamic maintenance of approximations in set-valued ordered decision systems under the attribute generalization
Inform. Sci.
Analyzing uncertainties of probabilistic rough set regions with game-theoretic rough sets
Int. J. Approx. Reason.
Systematic studies on three-way decisions with interval-valued decision-theoretic rough sets
Inform. Sci.
An automatic method to determine the number of clusters using decision-theoretic rough set
Int. J. Approx. Reason.
Interval set clustering
Expert Syst. Appl.
A new overlapping clustering algorithm based on graph theory
An incremental clustering approach based on three-way decisions
Cited by (202)
Multi-level personalized k-anonymity privacy-preserving model based on sequential three-way decisions
2024, Expert Systems with ApplicationsKey grids based batch-incremental CLIQUE clustering algorithm considering cluster structure changes
2024, Information SciencesA dynamic three-way conflict analysis model with adaptive thresholds
2024, Information SciencesTri-granularity attribute reduction of three-way concept lattices
2023, Knowledge-Based Systems