Elsevier

Applied Soft Computing

Volume 64, March 2018, Pages 59-74
Applied Soft Computing

A spectral clustering method with semantic interpretation based on axiomatic fuzzy set theory

https://doi.org/10.1016/j.asoc.2017.12.004Get rights and content

Highlights

Abstract

Owing to good performance in clustering non-convex datasets, spectral clustering has attracted much attention and become one of the most popular clustering algorithms in the last decades. However, the existing spectral clustering methods are sensitive to parameter settings in building the affinity matrix, which seriously jeopardizes the algorithm's immunity to noise data. Moreover, in many application domains, including credit rating and medical diagnosis, it is very important that the learned model is capable of understandability and interpretability. To make spectral clustering competitive in both classification rate and comprehensibility, we propose a spectral clustering method with semantic interpretation based on axiomatic fuzzy set (AFS) theory, which integrates the representation capability of AFS and the classification competence of spectral clustering (N-cut). The effectiveness of the proposed approach is demonstrated by using real-word datasets, and the experimental results indicate that the performance of our method is comparable with that of classic spectral clustering algorithms (NJW, SM, Diffuzzy, AASC and SOM-SC) and other clustering methods, including K-means, fuzzy c-means, and MinMax K-means. Meanwhile, the proposed method can be used to explore the underlying clusters and give their characteristics in the form of fuzzy descriptions.

Introduction

As an unsupervised technique in the field of machine learning, clustering plays a significant role in detecting the inherent structure and latent knowledge of data. During the past several decades, many clustering methods have been developed, such as partitioning approaches (K-means), model-based approaches (GMM), density-based approaches (DBSCAN), hierarchical clustering approaches, fuzzy clustering approaches (fuzzy c-means) and dimensionality reduction approaches (spectral clustering) [1]. Owing to its performance in recognizing non-spherical distributed clusters and simplicity of implementation, spectral clustering has been intensively investigated and applied in image segmentation [[2], [3], [4]], community detection [5], spatio-temporal data mining [6], parallel computing [7], etc.

The idea of spectral clustering is to utilize the eigenvalues of the affinity matrix of the data to perform dimensionality reduction before clustering in fewer dimensions. The affinity matrix serves as an input and consists of a quantitative assessment of the relative similarity of each pair of points in the dataset. However, there are still some open problems in spectral clustering algorithms. The improvement that researchers intensively care about is the similarity measure. In conventional spectral clustering algorithms, an appropriate parameter must be set for controlling the scale of neighbors in building the affinity matrix with a Gaussian kernel function. It is difficult to determine the proper parameters without a priori information. To address this issue, a family of improved spectral clustering algorithms was presented. İnkaya [8] proposes a parameter-free similarity graph, namely the density adaptive neighborhood (DAN), which combines distance, density, and connectivity information, and reflects the local characteristics. Aimed at approximating spectral clustering (ASC), the geodesic-based hybrid similarity criteria that enable the use of different types of information for accurate similarity representation was developed in [9]. Beauchemin [10] provides a method of building an affinity matrix for spectral clustering from a density estimator relying on K-means with sub-bagging procedure. In [11], the neighbor relation propagation principle is introduced as the similarity measure. Moreover, another research tendency is to integrate spectral clustering and a fuzzy system to improve its performance. Zhao et al. [12] utilizes the prototypes and partition matrix obtained by fuzzy c-means clustering algorithm to evaluate the similarity between samples. Liu et al. [2] introduces a novel factor incorporating both the spatial relationship and the neighborhood configuration relationship of samples to overcome the sensitivity to noise data of traditional spectral clustering.

Moreover, forming a Laplacian matrix, selecting eigenvectors, and estimating the number of clusters have continuously caught the attention of researchers in the machine learning community, which is reviewed in [13]. However, these improved methods are weak in terms of semantic interpretability of clustering results. Recently, a spectral clustering algorithm based on intuitionistic fuzzy information was proposed in [14]. In this study, the authors extend the spectral clustering method to an intuitionistic fuzzy environment. Inspired by [14], we intend to present a spectral clustering method with semantic interpretation based on axiomatic fuzzy set theory (AFS). The AFS framework [[15], [16], [17], [18], [19]] is a powerful approach for semantic concept extraction and interpretations for fuzzy attributes. The membership functions and their logic operations are algorithmically determined according to the distributions of original data and the semantics of the fuzzy sets, which is different from conventional fuzzy theory, whose membership functions are often given by personal intuition subjectively, and logic operations are equipped with a kind of triangular norm, or, for short, a t-norm, which is chosen in advance and independent of the distribution of raw data.

In recent years, AFS theory has been extend to many issues of pattern recognition and machine learning, including fuzzy rough sets [[20], [21]], fuzzy decision trees [[22], [23]], fuzzy clustering [[24], [25]] and fuzzy classifiers [26]. Meanwhile, AFS has been applied to many practical applications, such as semantic facial descriptor extraction [27], prediction of readmissions in intensive care units [28], management strategic analysis [29], time series analysis [30] and handwritten numeral recognition [31]. As described in [32], the AFS-based clustering method consists of two clustering processes: computing the transitive closure of the similarity matrix, and finding an optimal threshold to implement rational partition. The final result is achieved by the membership degree of the clusters’ descriptions. Even though the definition of the membership function in [32] have decreased the time complexity, the process of calculating the transitive closure of a fuzzy similarity matrix is still time-consuming, and the truncation threshold must be optimized. In this paper, we integrate spectral clustering and AFS theory in our method with the purpose of improving both classification rate and interpretability. The improvements include the following:

  • An unsupervised feature selection method is proposed based on compactness and separability, which is defined by the average distance from an instance to its nearest and farthest neighbors, respectively. Different from existing methods [[33], [34]], we consider the salience of fuzzy terms for every feature instead of relying on features themselves. Moreover, we embed feature selection into the clustering approach under the framework of AFS theory rather than using it as the preprocessing part, which improves the performance of describing samples with fuzzy concepts.

  • A fuzzy similarity measurement is developed by AFS theory and applied to build an affinity matrix that serves as the input of the spectral clustering algorithm. Compared with Gaussian kernel distance measure, our solution is robust to parameters since AFS theory enhances its adaptivity for various datasets to reduce undesirable influences caused by noise data.

  • The transitive closure calculation and threshold optimization used in conventional AFS-based clustering methods are replaced by a spectral clustering approach based on normalized cut when obtaining initial clustering results. This improvement reduces the complexity of the original method and avoids iterative tuning procedure.

  • AFS theory is introduced to generate a semantic interpretation for every class. We then employ it to adjust the initial result obtained by spectral clustering and achieve the final result with a comprehensive description simultaneously. The semantic interpretation of every class is the underlying knowledge of a dataset that is described by fuzzy concepts and AFS logic.

In this paper, we realize spectral clustering capable of describing results semantically by associating it with AFS theory. In comparison with traditional AFS-based clustering algorithms, our method outperform them in structure complexity and parameter optimization. For the classic spectral clustering approaches, the clustering result is interpretative for domain experts. In general, the improvements contribute to two methodologies in clustering tasks.

The rest of this paper is organized as follows. In Section 2, we provide a brief introduction to spectral clustering and AFS theory. In Section 3, a spectral clustering method with semantic interpretation based on axiomatic fuzzy set theory is proposed. Comparative analysis between our method and other clustering algorithms in selected UCI datasets are given in Section 4. Conclusions are presented in Section 5.

Section snippets

Spectral clustering

Spectral clustering is viewed as the optimization of a graph cut problem in spectral graph theory. Thus, we first review some basic concepts of graphs. It is well known that a graph consists of a set of vertexes and a set of edges denoted G = (V, E). In spectral clustering, let x1,x2,,xnd be n samples with d features, and the dataset can be described by a weighted undirected graph that is made up of |V| nodes and |E| edges. For viV, vi represents a sample xi, and eij denotes the weight

Methodology

As an unsupervised learning technique, clustering is widely applied in data analysis and processing, especially in knowledge discovery. The clustering result reflects the potential structure and relation implied in data. Compared with traditional clustering methods, spectral clustering is characterized by fitness for a non-convex sphere of sample spaces and convergence in a global optimal solution. In this paper, due to the advantage in semantic extraction with consistency support of

Experiment

The classification rate is widely applied to evaluate the performance of clustering algorithm. It can be determined by building a contingency matrix. In order to calculate the classification rate, a permutation mapping function is designed for building the corresponding relation between the cluster label and true label. Let L be the class label set; the classification rate is then defined asCR=maxτΦL1Ni=1NΩ(ti,τ(ri))where N is the number of samples, ti is the true label, ri is the obtained

Conclusion

In this paper, we present a spectral clustering method with semantic interpretation based on the framework of axiomatic fuzzy sets theory, which improves conventional spectral clustering algorithm's sensitivity to parameter when building affinity matrix and make the result of spectral clustering interpretative and comprehensive. The proposed method integrates spectral clustering's superiority in classifying nonspherical dataset with the linguistic representation capability of AFS theory. Thus,

Acknowledgements

This work is supported by Natural Science Foundation of China (Nos. 61370146, 61672132, 61673082) and Liaoning Science & Technology Project (No. 2013405003).

References (43)

Cited by (22)

  • Multi-level retrieval with semantic Axiomatic Fuzzy Set clustering for question answering

    2021, Applied Soft Computing
    Citation Excerpt :

    However, these methods of obtaining the search chain directly based on the words in the question are often too strict and may miss some evidence. The AFS theory provides a framework for transforming the random uncertainty of data into the basis of logical reasoning and the fuzziness of data into semantic information [17,31,32]. EI algebra [33], simple concepts, complex concepts, membership degree, and similarity function together constitute the AFS framework.

  • Path optimization for multi-axis EDM drilling of combustor liner cooling holes using SCGA algorithm

    2021, Computers and Industrial Engineering
    Citation Excerpt :

    The schematic of the data decomposition block is illustrated in Fig. 6 and the details are as follows. Owing to its excellent performance in clustering non-convex datasets and simplicity of implementation (Wang et al., 2018), spectral clustering (SC) is adopted to decompose the large-scale data. In SC, all given data is considered as points that can be connected by weighted edges.

  • Haze pollution causality mining and prediction based on multi-dimensional time series with PS-FCM

    2020, Information Sciences
    Citation Excerpt :

    The experimental results show that the cause-effect weights of meteorological factors on haze are larger than pollutant emissions on haze, NO2 is the main pollutant effecting on pollutant PM2.5 and haze formation, and atmosphere pressure (P) is dominant meteorological factor in haze formation. In the future, we would like to explore how to improve the precision based on the proposed approach further and how to integrate the AFS (Axiomatic Fuzzy Set) [35–38] to investigate more detailed relationships in haze formation. Another possible work is to use the stochastic configuration network [39] to investigate this problem.

View all citing articles on Scopus
View full text