Multi-label symbolic value partitioning through random walks
Introduction
Multi-label learning has attracted much attention in recent years. It is rooted in various fields, including image processing [1], [2], text classification [3], and bioinformatics [4], [5]. There are different multi-label learning problems [6], [7], [8], including classification [9], [10], [11], label distribution learning [12], [13], [14], and dimensionality reduction [15], [16]. Consequently, many multi-label learning algorithms have been proposed, including multi-label k-nearest neighbor (ML-kNN) [17], weighted linear loss multiple birth support vector machine based on information granulation [18], support vector machine for multi-label classification based on rank [19], support vector machine for multi-label classification with a zero label [20], multi-label learning with global and local correlation [21], multi-label dimensionality reduction via dependence maximization [22], and multi-label feature selection based on max-dependency and min-redundancy (MDMR) [23].
As with single-label learning, multi-label learning also encounters the curse of dimensionality [24], [25]. Multi-label data, such as text [26], images [27], [28], and gene sequences [29], are represented by high-dimensional eigenvectors. The high dimension makes the sample distribution sparse, which increases the computational complexity and degrades the performance of the classification model. Hence, a number of dimensionality reduction techniques [30], [31] have been developed, such as feature selection [32], [33] and feature extraction [34], [35]. The former selects the optimal subset from the original feature set according to specific criteria, and the latter aims to map the original feature set to a low-dimensional space and generate new features through specific transformations.
Multi-label feature selection [36], [37] can be broadly divided into two categories [38]: problem transformation (PT) and algorithm adaptation (AA). PT methods transform the multi-label learning problem into a series of single-label learning problems. Hence, they can effectively use existing single-label feature selection algorithms. Researchers designed two transformation approaches, that is, binary relevance (BR) [39], [40] and label power-set (LP) [41], and two indicators, that is, relief (RF) and information gain (IG). Subsequently, four PT-based feature selection algorithms (RF-BR, RF-LP, IG-BR, and IG-LP) were proposed. AA methods modify existing single-label feature selection algorithms so that they can be applied to multi-label data. MLNB [42] adapts traditional naïve Bayes classifiers to solve the multi-label feature selection problem. NRPS [43] uses the neighborhood relationship preserving score for multi-label feature selection, which was inspired by similarity preservation.
Symbolic value partitioning is another knowledge reduction technique. Similar to discretization [44], this approach decreases the size of the attribute domains. The difference is that it is applicable to symbolic instead of numeric data. In fact, symbolic value partitioning is more general than discretization and feature selection. Some symbolic value partitioning algorithms for a single-label have been proposed. Nguyen et al. [45] proposed a rough set approach to convert the symbolic value partitioning problem into a graph coloring problem. Wen and Min [46] proposed a granular computing framework with adaptive granule construction and selection. However, to the best of our knowledge, the respective multi-label problem has not been studied yet.
In this paper, we propose the multi-label symbolic value partitioning through random walks (MSPR) algorithm. Fig. 1 illustrates the framework of the new algorithm. It consists of two stages: graph construction and clustering. In the graph construction stage, an undirected weighted graph is constructed for each attribute. The weight of each edge represents the similarity between two attribute values. Similarity is calculated from the local information provided by one attribute and all labels. In the clustering stage, we use a random walk algorithm to cluster the attribute values on each weighted graph. A key parameter is the separating threshold, which determines whether an edge should span the boundaries of the two clusters. It is calculated by iterating the separation process on each graph. Moreover, the neighborhood similarity method is used for edge separation.
The main contributions of this paper are threefold. First, we introduce the multi-label symbolic value partition problem. This is an important problem in several applications. For instance, in medicine, when multiple labels represent multiple diagnoses (so-called comorbidity), a simplification of the attribute values can help to provide a better understanding of the problem, and a faster and more effective decision on new patients [47], [48]. Second, we convert the new problem into a clustering problem for attribute values. To the best of our knowledge, this is the first time clustering techniques have been applied to attribute value partitioning. Third, we apply a random walk algorithm based on graph clustering to the new problem. The constructed graph indicates the degree of association between attribute values.
Experiments were performed on 13 benchmark multi-label datasets to quantify the performance of the MSPR algorithm. The datasets were selected from different application areas: bioinformatics, video, images, semantic scene analysis, and text categorization. The number of instances ranged from 194 to 48,536, the number of attributes ranged from 19 to 500, and the number of labels ranged from 6 to 174. We compared the MSPR algorithm with four feature selection methods [38] using ReliefF and information gain as feature importance measures: ReliefF-Binary Relevance (RF-BR), ReliefF-Label Powerset (RF-LP), Information Gain-Binary Relevance (IG-BR), and Information Gain-Label Powerset (IG-LP). We also compared the MSPR algorithm with two feature selection methods using information-theoretic approaches as feature importance measures: MDMR [23] and fast information-theoretic multi-label feature ranking (FIMF) [49]. Additionally, we compared the MSPR algorithm with an embedded multi-label feature selection method with manifold regularization called manifold regularized discriminative feature selection for multi-label learning (MDFS) [50]. The results demonstrated that the MSPR algorithm outperformed the other algorithms on most datasets.
The remainder of this paper is organized as follows: In Section 2, we describe some basic concepts that are used throughout the paper. In Section 3, we present and analyze the MSPR algorithm. In Section 4, we present experimental results with analysis. Finally, in Section 5, we draw some conclusions and discuss future work.
Section snippets
Preliminaries
In this section, we review the main concepts that will be used in the discussion, including multi-label decision system, partition, clustering, and graph. We also redefine certain concepts in terms of attribute value partitioning. Table 1 lists notation used throughout the paper.
Algorithm
In this section, we first define the MSPR problem. Then we discuss the general framework of our approach using two subroutines. Finally, we discuss the heuristic function of weighted graph construction and the clustering method based on a random walk algorithm.
Experiments
We conducted experiments on 13 multi-label datasets to verify the performance of our proposed method and compared the results with those of seven other feature selection methods.
Conclusions
In this paper, we proposed a solution to the multi-label symbolic value partition problem. An efficient MSPR algorithm that consists of two stages was proposed to solve this issue. The goal of the proposed algorithm is to enhance the generalization ability and, simultaneously, help the classifier to obtain good classification performance. We compared the MSPR with seven popular feature selection algorithms on 13 datasets. The experimental results demonstrated that the MSPR algorithm achieved
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgments
This work was supported by the National Natural Science Foundation of China [grant numbers 61573321, 41631179, 41604114, 61976194]; the Key Laboratory of Oceanographic Big Data Mining and Application of Zhejiang Province [grant number OBDMA201601]; and the Scientific Research Starting Project of SWPU [grant number 2018QHR007].
Liu-Ying Wen received her M.S. degree from the School of Computer, Central China Normal University, Wuhan, China, in 2009, and her Ph.D. degree from the Petroleum Engineering and Technology, Southwest Petroleum University, Chengdu, China, in 2017. She is currently a lecturer of Southwest Petroleum University, Chengdu, China. Her current research interests include dimensionality reduction, granular computing, data mining.
References (53)
- et al.
Feature selection using neighborhood entropy-based uncertainty measures for gene expression data classification
Inf. Sci. (Ny)
(2019) - et al.
A comparison study of similarity measures for covering-based neighborhood classifiers
Inf. Sci. (Ny)
(2018) - et al.
Weighted linear loss multiple birth support vector machine based on information granulation for multi-class classification
Pattern Recognit.
(2017) An efficient multi-label support vector machine with a zero label
Expert Syst. Appl.
(2012)- et al.
Multi-label feature selection based on max-dependency and min-redundancy
Neurocomputing
(2015) - et al.
Multi-label learning with label-specific feature reduction
Knowl. Based Syst.
(2016) - et al.
Neighborhood rough sets based multi-label classification for automatic image annotation
Int. J. Approximate Reasoning
(2013) - et al.
Hierarchical attribute reduction algorithms for big data using mapreduce
Knowl. Based Syst.
(2015) - et al.
Cost-sensitive feature selection based on adaptive neighborhood granularity with multi-level confidence
Inf. Sci. (Ny)
(2016) - et al.
A multiway p-spectral clustering algorithm
Knowl. Based Syst.
(2019)
A multi-label feature extraction algorithm via maximizing feature variance and feature-label dependence simultaneously
Knowl. Based Syst.
A comparison of multi-label feature selection methods using the problem transformation approach
Electron. Notes Theor. Comput. Sci.
Label ranking by learning pairwise preferences
Artif. Intell.
Ensemble methods for multi-label classification
Expert. Syst. Appl.
Feature selection for multi-label naive Bayes classification
Inf. Sci. (Ny)
A flexible data-driven comorbidity feature extraction framework
Comput. Biol. Med.
Fast multi-label feature selection based on information-theoretic feature ranking
Pattern Recognit.
Manifold regularized discriminative feature selection for multi-label learning
Pattern Recognit.
Rough sets approach to symbolic value partition
Int. J. Approximate Reasoning
Feature selection with test cost constraint
Int. J. Approaximate Reasoning
Cnn-rnn: s unified framework for multi-label image classification
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
Automatic image annotation via local multi-label classification
Proceedings of the International Conference on Content-based Image and Video Retrieval
Document transformation for multi-label feature selection in text categorization
Proceedings of the Seventh IEEE International Conference on Data Mining
A review of feature selection techniques in bioinformatics
Bioinformatics
Mining multi-label data
Data Mining and Knowledge Discovery Handbook
Large-scale multi-label learning with missing labels
Proceedings of the International Conference on Machine Learning
Cited by (4)
An incremental random walk algorithm for sampling continuous fitness landscapes
2023, NeurocomputingKGA: integrating KPCA and GAN for microbial data augmentation
2023, International Journal of Machine Learning and CyberneticsHierarchical multilabel classification by exploiting label correlations
2022, International Journal of Machine Learning and CyberneticsDecision-Theoretic Rough Set: A Fusion Strategy
2020, IEEE Access
Liu-Ying Wen received her M.S. degree from the School of Computer, Central China Normal University, Wuhan, China, in 2009, and her Ph.D. degree from the Petroleum Engineering and Technology, Southwest Petroleum University, Chengdu, China, in 2017. She is currently a lecturer of Southwest Petroleum University, Chengdu, China. Her current research interests include dimensionality reduction, granular computing, data mining.
Chao-Guang Luo is a graduate student at the School of Computer Science, Southwest Petroleum University. His current research interests include multi-label learning, feature selection.
Wei-Zhi Wu received the B.Sc. degree in mathematics from Zhejiang Normal University, Jinhua, China, in 1986, the M.Sc. degree in mathematics from East China Normal University, Shanghai, China, in 1992, and the Ph.D. degree in applied mathematics from Xian Jiaotong University, Xian, China, in 2002. He is currently a Professor with the School of Mathematics, Physics, and Information Science, Zhejiang Ocean University, Zhejiang, China. His current research interests include approximate reasoning, rough sets, random sets, formal concept analysis, and granular computing.
Fan Min received the M.S. and Ph.D. degrees from the School of Computer Science and Engineering, University of Electronics Science and Technology of China, Chengdu, China, in 2000 and 2003, respectively. He visited the University of Vermont, Burlington, Vermont, from 2008 to 2009. He is currently a professor with Southwest Petroleum University, Chengdu. He has published more than 100 refereed papers in various journals and conferences, including the Information Sciences, International Journal of Approximate Reasoning, and Knowledge-Based Systems. His current research interests include data mining, recommender systems, active learning and granular computing.