Memetic feature selection for multilabel text categorization using label frequency difference

doi:10.1016/j.ins.2019.02.021

Information Sciences

Volume 485, June 2019, Pages 263-280

https://doi.org/10.1016/j.ins.2019.02.021 Get rights and content

Abstract

Multilabel text categorization is an important task in modern text mining applications. Text datasets comprise an excessive number of terms, and this can degrade the accuracy. Therefore, conventional studies applied a feature selection method before text categorization. Recently, memetic feature selection methods that hybridize an evolutionary feature wrapper and a filter have gained popularity and showed promising results. However, conventional memetic text feature selection methods suffer from limited performance because the used feature filter requires problem transformation that degrades the search capability, resulting in unrefined feature subsets with poor accuracy. In this study, we propose an effective memetic feature selection method based on a novel feature filter that is highly specialized to multilabel text categorization. Our experiments demonstrate that the proposed method significantly outperforms several conventional methods.

Introduction

Text categorization involves identifying relevant categories of a given text [40]. It is a core technique for many applications such as sentiment analysis [7], [29], [48], spam detection [47], review recommendation [28], and clinical analysis [11]. A single text can be assigned to multiple categories simultaneously, and therefore, task categorization needs to be performed using multilabel classification [17], [21], [22]. Let $W \subset B^{d}$ denote a set of text patterns constructed from a set of term features $F = {f_{1}, \dots, f_{d}},$ where each feature can represent each term by encoding its presence or absence [14]. Then, each pattern w_i ∈ W, where 1 ≤ i ≤ |W|, is assigned to a label subset λ_i ⊆ L, where $L = {l_{1}, \dots, l_{| L |}}$ . Among thousands of features, there can be a subset of unnecessary features that encodes meaningless terms such as stop words [8]. In this situation, multilabel feature selection can be a useful preprocessing step because it highlights the important features that significantly contribute to pattern-label set association by discarding unnecessary features [23].

Multilabel feature selection involves identifying a subset S ⊂ F comprising n ≪ |F| features that is significantly relevant to label set L. Thus, an effective search strategy should be devised to locate the best feature subset because of $\sum_{i = 1}^{n} (\binom{d}{i})$ possible feature subsets. To achieve this, conventional studies considered feature wrappers and filters. Specifically, the filter approach evaluates the importance of features based on the intrinsic characteristics of data using a score function such as χ² statistics or information gain [39]. By contrast, the wrapper approach evaluates using a specific learning algorithm, that is, a classifier. Thus, each can be distinguished according to whether a classifier is used to evaluate the superiority of the candidate feature subsets and feature wrappers that often outperform the feature filters in terms of learning accuracy because of the interaction with the learning algorithm [20].

Population-based evolutionary algorithms such as genetic algorithms were frequently used as a search method for feature wrappers owing to their stochastic global search capability [42]. However, conventional evolutionary search methods waste computational resources for fine-tuning solutions because new offspring are created by randomly combining the ancestors. A promising alternative is to use an ideal hybridization of an evolutionary feature wrapper and a filter to obtain the optimal solution, that is, a feature subset, without becoming trapped in the local optima; separately, unrefined feature subsets created randomly in the evolutionary feature wrappers can be fine-tuned using the score function of the feature filter. Consequently, recent studies have focused on hybridizing the evolutionary feature wrapper and filter for multilabel text categorization, that is, memetic feature selection, because of its excellent search capability [7], [10], [18], [27].

Few memetic feature selection methods have been considered in conventional text categorization studies [27], [47]. Those that were showed promising results demonstrating the advantages of hybridization [10]. However, their performance is limited because the score functions of used or candidate feature filters [1], [6], [10], [34], [37] for hybridization commonly require the transformation of multiple labels into a single label to obtain the feature importance scores for the fine-tuning process. This drawback leads to an unrefined final feature subset owing to the degradation of the fine-tuning capability of the feature filter, which in turn is caused by inaccurate score evaluation due to information loss from the transformation [19]. In addition, these score functions are designed without considering cooperation with the evolutionary feature wrapper. This results in a complicated hybridization process that requires additional parameters.

In this study, we propose an effective score function called “label frequency difference” (LFD) that is highly specialized for memetic multilabel text feature selection. This score function calculates the conditional frequencies of labels directly and determines the discriminating power of features based on the difference in label frequencies according to the presence/absence of terms. Thus, this scoring process can circumvent the degradation of fine-tuning capability because it does not incur any problems during transformation.

Section snippets

Related work

In text categorization studies, dimensionality reduction techniques such as feature selection and extraction are considered promising data preprocessing steps for solving complicated problems involving multilabel, multitask, or multiview learning [3], [24], [25], [26] because they can improve the learning accuracy by making the algorithm focus on important features. These feature selection methods can be roughly divided into feature wrapper and filter approaches. Although hybridization offers

Population-based incremental learning

Based on the evolutionary feature wrapper and filter, a memetic search can be conducted as follows: (1) an evolutionary feature wrapper is used to locate a promising region in the search space, (2) feature subsets belonging to the promising region are created randomly, and (3) a feature filter is applied to refine the created feature subsets. To implement our memetic search, we choose the estimation of distribution algorithm (EDA) because it can effectively solve feature selection problems [42]

Experimental settings

We performed experiments with 11 text datasets from the Yahoo! dataset collection [46]. These multilabel datasets consist of top-level categories such as Arts, Business, Computers, Education, Entertainment, Health, Recreation, Reference, Science, Social, and Society. We preprocessed an unsupervised dimensionality reduction on the text datasets to retain 5% features with the highest text frequency. The classification performance did not change significantly in previous text mining studies [43].

Conclusion

In this study, we proposed an effective memetic text feature selection method using our novel LFD feature filter. Specifically, to perform effective memetic searches, our feature filter is designed to not perform problem transformations when measuring the importance of text features. In addition, our feature filter is intentionally designed to cooperate with the evolutionary feature wrapper. The score function in the feature filter outputs score values that are compatible with the probability

Acknowledgment

This research was supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (NRF-2016R1C1B1014774 and NRF-2017R1D1A1B03031957).

References (49)

D. Agnihotri et al.
Variable global feature selection scheme for automatic classification of text documents
Expert Syst. Appl.
(2017)
Y. Guo et al.
An ensemble embedded feature selection method for multi-label clinical text classification
Proceedings of the IEEE International Conference on Bioinformatics and Biomedicine
(2016)
Z. He et al.
Many-objective evolutionary algorithm: objective space reduction and diversity improvement
IEEE Trans. Evol. Comput.
(2016)
J. Lee et al.
Memetic feature selection algorithm for multi-label classification
Inf. Sci.
(2015)
J. Lee et al.
Efficient multi-label feature selection using entropy-based label selection
Entropy
(2016)
J. Lee et al.
Effective evolutionary multilabel feature selection under a budget constraint
Complexity
(2018)
M.P. O‘Mahony et al.
A classification-based review recommender
Knowl. Based Syst.
(2010)
A. Onan et al.
A feature selection model based on genetic rank aggregation for text sentiment classification
J. Inf. Sci.
(2017)
R.H. Pinheiro et al.
A global-ranking local feature selection method for text categorization
Expert Syst. Appl.
(2012)
R.H. Pinheiro et al.
Data-driven global-ranking local feature selection methods for text categorization
Expert Syst. Appl.
(2015)

A. Rehman et al.

Feature selection based on a normalized difference measure for text classification

Inf. Process. Manage.

(2017)

A. Rehman et al.

Relative discrimination criterion–a novel feature ranking method for text data

Expert Syst. Appl.

(2015)

N. SpolaôR et al.

A comparison of multi-label feature selection methods using the problem transformation approach

Electron. Notes Theor. Comput. Sci.

(2013)

B. Tang et al.

Toward optimal feature selection in naive Bayes for text categorization

IEEE Trans. Knowl. Data Eng.

(2016)

A.K. Uysal et al.

A novel probabilistic feature selection method for text classification

Knowl. Based Syst.

(2012)

R. Wang et al.

Ambiguity-based multiclass active learning

IEEE Trans. Fuzzy Syst.

(2016)

B. Xue et al.

A survey on evolutionary computation approaches to feature selection

IEEE Trans. Evol. Comput.

(2016)

M.-L. Zhang et al.

Feature selection for multi-label naive bayes classification

Inf. Sci.

(2009)

M.-L. Zhang et al.

A review on multi-label learning algorithms

IEEE Trans. Knowl. Data Eng.

(2014)

L. Zheng et al.

Sentimental feature selection for sentiment analysis of chinese online reviews

Int. J. Mach. Learn. Cyber.

(2018)

S. Baluja

Population-based incremental learning: a method for integrating genetic search based function optimization and competitive learning

Technical Report

(1994)

Z. Cai et al.

Multi-label feature selection via feature manifold learning and sparsity regularization

Int. J. Mach. Learn. Cyber.

(2018)

K. Dembczyński et al.

On label dependence and loss minimization in multi-label classification

Mach. Learn.

(2012)

J. Demšar

Statistical comparisons of classifiers over multiple data sets

J. Mach. Learn. Res.

(2006)

Cited by (66)

A survey on multi-label feature selection from perspectives of label fusion
2023, Information Fusion
With the rapid advancement of big data technology, high-dimensional datasets comprising multi-label data have become prevalent in various fields. However, these datasets often contain more relevant and redundant features, which can adversely affect the performance of machine learning algorithms. Multi-label feature selection (MLFS) has emerged as a crucial pre-processing step in multi-label learning to address this issue. This survey provides an overview of multi-label learning and its algorithms, including problem transformation and algorithm adaptation. We also introduced three traditional strategies for MLFS: filter, wrapper, and embedded-based methods. Furthermore, we categorize existing research on multi-label feature selection into six aspects based on label fusion: label transformation-based (Binary Relevance-based and Label Powerset-based), label correlation-based (second and high-order, high and hybrid order), label specific-based, semi-supervised-learning-based, missing and noisy labels-based, and label enhancement-based approaches. We provide a detailed introduction to each method’s common approaches and theories. Additionally, we conduct experimental comparisons on practical multi-label learning datasets to evaluate the advantages and disadvantages of different algorithms. We discuss the application of multi-label feature selection in various domains, such as data mining, computer vision, natural language processing, and bio-informatics. Finally, we outline potential future research directions in multi-label feature selection, including MLFS with online learning, active learning, label distribution learning, partial label learning, granular computing, and class-imbalanced learning.
A bidirectional dynamic grouping multi-objective evolutionary algorithm for feature selection on high-dimensional classification
2023, Information Sciences
As a key preprocessing step in classification, feature selection involves two conflicting objectives: maximizing the classification accuracy and minimizing the number of selected features. Therefore, multi-objective optimization is widely used in feature selection due to its excellent trade-off between the convergence of two objectives. However, most existing multi-objective feature selection methods still face the issues of the “curse of dimensionality” and high computational costs, especially when the search space is large. To solve the above issues, this paper proposes a bidirectional dynamic grouping multi-objective evolutionary approach for high-dimensional feature selection, referred to as BDGMOEA. This approach transforms a high-dimensional feature selection problem into a feature selection task with a smaller search space by the idea of feature grouping, in which one bit of an individual represents a group of features. Specifically, a grouping search strategy is developed to divide the features into different quadrants according to the importance of the features obtained by different evaluation techniques. Then, the features in each quadrant are grouped by sector. This strategy can effectively narrow the search space and quickly locate promising feature regions. In addition, a bidirectional dynamic adjustment mechanism is presented by considering the evolutionary state of the population, and it can be used to explore each feature in more detail and comprehensively to prevent good features from being ignored in unselected groups. The experimental results demonstrate that the proposed BDGMOEA method performs the best in most cases, indicating that BDGMOEA not only achieves better classification performance but also reduces the training time.
MABUSE: A margin optimization based feature subset selection algorithm using boosting principles
2022, Knowledge-Based Systems
Feature subset selection is one of the most common procedures in machine learning tasks. In a broad sense, feature selection methods can be classified into three major groups, embedded, filter and wrapper methods. Although wrappers might attain superior classification performances, they suffer from scalability issues as they are more computationally expensive than the other methods. Filters are typically faster, and sometimes they are the only applicable methods when the datasets are large. In the field of classification, margin optimization has been proven to be an efficient approach for improving the generalization performance of many classification models. Although margins have been used as criteria for feature selection, in most cases, the most advanced methods are wrappers, which suffer from high computational costs and do not outperform the faster algorithms. In this paper, we propose MABUSE, which is a feature selection method that optimizes margins from a filter perspective. We consider a nearest-neighbor margin definition and, borrowing from the strategy of classifier ensemble construction using boosting, we develop a new method that uses a simple heuristic search. Extensive experimental validation demonstrates that our proposed approach outperforms the state-of-the-art algorithms in both classification and reduction, and has a computational cost that is similar to previous algorithms.
Chaotic binary Group Search Optimizer for feature selection
2022, Expert Systems with Applications
Citation Excerpt :
In order to assess the effectiveness of the proposed optimizers, standard evaluation measures have been employed. Table 3 shows the confusion matrix that helps in evaluating the effectiveness of the classifier, such as Accuracy, Fitness function, and Computational time (Kong et al., 2020; Lee, Yu, Park, & Kim, 2019; Selvakumar et al., 2019). In this part of the experiments, the M-of-n dataset was utilized since it describes a real difficulty in FS.
Feature selection (FS) is recognized as one of the majority public and challenging problems in the Machine Learning domain. FS can be examined as an optimization problem that needs an effective optimizer to determine its optimal subset of more informative features. This paper proposes a wrapper FS method that combines chaotic maps (CMs) and binary Group Search Optimizer (GSO) called CGSO, which is used to solve the FS problem. In this method, five chaotic maps are incorporated with the GSO algorithm’s main procedures, namely, Logistic, Piecewise, Singer, Sinusoidal, and Tent. The GSO algorithm is used as a search strategy, while k-NN is employed as an induction algorithm. The objective function is to integrate three main objectives: maximizing the classification accuracy value, minimizing the number of selected features, and minimizing the complexity of generated k-NN models. To evaluate the proposed methods’ performance, twenty well-known UCI datasets are used and compared with other well-known published methods in the literature. The obtained results reveal the superiority of the proposed methods in outperforming other well-known methods, especially when using binary GSO with Tent CM. Finally, it is a beneficial method to be utilized in systems that require FS pre-processing.
Label Distribution Feature Selection Based on Neighborhood Rough Set
2024, SSRN
Joint subspace reconstruction and label correlation for multi-label feature selection
2024, Applied Intelligence

View all citing articles on Scopus

View full text

Memetic feature selection for multilabel text categorization using label frequency difference

Abstract

Introduction

Section snippets

Related work

Population-based incremental learning

Experimental settings

Conclusion

Acknowledgment

Expert Syst. Appl.

IEEE Trans. Evol. Comput.

Inf. Sci.

Entropy

Complexity

Knowl. Based Syst.

J. Inf. Sci.

Expert Syst. Appl.

Expert Syst. Appl.

Inf. Process. Manage.

Expert Syst. Appl.

Electron. Notes Theor. Comput. Sci.

IEEE Trans. Knowl. Data Eng.

Knowl. Based Syst.

IEEE Trans. Fuzzy Syst.

IEEE Trans. Evol. Comput.

Inf. Sci.

IEEE Trans. Knowl. Data Eng.

Int. J. Mach. Learn. Cyber.

Population-based incremental learning: a method for integrating genetic search based function optimization and competitive learning

Technical Report

Multi-label feature selection via feature manifold learning and sparsity regularization

Int. J. Mach. Learn. Cyber.

On label dependence and loss minimization in multi-label classification

Mach. Learn.

Statistical comparisons of classifiers over multiple data sets

J. Mach. Learn. Res.