Elsevier

Information Sciences

Volume 485, June 2019, Pages 263-280
Information Sciences

Memetic feature selection for multilabel text categorization using label frequency difference

https://doi.org/10.1016/j.ins.2019.02.021Get rights and content

Abstract

Multilabel text categorization is an important task in modern text mining applications. Text datasets comprise an excessive number of terms, and this can degrade the accuracy. Therefore, conventional studies applied a feature selection method before text categorization. Recently, memetic feature selection methods that hybridize an evolutionary feature wrapper and a filter have gained popularity and showed promising results. However, conventional memetic text feature selection methods suffer from limited performance because the used feature filter requires problem transformation that degrades the search capability, resulting in unrefined feature subsets with poor accuracy. In this study, we propose an effective memetic feature selection method based on a novel feature filter that is highly specialized to multilabel text categorization. Our experiments demonstrate that the proposed method significantly outperforms several conventional methods.

Introduction

Text categorization involves identifying relevant categories of a given text [40]. It is a core technique for many applications such as sentiment analysis [7], [29], [48], spam detection [47], review recommendation [28], and clinical analysis [11]. A single text can be assigned to multiple categories simultaneously, and therefore, task categorization needs to be performed using multilabel classification [17], [21], [22]. Let WBd denote a set of text patterns constructed from a set of term features F={f1,,fd}, where each feature can represent each term by encoding its presence or absence [14]. Then, each pattern wi ∈ W, where 1 ≤ i ≤ |W|, is assigned to a label subset λi ⊆ L, where L={l1,,l|L|}. Among thousands of features, there can be a subset of unnecessary features that encodes meaningless terms such as stop words [8]. In this situation, multilabel feature selection can be a useful preprocessing step because it highlights the important features that significantly contribute to pattern-label set association by discarding unnecessary features [23].

Multilabel feature selection involves identifying a subset S ⊂ F comprising n ≪ |F| features that is significantly relevant to label set L. Thus, an effective search strategy should be devised to locate the best feature subset because of i=1n(di) possible feature subsets. To achieve this, conventional studies considered feature wrappers and filters. Specifically, the filter approach evaluates the importance of features based on the intrinsic characteristics of data using a score function such as χ2 statistics or information gain [39]. By contrast, the wrapper approach evaluates using a specific learning algorithm, that is, a classifier. Thus, each can be distinguished according to whether a classifier is used to evaluate the superiority of the candidate feature subsets and feature wrappers that often outperform the feature filters in terms of learning accuracy because of the interaction with the learning algorithm [20].

Population-based evolutionary algorithms such as genetic algorithms were frequently used as a search method for feature wrappers owing to their stochastic global search capability [42]. However, conventional evolutionary search methods waste computational resources for fine-tuning solutions because new offspring are created by randomly combining the ancestors. A promising alternative is to use an ideal hybridization of an evolutionary feature wrapper and a filter to obtain the optimal solution, that is, a feature subset, without becoming trapped in the local optima; separately, unrefined feature subsets created randomly in the evolutionary feature wrappers can be fine-tuned using the score function of the feature filter. Consequently, recent studies have focused on hybridizing the evolutionary feature wrapper and filter for multilabel text categorization, that is, memetic feature selection, because of its excellent search capability [7], [10], [18], [27].

Few memetic feature selection methods have been considered in conventional text categorization studies [27], [47]. Those that were showed promising results demonstrating the advantages of hybridization [10]. However, their performance is limited because the score functions of used or candidate feature filters [1], [6], [10], [34], [37] for hybridization commonly require the transformation of multiple labels into a single label to obtain the feature importance scores for the fine-tuning process. This drawback leads to an unrefined final feature subset owing to the degradation of the fine-tuning capability of the feature filter, which in turn is caused by inaccurate score evaluation due to information loss from the transformation [19]. In addition, these score functions are designed without considering cooperation with the evolutionary feature wrapper. This results in a complicated hybridization process that requires additional parameters.

In this study, we propose an effective score function called “label frequency difference” (LFD) that is highly specialized for memetic multilabel text feature selection. This score function calculates the conditional frequencies of labels directly and determines the discriminating power of features based on the difference in label frequencies according to the presence/absence of terms. Thus, this scoring process can circumvent the degradation of fine-tuning capability because it does not incur any problems during transformation.

Section snippets

Related work

In text categorization studies, dimensionality reduction techniques such as feature selection and extraction are considered promising data preprocessing steps for solving complicated problems involving multilabel, multitask, or multiview learning [3], [24], [25], [26] because they can improve the learning accuracy by making the algorithm focus on important features. These feature selection methods can be roughly divided into feature wrapper and filter approaches. Although hybridization offers

Population-based incremental learning

Based on the evolutionary feature wrapper and filter, a memetic search can be conducted as follows: (1) an evolutionary feature wrapper is used to locate a promising region in the search space, (2) feature subsets belonging to the promising region are created randomly, and (3) a feature filter is applied to refine the created feature subsets. To implement our memetic search, we choose the estimation of distribution algorithm (EDA) because it can effectively solve feature selection problems [42]

Experimental settings

We performed experiments with 11 text datasets from the Yahoo! dataset collection [46]. These multilabel datasets consist of top-level categories such as Arts, Business, Computers, Education, Entertainment, Health, Recreation, Reference, Science, Social, and Society. We preprocessed an unsupervised dimensionality reduction on the text datasets to retain 5% features with the highest text frequency. The classification performance did not change significantly in previous text mining studies [43].

Conclusion

In this study, we proposed an effective memetic text feature selection method using our novel LFD feature filter. Specifically, to perform effective memetic searches, our feature filter is designed to not perform problem transformations when measuring the importance of text features. In addition, our feature filter is intentionally designed to cooperate with the evolutionary feature wrapper. The score function in the feature filter outputs score values that are compatible with the probability

Acknowledgment

This research was supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (NRF-2016R1C1B1014774 and NRF-2017R1D1A1B03031957).

References (49)

  • A. Rehman et al.

    Feature selection based on a normalized difference measure for text classification

    Inf. Process. Manage.

    (2017)
  • A. Rehman et al.

    Relative discrimination criterion–a novel feature ranking method for text data

    Expert Syst. Appl.

    (2015)
  • N. SpolaôR et al.

    A comparison of multi-label feature selection methods using the problem transformation approach

    Electron. Notes Theor. Comput. Sci.

    (2013)
  • B. Tang et al.

    Toward optimal feature selection in naive Bayes for text categorization

    IEEE Trans. Knowl. Data Eng.

    (2016)
  • A.K. Uysal et al.

    A novel probabilistic feature selection method for text classification

    Knowl. Based Syst.

    (2012)
  • R. Wang et al.

    Ambiguity-based multiclass active learning

    IEEE Trans. Fuzzy Syst.

    (2016)
  • B. Xue et al.

    A survey on evolutionary computation approaches to feature selection

    IEEE Trans. Evol. Comput.

    (2016)
  • M.-L. Zhang et al.

    Feature selection for multi-label naive bayes classification

    Inf. Sci.

    (2009)
  • M.-L. Zhang et al.

    A review on multi-label learning algorithms

    IEEE Trans. Knowl. Data Eng.

    (2014)
  • L. Zheng et al.

    Sentimental feature selection for sentiment analysis of chinese online reviews

    Int. J. Mach. Learn. Cyber.

    (2018)
  • S. Baluja

    Population-based incremental learning: a method for integrating genetic search based function optimization and competitive learning

    Technical Report

    (1994)
  • Z. Cai et al.

    Multi-label feature selection via feature manifold learning and sparsity regularization

    Int. J. Mach. Learn. Cyber.

    (2018)
  • K. Dembczyński et al.

    On label dependence and loss minimization in multi-label classification

    Mach. Learn.

    (2012)
  • J. Demšar

    Statistical comparisons of classifiers over multiple data sets

    J. Mach. Learn. Res.

    (2006)
  • Cited by (66)

    • Chaotic binary Group Search Optimizer for feature selection

      2022, Expert Systems with Applications
      Citation Excerpt :

      In order to assess the effectiveness of the proposed optimizers, standard evaluation measures have been employed. Table 3 shows the confusion matrix that helps in evaluating the effectiveness of the classifier, such as Accuracy, Fitness function, and Computational time (Kong et al., 2020; Lee, Yu, Park, & Kim, 2019; Selvakumar et al., 2019). In this part of the experiments, the M-of-n dataset was utilized since it describes a real difficulty in FS.

    View all citing articles on Scopus
    View full text