To read this content please select one of the options below:

Hybrid supervised clustering based ensemble scheme for text classification

Aytug Onan (Department of Software Engineering, Celal Bayar University, Manisa, Turkey)

Kybernetes

ISSN: 0368-492X

Article publication date: 6 February 2017

541

Abstract

Purpose

The immense quantity of available unstructured text documents serve as one of the largest source of information. Text classification can be an essential task for many purposes in information retrieval, such as document organization, text filtering and sentiment analysis. Ensemble learning has been extensively studied to construct efficient text classification schemes with higher predictive performance and generalization ability. The purpose of this paper is to provide diversity among the classification algorithms of ensemble, which is a key issue in the ensemble design.

Design/methodology/approach

An ensemble scheme based on hybrid supervised clustering is presented for text classification. In the presented scheme, supervised hybrid clustering, which is based on cuckoo search algorithm and k-means, is introduced to partition the data samples of each class into clusters so that training subsets with higher diversities can be provided. Each classifier is trained on the diversified training subsets and the predictions of individual classifiers are combined by the majority voting rule. The predictive performance of the proposed classifier ensemble is compared to conventional classification algorithms (such as Naïve Bayes, logistic regression, support vector machines and C4.5 algorithm) and ensemble learning methods (such as AdaBoost, bagging and random subspace) using 11 text benchmarks.

Findings

The experimental results indicate that the presented classifier ensemble outperforms the conventional classification algorithms and ensemble learning methods for text classification.

Originality/value

The presented ensemble scheme is the first to use supervised clustering to obtain diverse ensemble for text classification

Keywords

Citation

Onan, A. (2017), "Hybrid supervised clustering based ensemble scheme for text classification", Kybernetes, Vol. 46 No. 2, pp. 330-348. https://doi.org/10.1108/K-10-2016-0300

Publisher

:

Emerald Publishing Limited

Copyright © 2017, Emerald Publishing Limited

Related articles