Elsevier

Neurocomputing

Volume 234, 19 April 2017, Pages 27-37
Neurocomputing

Soft estimation by hierarchical classification and regression

https://doi.org/10.1016/j.neucom.2016.12.037Get rights and content

Abstract

Classification and numeric estimation are the two most common types of data mining. The goal of classification is to predict the discrete type of output values whereas estimation is aimed at finding the continuous type of output values. Predictive data mining is generally achieved by using only one specific statistical or machine learning technique to construct a prediction model. Related studies have shown that prediction performance by this kind of single flat model can be improved by the utilization of some hierarchical structures. Hierarchical estimation approaches, usually a combination of multiple estimation models, have been proposed for solving some specific domain problems. However, in the literature, there is no generic hierarchical approach for estimation and no hybrid based solution that combines classification and estimation techniques hierarchically. Therefore, we introduce a generic hierarchical architecture, namely hierarchical classification and regression (HCR), suitable for various estimation problems. Simply speaking, the first level of HCR involves pre-processing a given training set by classifying it into k classes, leading to k subsets. Three approaches are used to perform this task in this study: hard classification (HC); fuzzy c-means (FCM); and genetic algorithms (GA). Then, each training data containing its associated class label is used to train a support vector machine (SVM) classifier for classification. Next, for the second level of HCR, k regression (or estimation) models are trained based on their corresponding subsets for final prediction. The experiments based on 8 different UCI datasets show that most hierarchical prediction models developed with the HCR architecture significantly outperform three well-known single flat prediction models, i.e., linear regression (LR), multilayer perceptron (MLP) neural networks, and support vector regression (SVR) in terms of mean absolute percentage error (MAPE) and root mean squared error (RMSE) rates. In addition, it is found that using the GA-based data pre-processing approach to classify the training set into 4 subsets is the best threshold (i.e., k=4) and the 4-class SVM+MLP outperforms three baseline hierarchical regression models.

Introduction

One major ultimate goal of data mining is prediction. The process can be categorized for classification or for estimation based on the data type involved in the prediction. Classification is one of the most common research problems in data mining. This problem is usually approached by (supervised) classification techniques. The aim of classification is to allocate an (unknown) instance represented by specific features into one correct class from a finite set of classes. To achieve classification, a learning (or training) task is necessary. This involves the computation of a classifier or model, which is achieved by approximating the mapping between input-output training examples, thereby enabling the correct labeling of the training set at a particular level of accuracy. After the model is generated or trained, it can be used to classify unknown instances assigning one of the class labels learned in the training set [14], [29].

Similar to classification learning, numeric estimation also involves training a model that is learned from a set of training examples including the output attribute. However, the output attribute is continuous, i.e., numeric rather than discrete. Therefore, the goal of numeric estimation is to ‘estimate’ the output value.

Studies of classification and estimation problems generally apply a single (flat) prediction model. More specifically, many recent studies have shown that a hierarchical structure outperforms a flat structure for solving various classification problems (e.g., [15], [38], [35]35]) and estimation problems (e.g., [1], [20], [27], [37], [41], [42]). Such hierarchical approaches have been proposed for specific problem domains. In particular, for estimation, they are based purely on combining several estimation models in a hierarchical manner. A search of the literature showed no hybrid based hierarchical solutions that combine classification and estimation techniques for numeric estimation (c.f., Section 2.2).

Therefore, to remedy this lack, this study introduces a generic hierarchical architecture, namely hierarchical classification and regression (HCR), designed to overcome the limitation of the flat estimation models. In addition, the HCR architecture can be used to solve a variety of estimation domain problems. The first level of HCR aims at classifying a new unknown case into a specific class (e.g., class i). This case is then input into an estimator trained to predict only the continuous values that have been categorized to class i (over a training set). To determine which output value belongs to which class in a given training set, three data pre-processing approaches are used in this paper. The first approach is the simplest one and is based on ‘hard classification’. It divides the sum of the maximum and minimum output values by k predefined classes. The second and third methods can be regarded as ‘soft classification’ approaches, in which the former and the latter are based on using the fuzzy c-means clustering algorithm and genetic algorithm, respectively, to categorize the training data into one of the k predefined classes (c.f., Section 3.2).

The proposed HCR architecture is based on the divide-and-conquer principle of solving complex problems by first solving subproblems (i.e., simpler tasks) that can be solved with a classifier on the first level and by an estimator on the second level of HCR. This concept is similar to ensemble learning [21] that is inspired by the nature of information processing in the brain which is modular. That is, individual functions can be subdivided into functionally different subprocess or subtasks without mutual interference [17]. Based on this characteristic, our experimental results demonstrate the outperformance of HCR by combining specific classifier and regression models over single flat models and related baselines (c.f. Section 4.2).

The rest of this paper is organized as follows. Section 2 offers an overview of three well-known estimation techniques, linear regression, neural networks, and support vector regression. In addition, related works on hierarchical estimation are described. Section 3 introduces the proposed HCR architecture and three different data pre-processing approaches used to classify the training data into specific classes. Section 4 presents the experimental results and some conclusions are provided in Section 5.

Section snippets

Literature review

Estimation models or estimators are needed to infer the value of unknown parameters in statistical models. Restated, models are needed to estimate parameter values based on measurement data. The estimator uses the measured data as input for parameter estimation [25]. Estimation methods such as supervised prediction involve learning from a set of training examples that includes output attributes that are numeric (i.e., continuous values) rather than discrete values.

Three well-known methods of

The architecture

The proposed hierarchical classification and regression (HCR) approach is a two-level architecture. The first level is to construct a classification model whereas the second level is for a regression model. Fig. 1 shows the HCR architecture for training and constructing the classification and regression models.

In Fig. 1(a) as the first level of HCR, a data pre-processing module is performed to ‘transform’ the original continuous output value of each training data sample into one of k discrete

The datasets

In this study, eight datasets, which deal with different regression oriented problems, are collected from the UCI Machine Learning Repository.2 Table 1 shows the basic information for these datasets. They contain various domain problems, which include different numbers of instances and attributes (including the output variable). We believe that these datasets can be used to reliably assess the prediction performance of the hierarchical models constructed by the

Conclusion

In this study, we introduce a generic architecture capable of hierarchically combining classification and regression models for various estimation problems, namely hierarchical classification and regression (HCR). The HCR is based on using the divide-and-conquer principle, so the estimation problem is divided into a first level of classification and second level of estimation. The first level of HCR aims at classifying the original training set into a number of subsets, in which the data in one

Dr. Shih-Wen Ke is an assistant professor at the Department of Information and Computer Engineering, Chung Yuan Christian University, Taiwan. His research covers information retrieval, machine learning, and data mining.

References (43)

  • C.H. Achen

    Two-step hierarchical estimation: beyond regression analysis

    Political Anal.

    (2005)
  • F. Bellocchio et al.

    Hierarchical approach for multiscale support vector regression

    IEEE Trans. Neural Netw. Learn. Syst.

    (2012)
  • J.C. Bezdek

    Pattern Recognition With Fuzzy Objective Function Algorithms

    (1981)
  • H. Byun et al.

    A survey on pattern recognition applications of support vector machines

    Int. J. Pattern Recognit. Artif. Intell.

    (2003)
  • J. Demsar

    Statistical comparisons of classifiers over multiple data sets

    J. Mach. Learn. Res.

    (2006)
  • Z. Deng et al.

    Minimax probability TSK fuzzy system classifier: a more transparent and highly interpretable classification model

    IEEE Trans. Fuzzy Syst.

    (2015)
  • Z. Deng et al.

    Generalized hidden-mapping ridge regression, knowledge-leveraged inductive transfer learning for neural networks, fuzzy systems and kernel methods

    IEEE Trans. Cybern.

    (2014)
  • N.R. Draper et al.

    Applied regression analysis

    (1998)
  • R.O. Duda et al.

    Pattern classification

    (2001)
  • S. Dumais, H. Chen, Hierarchical classification of web content, in: Proceedings of the International ACM SIGIR...
  • J.A. Hartigan et al.

    A k-means clustering algorithm

    Appl. Stat.

    (1979)
  • Cited by (10)

    • A hierarchical classification/regression algorithm for improving extreme wind speed events prediction

      2022, Renewable Energy
      Citation Excerpt :

      Architectures with a similar idea can be found in the literature [36], in the context of ML approaches for prediction problems. In [37] it is shown that the quality of a prediction improves when an HCR model is applied, with respect to bare ML methods. Similar methods have been implemented in different fields: in [38] a joint classification-regression method has been applied to estimate remaining useful life of devices in industrial manufacturing systems.

    • Three-way decisions based blocking reduction models in hierarchical classification

      2020, Information Sciences
      Citation Excerpt :

      Also when the number of categories is huge, the training of FC models is difficult. The hierarchical classification methods deal with multi-classification problems by dividing a large-scale classification task into several small-scale tasks [8,12,26]. A hierarchical classifier consists of two parts: category hierarchy and classifier.

    View all citing articles on Scopus

    Dr. Shih-Wen Ke is an assistant professor at the Department of Information and Computer Engineering, Chung Yuan Christian University, Taiwan. His research covers information retrieval, machine learning, and data mining.

    Dr. Wei-Chao Lin is an associate professor at the Department of Computer Science and Information Engineering, Asia University, Taiwan. His research interests are machine learning and artificial intelligence applications.

    Dr. Chih-Fong Tsai received a PhD at School of Computing and Technology from the University of Sunderland, UK in 2005. He is now a professor at the Department of Information Management, National Central University, Taiwan. He has published more than 50 technical publications in journals, book chapters, and international conference proceedings. He the Highly Commended Award (Emerald Literati Network 2008 Awards for Excellence) from Online Information Review (“A Review of Image Retrieval Methods for Digital Cultural Heritage Resources”), and the award for top 10 cited articles in 2008 from Expert Systems with Applications (“Using Neural Network Ensembles for Bankruptcy Prediction and Credit Scoring”). His current research focuses on multimedia information retrieval and data mining.

    Dr. Ya-Han Hu is currently an associate professor of the Department of Information Management at National Chung Cheng University, Taiwan. He received a PhD degree in information management from National Central University of Taiwan in 2007. His current research interests include data mining and knowledge discovery, business intelligence and medical informatics. His research has appeared in Artificial Intelligence in Medicine, Data &Knowledge Engineering, Decision Support Systems, IEEE Transactions on Systems, Man, and Cybernetics, Methods of Information in Medicine, Journal of Systems and Software and Journal of Information Science.

    View full text