A fast learning method for streaming and randomly ordered multi-class data chunks by using one-pass-throw-away class-wise learning concept

doi:10.1016/j.eswa.2016.07.002

Expert Systems with Applications

Volume 63, 30 November 2016, Pages 249-266

https://doi.org/10.1016/j.eswa.2016.07.002 Get rights and content

Highlights

•
We propose the 1-pass-throw-away learning method to classify a large or stream data.
•
Parameter update for versatile elliptic function in chunk data case is presented.
•
Both incremental and batch methods are used to compare with the proposed method.
•
The method gives high accuracy in classification on 5-fold cross-validation test.
•
The proposed method takes the fast learning time and less complexity on large data.

Abstract

Presently, the amount of data occurring in several business and academic areas such as ATM transactions, web searches, and sensor data are tremendously and continuously increased. Classifying as well as recognizing patterns among these data in a limited memory space complexity are very challenging. Various incremental learning methods have proposed to achieve highly accurate results but both already learned data and new incoming data must be retained throughout the learning process, causing high space and time complexities. In this paper, a new neural learning method based on radial-shaped function and discard-after-learn concept in the data streaming environment was proposed to reduce the space and time complexities. The experimental results showed that the proposed method used 1 to 95 times fewer neurons and 1.2 to 2,700 times faster than the results produced by MLP, RBF, SVM, VEBF, ILVQ, ASC, and other incremental learning methods. It is also robust to the incoming order of data chunks.

Introduction

With the advancement of current technology, tremendous amount of data in various fields such as business, science, medicine, and others have been generated. These data may overwhelm the storage as well as memory space. Obviously, they make the analysis and classification of these tremendous data complex in terms of time and space complexities. The present classical algorithms for analyzing the data are not efficient enough to cope with this new emerging and challenging problem. Classifying patterns existing among data is one of the processes in analysis. This process concerns the problems of developing an efficient learning method to achieve high accurate results within an acceptable time complexity. Classification of patterns can be applied to many machine intelligence studies such as face recognition (Chen, Han, Wang, Fan, 2011, Wang, Li, Zhang, 2008), object recognition (Li, Bebis, Bourbakis, 2008, Serratosa, Alquézar, Amézquita, 2012), pattern recognition (Khunarsal, Lursinsap, Raicharoen, 2013, Maglogiannis, Sarimveis, Kiranoudis, Chatziioannou, Oikonomou, Aidinis, 2008, Melin, Castillo, 2013, Xinjun, 2010) and pattern classification (Abrahams, Coupey, Zhong, Barkhi, Manasantivongs, 2013, Khanmohammadi, Chou, 2016, Nguyen, Khosravi, Creighton, Nahavandi, 2015). Among the proposed learning methods, it was found that neural learning method is rather more efficient and also practical than other methods. It can handle different patterns of data distribution in a high dimensional space and also behaves as either a linear function or a non-linear function. This ability makes neural learning method possibly achieve a rather low space complexity in terms of number of neurons as well as very high accuracy.

Typically, there are two types of neural learning methods having been proposed. The first type is called batch learning (Khanmohammadi, Chou, 2016, Khunarsal, Lursinsap, Raicharoen, 2013, Nguyen, Khosravi, Creighton, Nahavandi, 2015). It assumed that a sufficient amount of training data set is presented and a fixed network structure is set up. Furthermore, the testing data are also assumed to have the same statistical distribution as that of the training data (Giraud-Carrier, 2000). No new incoming data are allowed to involve or occur after the training process. Although the batch learning usually provides satisfactory results in classification, it is not suitable to handle the tremendous amount of data available because of time consuming and limited storage capacity. Wilson and Martinez (2003) studied the general inefficiency of batch training for gradient descent learning and concluded that the batch training was not a practical approach for a large training data set. Moreover, the batch learning is not suitable to handle the situation when tremendous amount of new data are generated in every second such as banking transactions, biological data, etc. This may be due to the assumption that no new data is allowed to be added during the training process. To cope with the learning time of the batch type, many techniques were proposed to speed up the learning process. The first approach is to reduce the number of dimensions of data (Patra, Widjaja, Das, Ang, 2005, Yan, Ma, Zhu, 2006) and also the size of training set (López, Gagné, Castellanos-Dominguez, Orozco-Alzate, 2015, Verbiest, Derrac, Cornelis, Garca, Herrera, 2016). Another approach is to modify the learning process in terms of error function (Hua, Chungb, Wanga, Yinga, 2012, Jiang, Deng, Wang, Zhang, 2003). These approaches are rather helpful to increase the accuracy but the training time is uncontrollable to reach the desired accuracy.

The concept of incremental learning or continuous learning was introduced to tackle the aforementioned issues by gradually adding neurons and adjusting the weights according to the training data (Langley, 1995). The goal is to achieve the linear time and space complexities of the learning process. However, the concept did not include the aspect of allowing new incoming data to the training process. The set of training data is fixed during the process. A few pre-processing steps were proposed to improve the performance of incremental learning. These steps concerned the feature extraction. Hall, Marshall, and Martin (1998) introduced a constructive method based on adding incrementally observations to an eigenspace model called incremental principal component analysis (IPCA). A modified version of IPCA was proposed by simultaneously performing feature extraction and classification (Ozawa, Toh, Abe, Pang, & Kasabov, 2005). Pang, Ozawa, and Kasabov (2005) proposed an incremental linear discriminant analysis called ILDA in which both between-class and within-class scatter matrices are computed incrementally. They compared the proposed ILDA to original LDA and showed that their method was better and effectively capable of evolving a discriminant eigenspace over a fast and large data stream. Note that every neuron uses the same activation function and only a single network is obtained after the training process.

To further improve the performance, the combination of different types of classifiers was suggested and known as ensemble learning. Polikar, Upda, Upda, and Honavar (2001) proposed an ensemble classifier called Learn++ but the obtained classifier was sensitive to parameter set-up. Wilson and Martinez (2003) proposed the gradient descent incremental learning which is significantly faster than original batch learning but there was no apparent difference in accuracy. Probabilistic RBF or PRBF network was proposed to handle the classification problems (Constantinopoulos & Likas, 2006) by sequentially adding a new component for stationary environment until no component containing data points belonging to more than one class. Duan, Shaob, Houa, Hea, and Zenga (2009) proposed an incremental learning algorithms for Lagrangian Support Vector Machines (LSVM) in both sequential and chunk-incremental learning in which sequential-incremental learning refers to only one sample at a time in each epoch of learning and chunk-incremental learning refers to more than one sample in each epoch. Their results showed that LSVM was faster and more efficient than other sequential and chunk-incremental learning methods based on LSVM. Yi and Wu (2011) presented an incremental SVM based on reserved set for network intrusion detection to reduce the training time. Additionally, memory-base learning methods were presented as incremental learning by which some training data are accumulated incrementally such as an evolving clustering method called ECM (Kasabov, 2002) and a fast prototype-based nearest neighbor classifier called ASC (Shen & Hasegawa, 2008).

Although, these incremental learning techniques claimed a faster speed than that of non-incremental learning, each datum must be repeatedly used to adjust the weights of the network. This leads to the problem of uncontrollable number of repetitions. These methods either required access the learned data many times, forgot the prior knowledge, or could not handle a new incoming class. Therefore, they are not suitable to apply in many practical applications such as data mining, robotics, intrusion detection, business transactions, analyzing real-time satellite images. The several sets of data are presented during the learning process in forms of one-by-one or chunk-by-chunk with various sizes. To solve this new scenario, the learning should be conducted incrementally in one pass called one-pass incremental learning. The term one pass means that the training data are used or accessed only once for a learning process (Kasabov, 2002).

In the past decade, several incremental learning methods have been proposed under one-pass learning concept to reduce the learning time on large scale data. Sequential and chunk-incremental principal component analysis were developed under one-pass environments to handle large-scale classification problem (Ozawa, Pang, & Kasabov, 2008). The performance of the proposed method was evaluated in terms of classification accuracy and learning time. The results showed that chunk-incremental learning could reduce the learning time effectively as compared with sequential-incremental learning and the chunk-incremental learning could obtain major eigenvectors with fairly good approximation. Jaiyen, Lursinsap, and Phimoltares (2010) proposed a new learning method to speed up the convergence rate in almost linear time based on the structure of versatile elliptical basis function (VEBF) neural network. The VEBF is one type of radial-shape function. The proposed learning was conducted under the discard-after-learn concept and one-pass-throw-away learning concept. Their experimental results showed that the classification accuracies were comparable to multilayer perceptron (MLP) and radial basis function (RBF) trained by traditional batch learning. Although, the time and space complexities of VEBF was the lowest among all compared methods but the performance was very susceptible to the orderings effect of an incoming datum. Another disadvantage is that the training samples must be learned one by one even if a chunk of training sample is available at a time. This caused inefficiency in computations because the eigenvector and eigenvalue computation in PCA must be performed to each training sample in the chunk. The method was not suitable to handle incoming data chunk such as banking transaction, intrusion detection, and emerging data on the internet. Xu, Shen, and Zhao (2012) proposed an incremental learning vector quantization (ILVQ) algorithm for pattern classification. The ILVQ outperformed other incremental learning in terms of accuracy and compression ratio. Liu and Ban (2015) applied incremental self-organizing neural network under one pass learning for clustering problem. The compared results showed the superior performance of the proposed algorithm in learning robustness, efficiency, working with outliers without requiring the predefined number of clusters. Recently, Ciarelli, Oliveira, and Salles (2012) and Ciarelli and Oliveira (2015) proposed the incremental learning method called the evolving Probabilistic Neural Network (ePNN) which is an on-line incremental learning method. The method is based on Gaussian Mixture Model and Expectation Maximization (EM) algorithm. Zhou, Zheng, Hu, Xu, and You (2016) proposed a local on-line learning method. In their work, a multiple hyperplane passive aggressive algorithm was integrated with on-line clustering technique. The experimental results achieved notably better performance without using kernel approximation and second order modeling. Fan, Song, and Shrestha (2016) proposed a kernel on-line learning method with adaptive kernel width. This kernel width could be adapted automatically. The simulation results showed that the proposed algorithm could adapt the training data with different initial kernel width. Its performance was better in both accuracy and learning time compared with the kernel algorithms with a fixed kernel width. Although, these incremental learning techniques under one-pass learning concept provided better performance in terms of accuracy and learning time, the relevant parameter update still performed for only one incoming datum at a time for learning purpose. In on-line or sequential learning, the order of data feed into a classifier strongly affects its classification accuracy. Mauro, Mauro, Ferilli, and Basile (2005) focused on how to avoid the order effects in incremental learning. One approach for mitigating the order sensitivity is to present multiple cases or samples to a classifier.

In our study, a practical method to achieve much better learning results than that of Jaiyen et al. (2010) with respect to chunk of data was proposed by reconciling the structure complexity and learning time complexity to alleviate the constraint on learning one incoming datum at a time. A chunk of multi-class data is allowed to enter the training process as a stream of data chunk instead. The concept of one-pass-throw-away learning as introduced in Jaiyen et al. (2010) is also adapted in our study since this concept has been proven to make the time and space complexities minimum. Furthermore, this study also focuses on diminishing the order effect of data presentation in the learning process as occurred in the method of Jaiyen et al. (2010).

This paper is organized as follows. Section 2 formulates the studied problem. Section 3 briefly describes the structure of the VEBF neural network and how to compute the orthonormal basis for the axes rotation along the direction of a dataset. Section 4 presents the concept of Class-wise Incremental learning CIL with VEBF neurons. Section 5 discusses the model evaluation, experimental setting and experimental results on 12 real-world datasets with various sizes. Finally, Section 6 concludes the paper.

Section snippets

Studied problems and constraints

The objective of our studied problem can be stated as follows. Given a temporal sequence of data chunks of different sizes and classes, learn each data chunk i of size n_i only once with time complexity of $O (n_{i}^{2})$ as well as space complexity of O(n_i) and discard this chunk forever afterwards. The constraints imposed on this objective are the followings:

1.
The concept of one-pass-throw-away learning must be deployed.
2.
For each data chunk, the distribution probability of each data class in the data

Overview of versatile elliptic basis function (VEBF) neural network

Versatile elliptic basis function (VEBF) neural network was introduced by Jaiyen et al. (2010). The network is a supervised neural network comprising three layers, namely input, one hidden, and output layers. In the input layer, the number of neurons is equal to the number of attributes of the training set. The hidden layer contains a number of hidden neuron groups. Each group is named sub-hidden layer. The number of groups is set to the number of classes in the learning dataset. During the

Proposed concept and learning algorithm

Instead of following the parameter adjustment of VEBF as in Jaiyen et al. (2010), we proposed a chunk-incremental learning to learn one class existing in the incoming data chunk at a time. The structure of learning network will be gradually expanded by augmenting some new neurons to capture all sub-clusters in the considered class. The parameters of a VEBF are updated according to the data within each sub-cluster. In fact, only some relevant data in a sub-cluster must be used to update the

Experiments

The experiments were conducted in both static and streaming chunk data scenarios. The proposed method was tested with 13 real-world datasets ranging from small size to large size. The size of a dataset was defined as the product of the number of attributes and the number of instances. The 12 well-known datasets were obtained from the University of California at Irvine Lichman (2013) and one dataset was physical protein-protein interaction of yeast Saccharomyces Cerevisiae freely available at //www.scucic.cn/Predict_PPI/index.htm

Conclusion

This paper proposed a new fast chunk-incremental learning algorithm called Class-wise Incremental Learning (CIL) based on a Versatile Elliptic Basis Function (VEBF). It is obvious that the most important concept to achieve the objective learning time complexity is to learn one class at a time. Once the data in any class are learned, they are thrown away and never learned again. By learning each class at a time, the time and space complexities can be easily controlled. Our approach is capable of

Acknowledgments

We are grateful to the Development and Promotion of Science and Technology Talents Project(DPST), the Institute for Promotion and Teaching Science and Technology (IPST), Ministry of Science and Technology, Thailand for financial support.

References (39)

A.S. Abrahams et al.
Audience targeting by b-to-b advertisement classification: A neural network approach.
Expert Systems With Applications
(2013)
P.M. Ciarelli et al.
Achieving a compromise between performance and complexity of structure: An incremental approach
Information Sciences
(2015)
P.M. Ciarelli et al.
An incremental neural network with a reduced architecture.
Neural Networks
(2012)
H. Duan et al.
An incremental learning algorithm for lagrangian support vector machine.
Pattern Recognition Letters
(2009)
H. Fan et al.
Kernel online learning with adaptive kernel width.
Neurocomputing
(2016)
S. Khanmohammadi et al.
A gaussian mixture model based discretization algorithm for associative classification of medical data
Expert Systems With Applications
(2016)
P. Khunarsal et al.
Very short time environmental sound classification based on spectrogram pattern matching.
Information Sciences
(2013)
H. Liu et al.
Clustering by growing incremental self-organizing neural network
Expert Systems with Applications
(2015)
P. Melin et al.
A review on the applications of type-2 fuzzy logic in classification and pattern recognition
Expert Systems with Applications
(2013)
T. Nguyen et al.
Classification of healthcare data using genetic fuzzy logic system and wavelets
Expert Systems With Applications
(2015)

F. Serratosa et al.

A probabilistic integrated object recognition and tracking framework

Expert Systems with Applications

(2012)

F. Shen et al.

A fast nearest neighbor classifier on self-organizing incremental neural network

Neural Networks

(2008)

N. Verbiest et al.

Evolutionary wrapper approaches for training set selection as preprocessing mechanism for support vector machines: Experimental evaluation and support vector analysis

Applied Soft Computing

(2016)

D.R. Wilson et al.

The general inefficiency of batch training for gradient descent learning

Neural Networks

(2003)

Y. Yi et al.

Incremental SVM based on reserved set for network instrusion detection

Expert Systems with Applications

(2011)

Z. Zhou et al.

One-pass online learning: A local approach

Pattern Recognition

(2016)

Y.N. Chen et al.

Face recognition using nearest feature space embedding.

IEEE Transactions on Pattern Analysis and Machine Intelligence

(2011)

C. Constantinopoulos et al.

An incremental training method for the probabilistic RBF network.

IEEE Transactions on Neural Networks

(2006)

C. Giraud-Carrier

A note on the utility of incremental learning.

AI Communications

(2000)

Cited by (9)

Adaptive one-pass passive-aggressive radial basis function for classification problems
2022, Neurocomputing
Citation Excerpt :
To span the observation space, the mentioned study use hyper-elliptic hidden neurons in its architecture. Followed by a batch-based extension in [36], this approach uses a fixed and relatively tiny structure. Consequently, the performance of both the original method and its later descendant is arguable.
This paper presents a novel adaptive one-pass Passive-Aggressive Radial Basis Function (APARBF) for classification problems. The APARBF uses elliptic Gaussian neurons followed by a Softmax layer and cross-entropy loss function. This network tries to overcome the elasticity-plasticity dilemma by using the Passive-Aggressive (PA) algorithm and adapting the hidden layer structure. The weight updates have to be plastic to acquire the most information from each sample and at the same time need to be elastic to retrain the information from the past instances. Inspired by PA, a novel update formula for cross-entropy loss minimization has been derived. The adaptive design of the network lets it start with zero hidden neurons and grow or shrink according to the data. For kernel parameters, the adaptive structure determines the correct number of hidden neurons and updates recursively each neuron’s center and covariance matrix. To evaluate our network, we perform two series of experiments. The first experiments compare the proposed APARBF with other recently developed one-pass algorithms (i.e., OVIG, OBHT, SCW, AROW, OGD, and PA). The subsequent experiments include comparing the proposed algorithm with some online adaptive structures such as FGAP-RBF, C-Mantec, McNN, and PBL-McNN, based on their mean classification error and the number of hidden neurons. Wilcoxon sign rank test and Friedman test clearly show the superiority of the proposed network’s results compared with its competitors in one-pass classification problems.
A lossless online Bayesian classifier
2019, Information Sciences
Citation Excerpt :
Finally, the conclusion is given in Section 6. Recently, there are a number of online algorithms using hyper-elliptical capsules as their learning units including a versatile elliptic basis function (VEBF) neural network [21,22] or a multi-stratum network [37]. It can be seen that hyper-elliptical capsules and multivariate Gaussians share a lot of common properties.
We are living in a world progressively driven by data. Besides the issue that big data cannot be entirely stored in the main memory as required by traditional offline learning methods, the problem of learning data that can only be collected over time is also very prevalent. Consequently, there is a need of online methods which can handle sequentially arriving data and offer the same accuracy as offline methods. In this paper, we introduce a new lossless online Bayesian-based classifier which uses the arriving data in a 1-by-1 manner and discards each data right after use. The lossless property of our proposed method guarantees that it can reach the same prediction performance as its offline counterpart regardless of the incremental training order. Experimental results demonstrate its superior performance over many well-known state-of-the-art online learning methods in the literature.
Neural Learning with Recoil Behavior in Hyperellipsoidal Structure
2020, IEEE Access
Scalable Hyper-Ellipsoidal Function with Projection Ratio for Local Distributed Streaming Data Classification
2020, IEEE Access
Concept Drift Detection and Adaptation with Weak Supervision on Streaming Unlabeled Data
2019, arXiv
Streaming chunk incremental learning for class-wise data stream classification with fast learning speed and low structural complexity
2019, PLoS ONE

View all citing articles on Scopus

View full text

A fast learning method for streaming and randomly ordered multi-class data chunks by using one-pass-throw-away class-wise learning concept

Highlights

Abstract

Introduction

Section snippets

Studied problems and constraints

Overview of versatile elliptic basis function (VEBF) neural network

Proposed concept and learning algorithm

Experiments

Conclusion

Acknowledgments

Expert Systems With Applications

Information Sciences

Neural Networks

Pattern Recognition Letters

Neurocomputing

Expert Systems With Applications

Information Sciences

Expert Systems with Applications

Expert Systems with Applications

Expert Systems With Applications

Expert Systems with Applications

Neural Networks

Applied Soft Computing

Neural Networks

Expert Systems with Applications

Pattern Recognition

Face recognition using nearest feature space embedding.

IEEE Transactions on Pattern Analysis and Machine Intelligence

An incremental training method for the probabilistic RBF network.

IEEE Transactions on Neural Networks

A note on the utility of incremental learning.

AI Communications