A fast learning method for streaming and randomly ordered multi-class data chunks by using one-pass-throw-away class-wise learning concept
Introduction
With the advancement of current technology, tremendous amount of data in various fields such as business, science, medicine, and others have been generated. These data may overwhelm the storage as well as memory space. Obviously, they make the analysis and classification of these tremendous data complex in terms of time and space complexities. The present classical algorithms for analyzing the data are not efficient enough to cope with this new emerging and challenging problem. Classifying patterns existing among data is one of the processes in analysis. This process concerns the problems of developing an efficient learning method to achieve high accurate results within an acceptable time complexity. Classification of patterns can be applied to many machine intelligence studies such as face recognition (Chen, Han, Wang, Fan, 2011, Wang, Li, Zhang, 2008), object recognition (Li, Bebis, Bourbakis, 2008, Serratosa, Alquézar, Amézquita, 2012), pattern recognition (Khunarsal, Lursinsap, Raicharoen, 2013, Maglogiannis, Sarimveis, Kiranoudis, Chatziioannou, Oikonomou, Aidinis, 2008, Melin, Castillo, 2013, Xinjun, 2010) and pattern classification (Abrahams, Coupey, Zhong, Barkhi, Manasantivongs, 2013, Khanmohammadi, Chou, 2016, Nguyen, Khosravi, Creighton, Nahavandi, 2015). Among the proposed learning methods, it was found that neural learning method is rather more efficient and also practical than other methods. It can handle different patterns of data distribution in a high dimensional space and also behaves as either a linear function or a non-linear function. This ability makes neural learning method possibly achieve a rather low space complexity in terms of number of neurons as well as very high accuracy.
Typically, there are two types of neural learning methods having been proposed. The first type is called batch learning (Khanmohammadi, Chou, 2016, Khunarsal, Lursinsap, Raicharoen, 2013, Nguyen, Khosravi, Creighton, Nahavandi, 2015). It assumed that a sufficient amount of training data set is presented and a fixed network structure is set up. Furthermore, the testing data are also assumed to have the same statistical distribution as that of the training data (Giraud-Carrier, 2000). No new incoming data are allowed to involve or occur after the training process. Although the batch learning usually provides satisfactory results in classification, it is not suitable to handle the tremendous amount of data available because of time consuming and limited storage capacity. Wilson and Martinez (2003) studied the general inefficiency of batch training for gradient descent learning and concluded that the batch training was not a practical approach for a large training data set. Moreover, the batch learning is not suitable to handle the situation when tremendous amount of new data are generated in every second such as banking transactions, biological data, etc. This may be due to the assumption that no new data is allowed to be added during the training process. To cope with the learning time of the batch type, many techniques were proposed to speed up the learning process. The first approach is to reduce the number of dimensions of data (Patra, Widjaja, Das, Ang, 2005, Yan, Ma, Zhu, 2006) and also the size of training set (López, Gagné, Castellanos-Dominguez, Orozco-Alzate, 2015, Verbiest, Derrac, Cornelis, Garca, Herrera, 2016). Another approach is to modify the learning process in terms of error function (Hua, Chungb, Wanga, Yinga, 2012, Jiang, Deng, Wang, Zhang, 2003). These approaches are rather helpful to increase the accuracy but the training time is uncontrollable to reach the desired accuracy.
The concept of incremental learning or continuous learning was introduced to tackle the aforementioned issues by gradually adding neurons and adjusting the weights according to the training data (Langley, 1995). The goal is to achieve the linear time and space complexities of the learning process. However, the concept did not include the aspect of allowing new incoming data to the training process. The set of training data is fixed during the process. A few pre-processing steps were proposed to improve the performance of incremental learning. These steps concerned the feature extraction. Hall, Marshall, and Martin (1998) introduced a constructive method based on adding incrementally observations to an eigenspace model called incremental principal component analysis (IPCA). A modified version of IPCA was proposed by simultaneously performing feature extraction and classification (Ozawa, Toh, Abe, Pang, & Kasabov, 2005). Pang, Ozawa, and Kasabov (2005) proposed an incremental linear discriminant analysis called ILDA in which both between-class and within-class scatter matrices are computed incrementally. They compared the proposed ILDA to original LDA and showed that their method was better and effectively capable of evolving a discriminant eigenspace over a fast and large data stream. Note that every neuron uses the same activation function and only a single network is obtained after the training process.
To further improve the performance, the combination of different types of classifiers was suggested and known as ensemble learning. Polikar, Upda, Upda, and Honavar (2001) proposed an ensemble classifier called Learn++ but the obtained classifier was sensitive to parameter set-up. Wilson and Martinez (2003) proposed the gradient descent incremental learning which is significantly faster than original batch learning but there was no apparent difference in accuracy. Probabilistic RBF or PRBF network was proposed to handle the classification problems (Constantinopoulos & Likas, 2006) by sequentially adding a new component for stationary environment until no component containing data points belonging to more than one class. Duan, Shaob, Houa, Hea, and Zenga (2009) proposed an incremental learning algorithms for Lagrangian Support Vector Machines (LSVM) in both sequential and chunk-incremental learning in which sequential-incremental learning refers to only one sample at a time in each epoch of learning and chunk-incremental learning refers to more than one sample in each epoch. Their results showed that LSVM was faster and more efficient than other sequential and chunk-incremental learning methods based on LSVM. Yi and Wu (2011) presented an incremental SVM based on reserved set for network intrusion detection to reduce the training time. Additionally, memory-base learning methods were presented as incremental learning by which some training data are accumulated incrementally such as an evolving clustering method called ECM (Kasabov, 2002) and a fast prototype-based nearest neighbor classifier called ASC (Shen & Hasegawa, 2008).
Although, these incremental learning techniques claimed a faster speed than that of non-incremental learning, each datum must be repeatedly used to adjust the weights of the network. This leads to the problem of uncontrollable number of repetitions. These methods either required access the learned data many times, forgot the prior knowledge, or could not handle a new incoming class. Therefore, they are not suitable to apply in many practical applications such as data mining, robotics, intrusion detection, business transactions, analyzing real-time satellite images. The several sets of data are presented during the learning process in forms of one-by-one or chunk-by-chunk with various sizes. To solve this new scenario, the learning should be conducted incrementally in one pass called one-pass incremental learning. The term one pass means that the training data are used or accessed only once for a learning process (Kasabov, 2002).
In the past decade, several incremental learning methods have been proposed under one-pass learning concept to reduce the learning time on large scale data. Sequential and chunk-incremental principal component analysis were developed under one-pass environments to handle large-scale classification problem (Ozawa, Pang, & Kasabov, 2008). The performance of the proposed method was evaluated in terms of classification accuracy and learning time. The results showed that chunk-incremental learning could reduce the learning time effectively as compared with sequential-incremental learning and the chunk-incremental learning could obtain major eigenvectors with fairly good approximation. Jaiyen, Lursinsap, and Phimoltares (2010) proposed a new learning method to speed up the convergence rate in almost linear time based on the structure of versatile elliptical basis function (VEBF) neural network. The VEBF is one type of radial-shape function. The proposed learning was conducted under the discard-after-learn concept and one-pass-throw-away learning concept. Their experimental results showed that the classification accuracies were comparable to multilayer perceptron (MLP) and radial basis function (RBF) trained by traditional batch learning. Although, the time and space complexities of VEBF was the lowest among all compared methods but the performance was very susceptible to the orderings effect of an incoming datum. Another disadvantage is that the training samples must be learned one by one even if a chunk of training sample is available at a time. This caused inefficiency in computations because the eigenvector and eigenvalue computation in PCA must be performed to each training sample in the chunk. The method was not suitable to handle incoming data chunk such as banking transaction, intrusion detection, and emerging data on the internet. Xu, Shen, and Zhao (2012) proposed an incremental learning vector quantization (ILVQ) algorithm for pattern classification. The ILVQ outperformed other incremental learning in terms of accuracy and compression ratio. Liu and Ban (2015) applied incremental self-organizing neural network under one pass learning for clustering problem. The compared results showed the superior performance of the proposed algorithm in learning robustness, efficiency, working with outliers without requiring the predefined number of clusters. Recently, Ciarelli, Oliveira, and Salles (2012) and Ciarelli and Oliveira (2015) proposed the incremental learning method called the evolving Probabilistic Neural Network (ePNN) which is an on-line incremental learning method. The method is based on Gaussian Mixture Model and Expectation Maximization (EM) algorithm. Zhou, Zheng, Hu, Xu, and You (2016) proposed a local on-line learning method. In their work, a multiple hyperplane passive aggressive algorithm was integrated with on-line clustering technique. The experimental results achieved notably better performance without using kernel approximation and second order modeling. Fan, Song, and Shrestha (2016) proposed a kernel on-line learning method with adaptive kernel width. This kernel width could be adapted automatically. The simulation results showed that the proposed algorithm could adapt the training data with different initial kernel width. Its performance was better in both accuracy and learning time compared with the kernel algorithms with a fixed kernel width. Although, these incremental learning techniques under one-pass learning concept provided better performance in terms of accuracy and learning time, the relevant parameter update still performed for only one incoming datum at a time for learning purpose. In on-line or sequential learning, the order of data feed into a classifier strongly affects its classification accuracy. Mauro, Mauro, Ferilli, and Basile (2005) focused on how to avoid the order effects in incremental learning. One approach for mitigating the order sensitivity is to present multiple cases or samples to a classifier.
In our study, a practical method to achieve much better learning results than that of Jaiyen et al. (2010) with respect to chunk of data was proposed by reconciling the structure complexity and learning time complexity to alleviate the constraint on learning one incoming datum at a time. A chunk of multi-class data is allowed to enter the training process as a stream of data chunk instead. The concept of one-pass-throw-away learning as introduced in Jaiyen et al. (2010) is also adapted in our study since this concept has been proven to make the time and space complexities minimum. Furthermore, this study also focuses on diminishing the order effect of data presentation in the learning process as occurred in the method of Jaiyen et al. (2010).
This paper is organized as follows. Section 2 formulates the studied problem. Section 3 briefly describes the structure of the VEBF neural network and how to compute the orthonormal basis for the axes rotation along the direction of a dataset. Section 4 presents the concept of Class-wise Incremental learning CIL with VEBF neurons. Section 5 discusses the model evaluation, experimental setting and experimental results on 12 real-world datasets with various sizes. Finally, Section 6 concludes the paper.
Section snippets
Studied problems and constraints
The objective of our studied problem can be stated as follows. Given a temporal sequence of data chunks of different sizes and classes, learn each data chunk i of size ni only once with time complexity of as well as space complexity of O(ni) and discard this chunk forever afterwards. The constraints imposed on this objective are the followings:
- 1.
The concept of one-pass-throw-away learning must be deployed.
- 2.
For each data chunk, the distribution probability of each data class in the data
Overview of versatile elliptic basis function (VEBF) neural network
Versatile elliptic basis function (VEBF) neural network was introduced by Jaiyen et al. (2010). The network is a supervised neural network comprising three layers, namely input, one hidden, and output layers. In the input layer, the number of neurons is equal to the number of attributes of the training set. The hidden layer contains a number of hidden neuron groups. Each group is named sub-hidden layer. The number of groups is set to the number of classes in the learning dataset. During the
Proposed concept and learning algorithm
Instead of following the parameter adjustment of VEBF as in Jaiyen et al. (2010), we proposed a chunk-incremental learning to learn one class existing in the incoming data chunk at a time. The structure of learning network will be gradually expanded by augmenting some new neurons to capture all sub-clusters in the considered class. The parameters of a VEBF are updated according to the data within each sub-cluster. In fact, only some relevant data in a sub-cluster must be used to update the
Experiments
The experiments were conducted in both static and streaming chunk data scenarios. The proposed method was tested with 13 real-world datasets ranging from small size to large size. The size of a dataset was defined as the product of the number of attributes and the number of instances. The 12 well-known datasets were obtained from the University of California at Irvine Lichman (2013) and one dataset was physical protein-protein interaction of yeast Saccharomyces Cerevisiae freely available at //www.scucic.cn/Predict_PPI/index.htm
Conclusion
This paper proposed a new fast chunk-incremental learning algorithm called Class-wise Incremental Learning (CIL) based on a Versatile Elliptic Basis Function (VEBF). It is obvious that the most important concept to achieve the objective learning time complexity is to learn one class at a time. Once the data in any class are learned, they are thrown away and never learned again. By learning each class at a time, the time and space complexities can be easily controlled. Our approach is capable of
Acknowledgments
We are grateful to the Development and Promotion of Science and Technology Talents Project(DPST), the Institute for Promotion and Teaching Science and Technology (IPST), Ministry of Science and Technology, Thailand for financial support.
References (39)
- et al.
Audience targeting by b-to-b advertisement classification: A neural network approach.
Expert Systems With Applications
(2013) - et al.
Achieving a compromise between performance and complexity of structure: An incremental approach
Information Sciences
(2015) - et al.
An incremental neural network with a reduced architecture.
Neural Networks
(2012) - et al.
An incremental learning algorithm for lagrangian support vector machine.
Pattern Recognition Letters
(2009) - et al.
Kernel online learning with adaptive kernel width.
Neurocomputing
(2016) - et al.
A gaussian mixture model based discretization algorithm for associative classification of medical data
Expert Systems With Applications
(2016) - et al.
Very short time environmental sound classification based on spectrogram pattern matching.
Information Sciences
(2013) - et al.
Clustering by growing incremental self-organizing neural network
Expert Systems with Applications
(2015) - et al.
A review on the applications of type-2 fuzzy logic in classification and pattern recognition
Expert Systems with Applications
(2013) - et al.
Classification of healthcare data using genetic fuzzy logic system and wavelets
Expert Systems With Applications
(2015)
A probabilistic integrated object recognition and tracking framework
Expert Systems with Applications
A fast nearest neighbor classifier on self-organizing incremental neural network
Neural Networks
Evolutionary wrapper approaches for training set selection as preprocessing mechanism for support vector machines: Experimental evaluation and support vector analysis
Applied Soft Computing
The general inefficiency of batch training for gradient descent learning
Neural Networks
Incremental SVM based on reserved set for network instrusion detection
Expert Systems with Applications
One-pass online learning: A local approach
Pattern Recognition
Face recognition using nearest feature space embedding.
IEEE Transactions on Pattern Analysis and Machine Intelligence
An incremental training method for the probabilistic RBF network.
IEEE Transactions on Neural Networks
A note on the utility of incremental learning.
AI Communications
Cited by (9)
Adaptive one-pass passive-aggressive radial basis function for classification problems
2022, NeurocomputingCitation Excerpt :To span the observation space, the mentioned study use hyper-elliptic hidden neurons in its architecture. Followed by a batch-based extension in [36], this approach uses a fixed and relatively tiny structure. Consequently, the performance of both the original method and its later descendant is arguable.
A lossless online Bayesian classifier
2019, Information SciencesCitation Excerpt :Finally, the conclusion is given in Section 6. Recently, there are a number of online algorithms using hyper-elliptical capsules as their learning units including a versatile elliptic basis function (VEBF) neural network [21,22] or a multi-stratum network [37]. It can be seen that hyper-elliptical capsules and multivariate Gaussians share a lot of common properties.
Neural Learning with Recoil Behavior in Hyperellipsoidal Structure
2020, IEEE Access