Evolving fuzzy pattern trees for binary classification on data streams

doi:10.1016/j.ins.2012.02.034

Information Sciences

Volume 220, 20 January 2013, Pages 34-45

https://doi.org/10.1016/j.ins.2012.02.034 Get rights and content

Abstract

Fuzzy pattern trees (FPTs) have recently been introduced as a novel model class for machine learning. In this paper, we consider the problem of learning fuzzy pattern trees for binary classification from data streams. Apart from its practical relevance, this problem is also interesting from a methodological point of view. First, the aspect of efficiency plays an important role in the context of data streams, since learning has to be accomplished under hard time (and memory) constraints. Moreover, a learning algorithm should be adaptive in the sense that an up-to-date model is offered at any time, taking new data items into consideration as soon as they arrive and perhaps forgetting old ones that have become obsolete due to a change of the underlying data generating process. To meet these requirements, we develop an evolving version of fuzzy pattern tree learning, in which model adaptation is realized by anticipating possible local changes of the current model, and confirming these changes through statistical hypothesis testing. In experimental studies, we compare our method to a state-of-the-art tree-based classifier for learning from data streams, showing that evolving pattern trees are competitive in terms of performance while typically producing smaller and more compact models.

Introduction

Fuzzy pattern tree induction was recently introduced as a novel machine learning method for classification by Huang et al. [11]. Independently, the same type of model structure was proposed in [23] under the name “fuzzy operator tree”. An alternative to the original algorithm for learning pattern trees, as proposed in [11], was developed by Senge and Hüllermeier in [20]. Besides, an FPT variant for regression was introduced in [19].

Roughly speaking, a fuzzy pattern tree is a hierarchical, tree-like structure, whose inner nodes are marked with generalized (fuzzy) logical and arithmetic operators. It implements a recursive function that maps a combination of attribute values, entered in the leaf nodes, to a number in the unit interval, produced as an output by the root of the tree. The model class of fuzzy pattern trees is interesting for several reasons. Apart from some properties that make it appealing from a learning point of view (like a built-in feature selection mechanism and the possibility to guarantee monotonicity in certain attributes), FPTs are arguably attractive from an interpretation point of view. Generally, each tree can be considered as a kind of (generalized) logical description of a class.¹ In this regard, pattern trees can be considered as a viable alternative to classical fuzzy rule models. Compared to such models, the hierarchical structure of pattern trees further allows for a more compact representation and for trading off accuracy against model simplicity in a seamless manner.

In recent years, the idea of adaptive learning in dynamical environments has received considerable attention, especially under the slogan of “learning from data streams” [8]. Closely related to this, a special branch of data-driven fuzzy systems modeling has emerged under the notion of “evolving fuzzy systems” [2], [15], [1], [16]. Despite small differences regarding the basic assumptions and the technical setting, the emphasis of goals and performance criteria, or the focus on specific types of applications, the key motivation of these and related fields is the idea of a system that learns incrementally, and maybe even in real-time, on a continuous stream of data, and which is able to properly adapt itself to changes of environmental conditions or properties of the data-generating process.

Motivated by these developments, we propose an extended version of fuzzy pattern trees suitable for learning from data streams. More specifically, building on the (batch learning) algorithm for pattern tree induction as proposed in [20], we develop an evolving variant for the problem of binary classification. The rest of the paper is organized as follows. In Section 2, we start with a brief description of the data stream scenario and recall the special requirements it involves for learning. Fuzzy pattern trees are explained in Section 3, in which we also recall the basic algorithm for learning such trees in batch mode. An extension of this algorithm for learning from data streams in then proposed in Section 4. Finally, an empirical evaluation of this method is presented in Section 5, where evolving fuzzy pattern trees are compared with so-called Hoeffding trees [13] on different types of data streams, both in terms of performance and readability.

Section snippets

Learning from data streams

In recent years, so-called data streams have attracted considerable attention in different fields of computer science, including database systems, data mining, and distributed systems. As the notion suggests, a data stream can roughly be thought of as an ordered sequence of data items, where the input arrives more or less continuously as time progresses [10], [9], [8]. There are various applications in which streams of this type are produced, such as network monitoring, telecommunication

Fuzzy pattern trees

As already mentioned earlier, a fuzzy pattern tree is a hierarchical, tree-like structure. The inner nodes of an FPT are marked with generalized (fuzzy) operators, either logical and arithmetic, whereas the leaf nodes are associated with fuzzy predicates on input attributes. A pattern tree propagates information from the leaf to the root node: a node takes the values of its descendants as input, combines them using the respective operator, and submits the output to its predecessor. Thus, a

Evolving fuzzy pattern trees

The basic idea of our evolving version of fuzzy pattern tree learning (eFPT) is to maintain an ensemble of pattern trees, consisting of a current (active) model and a set of neighbor models. The current model is used to make predictions, while the neighbor models can be seen as anticipated adaptations: they are kept ready to replace the current model in case of a drop in performance, caused, for example, by a drift of the concept to be learned. More generally, the current model is replaced or,

Empirical evaluation

In this section, we compare our evolving fuzzy pattern trees (eFPTs) with Hoeffding trees [13], a state-of-the-art approach for classification on data streams, in terms of performance, stability, and handling of concept drift. We use eFPT in its default setting (i.e., using default parameters n = 100, α = 0.01, p = 3). Experiments are not only conducted with real data sets, but also with synthetic data. As an important advantage of synthetic data, let us note that it allows for conducting experiments

Summary and conclusions

We have proposed an evolving version of the fuzzy pattern tree classifier that meets the increased requirements of incremental learning on data streams. The key idea of eFPT is to maintain, in addition to the current model, a set of neighbor trees that can replace the current model if the performance of the latter is no longer optimal. Thus, a modification of the current model is realized implicitly in the form of a replacement by an alternative tree. A replacement decision is made on the basis

References (23)

P.P. Angelov et al.
Evolving fuzzy classifiers using different model architectures
Fuzzy Sets and Systems
(2008)
P.P. Angelov et al.
Evolving Intelligent Systems
(2010)
S. Ben-David, J. Gehrke, D. Kifer, Detecting change in data streams, in: Proceedings of the 30th International...
A. Bifet, R. Kirkby, Massive Online Analysis Manual, 2009....
P. Domingos, G. Hulten, Catching up with the data: research issues in mining data streams, in: 2001 ACM SIGMOD Workshop...
A. Frank, A. Asuncion, UCI Machine Learning Repository, 2010....
M.M. Gaber et al.
Mining data streams: a review
ACM SIGMOD Record
(2005)
J. Gama et al.
Learning from Data Streams
(2007)
M. Garofalakis, J. Gehrke, R. Rastogi, Querying and mining data streams: you only get one look, in: Proceedings of the...
L. Golab et al.
Issues in data stream management
SIGMOD Record
(2003)

Z. Huang et al.

Pattern trees induction: a new machine learning method

IEEE Transactions on Fuzzy Systems

(2008)

Cited by (33)

Online density estimation over high-dimensional stationary and non-stationary data streams
2019, Data and Knowledge Engineering
Citation Excerpt :
For a system with the mission of online processing on open-ended data streams, the general design criteria [18] are listed below. These criteria have been widely used as metrics for the evaluation of methods of processing data streams, [19–21]. In this work, BSP method [16] is used as the core for density estimation.
Efficient density estimation over an open-ended stream of high-dimensional data is of primary importance to machine learning. In general, parametric methods for density estimation are not suitable for high dimensions, and the widely used non-parametric methods like kernel density estimation (KDE) method fail for high-dimensional datasets. In this paper we present a framework for density estimation over stationary and non-stationary high-dimensional data streams. It is based on a blockized implementation of the Bayesian sequential partitioning (BSP) algorithm. The proposed framework satisfies the general design criteria for systems with the mission of online machine learning and data mining over data streams.
CS-IBC: Cuckoo search based incremental binary classifier for data streams
2019, Journal of King Saud University - Computer and Information Sciences
Citation Excerpt :
Ammar Shaker et al., (Shaker et al., 2013) proposed “Evolving fuzzy pattern trees for binary classification on data streams”, which is contemporarily similar to the proposed model of this manuscript, since both aim to learn and classify the records in incremental fashion. Though the model devised in Shaker et al., 2013 differ from contemporary models in the approach that evinced significance in process completion and classification accuracy, still it is limited due to the constraints of the traditional fuzzy reasoning. Some of the constraints are process complexity due to multiple evolutions of the fuzzification and misclassification due to fuzzy reasoning.
The act of classifying data streams is widely studied in the literature over the last decade. Incremental or progressive learning strategies are adapted to classify the data streams by many research contributions in recent literature. The contemporary affirmation of recent literature indicate that issues like timeliness, linearity of computational complexity, incremental update of the classifier, and concept drift adaptation in data stream classification are still significant constraints. And there is a need for an algorithm to provide good classification performance with a reasonable response time and maximal classification accuracy. In order to arrive at this, Cuckoo Search Based Incremental Binary Classifier (CS-IBC) has been devised in this manuscript. The contributions of the CS-IBC is to define class labels from training data and fasten the class search through bio inspired strategy called “CUCKOO Search”. A periodical update of the classifier is also proposed to update the classifier if a set of new labelled records are given. The CS-IBC is tested on KDDCUP data that contains records, which are labelled as attack prone or normal. Metrics such as classification error rate, latency of the classification strategy and classification accuracy deterioration were assessed to estimate the scope of the CS-IBC as binary classifier. The experimental study indicates that the proposed CS-IBC is robust and scalable.
Finding the hottest item in data streams
2018, Information Sciences
Citation Excerpt :
The data stream exists in many applications especially when the application itself continuously generates or collects data, such as sensor streams [2,13], financial monitoring streams [16,21], biomolecular streams [3,10], etc. Due to the stream volume, substantial analytical tasks have been developed to extract the underlying knowledge of the stream data, including clustering [8,9,14,22], classification [24], mining frequent patterns [7,19,23,26,27], estimating mutual information [17], etc. The hottest item problem can be viewed as a monitoring problem that keeps tracking the best performing item over time.
We study a problem of finding the hottest item interval in a data stream, where the hotness of an item over an interval is determined by its average frequency. Finding the hottest item interval is particularly helpful in business promotions, such as monitoring the peak sales records, finding the hottest period in an online game, digging the highest click rate of an online music, etc. Existing work focus on finding the most frequent item over a fixed length interval. However, these solutions cannot return the hottest interval since the best length (i.e., maximizing the average frequency) is unknown in advance. To discover the hottest item interval, a straightforward solution is to calculate the average frequencies of items for every possible interval length, which is too costly for stream applications. To efficiently compute the hottest item interval, we propose an algorithm that employs the arrival timestamps of items and reduce the search space by three pruning strategies. Extensive experiments show that the proposed algorithms can efficiently discover the hottest item interval on both real and synthetic datasets.
IFC-Filter: Membership function generation for inductive fuzzy classification
2015, Expert Systems with Applications
Citation Excerpt :
The main difference is that FCT are a class of models where leaf nodes can predict the degrees of possibility for multiple classes. Fuzzy pattern trees is a recently emerging class of fuzzy tree algorithms (Huang, Gedeon, & Nikravesh, 2008; Senge & Hullermeier, 2015; Shaker, Senge, & Hüllermeier, 2013). Instead of dividing the input space top-down, pattern trees are constructed bottom-up, where the leaf nodes represent fuzzified input variables that are then combined and aggregated using different tree nodes containing arithmetic and fuzzy-logic operators.
Fuzzy classification can be defined as a method of computing the degrees of membership of objects in classes. There are many approaches to fuzzy classification, most of which generate sophisticated multivariate models that classify all of the input space simultaneously. In contrast, methods for membership function generation (MFG) derive simple models for fuzzy classification that map one input variable to one fuzzy class; therefore, by minimizing complexity, these models are very understandable to human experts. The unique contribution of this paper is a method for membership function generation from real data that is based on inductive logic. Most existing MFG methods apply either parameter optimization heuristics or unsupervised learning and clustering for the definition of the membership function. In contrast to heuristic methods, our method can approximate membership functions of any shape. In comparison to clustering, our approach can make use of a target signal to learn a membership function supervised from the association between two variables. Compared to probabilistic methods, which translate frequency information, i.e., normalized histograms, directly into membership degrees, our approach applies inductive reasoning based on conditional relative frequencies, which are called likelihoods. According to the law of likelihood in inductive logic, it is the ratio between the likelihoods of the data that is of interest when evaluating two alternative hypotheses, not the likelihoods themselves. The greatest advantage of our method is its understandability to human users and thereby the potential for visual analytics. However, experimental evaluation did not show reproducible significant effects on the predictive performance of conventional multivariate regression models. Given that there are already many very accurate multivariate models for fuzzy classification, the practical implication is that IFC-Filter can unfold its unique potential mainly for explaining data, specifically, associations between analytical and target variables, to human decision makers. Lessons learned from two case studies with industry partners demonstrate that IFC-Filter can extract interpretable and actionable knowledge from data.
Recovery analysis for adaptive learning from non-stationary data streams: Experimental design and case study
2015, Neurocomputing
The extension of machine learning methods from static to dynamic environments has received increasing attention in recent years; in particular, a large number of algorithms for learning from so-called data streams has been developed. An important property of dynamic environments is non-stationarity, i.e., the assumption of an underlying data generating process that may change over time. Correspondingly, the ability to properly react to so-called concept change is considered as an important feature of learning algorithms. In this paper, we propose a new type of experimental analysis, called recovery analysis, which is aimed at assessing the ability of a learner to discover a concept change quickly, and to take appropriate measures to maintain the quality and generalization performance of the model. We develop recovery analysis for two types of supervised learning problems, namely classification and regression. Moreover, as a practical application, we make use of recovery analysis in order to compare model-based and instance-based approaches to learning on data streams.
A similarity-based approach for data stream classification
2014, Expert Systems with Applications
Citation Excerpt :
For the single classifier-based approach, the main issue is to build a model from a small portion of the data stream and incrementally update the model using newly arrived examples. The main techniques used are: Artificial Neural Networks (LEARN (Polikar, Udpa, & Honavar, 2000), Fuzzy-UCF (Orriols-Puig, Casillas, & Bernado, 2008)); Rule Learning (Facil (Ferrer-Troyano, Aguilar-Ruiz, & Santos, 2005), OGA (Vivekanandan & Nedunchezhian, 2011), AC-DS (Su, Liu, & Song, 2011)); Decision trees (VFDT (Domingos & Hulten, 2000), VFDTc (Gama, Rocha, & Medas, 2003), FlexDT (Hashemi & Yang, 2009), eFTP (Shaker, Senge, & Hullermeier, 2013)); and Instance-based Learning (TWF and LWF Salganicoff, 1997, SlidingWindows (Klinkenberg & Joachims, 2000), IBL-DS (Beringer & Hullermeier, 2007), IBLStreams (Shaker & Hullermeier, 2013)). For ensemble-based approach, a number of base classifiers are built from different portions of the data stream, and then all base models are combined to form an ensemble of classifiers.
Incremental learning techniques have been used extensively to address the data stream classification problem. The most important issue is to maintain a balance between accuracy and efficiency, i.e., the algorithm should provide good classification performance with a reasonable time response. This work introduces a new technique, named Similarity-based Data Stream Classifier (SimC), which achieves good performance by introducing a novel insertion/removal policy that adapts quickly to the data tendency and maintains a representative, small set of examples and estimators that guarantees good classification rates. The methodology is also able to detect novel classes/labels, during the running phase, and to remove useless ones that do not add any value to the classification process. Statistical tests were used to evaluate the model performance, from two points of view: efficacy (classification rate) and efficiency (online response time). Five well-known techniques and sixteen data streams were compared, using the Friedman’s test. Also, to find out which schemes were significantly different, the Nemenyi’s, Holm’s and Shaffer’s tests were considered. The results show that SimC is very competitive in terms of (absolute and streaming) accuracy, and classification/updating time, in comparison to several of the most popular methods in the literature.

View all citing articles on Scopus

View full text

Evolving fuzzy pattern trees for binary classification on data streams

Abstract

Introduction

Section snippets

Learning from data streams

Fuzzy pattern trees

Evolving fuzzy pattern trees

Empirical evaluation

Summary and conclusions

Fuzzy Sets and Systems

Evolving Intelligent Systems

Mining data streams: a review

ACM SIGMOD Record

Learning from Data Streams

Issues in data stream management

SIGMOD Record

Pattern trees induction: a new machine learning method

IEEE Transactions on Fuzzy Systems