Discriminative structure learning of sum–product networks for data stream classification
Introduction
Sum–product networks (SPNs) are recently developed deep neural probabilistic models that admit exact inference in time linear to the size of the network (Poon & Domingos, 2011). This has aroused a lot of interest because learning usually involves inference as a subroutine, which is expensive or even intractable in classical graphical models, except models with low treewidth (Jordan et al., 1999, Wainwright and Jordan, 2008). SPNs have manifested their superiorityin dealing with real world data, such as image completion (Poon & Domingos, 2011), classification (Gens & Domingos, 2012), speech (Peharz, Kapeller, Mowlaee, & Pernkopf, 2014) and language processing (Cheng, Kok, Pham, Chieu, & Chai, 2014).
An SPN consists of a rooted directed acyclic graph with internal nodes corresponding to sums and products, and leaves corresponding to tractable distributions. The learning process of SPN structure generates this graph along with its parameters, with the aim of capturing the latent interaction among observed variables. Most algorithms were designed for learning in batch optimization scenario (Dennis and Ventura, 2012, Gens and Domingos, 2013, Peharz et al., 2013), where the full dataset is available to be examined iteratively. With the rise of massive streaming data, there has been a recent focus on online structure learning for SPNs. The dynamic and evolving nature of data streams poses great challenges to structure learning algorithms since it is hard to extract all necessary information from data records in only one pass.
Some online approaches have been proposed to refine the parameters of SPN with fixed structure (Jaini et al., 2016, Poon and Domingos, 2011, Rashwan et al., 2016). One straightforward way is to modify an iterative parameter optimization algorithm to the online mode by restricting parameter updating in only one iteration (Rashwan et al., 2016). Such algorithms include gradient descent, exponentiated gradient and expectation maximization (EM). They can be further sped up by replacing marginal inference with most probable explanation (MPE) inference and implementing hard training mechanisms (Gens and Domingos, 2012, Poon and Domingos, 2011). Instead of maximum likelihood, Rashwan et al. proposed a Bayesian moment matching (BMM) algorithm which lends itself to online learning without suffering from local optima. Jaini et al. extended this paradigm from SPN modeling categorical data to SPN modeling over continuous data (Jaini et al., 2016). While these approaches have proven to be effective in achieving state-of-the-art results, they rely heavily on the pre-specification of SPN structure, which is not trivial.
Some researchers have attempted automated structure learning for SPNs from massive and continuous data streams. In a first attempt, Lee, Heo, and Zhang (2013) built up clusters based on mini-batch samples, and performed training with a top-down structure learner over the newly generated clusters. In their model, new child nodes are hierarchically added onto the existing sum nodes, while product nodes do not change after they are created. A related but different approach was developed by Hsu, Kalra, and Poupart (2017), who considered the more general case of SPNs over continuous variables, and proposed a bottom-up structure learner, which dynamically monitor the change of the correlation coefficients between two variables, and modify the product nodes whenever correlation is detected. Since the product nodes need to maintain the covariance matrix, which is quadratic in the size of their scope, the algorithm is computationally expensive. These two online approaches learn the structure of SPNs generatively by maximizing the joint distribution of all the variables. However, such generative learning can lead to suboptimal prediction performance, due to the mismatch between the learning objective and the goal of classification.
In this paper, we propose an online approach for discriminatively learning both the structure and parameters of SPNs. The benefit of structure update is to improve the representation for streaming data, while parameter update is to improve prediction under drift. In particular, our formulation works with continuous SPNs that have Gaussian leaves. The basic idea is to keep track of informative and representative examples over time to capture the trend of time-changing class distributions. We incorporate a vigilance parameter balance between plasticity1 and stability2 during online discriminative learning. For each new incoming data point, we estimate the goodness of fit of the SPN structures learned so far, and by dynamically maintaining a certain amount of informative examples, we generate new sub-SPNs in a recursive and top-down manner for enriching the representation. Specifically, the sum nodes are obtained by dynamic clustering over the instances, while the product nodes are obtained by partitioning the variables into correlated subsets. To boost the discrimination capability between the genuine class and the closest rival class, an outlier-robust margin-based log-likelihood loss function is applied to each data point, and parameters of SPN are updated continuously using most probable explanation (MPE) inference. In other words, we simply consider the branching paths that traverse the winning child nodes, leading to a fast yet powerful optimization procedure. Empirical results on handwritten digit recognition and stream classification tasks demonstrate that the proposed approach promises appealing performance and efficiency over the well-developed SPNs. In addition, it achieves consistently lower classification errors compared to the state-of-the-art data stream classifiers.
The remainder of this paper is organized as follows. Section 2 reviews the related work. Section 3 provides the basics of SPNs. Section 4 presents the proposed online discriminative structure learning approach. Section 5 presents the experimental results. Section 6 draws concluding remarks.
Section snippets
Related work
Data stream classification is a challenging data mining task because of the dynamic and evolving nature of data. Existing data stream classification approaches can be categorized into three groups: single model classification, ensemble classification and instance-based classification.
Single model classification approaches strive to update the model by dynamically keeping track of a fixed or adaptive window of incoming instances. For example, the approaches in Bifet and Gavaldà (2009) and
Sum–product networks
We begin by introducing the notations used throughout this paper. We denote random variables as uppercase letters , and . We represent the set of values taken by as , and denote their values using the corresponding lowercase letters, e.g., is an element of . The sets of random variables are denoted by boldface letters and . Consider any random variable set , we define the set of its possible values as the Cartesian product and use the
Proposed algorithm
Consider a data stream consisting of a continuous sequence of labeled instances for , where denotes a new instance arriving at time with -dimensional features, and represents its class label. It is assumed that learner can access the true label of instance before the arrival of instance . We present a discriminative structure learning algorithm for SPNs on streaming data. The algorithm incrementally builds up a collection of generative SPNs, one
Experiments
We evaluated the performance of the proposed algorithm on SD3 and SD7 datasets in the NIST Special Database SD19 (Grother, 1995) and three popular stream classification datasets: Spam, Electricity and Covtype.3 We aim to compare our algorithm with the state-of-the-art online SPN solvers and several data stream classification methods, and analyze the effects of the vigilance parameter and cache size .
Conclusion
We proposed SPN-DSC, a novel SPN-based classification algorithm for concept-drifting data streams, with the ability of online structure and discriminative parameter learning. SPN-DSC keeps representative examples that characterize the time-changing class distributions to enrich the network representation. A vigilance parameter is used to the tradeoff between the adaptation of already learned structure (i.e., parameter update) and the generation of new sub-structures (i.e., structure update).
Acknowledgments
We wish to thank the anonymous referees for their careful reading and valuable comments. This research work was supported by the National Key R&D Program of China (No.2017YFC0803700), the National Natural Science Foundation of China (No. 61876183, 61721004, 61772525 and U1636220) and the Natural Science Foundation of Beijing Municipality (No. 4172063).
References (40)
- et al.
Regularized margin-based conditional log-likelihood loss for prototype learning
Pattern Recognition
(2010) - Adel, T., Balduzzi, D., & Ghodsi, A. (2015). Learning the structure of sum-product networks via an svd-based algorithm....
- et al.
A framework for on-demand classification of evolving data streams
IEEE Transactions on Knowledge and Data Engineering
(2006) - et al.
Pattern recognition (Chinese Edition)
(2000) - et al.
Adaptive learning from evolving data streams
- Bifet, A., Holmes, G., Pfahringer, B., Kirkby, R., & Gavaldà, R. (2009). New ensemble methods for evolving data...
- Cheng, W.-C., Kok, S., Pham, H. V., Chieu, H. L., & Chai, K. M. A. (2014). Language modeling with sum-product networks....
- et al.
Learning the architecture of sum-product networks using clustering on variables
- et al.
Discriminative learning of sum-product networks
- Gens, R., & Domingos, P. (2013). Learning the structure of sum-product networks. In Proceedings of the 30th...
Handprinted forms and character database, nist special database 19Technical Report and CDROM
Online structure learning for sum-product networks with gaussian leaves
An introduction to variational methods for graphical models
Machine Learning
Scikit-multiflow: A multi-output streaming framework
Journal of Machine Learning Research (JMLR)
Foundations of sum-product networks for probabilistic modeling
Greedy part-wise learning of sum-product networks
Cited by (3)
A Hybrid Framework for Effective Prediction of Online Streaming Data
2021, Journal of Physics: Conference SeriesResearch on English movie resource information mining based on dynamic data stream classification
2021, Security and Communication NetworksResearch on mining of applied mathematics educational resources based on edge computing and data stream classification
2021, Mobile Information Systems