Elsevier

Neural Networks

Volume 123, March 2020, Pages 163-175
Neural Networks

Discriminative structure learning of sum–product networks for data stream classification

https://doi.org/10.1016/j.neunet.2019.12.002Get rights and content

Abstract

Sum–product network (SPN) is a deep probabilistic representation that allows for exact and tractable inference. There has been a trend of online SPN structure learning from massive and continuous data streams. However, online structure learning of SPNs has been introduced only for the generative settings so far. In this paper, we present an online discriminative approach for SPNs for learning both the structure and parameters. The basic idea is to keep track of informative and representative examples to capture the trend of time-changing class distributions. Specifically, by estimating the goodness of model fitting of data points and dynamically maintaining a certain amount of informative examples over time, we generate new sub-SPNs in a recursive and top-down manner. Meanwhile, an outlier-robust margin-based log-likelihood loss is applied locally to each data point and the parameters of SPN are updated continuously using most probable explanation (MPE) inference. This leads to a fast yet powerful optimization procedure and improved discrimination capability between the genuine class and rival classes. Empirical results show that the proposed approach achieves better prediction performance than the state-of-the-art online structure learner for SPNs, while promising order-of-magnitude speedup. Comparison with state-of-the-art stream classifiers further proves the superiority of our approach.

Introduction

Sum–product networks (SPNs) are recently developed deep neural probabilistic models that admit exact inference in time linear to the size of the network (Poon & Domingos, 2011). This has aroused a lot of interest because learning usually involves inference as a subroutine, which is expensive or even intractable in classical graphical models, except models with low treewidth (Jordan et al., 1999, Wainwright and Jordan, 2008). SPNs have manifested their superiorityin dealing with real world data, such as image completion (Poon & Domingos, 2011), classification (Gens & Domingos, 2012), speech (Peharz, Kapeller, Mowlaee, & Pernkopf, 2014) and language processing (Cheng, Kok, Pham, Chieu, & Chai, 2014).

An SPN consists of a rooted directed acyclic graph with internal nodes corresponding to sums and products, and leaves corresponding to tractable distributions. The learning process of SPN structure generates this graph along with its parameters, with the aim of capturing the latent interaction among observed variables. Most algorithms were designed for learning in batch optimization scenario (Dennis and Ventura, 2012, Gens and Domingos, 2013, Peharz et al., 2013), where the full dataset is available to be examined iteratively. With the rise of massive streaming data, there has been a recent focus on online structure learning for SPNs. The dynamic and evolving nature of data streams poses great challenges to structure learning algorithms since it is hard to extract all necessary information from data records in only one pass.

Some online approaches have been proposed to refine the parameters of SPN with fixed structure (Jaini et al., 2016, Poon and Domingos, 2011, Rashwan et al., 2016). One straightforward way is to modify an iterative parameter optimization algorithm to the online mode by restricting parameter updating in only one iteration (Rashwan et al., 2016). Such algorithms include gradient descent, exponentiated gradient and expectation maximization (EM). They can be further sped up by replacing marginal inference with most probable explanation (MPE) inference and implementing hard training mechanisms (Gens and Domingos, 2012, Poon and Domingos, 2011). Instead of maximum likelihood, Rashwan et al. proposed a Bayesian moment matching (BMM) algorithm which lends itself to online learning without suffering from local optima. Jaini et al. extended this paradigm from SPN modeling categorical data to SPN modeling over continuous data (Jaini et al., 2016). While these approaches have proven to be effective in achieving state-of-the-art results, they rely heavily on the pre-specification of SPN structure, which is not trivial.

Some researchers have attempted automated structure learning for SPNs from massive and continuous data streams. In a first attempt, Lee, Heo, and Zhang (2013) built up clusters based on mini-batch samples, and performed training with a top-down structure learner over the newly generated clusters. In their model, new child nodes are hierarchically added onto the existing sum nodes, while product nodes do not change after they are created. A related but different approach was developed by Hsu, Kalra, and Poupart (2017), who considered the more general case of SPNs over continuous variables, and proposed a bottom-up structure learner, which dynamically monitor the change of the correlation coefficients between two variables, and modify the product nodes whenever correlation is detected. Since the product nodes need to maintain the covariance matrix, which is quadratic in the size of their scope, the algorithm is computationally expensive. These two online approaches learn the structure of SPNs generatively by maximizing the joint distribution of all the variables. However, such generative learning can lead to suboptimal prediction performance, due to the mismatch between the learning objective and the goal of classification.

In this paper, we propose an online approach for discriminatively learning both the structure and parameters of SPNs. The benefit of structure update is to improve the representation for streaming data, while parameter update is to improve prediction under drift. In particular, our formulation works with continuous SPNs that have Gaussian leaves. The basic idea is to keep track of informative and representative examples over time to capture the trend of time-changing class distributions. We incorporate a vigilance parameter balance between plasticity1 and stability2 during online discriminative learning. For each new incoming data point, we estimate the goodness of fit of the SPN structures learned so far, and by dynamically maintaining a certain amount of informative examples, we generate new sub-SPNs in a recursive and top-down manner for enriching the representation. Specifically, the sum nodes are obtained by dynamic clustering over the instances, while the product nodes are obtained by partitioning the variables into correlated subsets. To boost the discrimination capability between the genuine class and the closest rival class, an outlier-robust margin-based log-likelihood loss function is applied to each data point, and parameters of SPN are updated continuously using most probable explanation (MPE) inference. In other words, we simply consider the branching paths that traverse the winning child nodes, leading to a fast yet powerful optimization procedure. Empirical results on handwritten digit recognition and stream classification tasks demonstrate that the proposed approach promises appealing performance and efficiency over the well-developed SPNs. In addition, it achieves consistently lower classification errors compared to the state-of-the-art data stream classifiers.

The remainder of this paper is organized as follows. Section 2 reviews the related work. Section 3 provides the basics of SPNs. Section 4 presents the proposed online discriminative structure learning approach. Section 5 presents the experimental results. Section 6 draws concluding remarks.

Section snippets

Related work

Data stream classification is a challenging data mining task because of the dynamic and evolving nature of data. Existing data stream classification approaches can be categorized into three groups: single model classification, ensemble classification and instance-based classification.

  • Single model classification approaches strive to update the model by dynamically keeping track of a fixed or adaptive window of incoming instances. For example, the approaches in Bifet and Gavaldà (2009) and

Sum–product networks

We begin by introducing the notations used throughout this paper. We denote random variables as uppercase letters W, X and Y. We represent the set of values taken by X as val(X), and denote their values using the corresponding lowercase letters, e.g., x is an element of val(X)R. The sets of random variables are denoted by boldface letters W and X. Consider any random variable set X={X1,,XD}, we define the set of its possible values as the Cartesian product val(X)=×d=1Dval(Xd) and use the

Proposed algorithm

Consider a data stream consisting of a continuous sequence of labeled instances (xt,yt) for t=1,2,,T, where xtRd denotes a new instance arriving at time t with d-dimensional features, and yt{1,,L} represents its class label. It is assumed that learner can access the true label yt of instance xt before the arrival of instance xt+1. We present a discriminative structure learning algorithm for SPNs on streaming data. The algorithm incrementally builds up a collection of generative SPNs, one

Experiments

We evaluated the performance of the proposed algorithm on SD3 and SD7 datasets in the NIST Special Database SD19 (Grother, 1995) and three popular stream classification datasets: Spam, Electricity and Covtype.3 We aim to compare our algorithm with the state-of-the-art online SPN solvers and several data stream classification methods, and analyze the effects of the vigilance parameter ρ and cache size τ.

Conclusion

We proposed SPN-DSC, a novel SPN-based classification algorithm for concept-drifting data streams, with the ability of online structure and discriminative parameter learning. SPN-DSC keeps representative examples that characterize the time-changing class distributions to enrich the network representation. A vigilance parameter is used to the tradeoff between the adaptation of already learned structure (i.e., parameter update) and the generation of new sub-structures (i.e., structure update).

Acknowledgments

We wish to thank the anonymous referees for their careful reading and valuable comments. This research work was supported by the National Key R&D Program of China (No.2017YFC0803700), the National Natural Science Foundation of China (No. 61876183, 61721004, 61772525 and U1636220) and the Natural Science Foundation of Beijing Municipality (No. 4172063).

References (40)

  • JinX.B. et al.

    Regularized margin-based conditional log-likelihood loss for prototype learning

    Pattern Recognition

    (2010)
  • Adel, T., Balduzzi, D., & Ghodsi, A. (2015). Learning the structure of sum-product networks via an svd-based algorithm....
  • AggarwalC.C. et al.

    A framework for on-demand classification of evolving data streams

    IEEE Transactions on Knowledge and Data Engineering

    (2006)
  • BianZ. et al.

    Pattern recognition (Chinese Edition)

    (2000)
  • BifetA. et al.

    Adaptive learning from evolving data streams

  • Bifet, A., Holmes, G., Pfahringer, B., Kirkby, R., & Gavaldà, R. (2009). New ensemble methods for evolving data...
  • Cheng, W.-C., Kok, S., Pham, H. V., Chieu, H. L., & Chai, K. M. A. (2014). Language modeling with sum-product networks....
  • DennisA. et al.

    Learning the architecture of sum-product networks using clustering on variables

  • GensR. et al.

    Discriminative learning of sum-product networks

  • Gens, R., & Domingos, P. (2013). Learning the structure of sum-product networks. In Proceedings of the 30th...
  • GrotherP.

    Handprinted forms and character database, nist special database 19Technical Report and CDROM

    (1995)
  • HsuW. et al.

    Online structure learning for sum-product networks with gaussian leaves

    (2017)
  • Hulten, G., Spencer, L., & Domingos, P. (2001). Mining time-changing data streams. In Proceedings of the 7th ACM SIGKDD...
  • Jaini, P., Rashwan, A., Zhao, H., Liu, Y., Banijamali, E., & Chen, Z., et al. (2016). Online algorithms for sum-product...
  • JordanM.I. et al.

    An introduction to variational methods for graphical models

    Machine Learning

    (1999)
  • Lee, S.-W., Heo, M.-O., & Zhang, B.-T. (2013). Online incremental structure learning of sum-product networks. In...
  • Molina, A., Vergari, A., Mauro, N. D., Natarajan, S., Esposito, F., & Kersting, K. (2018). Mixed sum-product networks:...
  • MontielJ. et al.

    Scikit-multiflow: A multi-output streaming framework

    Journal of Machine Learning Research (JMLR)

    (2018)
  • PeharzR.

    Foundations of sum-product networks for probabilistic modeling

    (2015)
  • PeharzR. et al.

    Greedy part-wise learning of sum-product networks

  • View full text