Dynamic Hybrid Random Fields for the Probabilistic Graphical Modeling of Sequential Data: Definitions, Algorithms, and an Application to Bioinformatics

Bongini, Marco; Freno, Antonino; Laveglia, Vincenzo; Trentin, Edmondo

doi:10.1007/s11063-017-9730-3

Dynamic Hybrid Random Fields for the Probabilistic Graphical Modeling of Sequential Data: Definitions, Algorithms, and an Application to Bioinformatics

Published: 26 October 2017

Volume 48, pages 733–768, (2018)
Cite this article

Neural Processing Letters Aims and scope Submit manuscript

Marco Bongini¹,
Antonino Freno²,
Vincenzo Laveglia^1,3 &
…
Edmondo Trentin ORCID: orcid.org/0000-0003-2197-0703¹

214 Accesses
1 Citation
Explore all metrics

Abstract

The paper introduces a dynamic extension of the hybrid random field (HRF), called dynamic HRF (D-HRF). The D-HRF is aimed at the probabilistic graphical modeling of arbitrary-length sequences of sets of (time-dependent) discrete random variables under Markov assumptions. Suitable maximum likelihood algorithms for learning the parameters and the structure of the D-HRF are presented. The D-HRF inherits the computational efficiency and the modeling capabilities of HRFs, subsuming both dynamic Bayesian networks and Markov random fields. The behavior of the D-HRF is first evaluated empirically on synthetic data drawn from probabilistic distributions having known form. Then, D-HRFs (combined with a recurrent autoencoder) are successfully applied to the prediction of the disulfide-bonding state of cysteines from the primary structure of proteins in the Protein Data Bank.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Probabilistic Modeling in Machine Learning

On the need for structure modelling in sequence prediction

Article Open access 21 July 2016

Learning Hidden Markov Models Using Probabilistic Matrix Factorization

Notes

Bearing in mind this sequential dynamics of the model, we will occasionally refer to the RNN as operating “over time”, such that it is fed with t-th residue at “time t”.
Available publicly at http://www.biocomp.unibo.it/savojard/PDBCYS.ssbonds.txt.

References

Baldi P, Brunak S (1998) Bioinformatics: the machine learning approach. MIT Press, Cambridge
MATH Google Scholar
Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, Shindyalov IN, Bourne PE (2000) The protein data bank. Nucleic Acids Res 28(1):235–242
Article Google Scholar
Besag J (1975) Statistical analysis of non-lattice data. The Statistician 24:179–195
Article Google Scholar
Bongini M, Laveglia V, Trentin E (2016) A hybrid recurrent neural network/dynamic probabilistic graphical model predictor of the disulfide bonding state of cysteines from the primary structure of proteins. In: Proceedings of the 7th IAPR TC 3 workshop on artificial neural networks in pattern recognition, pp 257–268. Springer
Bongini M, Trentin E (2012) Towards a novel probabilistic graphical model of sequential data: a solution to the problem of structure learning and an empirical evaluation. In: Proceedings of the 5th INNS IAPR TC 3 GIRPR workshop on artificial neural networks in pattern recognition, pp. 82–92. Springer
Cappé O, Moulines E, Ryden T (2005) Inference in hidden Markov models. Springer, New York
MATH Google Scholar
Ceroni A, Passerini A, Vullo A, Frasconi P (2006) DISULFIND: a disulfide bonding state and cysteine connectivity prediction server. Nucleic Acids Res 34(Web–Server–Issue):177–181
Article Google Scholar
Chen Y, Lin Y, Lin C, Hwang J (2004) Prediction of the bonding states of cysteines using the support vector machines based on multiple feature vectors and cysteine state sequences. Proteins 55(4):1036–1042
Article Google Scholar
Chung W, Yang C, Hor C (2009) An effective tuning method for cysteine state classification. In: Proceedings of the national computer symposium, workshop on algorithms and bioinformatics, Taipei, Taiwan, 27–28 November
Cooper GF, Herskovits E (1992) A Bayesian method for the induction of probabilistic networks from data. Mach Learn 9:309–347
MATH Google Scholar
Crick F (1970) Central dogma of molecular biology. Nature 227(5258):561–563
Article Google Scholar
Elidan G, Gould S (2009) Learning bounded treewidth Bayesian networks. In: Koller D, Schuurmans D, Bengio Y, Bottou L (eds) Advances in neural information processing systems, vol 21. Curran Associates, Red Hook, pp 417–424
Google Scholar
Fariselli P, Riccobelli P, Casadio R (1999) Role of evolutionary information in predicting the disulfide-bonding state of cysteine in proteins. Proteins 36(3):340–346
Article Google Scholar
Frasconi P, Passerini A, Vullo A (2002) A two-stage SVM architecture for predicting the disulfide bonding state of cysteines. In: Proceedings of the IEEE workshop on neural networks for signal processing, pp 25–34
Freno A, Trentin E (2011) Hybrid random fields: a scalable approach to structure and parameter learning in probabilistic graphical models. Springer, New York
Book Google Scholar
Freno A, Trentin E, Gori M (2009) A hybrid random field model for scalable statistical learning. Neural Netw 22:603–613
Article Google Scholar
Freno A, Trentin E, Gori M (2009) Scalable pseudo-likelihood estimation in hybrid random fields. In: Elder J, Fogelman-Souli F, Flach P, Zaki M (eds) Proceedings of the 15th ACM SIGKDD conference on knowledge discovery and data mining (KDD 2009), pp 319–327. ACM
Freno A, Trentin E, Gori M (2009) Scalable statistical learning: a modular Bayesian/Markov network approach. In: Proceedings of the international joint conference on neural networks (IJCNN 2009), pp 890–897. IEEE
Freno A, Trentin E, Gori M (2010) Kernel-based hybrid random fields for nonparametric density estimation. In: Proceedings of the 19th European conference on artificial intelligence (ECAI), pp 427–432. IOS Press, Amsterdam, The Netherlands
Friedman N (1998) The Bayesian structural EM algorithm. In: Proceedings of the fourteenth conference on uncertainty in artificial intelligence, pp 129–138. Morgan Kaufmann, San Francisco, USA
Ghahramani Z (1998) Learning dynamic Bayesian networks. In: Adaptive processing of sequences and data structures, pp 168–197. Springer
Gilks WR, Richardson S, Spiegelhalter D (1996) Markov chain Monte Carlo in practice. CRC Press, Boca Raton
MATH Google Scholar
Grünwald PD (2007) The minimum description length principle. The MIT Press, Cambridge
Google Scholar
Hertz J, Krogh A, Palmer RG (1991) Introduction to the theory of neural computation. Addison Wesley, Redwood
Google Scholar
Kindermann R, Snell JL (1980) Markov random fields and their applications. American Mathematical Society, Providence
Book Google Scholar
Koller D, Friedman N (2009) Probabilistic graphical models: principles and techniques. MIT Press, Cambridge
MATH Google Scholar
Lafferty J, McCallum A, Pereira F (2001) Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: Proceedings of 18th international conference on machine learning, pp 282–289. Morgan Kaufmann, San Francisco, CA
Lauritzen SL (1996) Graphical models. Oxford University Press, Oxford
MATH Google Scholar
Lin C, Yang C, Hor C, Huang K (2010) Disulfide bonding state prediction with SVM based on protein types. In: Proceedings of the fifth IEEE international conference on bio-inspired computing: theories and applications, pp 1436–1442
Mandic DP, Chambers J (2001) Recurrent neural networks for prediction: learning algorithms, architectures and stability. Wiley, New York
Book Google Scholar
Mitchell TM (1997) Machine learning. McGraw-Hill, New York
MATH Google Scholar
Mozer MC (1989) A focused backpropagation algorithm for temporal pattern recognition. Complex Syst 3(1):349–381
MathSciNet MATH Google Scholar
Neal R, Hinton GE (1999) A view of the EM algorithm that justifies incremental, sparse, and other variants. In: Jordan MI (ed) Learning in graphical models. MIT Press, Cambridge, pp 355–368
Google Scholar
Neapolitan RE (2004) Learning Bayesian networks. Prentice Hall, Upper Saddle River
Google Scholar
Pearl J (1985) Bayesian networks: a model of self-activated memory for evidential reasoning. In: Proceedings of the 7th conference of the cognitive science society, University of California, Irvine, pp 329–334
Pearl J (1988) Probabilistic reasoning in intelligent systems. Morgan Kaufmann, San Francisco
MATH Google Scholar
Rabiner LR (1989) A tutorial on hidden Markov models and selected applications in speech recognition. In: Proceedings of the IEEE, vol 77, no 2, pp 257–286
Article Google Scholar
Rust R, Schmittlein D (1985) A Bayesian cross-validated likelihood method for comparing alternative specifications of quantitative models. Mark Sci 4(1):20–40
Article Google Scholar
Savojardo C, Fariselli P, Alhamdoosh M, Martelli PL, Pierleoni A, Casadio R (2011) Improving the prediction of disulfide bonds in eukaryotes with machine learning methods and protein subcellular localization. Bioinformatics 27(16):2224–2230
Article Google Scholar
Scanagatta M, de Campos CP, Corani G, Zaffalon M (2015) Learning Bayesian networks with thousands of variables. In: Cortes C, Lawrence ND, Lee DD, Sugiyama M, Garnett R (eds) Advances in neural information processing systems, vol 28. Curran Associates, Red Hook, pp 1864–1872
Google Scholar
Scanagatta M, Corani G, de Campos CP, Zaffalon M (2016) Learning treewidth-bounded Bayesian networks with thousands of variables. In: Lee DD, Sugiyama M, Luxburg UV, Guyon I, Garnett R (eds) Advances in neural information processing systems, vol 29. Curran Associates, Red Hook, pp 1462–1470
Google Scholar
Shoombuatong W, Traisathit P, Prasitwattanaseree S, Tayapiwatana C, Cutler RW, Chaijaruwanich J (2011) Prediction of the disulphide bonding state of cysteines in proteins using conditional random fields. IJDMB 5(4):449–464
Article Google Scholar
Singh R (2008) A review of algorithmic techniques for disulfide-bond determination. Brief Funct Genomic Proteomic 7(2):157–172
Article Google Scholar
Trentin E (2015) Maximum-likelihood normalization of features increases the robustness of neural-based spoken human–computer interaction. Pattern Recogn Lett 66(C):71–80
Article Google Scholar
Trentin E, Bongini M (2012) Towards a novel probabilistic graphical model of sequential data: fundamental notions and a solution to the problem of parameter learning. In: Proceedings of the 5th INNS IAPR TC 3 GIRPR workshop on artificial neural networks in pattern recognition, pp 72–81. Springer
Trentin E, Cattoni R (1999) Learning perception for indoor robot navigation with a hybrid hidden Markov model/recurrent neural networks approach. Connect Sci 11(3–4):243–265
Article Google Scholar
Trentin E, Giuliani D (2001) A mixture of recurrent neural networks for speaker normalisation. Neural Comput Appl 10(2):120–135
Article Google Scholar
Witten IH, Frank E (2005) Data mining, 2nd edn. Morgan Kaufmann, San Francisco
MATH Google Scholar

Download references

Author information

Authors and Affiliations

DIISM, Università di Siena, Via Roma, 56, 53100, Siena, Italy
Marco Bongini, Vincenzo Laveglia & Edmondo Trentin
Zalando SE, Charlottenstrasse, 4, 10969, Berlin, Germany
Antonino Freno
DINFO, Università di Firenze, Via di S. Marta, 3, 50139, Florence, Italy
Vincenzo Laveglia

Authors

Marco Bongini
View author publications
You can also search for this author in PubMed Google Scholar
Antonino Freno
View author publications
You can also search for this author in PubMed Google Scholar
Vincenzo Laveglia
View author publications
You can also search for this author in PubMed Google Scholar
Edmondo Trentin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Edmondo Trentin.

Appendices

A Formal Definition of HRF and Related Notions

The material presented in this Appendix and in the next one is drawn from [15]. HRFs are aimed at representing joint probability distributions underlying sets of random variables. The definition of HRF involves a few preliminary notions concerning directed and undirected graphs. First, let us define the concept of (directed) union of directed graphs:

Definition 1

If $\mathcal {G}_{1} = (\mathcal {V}_{1}, \mathcal {E}_{1})$ and $\mathcal {G}_{2} = (\mathcal {V}_{2}, \mathcal {E}_{2})$ are directed graphs, then the (directed) union of $\mathcal {G}_{1}$ and $\mathcal {G}_{2}$ (denoted by $\mathcal {G}_{1} \cup \mathcal {G}_{2}$) is the directed graph $\mathcal {G} = (\mathcal {V}, \mathcal {E})$ such that $\mathcal {V} = \mathcal {V}_{1} \cup \mathcal {V}_{2}$ and $\mathcal {E} = \mathcal {E}_{1} \cup \mathcal {E}_{2}$.

Given a set of directed graphs $\mathcal {G}_{1}, \ldots , \mathcal {G}_{n}$, the directed union $\mathcal {G} = \bigcup _{i=1}^{n} \mathcal {G}_{i}$ results from iterated application of the binary union operator. Similarly, we define the notion of (undirected) union for undirected graphs as follows:

Definition 2

If $\mathcal {G}_{1} = (\mathcal {V}_{1}, \mathcal {E}_{1})$ and $\mathcal {G}_{2} = (\mathcal {V}_{2}, \mathcal {E}_{2})$ are undirected graphs, then the (undirected) union of $\mathcal {G}_{1}$ and $\mathcal {G}_{2}$ (denoted by $\mathcal {G}_{1} \cup \mathcal {G}_{2}$) is the undirected graph $\mathcal {G} = (\mathcal {V}, \mathcal {E})$ such that $\mathcal {V} = \mathcal {V}_{1} \cup \mathcal {V}_{2}$ and $\mathcal {E} = \mathcal {E}_{1} \cup \mathcal {E}_{2}$.

Now, if $\mathcal {G} = (\mathcal {V}, \mathcal {E})$ is a directed graph, then we say that $\mathcal {G}^{*} = (\mathcal {V}^{*}, \mathcal {E}^{*})$ is the undirected version of $\mathcal {G}$ if $\mathcal {V}^{*} = \mathcal {V}$ and $\mathcal {E}^{*} = \{\{X_{i}, X_{j}\}: (X_{i}, X_{j}) \in \mathcal {E}\}$. Thus, we also define the undirected union for a pair of directed graphs:

Definition 3

If $\mathcal {G}_{1} = (\mathcal {V}_{1}, \mathcal {E}_{1})$ and $\mathcal {G}_{2} = (\mathcal {V}_{2}, \mathcal {E}_{2})$ are directed graphs, then the undirected union of $\mathcal {G}_{1}$ and $\mathcal {G}_{2}$ (denoted by $\mathcal {G}_{1} \Cup \mathcal {G}_{2}$) is the (undirected) graph $\mathcal {G}^{*} = \mathcal {G}_{1}^{*} \cup \mathcal {G}_{2}^{*}$ such that $\mathcal {G}_{1}^{*}$ and $\mathcal {G}_{2}^{*}$ are the undirected versions of $\mathcal {G}_{1}$ and $\mathcal {G}_{2}$ respectively.

Finally, a HRF can be defined as follows:

Definition 4

Let $\mathbf {X}$ be a set of random variables $X_{1}, \ldots , X_{n}$. A hybrid random field for $X_{1}, \ldots , X_{n}$ is a set of Bayesian networks $BN_{1}, \ldots , BN_{n}$ (with DAGs $\mathcal {G}_{1}, \ldots , \mathcal {G}_{n}$) such that:

1.
Each $BN_{i}$ contains $X_{i}$ plus a subset $\mathcal {R}(X_{i})$ of $\mathbf {X} \setminus \{X_{i}\}$;
2.
If $\mathcal {MB}_{i}(X_{i})$ denotes the Markov blanket [36] of $X_{i}$ in $BN_{i}$ (i.e. the set of the parents, the children, and the parents of the children of $X_{i}$ in $\mathcal {G}_{i}$) and $P(X_{i} \mid \mathcal {MB}_{i}(X_{i}))$ is the conditional distribution of $X_{i}$ given $\mathcal {MB}_{i}(X_{i})$, as derivable from $BN_{i}$, then at least one of the following conditions is satisfied:
1. (a)
  The directed graph $\mathcal {G} = \bigcup _{i=1}^{n} \mathcal {G}_{i}$ is acyclic and there is a Bayesian network $h_{\mathcal {G}}$ with DAG $\mathcal {G}$ such that, for each $X_{i}$ in $\mathbf {X}$, $P(X_{i} \mid \mathcal {MB}_{i}(X_{i})) = P(X_{i} \mid \mathcal {MB}_{\mathcal {G}}(X_{i}))$, where $\mathcal {MB}_{\mathcal {G}}(X_{i})$ is the Markov blanket of $X_{i}$ in $h_{\mathcal {G}}$ and $P(X_{i} \mid \mathcal {MB}_{\mathcal {G}}(X_{i}))$ is the conditional distribution of $X_{i}$ given $\mathcal {MB}_{\mathcal {G}}(X_{i})$, as entailed by $h_{\mathcal {G}}$;
2. (b)
  There is a Markov random field $h_{\mathcal {G}^{*}}$ with graph $\mathcal {G}^{*}$ such that $\mathcal {G}^{*} = \mathcal {G}_{1} \Cup \ldots \Cup \mathcal {G}_{n}$ and, for each $X_{i}$ in $\mathbf {X}$, $P(X_{i} \mid \mathcal {MB}_{i}(X_{i})) = P(X_{i} \mid \mathcal {MB}_{\mathcal {G}^{*}}(X_{i}))$, where $\mathcal {MB}_{\mathcal {G}^{*}}(X_{i})$ is the Markov blanket of $X_{i}$ in $h_{\mathcal {G}^{*}}$ and $P(X_{i} \mid \mathcal {MB}_{\mathcal {G}^{*}}(X_{i}))$ is the conditional distribution of $X_{i}$ given $\mathcal {MB}_{\mathcal {G}^{*}}(X_{i})$, as derived from $h_{\mathcal {G}^{*}}$.

The elements of $\mathcal {R}(X_{i})$ are called ‘relatives of $X_{i}$’. That is, the relatives of a node $X_{i}$ in a HRF are the nodes appearing in graph $\mathcal {G}_{i}$ (except for $X_{i}$ itself). An example of HRF over the variables $X_1, \ldots , X_4$ is shown in Fig. 10.

Two fundamental properties of HRFs stem from the following Modularity Theorem, whose proof is given in [15].

Theorem 1

Suppose that h is a hybrid random field for the random vector $\mathbf {X} = (X_{1}, \ldots , X_{n})$, and let h be made up by Bayesian networks $BN_{1}, \ldots , BN_{n}$ (with DAGs $\mathcal {G}_{1}, \ldots , \mathcal {G}_{n}$). Then, if each conditional distribution $P(X_{i} \mid \mathcal {MB}_{i}(X_{i}))$ is strictly positive (where $1 \le i \le n$), h has the following properties:

1.
An ordered Gibbs sampler [22] applied to h, i.e. an ordered Gibbs sampler which is applied to the conditional distributions $P(X_{1} \mid \mathcal {MB}_{1}(X_{1})), \ldots , P(X_{n} \mid \mathcal {MB}_{n}(X_{n}))$, defines a joint probability distribution over $\mathbf {X}$ via its (unique) stationary distribution $P(\mathbf {X})$;
2.
For each variable $X_{i}$ in $\mathbf {X}$, $P(\mathbf {X})$ is such that $P(X_{i} \mid \mathbf {X} \setminus \{X_{i}\}) = P(X_{i} \mid \mathcal {MB}_{i}(X_{i}))$.

The first condition in Theorem 1 is exploited in D-HRFs in order to express the overall likelihood of the D-HRF over an input sequence in terms of local, state-specific joint probability distributions (namely, the emission probabilities) modeled via HRFs. In turn, we refer to the second condition in Theorem 1 as the modularity property. Based on the modularity property, the set $\mathcal {MB}_{i}(X_{i})$ is a Markov blanket of $X_{i}$ in $\mathbf {X}$.

B Likelihood Versus Pseudo-Likelihood in HRFs

Let ${{\mathcal {X}}} = \{X_{1}, \ldots , X_{n}\}$ be any set of n discrete random variables of interest. In order to extract the joint probability $P({{\mathcal {X}}})$ from a HRF [15], Gibbs sampling techniques need to be used [15]. Unfortunately, Gibbs sampling can be computationally intractable [22]. Therefore, in practice we can follow in the footsteps of [3] and resort to the pseudo-likelihood function, defined as :

$$\begin{aligned} P^{*}({{\mathcal {X}}}) = \prod _{i=1}^{n} P(X_{i} \mid {{\mathcal {X}}} \setminus \{X_{i}\}) \end{aligned}$$

(29)

A discussion of the theoretical properties of the pseudo-likelihood is offered by [26]. Due to the modularity property, Eq. (29) can be rewritten as:

$$\begin{aligned} P^{*}({{\mathcal {X}}}) = \prod _{i=1}^{n} P(X_{i} \mid \mathcal {MB}_{i}(X_{i})) \end{aligned}$$

(30)

Therefore, in order to measure the pseudo-likelihood $P^{*}({{\mathcal {X}}})$ of the HRF given ${{\mathcal {X}}}$ we only need to be able to compute the conditional distribution of each node $X_{i}$ given the state $mb_{i}(X_{i})$ of $\mathcal {MB}_{i}(X_{i})$. A discussion of how this can be done in a simple and efficient way is given in [15].

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bongini, M., Freno, A., Laveglia, V. et al. Dynamic Hybrid Random Fields for the Probabilistic Graphical Modeling of Sequential Data: Definitions, Algorithms, and an Application to Bioinformatics. Neural Process Lett 48, 733–768 (2018). https://doi.org/10.1007/s11063-017-9730-3

Download citation

Published: 26 October 2017
Issue Date: October 2018
DOI: https://doi.org/10.1007/s11063-017-9730-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Dynamic Hybrid Random Fields for the Probabilistic Graphical Modeling of Sequential Data: Definitions, Algorithms, and an Application to Bioinformatics

Abstract

Access this article

Similar content being viewed by others

Probabilistic Modeling in Machine Learning

On the need for structure modelling in sequence prediction

Learning Hidden Markov Models Using Probabilistic Matrix Factorization

Notes

References

Author information

Authors and Affiliations

Corresponding author

Appendices

A Formal Definition of HRF and Related Notions

Definition 1

Definition 2

Definition 3

Definition 4

Theorem 1

B Likelihood Versus Pseudo-Likelihood in HRFs

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Dynamic Hybrid Random Fields for the Probabilistic Graphical Modeling of Sequential Data: Definitions, Algorithms, and an Application to Bioinformatics

Abstract

Access this article

Similar content being viewed by others

Probabilistic Modeling in Machine Learning

On the need for structure modelling in sequence prediction

Learning Hidden Markov Models Using Probabilistic Matrix Factorization

Notes

References

Author information

Authors and Affiliations

Corresponding author

Appendices

A Formal Definition of HRF and Related Notions

Definition 1

Definition 2

Definition 3

Definition 4

Theorem 1

B Likelihood Versus Pseudo-Likelihood in HRFs

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation