Inferring Boolean functions via higher-order correlations

Maucher, Markus; Kracht, David V.; Schober, Steffen; Bossert, Martin; Kestler, Hans A.

doi:10.1007/s00180-012-0385-2

Inferring Boolean functions via higher-order correlations

Original Paper
Published: 08 December 2012

Volume 29, pages 97–115, (2014)
Cite this article

Computational Statistics Aims and scope Submit manuscript

Markus Maucher²,
David V. Kracht¹,
Steffen Schober¹,
Martin Bossert¹ &
…
Hans A. Kestler²

449 Accesses
10 Citations
Explore all metrics

Abstract

Both the Walsh transform and a modified Pearson correlation coefficient can be used to infer the structure of a Boolean network from time series data. Unlike the correlation coefficient, the Walsh transform is also able to represent higher-order correlations. These correlations of several combined input variables with one output variable give additional information about the dependency between variables, but are also more sensitive to noise. Furthermore computational complexity increases exponentially with the order. We first show that the Walsh transform of order 1 and the modified Pearson correlation coefficient are equivalent for the reconstruction of Boolean functions. Secondly, we also investigate under which conditions (noise, number of samples, function classes) higher-order correlations can contribute to an improvement of the reconstruction process. We present the merits, as well as the limitations, of higher-order correlations for the inference of Boolean networks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

On pseudo-Boolean polynomials

Article 13 November 2015

New characterizations of generalized Boolean functions

Article 11 March 2024

Gene regulatory network state estimation from arbitrary correlated measurements

Article Open access 04 April 2018

References

Akutsu T, Miyano S, Kuhara S (1999) Identification of genetic networks from a small number of gene expression patterns under the boolean network model. Pac Symp Biocomput 4:17–28
Google Scholar
Arpe J, Reischuk R (2007) Learning juntas in the presence of noise. Theor Comput Sci 384(1):2–21
Article MATH MathSciNet Google Scholar
Bahadur RR (1961) A representation of the joint distribution of responses to n dichotomous items. In: Solomon H (ed) Studies on item analysis and prediction, Stanford University Press, Stanford, no. 6 in Stanford mathematical, studies in the social sciences, pp 158–176
Bornholdt S (2005) Systems biology: less is more in modeling large genetic networks. Science 21(310): 449–451
Google Scholar
Bshouty N, Tamon C (1996) On the fourier spectrum of monotone functions. J ACM (JACM) 43(4):747–770
Article MATH MathSciNet Google Scholar
Covert M, Knight E, Reed J, Herrgard M, Palsson B (2004) Integrating high-throughput and computational data elucidates bacterial networks. Nature 429(6987):92–96
Article Google Scholar
Fawcett T (2006) An introduction to ROC analysis. Pattern Recognit Lett 27:861–874
Article Google Scholar
Gotsman C, Linial N (1994) Spectral properties of threshold functions. Combinatorica 14(1):35–50
Article MATH MathSciNet Google Scholar
Harris SE, Sawhill BK, Wuensche A, Kauffman S (2002) A model of transcriptional regulatory networks based on biases in the observed regulation rules. Complexity 7(4):23–40
Article Google Scholar
Kahn J, Kalai G, Linial N (1988) The influence of variables on boolean functions. In: Proceedings of the 29th annual symposium on foundations of computer science. IEEE Computer Society, Los Alamitos, pp 68–80
Kauffman S (1969) Metabolic stability and epigenesis in randomly constructed genetic nets. J Theor Biol 22(3):437–467
Article Google Scholar
Kauffman S, Peterson C, Samuelsson B, Troein C (2004) Genetic networks with canalyzing Boolean rules are always stable. PNAS 101(49):17102–17107
Article Google Scholar
Kestler HA, Lausser L, Lindner W, Palm G (2011) On the fusion of threshold classifiers for categorization and dimensionality reduction. Computational Statistics 26:321–340
Article MathSciNet Google Scholar
Kim H, Lee JK, Park T (2007) Boolean networks using the chi-square test for inferring large-scale gene regulatory networks. BMC Bioinformatics 8(37)
Lähdesmäki H, Shmulevich I, Yli-Harja O (2003) On learning gene regulatory networks under the boolean network model. Mach Learn 52(1–2):147–167
Article MATH Google Scholar
Liang S, Fuhrman S, Somogyi R (1998) Reveal, a general reverse engineering algorithm for inference of genetic network architectures. Pac Symp Biocomput 3:18–29
Google Scholar
Lindner W, Köbler J (2006) Learning Boolean functions under the uniform distribution via the Fourier Transform. In: Toran J (ed) Bulletin of the European Association for Theoretical Computer Science. Number 89, pp 48–78
Maucher M, Kracher B, Kühl M, Kestler HA (2011) Inferring Boolean network structure via correlation. Bioinformatics 27(11):1529–1536
Article Google Scholar
Mossel E, O’Donnell R, Servedio R (2003) Learning juntas. In: STOC ’03: Proceedings of the thirty-fifth annual ACM symposium on Theory of Computing, pp 206–212
Müssel C, Hopfensitz M, Kestler HA (2010) BoolNet—an R package for generation, reconstruction, and analysis of Boolean networks. Bioinformatics 26(10):1378–1380
Article Google Scholar
R Development Core Team (2008) R: A language and environment for statistical computing. http://www.R-project.org
Schober S, Kracht D, Heckel R, Bossert M (2011) Detecting controlling nodes of boolean regulatory networks. EURASIP J Bioinform Syst Biol 2011:6
Article Google Scholar
Sundararajan D (2001) The discrete Fourier transform: theory, algorithms and applications. World Scientific Publishing, Singapore
Book Google Scholar

Download references

Acknowledgments

This work was funded in part by the Graduate School of Mathematical Analysis of Evolution, Information and Complexity at Ulm University to DVK, the German federal ministry of education and research (BMBF) within the framework of the program of medical genome research (PaCa-Net; project ID PKB-01GS08) to HAK, and the framework GERONTOSYS (Forschungskern SyStaR, project ID 0315894A) to HAK. The responsibility for the content lies exclusively with the authors.

Author information

Authors and Affiliations

Institute of Communications Engineering, University of Ulm, 89069, Ulm, Germany
David V. Kracht, Steffen Schober & Martin Bossert
Research group Bioinformatics and Systems Biology, Institute of Neural Information Processing, University of Ulm, 89069, Ulm, Germany
Markus Maucher & Hans A. Kestler

Authors

Markus Maucher
View author publications
You can also search for this author in PubMed Google Scholar
David V. Kracht
View author publications
You can also search for this author in PubMed Google Scholar
Steffen Schober
View author publications
You can also search for this author in PubMed Google Scholar
Martin Bossert
View author publications
You can also search for this author in PubMed Google Scholar
Hans A. Kestler
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hans A. Kestler.

Additional information

M. Maucher, D. V. Kracht: equal contribution.

Appendix

In this section the intricate relationship between functional analysis on Boolean functions and correlation will be further elaborated. It has been shown that Pearson correlation and Fourier expansion with basis functions of order 1 are sufficient to reconstruct the dependencies of a Boolean network if that network only consists of monotone functions. We now give a formulation for the influence of a variable in terms of partial discrete derivatives to highlight the connection of monotonicity to spectral coefficients. In this context, the modified Pearson correlation can also be interpreted as the marginal effect of a variable $x_i$ on $f$ as measured via a simple linear regression relating $x_i$ to $f$.

Influence of a variable: We define the $i$th partial discrete derivative of a Boolean function $f$ at $\varvec{x}$ with respect to $x_i$ similar to, but more general than Gotsman and Linial (1994) as

$$\begin{aligned} \partial _i(f,\varvec{x},x_i) = \frac{ f(\varvec{x}|_{i:x_i}) - f(\varvec{x}|_{i:\overline{x}_i}) }{x_i - \overline{x}_i}, \quad \mathrm{with} ~ \overline{x}_i = \fancyscript{B} {\setminus } x_i. \end{aligned}$$

Hence we can depict the influence of a variable $X_i$ as mean absolute derivative of a Boolean function $f$ with respect to $X_i$:

$$\begin{aligned} \mathrm{I}_{i,\fancyscript{D}}(f) = E_{\fancyscript{D}} [ | \partial _i(f,\varvec{X},X_i) |]. \end{aligned}$$

Utilizing the Fourier expansion (4) the influence can be written as

$$\begin{aligned} \mathrm{I}_{i,\fancyscript{D}}(f) = E_{\fancyscript{D}} \left[ \left|\sum \limits _{ S :i \in S} \hat{f}(S) \chi _{S \setminus i}(\varvec{X}) \right| \right]. \end{aligned}$$

If there is a mean linear tendency of a function $f$ projected to a dimension $i$, it is possible to access the relevant variable $x_i \in \mathrm{rel}(f)$ via linear regression, as the expected value is not zero. It may not be possible to further simplify this mean analytically for all classes of Boolean functions, but for monotone functions we can show the following connection:

Linear regression on monotonicity: Any Boolean function decomposed in (4) as

$$\begin{aligned} f(\varvec{x})= g(\varvec{x}) + x_i \cdot h(\varvec{x}),\quad \mathrm{with} ~ \varvec{x} \in \{-1,+1\}^n \end{aligned}$$

is a multivariate polynomial over the reals. A function $f$ is monotone in a variable $x_i$, if either

$$\begin{aligned} f(\varvec{x}_{|i:-1}) \le f(\varvec{x}_{|i:+1}) \quad \mathrm{or} \quad f(\varvec{x}_{|i:-1}) \ge f(\varvec{x}_{|i:+1}) \end{aligned}$$

holds for all $\varvec{x}$. A Boolean function monotone increasing in variable $i$ holds

$$\begin{aligned} f(\varvec{x}_{|i:-1}) = g(\varvec{x}) - h(\varvec{x}) \le f(\varvec{x}_{|i:+1}) = g(\varvec{x}) + h(\varvec{x}), \end{aligned}$$

which implies that the partial discrete derivative is $\partial _i(f,\varvec{x},1) \ge 0$. For functions, monotone decreasing in dimension $i$, the derivative is $\partial _i(f,\varvec{x},1) \le 0$. As the sign of $\partial _i$ is not alternating, the computation of the absolute value and the mean can be interchanged and the influence can be expressed as

$$\begin{aligned} \mathrm{I}_{i,\fancyscript{D}}(f) = | r_{\fancyscript{D}}(i,f) | = \left| \sum \limits _{ S :i \in S} \hat{f}(S) \prod \limits _{j \in S \setminus i} \mu _j \right| ,\quad \mathrm{with}~ \mu _j = E[x_j], \end{aligned}$$

(8)

namely the absolute regression coefficient of the linear Boolean model in variable $i$. As suggested in Sect. 2, monotonicity of a variable is sufficient to access the relevance of a variable via linear regression interpreted as influence. This fact holds for general product distributions.

We recall some facts for Boolean functions on uniform distributed domain (subscript $\fancyscript{U}$), also mentioned in (Gotsman and Linial 1994): The influence of a variable $i$ is lower-bounded as follows:

$$\begin{aligned} \mathrm{I}_{i,\fancyscript{U}}(f) \ge \max \limits _{ S: i \in S } ~ | \hat{f} (S) |. \end{aligned}$$

For Boolean functions which are monotonic in a variable $i$ the definitions in (Kahn et al. 1988) further imply $ \text{ I}_{i,\fancyscript{U}}(f) = | \hat{f} (\{i\}) |,$ meaning that the linear coefficient of this variable $i$ is the largest absolute value.

Limits on Pearson correlation: If the monotonicity on a variable $i$ is not given by premise the interpretation of the absolute correlation coefficient $r_{\fancyscript{D}}(i,f)$ as influence $\text{ I}_{i,\fancyscript{D}}(f)$ is not given (see 8). Nevertheless the relevance of the variable can be addressed via modified Pearson correlation if the correlation coefficient differs from zero. A non-zero coefficient simply indicates that there is a (linear) dependency (see Covariance in 5) between this variable $i$ and the value of the Boolean function. The absolute value of the correlation is limited by non-linearities of the Boolean function and by the probability distribution of the variables, more precisely by those variables which are not directly considered in the correlation.its “structure“, the Fourier coefficients $\hat{f}$ at uniform distribution and the means of the Fourier basis under general product distribution $\fancyscript{D}$.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Maucher, M., Kracht, D.V., Schober, S. et al. Inferring Boolean functions via higher-order correlations. Comput Stat 29, 97–115 (2014). https://doi.org/10.1007/s00180-012-0385-2

Download citation

Received: 30 November 2011
Accepted: 15 November 2012
Published: 08 December 2012
Issue Date: February 2014
DOI: https://doi.org/10.1007/s00180-012-0385-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Inferring Boolean functions via higher-order correlations

Abstract

Access this article

Similar content being viewed by others

On pseudo-Boolean polynomials

New characterizations of generalized Boolean functions

Gene regulatory network state estimation from arbitrary correlated measurements

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendix

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Inferring Boolean functions via higher-order correlations

Abstract

Access this article

Similar content being viewed by others

On pseudo-Boolean polynomials

New characterizations of generalized Boolean functions

Gene regulatory network state estimation from arbitrary correlated measurements

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendix

Appendix

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation