Skip to main content
Log in

Inferring Boolean functions via higher-order correlations

  • Original Paper
  • Published:
Computational Statistics Aims and scope Submit manuscript

Abstract

Both the Walsh transform and a modified Pearson correlation coefficient can be used to infer the structure of a Boolean network from time series data. Unlike the correlation coefficient, the Walsh transform is also able to represent higher-order correlations. These correlations of several combined input variables with one output variable give additional information about the dependency between variables, but are also more sensitive to noise. Furthermore computational complexity increases exponentially with the order. We first show that the Walsh transform of order 1 and the modified Pearson correlation coefficient are equivalent for the reconstruction of Boolean functions. Secondly, we also investigate under which conditions (noise, number of samples, function classes) higher-order correlations can contribute to an improvement of the reconstruction process. We present the merits, as well as the limitations, of higher-order correlations for the inference of Boolean networks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  • Akutsu T, Miyano S, Kuhara S (1999) Identification of genetic networks from a small number of gene expression patterns under the boolean network model. Pac Symp Biocomput 4:17–28

    Google Scholar 

  • Arpe J, Reischuk R (2007) Learning juntas in the presence of noise. Theor Comput Sci 384(1):2–21

    Article  MATH  MathSciNet  Google Scholar 

  • Bahadur RR (1961) A representation of the joint distribution of responses to n dichotomous items. In: Solomon H (ed) Studies on item analysis and prediction, Stanford University Press, Stanford, no. 6 in Stanford mathematical, studies in the social sciences, pp 158–176

  • Bornholdt S (2005) Systems biology: less is more in modeling large genetic networks. Science 21(310): 449–451

    Google Scholar 

  • Bshouty N, Tamon C (1996) On the fourier spectrum of monotone functions. J ACM (JACM) 43(4):747–770

    Article  MATH  MathSciNet  Google Scholar 

  • Covert M, Knight E, Reed J, Herrgard M, Palsson B (2004) Integrating high-throughput and computational data elucidates bacterial networks. Nature 429(6987):92–96

    Article  Google Scholar 

  • Fawcett T (2006) An introduction to ROC analysis. Pattern Recognit Lett 27:861–874

    Article  Google Scholar 

  • Gotsman C, Linial N (1994) Spectral properties of threshold functions. Combinatorica 14(1):35–50

    Article  MATH  MathSciNet  Google Scholar 

  • Harris SE, Sawhill BK, Wuensche A, Kauffman S (2002) A model of transcriptional regulatory networks based on biases in the observed regulation rules. Complexity 7(4):23–40

    Article  Google Scholar 

  • Kahn J, Kalai G, Linial N (1988) The influence of variables on boolean functions. In: Proceedings of the 29th annual symposium on foundations of computer science. IEEE Computer Society, Los Alamitos, pp 68–80

  • Kauffman S (1969) Metabolic stability and epigenesis in randomly constructed genetic nets. J Theor Biol 22(3):437–467

    Article  Google Scholar 

  • Kauffman S, Peterson C, Samuelsson B, Troein C (2004) Genetic networks with canalyzing Boolean rules are always stable. PNAS 101(49):17102–17107

    Article  Google Scholar 

  • Kestler HA, Lausser L, Lindner W, Palm G (2011) On the fusion of threshold classifiers for categorization and dimensionality reduction. Computational Statistics 26:321–340

    Article  MathSciNet  Google Scholar 

  • Kim H, Lee JK, Park T (2007) Boolean networks using the chi-square test for inferring large-scale gene regulatory networks. BMC Bioinformatics 8(37)

  • Lähdesmäki H, Shmulevich I, Yli-Harja O (2003) On learning gene regulatory networks under the boolean network model. Mach Learn 52(1–2):147–167

    Article  MATH  Google Scholar 

  • Liang S, Fuhrman S, Somogyi R (1998) Reveal, a general reverse engineering algorithm for inference of genetic network architectures. Pac Symp Biocomput 3:18–29

    Google Scholar 

  • Lindner W, Köbler J (2006) Learning Boolean functions under the uniform distribution via the Fourier Transform. In: Toran J (ed) Bulletin of the European Association for Theoretical Computer Science. Number 89, pp 48–78

  • Maucher M, Kracher B, Kühl M, Kestler HA (2011) Inferring Boolean network structure via correlation. Bioinformatics 27(11):1529–1536

    Article  Google Scholar 

  • Mossel E, O’Donnell R, Servedio R (2003) Learning juntas. In: STOC ’03: Proceedings of the thirty-fifth annual ACM symposium on Theory of Computing, pp 206–212

  • Müssel C, Hopfensitz M, Kestler HA (2010) BoolNet—an R package for generation, reconstruction, and analysis of Boolean networks. Bioinformatics 26(10):1378–1380

    Article  Google Scholar 

  • R Development Core Team (2008) R: A language and environment for statistical computing. http://www.R-project.org

  • Schober S, Kracht D, Heckel R, Bossert M (2011) Detecting controlling nodes of boolean regulatory networks. EURASIP J Bioinform Syst Biol 2011:6

    Article  Google Scholar 

  • Sundararajan D (2001) The discrete Fourier transform: theory, algorithms and applications. World Scientific Publishing, Singapore

    Book  Google Scholar 

Download references

Acknowledgments

This work was funded in part by the Graduate School of Mathematical Analysis of Evolution, Information and Complexity at Ulm University to DVK, the German federal ministry of education and research (BMBF) within the framework of the program of medical genome research (PaCa-Net; project ID PKB-01GS08) to HAK, and the framework GERONTOSYS (Forschungskern SyStaR, project ID 0315894A) to HAK. The responsibility for the content lies exclusively with the authors.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hans A. Kestler.

Additional information

M. Maucher, D. V. Kracht: equal contribution.

Appendix

Appendix

In this section the intricate relationship between functional analysis on Boolean functions and correlation will be further elaborated. It has been shown that Pearson correlation and Fourier expansion with basis functions of order 1 are sufficient to reconstruct the dependencies of a Boolean network if that network only consists of monotone functions. We now give a formulation for the influence of a variable in terms of partial discrete derivatives to highlight the connection of monotonicity to spectral coefficients. In this context, the modified Pearson correlation can also be interpreted as the marginal effect of a variable \(x_i\) on \(f\) as measured via a simple linear regression relating \(x_i\) to \(f\).

Influence of a variable: We define the \(i\)th partial discrete derivative of a Boolean function \(f\) at \(\varvec{x}\) with respect to \(x_i\) similar to, but more general than Gotsman and Linial (1994) as

$$\begin{aligned} \partial _i(f,\varvec{x},x_i) = \frac{ f(\varvec{x}|_{i:x_i}) - f(\varvec{x}|_{i:\overline{x}_i}) }{x_i - \overline{x}_i}, \quad \mathrm{with} ~ \overline{x}_i = \fancyscript{B} {\setminus } x_i. \end{aligned}$$

Hence we can depict the influence of a variable \(X_i\) as mean absolute derivative of a Boolean function \(f\) with respect to \(X_i\):

$$\begin{aligned} \mathrm{I}_{i,\fancyscript{D}}(f) = E_{\fancyscript{D}} [ | \partial _i(f,\varvec{X},X_i) |]. \end{aligned}$$

Utilizing the Fourier expansion (4) the influence can be written as

$$\begin{aligned} \mathrm{I}_{i,\fancyscript{D}}(f) = E_{\fancyscript{D}} \left[ \left|\sum \limits _{ S :i \in S} \hat{f}(S) \chi _{S \setminus i}(\varvec{X}) \right| \right]. \end{aligned}$$

If there is a mean linear tendency of a function \(f\) projected to a dimension \(i\), it is possible to access the relevant variable \(x_i \in \mathrm{rel}(f)\) via linear regression, as the expected value is not zero. It may not be possible to further simplify this mean analytically for all classes of Boolean functions, but for monotone functions we can show the following connection:

Linear regression on monotonicity: Any Boolean function decomposed in (4) as

$$\begin{aligned} f(\varvec{x})= g(\varvec{x}) + x_i \cdot h(\varvec{x}),\quad \mathrm{with} ~ \varvec{x} \in \{-1,+1\}^n \end{aligned}$$

is a multivariate polynomial over the reals. A function \(f\) is monotone in a variable \(x_i\), if either

$$\begin{aligned} f(\varvec{x}_{|i:-1}) \le f(\varvec{x}_{|i:+1}) \quad \mathrm{or} \quad f(\varvec{x}_{|i:-1}) \ge f(\varvec{x}_{|i:+1}) \end{aligned}$$

holds for all \(\varvec{x}\). A Boolean function monotone increasing in variable \(i\) holds

$$\begin{aligned} f(\varvec{x}_{|i:-1}) = g(\varvec{x}) - h(\varvec{x}) \le f(\varvec{x}_{|i:+1}) = g(\varvec{x}) + h(\varvec{x}), \end{aligned}$$

which implies that the partial discrete derivative is \(\partial _i(f,\varvec{x},1) \ge 0\). For functions, monotone decreasing in dimension \(i\), the derivative is \(\partial _i(f,\varvec{x},1) \le 0\). As the sign of \(\partial _i\) is not alternating, the computation of the absolute value and the mean can be interchanged and the influence can be expressed as

$$\begin{aligned} \mathrm{I}_{i,\fancyscript{D}}(f) = | r_{\fancyscript{D}}(i,f) | = \left| \sum \limits _{ S :i \in S} \hat{f}(S) \prod \limits _{j \in S \setminus i} \mu _j \right| ,\quad \mathrm{with}~ \mu _j = E[x_j], \end{aligned}$$
(8)

namely the absolute regression coefficient of the linear Boolean model in variable \(i\). As suggested in Sect. 2, monotonicity of a variable is sufficient to access the relevance of a variable via linear regression interpreted as influence. This fact holds for general product distributions.

We recall some facts for Boolean functions on uniform distributed domain (subscript \(\fancyscript{U}\)), also mentioned in (Gotsman and Linial 1994): The influence of a variable \(i\) is lower-bounded as follows:

$$\begin{aligned} \mathrm{I}_{i,\fancyscript{U}}(f) \ge \max \limits _{ S: i \in S } ~ | \hat{f} (S) |. \end{aligned}$$

For Boolean functions which are monotonic in a variable \(i\) the definitions in (Kahn et al. 1988) further imply \( \text{ I}_{i,\fancyscript{U}}(f) = | \hat{f} (\{i\}) |,\) meaning that the linear coefficient of this variable \(i\) is the largest absolute value.

Limits on Pearson correlation: If the monotonicity on a variable \(i\) is not given by premise the interpretation of the absolute correlation coefficient \(r_{\fancyscript{D}}(i,f)\) as influence \(\text{ I}_{i,\fancyscript{D}}(f)\) is not given (see 8). Nevertheless the relevance of the variable can be addressed via modified Pearson correlation if the correlation coefficient differs from zero. A non-zero coefficient simply indicates that there is a (linear) dependency (see Covariance in 5) between this variable \(i\) and the value of the Boolean function. The absolute value of the correlation is limited by non-linearities of the Boolean function and by the probability distribution of the variables, more precisely by those variables which are not directly considered in the correlation.its “structure“, the Fourier coefficients \(\hat{f}\) at uniform distribution and the means of the Fourier basis under general product distribution \(\fancyscript{D}\).

Rights and permissions

Reprints and permissions

About this article

Cite this article

Maucher, M., Kracht, D.V., Schober, S. et al. Inferring Boolean functions via higher-order correlations. Comput Stat 29, 97–115 (2014). https://doi.org/10.1007/s00180-012-0385-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00180-012-0385-2

Keywords

Navigation