Comparison of coexpression measures: mutual information, correlation, and model based indices
 Lin Song^{1, 2},
 Peter Langfelder^{1} and
 Steve Horvath^{1, 2}Email author
DOI: 10.1186/1471210513328
© Song et al.; licensee BioMed Central Ltd. 2012
Received: 14 March 2012
Accepted: 30 November 2012
Published: 9 December 2012
Abstract
Background
Coexpression measures are often used to define networks among genes. Mutual information (MI) is often used as a generalized correlation measure. It is not clear how much MI adds beyond standard (robust) correlation measures or regression model based association measures. Further, it is important to assess what transformations of these and other coexpression measures lead to biologically meaningful modules (clusters of genes).
Results
We provide a comprehensive comparison between mutual information and several correlation measures in 8 empirical data sets and in simulations. We also study different approaches for transforming an adjacency matrix, e.g. using the topological overlap measure. Overall, we confirm close relationships between MI and correlation in all data sets which reflects the fact that most gene pairs satisfy linear or monotonic relationships. We discuss rare situations when the two measures disagree. We also compare correlation and MI based approaches when it comes to defining coexpression network modules. We show that a robust measure of correlation (the biweight midcorrelation transformed via the topological overlap transformation) leads to modules that are superior to MI based modules and maximal information coefficient (MIC) based modules in terms of gene ontology enrichment. We present a function that relates correlation to mutual information which can be used to approximate the mutual information from the corresponding correlation coefficient. We propose the use of polynomial or spline regression models as an alternative to MI for capturing nonlinear relationships between quantitative variables.
Conclusion
The biweight midcorrelation outperforms MI in terms of elucidating gene pairwise relationships. Coupled with the topological overlap matrix transformation, it often leads to more significantly enriched coexpression modules. Spline and polynomial networks form attractive alternatives to MI in case of nonlinear relationships. Our results indicate that MI networks can safely be replaced by correlation networks when it comes to measuring coexpression relationships in stationary data.
Background
Coexpression methods are widely used for analyzing gene expression data and other high dimensional “omics” data. Most coexpression measures fall into one of two categories: correlation coefficients or mutual information measures. MI measures have attractive informationtheoretic interpretations and can be used to measure nonlinear associations. Although MI is well defined for discrete or categorical variables, it is nontrivial to estimate the mutual information between quantitative variables, and corresponding permutation tests can be computationally intensive. In contrast, the correlation coefficient and other model based association measures are ideally suited for relating quantitative variables. Model based association measures have obvious statistical advantages including ease of calculation, straightforward statistical testing procedures, and the ability to include additional covariates into the analysis. Researchers trained in statistics often measure gene coexpression by the correlation coefficient. Computer scientists, trained in information theory, tend to use a mutual information (MI) based measure. Thus far, the majority of published articles use the correlation coefficient as coexpression measure [1–5] but hundreds of articles have used the mutual information (MI) measure [6–12].
Several articles have used simulations and real data to compare the two coexpression measures when clustering gene expression data. Allen et al. have found that correlation based network inference method WGCNA [5] and mutual information based method ARACNE [9] both perform well in constructing global network structure [13]; Steuer et al. show that mutual information and the Pearson correlation have an almost onetoone correspondence when measuring gene pairwise relationships within their investigated data set, justifying the application of Pearson correlation as a measure of similarity for geneexpression measurements [14]. In simulations, no evidence could be found that mutual information performs better than correlation for constructing coexpression networks [15]. However, MI continues to be used in recent publications. Some authors have argued that MI is more robust than Pearson correlation in terms of distinguishing various clustering solutions [10]. Given the debates, it remains an open question whether mutual information could be supplanted by standard model based association measures. We affirmatively answer this question by i) reviewing the close relationship between mutual information and likelihood ratio test statistic in the case of categorical variables, ii) finding a close relationship between mutual information and correlation in simulations and empirical studies, and iii) proposing polynomial and spline regression models as alternatives to mutual information for modeling nonlinear relationships.
While previous comparisons involved the Pearson correlation, we provide a more comprehensive comparison that considers i) different types of correlation coefficients, e.g. the biweight midcorrelation (bicor), ii) different approaches for constructing MI based and correlation based networks, iii) different ways of transforming a network adjacency matrix (e.g. the topological overlap reviewed below [4, 16–18]), and iv) 8 diverse gene expression data from yeast, mouse and humans. Our unbiased comparison evaluates coexpression measures at the level of gene pair relationships and at the level of forming coexpression modules (clusters of genes).
This article presents the following results. First, probably the most comprehensive empirical comparison to date is used to evaluate which pairwise association measure leads to the biologically most meaningful network modules (clusters) when it comes to functional enrichment with GO ontologies. Second, polynomial regression and spline regression methods are evaluated when it comes to defining nonlinear association measures between gene pairs. Third, simulation studies are used to validate a functional relationship (corMI function) between correlation and mutual information in case that the two variables satisfy a linear relationship. Our comprehensive empirical studies illustrate that the corMI function can be used to approximate the relationship between mutual information and correlation in case of real data sets which indicates that in many situations the MI measure is not worth the trouble. Gene pairs where the two association measures disagree are investigated to determine whether technical artifacts lead to the incongruence.
Overall, we find that bicor based coexpression measure is an attractive coexpression measure, particularly when limited sample size does not permit the detection of nonlinear relationships. Our theoretical results, simulations, and 8 different gene expression data sets show that MI is often inferior to correlation based approaches in terms of elucidating gene pairwise relationships and identifying coexpression modules. A signed correlation network transformed via the topological overlap matrix transformation often leads to the most significant functional enrichment of modules. Polynomial and spline regression model based statistical approaches are promising alternatives to MI for measuring nonlinear relationships.
Association measure and network adjacency
An association measure is used to estimate the relationships between two random variables. For example, correlation is a commonly used association measure. There are different types of correlations. While the Pearson correlation, which measures the extent of a linear relationship, is the most widely used correlation measure, the following two more robust correlation measures are often used. First, the Spearman correlation is based on ranks, and measures the extent of a monotonic relationship between x and y. Second, “bicor” (refer to Materials and Methods for definition and details) is a median based correlation measure, and is more robust than the Pearson correlation but often more powerful than the Spearman correlation [19, 20]. All correlation coefficients take on values between −1 and 1 where negative values indicate an inverse relationship. A correlation coefficient is an attractive association measure since i) it can be easily calculated, ii) it affords several asymptotic statistical tests (regression models, Fisher transformation) for calculating significance levels (pvalues), and iii) the sign of correlation allows one to distinguish between positive and negative relationships. Other association measures, such as mutual information, will be introduced in the next sections.
Additional details of correlation based adjacencies (unweighted or weighted, unsigned or signed) are described in Materials and Methods.
Network adjacency based on coexpression measures
When dealing with gene expression data, x_{ i }denotes the expression levels of the ith gene (or probe) across multiple samples. In this article, we assume that the m components of x_{ i }correspond to random independent samples. Coexpression measures can be used to define coexpression networks in which the nodes correspond to genes. The adjacencies A_{ ij } encode the similarity between the expression profiles of genes i and j. In practice, transformations such as the topological overlap measure (TOM) [4, 16–18] are often used to turn an original network adjacency matrix into a new one. Details of TOM transformation are reviewed in Materials and Methods.
Mutual information networks based on categorical variables
This relationship has many applications. First, it can be used to prove that the mutual information takes on nonnegative values. Second, it can be used to calculate an asymptotic pvalue for the mutual information. Third, it points to a way for defining a mutual information measure that adjusts for additional conditioning variables z_{1},z_{2},… Specifically, one can use a multivariate multinomial regression model for regressing dy on dx and the conditioning variables. Up to a scaling factor of 2m, the likelihood ratio test statistic can be interpreted as a (nonsymmetric) measure of mutual information between dx and dy that adjusts for conditioning variables. More detailed discussion of mutual information can be found in [14, 23, 24]. In Additional file 1, we describe association measures between categorical variables in detail, including LRT statistic and MI.
One can easily prove that $0\le {A}_{\mathit{\text{ij}}}^{\mathit{\text{MI}},\mathit{\text{UniversalVersion}}1}\le 1$. The term “universal” reflects the fact that the adjacency based dissimilarity ${\mathit{\text{dissMI}}}_{\mathit{\text{ij}}}^{\mathit{\text{UniveralVersion}}1}=1{A}^{\mathit{\text{MI}},\mathit{\text{UniversalVersion}}1}$ turns out to be a universal distance function [25]. Roughly speaking, the universality of ${\mathit{\text{dissMI}}}_{\mathit{\text{ij}}}^{\mathit{\text{UniveralVersion}}1}$ implies that any other distance measure between d x_{ i }and d x_{ j } will be small if $\mathit{\text{disM}}{I}_{\mathit{\text{ij}}}^{\mathit{\text{UniveralVersion}}1}$ is small. The term “distance” reflects the fact that dissMI^{UniveralVersion 1}satisfies the properties of a distance including the triangle inequality.
The name reflects the fact that dissMI^{UniveralVersion 2}= 1−A^{MI,UniversalVersion 2} is also a universal distance measure [25]. While A^{MI,UniversalVersion 1}and A^{MI,UniversalVersion 2} are in general different, we find very high Spearman correlations (r > 0. 9 ) between their vectorized versions.
Many alternative approaches exist for defining MI based networks, e.g. ARACNE [9], CLR [26], MRNET [27] and RELNET [6, 28] are described in Materials and Methods.
Mutual information networks based on discretized numeric variables
The number of bins, no.bins, is the only parameter of the equalwidth discretization method.
In our subsequent studies, we calculate an MIbased adjacency matrix using the following three steps. First, numeric vectors of gene expression profiles are discretized according to the equalwidth discretization method with the default number of bins given by $\mathit{\text{no.bins}}=\sqrt{m}$. Second, the mutual information M I_{ ij }= MI(discretize(x_{ i }),discretize(x_{ j })) is calculated between the discretized vectors based on Eq. 10 and the Miller Madow entropy estimation method (detailed in Additional file 1). Third, the MI matrix is transformed into one of three possible MIbased adjacency matrices: A^{MI,SymmetricUncertainty} (Eq. 14), A^{MI,UniversalVersion 1} (Eq. 15), A^{MI,UniversalVersion 2}(Eq. 16).
Results
An equation relating MI(discretize(x),discretize(y)) to cor(x,y)
Eq. 18 was stated in terms of the Pearson correlation, but it also applies for bicor as can be seen from our simulation studies.
Simulations where x and y represent samples from a bivariate normal distribution
Empirical studies involving 8 gene expression data sets
Our simulation results show that both the robust biweight midcorrelation and the Pearson correlation can be used as input of F^{cor−MI} for predicting A^{MI,UniversalVersion 2}when the underlying variables satisfy pairwise bivariate normal relationships. However, it is not clear whether F^{cor−MI} can also be used to relate correlation and mutual information in real data applications. In this section, we report 8 empirical studies to study the relationship between MI and the robust correlation measure bicor. To focus the analysis on genes that are likely to reflect biological variation and to reduce computational burden, we selected the 3000 genes with highest variance across the microarray samples for each data set. Description of data sets can be found in Materials and Methods.
In summary, bicor usually detects linear relationships between gene pairs accurately while mutual information is susceptible to outliers, and sometimes identifies pairs that exhibit patterns unlikely to be of biological origin or that exhibit no clear dependency at all. We note that MI results tend to be more meaningful when dealing with a large number of observations (say m > 300). Although we only consider 3000 genes with highest variances, our results are highly robust with respect to the number of genes. For example, in Additional file 2, we report results when considering all 23568 genes in the mouse adipose data set or considering 10000 randomly selected genes (rather than with high variance) in the ND data set. These results demonstrate that our findings do not depend on the number of genes.
Gene ontology enrichment analysis of coexpression modules defined by different networks
Gene coexpression networks typically exhibit modular structure in the sense that genes can be grouped into modules (clusters) comprised of highly interconnected genes (i.e., withinmodule adjacencies are high). The network modules often have a biological interpretation in the sense that the modules are highly enriched in genes with a common functional annotation (gene ontology categories, cell type markers, etc) [3, 30, 31]. In this section, we assess association measures (and network construction methods) by the gene ontology (GO) enrichment of their resulting modules in the 8 empirical data sets.
Types of networks and characteristics
Network type  Used here  Examples  Variable  Ease of estimation  Utility for modeling  Adjacencies  Used in GO  

types  discussed  enrichment  
this article  analysis  
GRN  Reduce  Direct  Time  Nonlin.  Sign  
Correlation network  Yes  WGCNA [5]  Numeric  Easy  Yes  Yes  No  Maybe  No  Yes  unsignedA  Yes 
signedA  Yes  
TOM  Yes  
Polynomial or  Yes  WGCNA [5]  Numeric  Moderate  Yes  Yes  No  Maybe  Yes  No  poly R ^{2}  No 
Spline regression  spline R ^{2}  No  
network  
Mutual information network  Yes  ARACNE  Discretized  Moderate  Yes  Not clear  No  Maybe  Yes  No  ASU  No 
[9], RELNET  numeric,  AUV1  No  
categorical  AUV2  Yes  
[26], MRNET  ARACNE  Yes  
ARACNE0.2  Yes  
ARACNE0.5  Yes  
CLR  Yes  
MRNET  Yes  
RELNET  Yes  
MIC  Yes  
Boolean network  No  Boolean network [71]  Dichotomized numeric  Moderate  Yes  Not clear  Yes  Yes  NA  NA  No  No 
Probabilistic network  No  Any  Hard  Yes  Not clear  Yes  Yes  Yes  Yes  No  No 
Overall, these unbiased comparisons show that signed correlation networks coupled with the topological overlap transformation outperform the commonly used mutual information based algorithms when it comes to GO enrichment of modules.
Polynomial and spline regression models as alternatives to mutual information
A widely noted advantage of mutual information is that it can detect general, possibly nonlinear, dependence relationships. However, estimation of mutual information poses multiple challenges ranging from computational complexity to dependency on parameters and difficulties with small sample sizes. Standard polynomial and spline regression models can also detect nonlinear relationships between variables. While perhaps less general than MI, relatively simple polynomial and spline regression models avoid many of the challenges of estimating MI while adequately modeling a broad range of nonlinear relationships. In addition to being computationally simpler and faster, regression models also make available standard statistical tests and model fitting indices. Thus, in this section we examine polynomial and spline regression as alternatives to MI for capturing nonlinear relationships between gene expression profiles. We define association measures based on polynomial and spline regression models and study their performance.
Networks based on polynomial and spline regression models
The model fitting index R^{2}(x,y) (described in Materials and Methods) can be used to evaluate the fit of the model. One can then reverse the roles of x and y to arrive at a model fitting index R^{2}(y,x) . In general, R^{2}(x,y) ≠ R^{2}(y,x).
Now consider a set of n variables x_{1},…,x_{ n }. One can then calculate pairwise model fitting indices $\left(\right)close="">{R}_{\mathit{\text{ij}}}^{2}={R}^{2}({x}_{i},{x}_{j})$ which can be interpreted as the elements of an n × n association matrix $\left(\right)close="">({R}_{\mathit{\text{ij}}}^{2})$. This matrix is in general nonsymmetric and takes on values in [0,1] , with diagonal values equal to 1. A large value indicates a close relationship between variables x_{ i } and x_{ j }. To define an adjacency matrix, we symmetrize $\left(\right)close="">\left({R}_{\mathit{\text{ij}}}^{2}\right)$ through Eqs. 3, 4 or 5.
Spline regression models are also known as local polynomial regression models [36]. Local refers to the fact that these models amount to fitting models on subintervals of the range of x. The boundaries of subintervals are referred to as knots. In analogy to polynomial models, we build natural cubic spline model for all pairs of x_{ i },x_{ j }. We use the following rule of thumb for the number of knots: if m > 100 use 5 knots, if m < 30 use 3 knots, otherwise use 4 knots. We then calculate model fitting indices and create corresponding network adjacencies. (Details of spline model construction can be found in Materials and Methods.)
Compared to spline regression, polynomial regression models have a potential shortcoming: the model fit can be adversely affected by outlying observations. A single outlying observation (x_{ u },y_{ u }) can “bend” the fitting curve into the wrong direction, i.e. adversely affect the estimates of the β coefficients. Spline regression alleviates this problem by fitting model on subintervals of the range of x.
Relationship between regression and MI based networks
Previously, we discussed the relationship between correlation and mutual information based adjacencies in simulations where x and y represent samples from a bivariate normal distribution. Here, we consider the performance of polynomial and spline association measure in the same scenario (Additional file 4). With all x,y pairs following linear relationships, both regression models reduce to simple linear models, and perform almost identically to correlation based measures (panel (A) and (C)). We find that the corMI function introduced previously also allows us to relate spline and polynomial regression based networks to the MI based network (panel (B) and (D)), e.g. $\mathit{\text{AUV}}{2}_{\mathit{\text{ij}}}\approx {F}^{\mathit{\text{cor}}\mathit{\text{MI}}}\left(\sqrt{\mathit{\text{max}}({R}^{2}({x}_{i},{x}_{j}),{R}^{2}({x}_{j},{x}_{i}))}\right)$. Note that different symmetrization methods (Eq. 3) applied R^{2} result in similar adjacencies in our applications (refer to Additional file 5), thus it’s valid to use any of them.
In addition, our empirical data show that regression models and mutual information adjacency A^{MI,UniversalVersion 2}are highly correlated, and the relationship is stronger than that between bicor and A^{MI,UniversalVersion 2} (Figure 8 CF). This indicates that A^{MI,UniversalVersion 2}and regression models discover some common gene pairwise nonlinear relations that can not be identified by correlations. The Neurological Disease (ND) and mouse muscle sets are shown in Figure 8 as representatives. A detailed analysis of all data sets can be found in Additional file 5.
Simulations for module identification in data with nonlinear relationships
Overview of network methods and alternatives
A thorough review of network methods is beyond our scope and we point the reader to the many many review articles [37–40]. But Table 1 describes not only the methods used in this article but also alternative approaches. Table 1 also describes the kind of biological insights that can be gained from these network methods. As a rule, association networks (based on correlation or MI) are ill suited for causal analysis and for encoding directional information. While association networks such as WGCNA or ARACNE have been been successfully used for gene regulatory networks (GRNs) [13], a host of alternatives are available. For example, the DREAM (Dialogue for Reverse Engineering Assessments and Methods) project has repeatedly tackled this problem [41–43]. A limitation of our study is that we are focusing on undirected (as opposed to directed, causal models). Structural equation models, Bayesian networks, and other probabilistic graphical models are widely used for studying causal relationships. Many authors have proposed to use Bayesian networks for analyzing gene expression data [44–47] and for generating causal networks from observational data [48] or genetic data [49, 50].
While it is beyond our scope to evaluate network inference methods for time series data (reviewed in [51]), we briefly mention several approaches. A (probabilistic) Boolean network [52] is a special case of a discrete state space model that characterizes a system using dichotomized data. A Bayesian network is a graphbased model of joint multivariate probability distributions that captures properties of conditional independence between variables [45]. Such models are attractive for their ability to describe complex stochastic processes and for modeling causal relationships. Several articles describe the relationship between Boolean networks and dynamic Bayesian networks when it comes to models of gene regulatory relationships [47, 53]. Finally, we mention that correlation network methodology can be adapted to model time series data, e.g. many authors have proposed to use a timelagged correlation measure for inferring gene regulatory networks [54].
A large part of GRN research focuses on the accurate assessment of individual network edges, e.g. [55–58] so many of these methods are not designed as data reduction methods. In contrast, correlation network methods, such as WGCNA, are highly effective at reducing high dimensional genomic data since modules can be represented by their first singular vector (referred to as module eigengene) [21, 59].
Discussion
This article presents the following theoretical and methodological results: i) it reviews the relationship between the MI and a likelihood ratio test statistic in case of two categorical variables, ii) it presents a novel empirical formula for relating correlation to MI when the two variables satisfy a linear relationship, and iii) it describes how to use polynomial and spline regression models for defining pairwise coexpression measures that can detect nonlinear relationships.
Mutual information has several appealing information theoretic properties. A widely recognized advantage of mutual information over correlation is that it allows one to detect nonlinear relationships. This can be attractive in particular when dealing with time series data [60]. But mutual information is not unique in being able to detect nonlinear relationships. Standard regression models such as polynomial and spline models can also capture nonlinear relationships. An advantage of these models is that well established likelihood based statistical estimation and testing procedures are available. Regression models allow one to calculate model fitting indices that can be used to define network adjacencies as well as flag possible outlying observations by analyzing residuals.
For categorical variables, mutual information is (asymptotically) equivalent to other widely used statistical association measures such as the likelihood ratio statistic or the Pearson chisquare test. In this case, all of these measures (including MI) are arguably optimal association measures. Interpreting MI as a likelihood ratio test statistic facilitates a straightforward approach for adjusting the association measure for additional covariates.
We and others [14] have found close relationships between mutual information and correlation based coexpression networks. Our comprehensive empirical studies show that mutual information is often highly related to the absolute value of the correlation coefficient. We observe that when robust correlation and mutual information disagree, the robust correlation findings appear to be more plausible statistically and biologically. We found that network modules defined using robust correlation exhibit on average higher enrichment in GO categories than modules defined using mutual information. Since our empirical studies involved expression data measured on a variety of platforms and normalized in different ways, we expect that our findings are broadly applicable.
The correlation coefficient is an attractive alternative to the MI for the following reasons. First, the correlation can be accurately estimated with relatively few observations and it does not require the estimation of the (joint) frequency distribution. Estimating the joint density needed for calculating MI typically requires larger sample sizes. Second, the correlation does not depend on hidden parameter choices. In contrast, MI estimation methods involve (hidden) parameter choices, e.g. the number of bins when a discretization method is being used. Third, the correlation allows one to quickly calculate pvalues and false discovery rates since asymptotic tests are available (Additional file 1). In contrast, it is computationally challenging to calculate a permutation test pvalue for the mutual information between two discretized vectors. Fourth, the sign of the correlation allows one to distinguish positive from negative relationships. Signed correlation networks have been found useful in biological applications [22] and our results show that the resulting modules tend to be more significantly enriched with GO terms that those of networks that ignore the sign information. Fifth, modules comprised of highly correlated vectors can be effectively summarized by the module eigennode (the first principal component of scaled vectors). Sixth, the correlation allows for a straightforward angular interpretation, which facilitates a geometric interpretation of network methods and concepts [59]. For example, intramodular connectivity can be interpreted as module eigennode based connectivity.
Our empirical studies show that a signed weighted correlation network transformed via the topological overlap matrix transformation often leads to the most significant functional enrichment of modules. The recently developed maximal information coefficient [35] has clear theoretical advantages when it comes to measuring general dependence patterns between variables but our results show that the biweight midcorrelation coupled with the topological overlap measure outperforms the MIC when it comes to the GO ontology enrichment of resulting coexpression modules.
While defining mutual information for categorical variables is relatively straightforward, no consensus seems to exist in the literature on how to define mutual information for continuous variables. A major limitation of our study is that we only studied MI measures based on discretized continuous variables. For example, the corMI function for relating correlation to MI only applies when an equal width discretization method is used with $\mathit{\text{no.bins}}=\sqrt{m}$.
A second limitation concerns our gene ontology analysis of modules identified in networks based on various association measures in which we found that the correlation based topological overlap measure (TOM) leads to coexpression modules that are more highly enriched with GO terms than those of alternative approaches. A potential problem with our approach is that the enrichment pvalues often strongly depend on (increase with) module sizes, and TOM tends to lead to larger modules. To address this concern, in Additional file 6 we show the enrichment pvalues as a function of module size for modules identified by TOM and by AUV2. It turns out that in most studies, the enrichment of modules defined by TOM is better than that of comparably sized modules defined by AUV2.
A third limitation concerns our use of the bicor correlation measure as opposed to alternatives (e.g. Pearson or Spearman correlation). In our study we find that all 3 correlation measures lead to very similar findings (Additional file 7).
Conclusions
Our simulation and empirical studies suggest that mutual information can safely be replaced by linear regression based association measures (e.g. bicor) in case of stationary gene expression measures (which are represented by quantitative variables). To capture general monotonic relationships between such variables, one can use the Spearman correlation. To capture more complicated dependencies, one can use symmetrized model fitting statistics from a polynomial or spline regression model. Regression based association measures have the advantage of allowing one to include covariates (conditioning variables). In case of categorical variables, mutual information is an appropriate choice since it is equivalent to an association measure (likelihood ratio test statistic) of a generalized linear regression model but categorical variables rarely occur in the context of modeling relationships between gene products.
Materials and Methods
Empirical gene expression data sets description
Brain cancer data set. This data set was composed of 55 microarray samples of glioblastoma (brain cancer) patients. Gene expression profiling were performed with Affymetrix highdensity oligonucleotide microarrays. A detailed description can be found in [61].
SAFHS data set. This data set [62] was derived from blood lymphocytes of randomly ascertained participants enrolled independent of phenotype in the San Antonio Family Heart Study. Gene expression profiles of 1084 samples were measured by Illumina Sentrix Human Whole Genome (WG6) Series I BeadChips.
ND data set. This blood lymphocyte data set consisted of 346 samples from patients with neurological diseases. Illumina HumanRef8 v3.0 Expression BeadChip were used to measure their gene expression profiles.
Yeast data set. The yeast microarray data set was composed of 44 samples from the Saccharomyces Genome Database (http://db.yeastgenome.org/cgibin/SGD/expression/expressionConnection.pl). Original experiments were designed to study the cell cycle [63]. A detailed description of the data set can be found in [64].
Tissuespecific mouse data sets. This study uses 4 tissuespecific gene expression data from a large F_{2} mouse intercross (B × H) previously described in [65, 66]. Specifically, the surveyed tissues include adipose (239 samples), whole brain (221 samples), liver (272 samples) and muscle (252 samples).
Definition of Biweight Midcorrelation
A modified version of biweight midcorrelation is implemented as function bicor in the WGCNA R package [5, 20]. One major argument of the function is “maxPOutliers”, which caps the maximum proportion of outliers with weight w_{ i }= 0. Practically, we find that maxPOutliers = 0. 02 detects outliers efficiently while preserving most data. Therefore, 0. 02 is the value we utilize in this study.
Types of correlation based gene coexpression networks
where τ is the ‘hard’ threshold parameter. Hard thresholding of the correlation leads to simple network concepts (e.g., the gene connectivity equals the number of direct neighbors) but it may lead to a loss of information.
with β ≥ 1. This soft thresholding approach emphasizes strong correlations, punishes weak correlations, and leads to a weighted gene coexpression network.
β is default to 6 for unsigned adjacency and 12 for signed adjacency. The choice of signed vs. unsigned networks depends on the application; both signed [22] and unsigned [30, 61, 65] weighted gene networks have been successfully used in gene expression analysis.
Adjacency function based on topological overlap
The TOM based adjacency function A_{ TOM } is particularly useful when the entries of A^{ original }are sparse (many zeroes) or susceptible to noise. This replaces the original adjacencies by a measure of interconnected that is based on shared neighbors. The topological overlap measure can serve as a filter that decreases the effect of spurious or weak connections and it can lead to more robust networks [17, 18, 68].
Mutualinformation based network inference methods
There are 4 commonly used mutualinformation based network inference methods: RELNET, CLR, MRNET and ARACNE. In order to identify pairwise interactions between numeric variables x_{ i },x_{ j }, all methods start by estimating mutual information MI(x_{ i },x_{ j }).
RELNET
The relevance network (RELNET) approach [6, 28] thresholds the pairwise measures of mutual information by a threshold τ. However, this method suffers from a significant limitation that vectors separated by one or more intermediaries (indirect relationships) may have high mutual information without implying a direct interaction.
CLR
z_{ j } can be defined analogously. In terms of z_{ i },z_{ j }, the score used in CLR algorithm can be expressed as $\left(\right)close="">{z}_{\mathit{\text{ij}}}=\sqrt{{z}_{i}^{2}+{z}_{j}^{2}}$.
MRNET
The score of each pair x_{ i } and x_{ j } will be the maximum score of the one computed when x_{ i } is the target and the one computed when x_{ j }is the target.
ARACNE
The tolerance threshold ε could be chosen to reflect the variance of the MI estimator and should decrease with increasing sample size m. Using a nonzero tolerance ε > 0 can lead to the persistence of some 3vector loops.
The outputs from RELNET, CLR, MRNET or ARACNE are association matrices. They can be transformed into corresponding adjacencies based on the algorithm discussed in Introduction.
MIC
Another mutual information based method is the recently proposed the maximal information coefficient (MIC) [35]. The MIC is a type of maximal informationbased nonparametric exploration (MINE) statistics [35]. In our empirical evaluations, we calculate the MIC using the minerva R package [69].
Fitting indices of polynomial regression models
where^{} denotes the (pseudo) inverse, and ^{ τ }denotes the transpose of a matrix.
In the context of a regression model, R^{2} is also known as the proportion of variation of y explained by the model.
Spline regression model construction
To investigate the relationship between variable x and y, one can use another textbook method from the arsenal of statisticians: spline regression models. Here knots are used to decide boundaries of the subintervals. They are typically prespecified, e.g. based on quantiles of x. The choice of the knots will affect the model fit. It turns out that the values of the knots (i.e. their placement) is not as important as the number of knots. We use the following rule of thumb for the number of knots: if m > 100 use 5 knots, if m < 30 use 3 knots, otherwise use 4 knots.
This function can also be applied to the components of a vector, e.g. (x)_{+} denotes a vector whose negative components have been set to zero. So (x−knot 1)_{+} is a vector whose uth component equals x[u]−knot 1 if x[u]−knot 1 ≥ 0 and 0 otherwise.
The knot parameters (numbers) kno t_{1},kno t_{2},… are chosen before estimating the parameter values. Analogous to polynomial regression, R^{2} can be calculated as the association measure between x and y. This method guarantees the smoothness of the regression line and restrict the influence of each observation to its local subinterval.
Other networks
Boolean network [71] and Probabilistic network [72, 73] are briefly mentioned in Table 1.
Availability of software
Project name: Adjacency matrix for nonlinear relationships
Project home page: http://www.genetics.ucla.edu/labs/horvath/CoexpressionNetwork/Rpackages/WGCNA
Operating system(s): Platform independent
Programming language: R
Licence: GNU GPL 3
The following functions described in this article have been implemented in the WGCNA R package [5]. Function adjacency.polyReg and adjacency.splineReg calculate polynomial and spline regression R^{2} based adjacencies. Users can specify the R^{2} symmetrization method. Function mutualInfoAdjacency calculates the mutual information based adjacencies A^{MI,SymmetricUncertainty} (Eq. 14), A^{MI,UniversalVersion 1} (Eq. 15) and A^{MI,UniversalVersion 2} (Eq. 16). Function AFcorMI implements the F^{cor−MI}prediction function 18 for relating correlation with mutual information.
Abbreviations
 MI:

Mutual information
 Bicor:

Biweight midcorrelation
 MIC:

Maximal information coefficient
 ARACNE:

Algorithm for the reconstruction of accurate cellular networks
 GO:

Gene ontology
 LRT:

Likelihoood ratio test
 TOM:

Topological overlap matrix
 WGCNA:

Weighted correlation network analysis.
Declarations
Acknowledgements
We acknowledge grant support from 1R01 DA03091301, P50CA092131, R01NS058980, and the UCLA CTSI.
Authors’ Affiliations
References
 Eisena M, Spellman P, Brown P, Botstein D: Cluster analysis and display of genomewide expression patterns. Proc Natl Acad Sci U S A 1998, 95(25):14863–14868. 10.1073/pnas.95.25.14863View ArticleGoogle Scholar
 Zhou X, Kao M, Wong W: Transitive Functional Annotation By Shortest Path Analysis of Gene Expression Data. Proc Natl Acad Sci U S A 2002, 99(20):12783–12788. 10.1073/pnas.192159399PubMed CentralView ArticlePubMedGoogle Scholar
 Stuart JM, Segal E, Koller D, Kim SK: A GeneCoexpression Network for Global Discovery of Conserved Genetic Modules. Science 2003, 302(5643):249–255. 10.1126/science.1087447View ArticlePubMedGoogle Scholar
 Zhang B, Horvath S: General framework for weighted gene coexpression analysis. Stat Appl Genet Mol Biol 2005, 4: 17.Google Scholar
 Langfelder P, Horvath S: WGCNA: an R package for weighted correlation network analysis. BMC Bioinformatics 2008, 9: 559. 10.1186/147121059559PubMed CentralView ArticlePubMedGoogle Scholar
 Butte A, Tamayo P, Slonim D, Golub T, Kohane I: Discovering Functional Relationships Between RNA Expression and Chemotherapeutic Susceptibility Using Relevance Networks. Proc Natl Acad Sci U S A 2000, 97: 12182–12186. 10.1073/pnas.220392197PubMed CentralView ArticlePubMedGoogle Scholar
 Daub C, Steuer R, Selbig J, Kloska S: Estimating mutual information using Bspline functions  an improved similarity measure for analysing gene expression data. BMC Bioinformatics 2004, 5: 118. 10.1186/147121055118PubMed CentralView ArticlePubMedGoogle Scholar
 Basso K, Margolin A, Stolovitzky G, Klein U, DallaFavera R, Califano A: Reverse engineering of regulatory networks in human B cells. Nat Genet 2005, 37(4):382–390. 10.1038/ng1532View ArticlePubMedGoogle Scholar
 Margolin A, Nemenman I, Basso K, Wiggins C, Stolovitzky G, Favera R, Califano A: ARACNE: an algorithm for the reconstruction of gene regulatory networks in a mammalian cellular context. BMC Bioinformatics 2006, 7(Suppl 1):S7. 10.1186/147121057S1S7PubMed CentralView ArticlePubMedGoogle Scholar
 Priness I, Maimon O, BenGal I: Evaluation of geneexpression clustering via mutual information distance measure. BMC Bioinformatics 2007, 8: 111. [http://www.biomedcentral.com/1471–2105/8/111] [] 10.1186/147121058111PubMed CentralView ArticlePubMedGoogle Scholar
 Meyer P, Lafitte F, Bontempi G: minet: A R/Bioconductor Package for Inferring Large Transcriptional Networks Using Mutual Information. BMC Bioinformatics 2008, 9: 461. 10.1186/147121059461PubMed CentralView ArticlePubMedGoogle Scholar
 Cadeiras M, Bayern MV, Sinha A, Shahzad1 K, Lim WK, Grenett H, Tabak E, Klingler T, Califano A, Deng MC: Drawing networks of rejection  a systems biological approach to the identification of candidate genes in heart transplantation. J Cell Mol Med 2010, 15(4):949–956.PubMed CentralView ArticleGoogle Scholar
 Allen JD, Xie Y, Chen M, Girard L, Xiao G: Comparing Statistical Methods for Constructing Large Scale Gene Networks. PLoS ONE 2012, 7: e29348. [http://dx.doi.org/10.1371] [] 10.1371/journal.pone.0029348PubMed CentralView ArticlePubMedGoogle Scholar
 Steuer R, Kurths J, Daub CO, Weise J, Selbig J: The mutual information: Detecting and evaluating dependencies between variables. Bioinformatics 2002, 18(Suppl 2):S231S240. 10.1093/bioinformatics/18.suppl_2.S231View ArticlePubMedGoogle Scholar
 Lindlof A, Lubovac Z: Simulations of simple artificial genetic networks reveal features in the use of Relevance Networks. In Silico Biology 2005, 5(3):239–250.PubMedGoogle Scholar
 Ravasz E, Somera A, Mongru D, Oltvai Z, Barabasi A: Hierarchical organization of modularity in metabolic networks. Science 2002, 297(5586):1551–1555. 10.1126/science.1073374View ArticlePubMedGoogle Scholar
 Yip A, Horvath S: Gene Network Interconnectedness and the Generalized Topological Overlap Measure. BMC Bioinformatics 2007, 8(8):22.PubMed CentralView ArticlePubMedGoogle Scholar
 Li A, Horvath S: Network neighborhood analysis with the multinode topological overlap measure. Bioinformatics 2007, 23(2):222–231. 10.1093/bioinformatics/btl581View ArticlePubMedGoogle Scholar
 Hardin J, Mitani A, Hicks L, VanKoten B: A robust measure of correlation between two genes on a microarray. BMC Bioinformatics 2007, 8: 220. 10.1186/147121058220PubMed CentralView ArticlePubMedGoogle Scholar
 Langfelder P, Horvath S: Fast R Functions For Robust Correlations And Hierarchical Clustering. J Stat Softw 2012, 46(i11):1–17.Google Scholar
 Horvath S: Weighted Network Analysis. Applications in Genomics and Systems Biology. New York: Springer Book; 2011.View ArticleGoogle Scholar
 Mason M, Fan G, Plath K, Zhou Q, Horvath S: Signed weighted gene coexpression network analysis of transcriptional regulation in murine embryonic stem cells. BMC Genomics 2009, 10: 327. 10.1186/1471216410327PubMed CentralView ArticlePubMedGoogle Scholar
 Cover T, Thomas J: Elements of information theory. New York: John Wiley Sons; 1991.View ArticleGoogle Scholar
 Paninski L: Estimation of entropy and mutual information. Neural Computation 2003, 15(6):1191–1253. 10.1162/089976603321780272View ArticleGoogle Scholar
 Kraskov A, Stögbauer H, andrzejak R, Grassberger P: Hierarchical Clustering Using Mutual Information. EPL (Europhysics Letters) 2007, 70(2):278.View ArticleGoogle Scholar
 Faith JJ, Hayete B, Thaden JT, Mogno I, Wierzbowski J, Cottarel G, Kasif S, Collins JJ, Gardner TS: LargeScale Mapping and Validation of Escherichia coli Transcriptional Regulation from a Compendium of Expression Profiles. PLoS Biol 2007, 5: e8. [http://dx.doi.org/10.1371] [] 10.1371/journal.pbio.0050008PubMed CentralView ArticlePubMedGoogle Scholar
 Meyer PE, Kontos K, Lafitte F, Bontempi G: InformationTheoretic Inference of Large Transcriptional Regulatory Networks. EURASIP J Bioinforma Syst Biol 2007, 2007: 79879.Google Scholar
 Butte A, Kohane I: Mutual Information Relevance Networks: Functional Genomic Clustering Using Pairwise Entropy Measurments. Pac Symp Biocomput 2000, 418–429.Google Scholar
 Moon YI, Rajagopalan B, Lall U: Estimation of mutual information using kernel density estimators. Phys Rev E 1995, 52(3):2318–2321. 10.1103/PhysRevE.52.2318View ArticleGoogle Scholar
 Oldham M, Konopka G, Iwamoto K, Langfelder P, Kato T, Horvath S, Geschwind D: Functional organization of the transcriptome in human brain. Nat Neurosci 2008, 11(11):1271–1282. 10.1038/nn.2207PubMed CentralView ArticlePubMedGoogle Scholar
 Wolfe C, Kohane I, Butte A: Systematic survey reveals general applicability of ”guiltbyassociation” within gene coexpression networks. BMC Bioinformatics 2005, 6: 227. 10.1186/147121056227PubMed CentralView ArticlePubMedGoogle Scholar
 Langfelder P, Zhang B, Horvath S: Defining clusters from a hierarchical cluster tree: the Dynamic Tree Cut library for R. Bioinformatics 2007, 24(5):719–720.View ArticlePubMedGoogle Scholar
 Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT, Harris MA, Hill DP, IsselTarver L, Kasarskis A, Lewis S, Matese JC, Richardson JE, Ringwald M, Sherlock GMRG: Gene Ontology: tool for the unification of biology. Nature Genetics 2000, 25: 25–29. 10.1038/75556PubMed CentralView ArticlePubMedGoogle Scholar
 Gentleman R, Carey V, Bates D, Bolstad B, Dettling M, Dudoit S, Ellis B, Gautier L, Ge Y, Gentry J, Hornik K, Hothorn T, Huber W, Iacus S, Irizarry R, Leisch F, Li C, Maechler M, Rossini A, Sawitzki G, Smith C, Smyth G, Tierney L, Yang Y, Zhang J: Bioconductor: Open software development for computational biology and bioinformatics. Genome Biol 2004, 5: R80. 10.1186/gb2004510r80PubMed CentralView ArticlePubMedGoogle Scholar
 Reshef DN, Reshef YA, Finucane HK, Grossman SR, McVean G, Turnbaugh PJ, Lander ES, Mitzenmacher M, Sabeti PC: Detecting Novel Associations in Large Data Sets. Science 2011, 334(6062):1518–1524. [http://www.sciencemag.org/content/334/6062/1518.abstract] [] 10.1126/science.1205438PubMed CentralView ArticlePubMedGoogle Scholar
 Faraway J: Practical Regression and Anova using R. R pdf file at 2002 http://cranrprojectorg/doc/contrib/FarawayPRApdf R pdf file at 2002
 D’Haeseleer P, Liang S, Somogyi R: Genetic network inference: from coexpression clustering to reverse engineering. Bioinformatics 2000, 16(8):707–726. [http://dx.doi.org/10.1093/bioinformatics/16.8.707] [] 10.1093/bioinformatics/16.8.707View ArticlePubMedGoogle Scholar
 Markowetz F, Spang R: Inferring cellular networks–a review. BMC bioinformatics 2007, 8(Suppl 6):S5+. [http://dx.doi.org/10.1186/1471–2105–8S6S5] []PubMed CentralView ArticlePubMedGoogle Scholar
 Bansal M, Belcastro V, AmbesiImpiombato A, di Bernardo D: How to infer gene networks from expression profiles. Molecular Systems Biology 2007, 3: 78. [http://dx.doi.org/10.1038/msb4100120] []PubMed CentralView ArticlePubMedGoogle Scholar
 De Smet R, Marchal K: Advantages and limitations of current network inference methods. Nat Rev Micro 2010, 8(10):717–729. [http://dx.doi.org/10.1038/nrmicro2419] []Google Scholar
 Stolovitzky G, MONROE D, Califano A: Dialogue on ReverseEngineering Assessment and Methods. Ann NY Acad Sci 2007, 1115(1):1–22. 10.1196/annals.1407.021View ArticlePubMedGoogle Scholar
 Stolovitzky G, Prill RJ, Califano A: Lessons from the DREAM2 Challenges. Ann NY Acad Sci 2009, 1158: 159–195. 10.1111/j.17496632.2009.04497.xView ArticlePubMedGoogle Scholar
 Prill RJ, Marbach D, SaezRodriguez J, Sorger PK, Alexopoulos LG, Xue X, Clarke ND, AltanBonnet G, Stolovitzky G: Towards a Rigorous Assessment of Systems Biology Models: The DREAM3 Challenges. PLoS ONE 2010, 5(2):e9202. 10.1371/journal.pone.0009202PubMed CentralView ArticlePubMedGoogle Scholar
 Friedman N, Linial M, Nachman I, Pe’er D: Using Bayesian networks to analyze expression data. J Comput Biol 2000, 7(3):601–620. 10.1089/106652700750050961View ArticlePubMedGoogle Scholar
 Perrin B, Ralaivola L: Gene networks inference using dynamic Bayesian networks. Bioinformatics 2003, 19(Suppl 2):II138II148. 10.1093/bioinformatics/btg1071View ArticlePubMedGoogle Scholar
 Friedman N: Inferring cellular networks using probabilistic graphical models. Science 2004, 303(5659):799–805. 10.1126/science.1094068View ArticlePubMedGoogle Scholar
 Li P, Zhang C, Perkins E, Gong P, Deng Y: Comparison of probabilistic Boolean network and dynamic Bayesian network approaches for inferring gene regulatory networks. BMC Bioinformatics 2007, 8(Suppl 7):S13. [http://www.biomedcentral.com/1471–2105/8/S7/S13] [] 10.1186/147121058S7S13PubMed CentralView ArticlePubMedGoogle Scholar
 Yu J, Smith VA, Wang PP, Hartemink AJ, Jarvis ED: Advances to Bayesian network inference for generating causal networks from observational biological data. Bioinformatics 2004, 20(18):3594–3603. [http://bioinformatics.oxfordjournals.org/content/20/18/3594.abstract] [] 10.1093/bioinformatics/bth448View ArticlePubMedGoogle Scholar
 Zhu J, Lum P, Lamb J, HuhaThakurta D, Edwards S, Thieringer R, Berger J, Wu M, Thompson J, Sachs A, Schadt E: An integrative genomics approach to the reconstruction of gene networks in segregating populations. Cytogenet Genome Res 2004, 105: 363–374. 10.1159/000078209View ArticlePubMedGoogle Scholar
 Schadt E, Lamb J, Yang X, Zhu J, Edwards J, GuhaThakurta D, Sieberts S, Monks S, Reitman M, Zhang C, Lum P, Leonardson A, Thieringer R, Metzger J, Yang L, Castle J, Zhu H, Kash S, Drake T, Sachs A, Lusis A: An integrative genomics approach to infer causal associations between gene expression and disease. Nature Genetics 2005, 37(7):710–717. 10.1038/ng1589PubMed CentralView ArticlePubMedGoogle Scholar
 Sima C, Hua J, Jung S: Inference of Gene Regulatory Networks Using TimeSeries Data: A Survey. Curr Genomics 2009, 10(6):416–429. 10.2174/138920209789177610PubMed CentralView ArticlePubMedGoogle Scholar
 Shmulevich I, Dougherty ER, Kim S, Zhang W: Probabilistic Boolean networks: a rulebased uncertainty model for gene regulatory networks. Bioinformatics 2002, 18(2):261–274. [http://bioinformatics.oxfordjournals.org/content/18/2/261.abstract] [] 10.1093/bioinformatics/18.2.261View ArticlePubMedGoogle Scholar
 Lahdesmki H, Hautaniemi S, Shmulevich I, YliHrja O: Relationships between probabilistic Boolean networks and dynamic Bayesian networks as models of gene regulatory networks. Signal Processing 2006, 86(4):814–834. 10.1016/j.sigpro.2005.06.008View ArticleGoogle Scholar
 Schmitt WA, Raab RM, Stephanopoulos G: Elucidation of Gene Interaction Networks Through TimeLagged Correlation Analysis of Transcriptional Data. Genome Research 2004, 14(8):1654–1663. [http://genome.cshlp.org/content/14/8/1654.abstract] [] 10.1101/gr.2439804PubMed CentralView ArticlePubMedGoogle Scholar
 Fernandes JS, Sternberg PW: The tailless Ortholog nhr67 Regulates Patterning of Gene Expression and Morphogenesis in the C. elegans Vulva. PLoS Genet 2007, 3(4):e69. [http://dx.plos.org/10.1371] [] 10.1371/journal.pgen.0030069PubMed CentralView ArticlePubMedGoogle Scholar
 Yan J, Wang H, Liu Y, Shao C: Analysis of Gene Regulatory Networks in the Mammalian Circadian Rhythm. PLoS Comput Biol 2008, 4(10):e1000193. [http://dx.doi.org/10.1371] [] 10.1371/journal.pcbi.1000193PubMed CentralView ArticlePubMedGoogle Scholar
 Altay G, EmmertStreib F: Revealing differences in gene network inference algorithms on the networklevel by ensemble methods. Bioinformatics 2010, 26(14):1738–1744. 10.1093/bioinformatics/btq259View ArticlePubMedGoogle Scholar
 Chaitankar V, Ghosh P, Perkins E, Gong P, Zhang C: Time lagged information theoretic approaches to the reverse engineering of gene regulatory networks. BMC Bioinformatics 2010, 11(Suppl 6):S19. 10.1186/1471210511S6S19PubMed CentralView ArticlePubMedGoogle Scholar
 Horvath S, Dong J: Geometric interpretation of Gene Coexpression Network Analysis. PloS Comput Biol 2008, 4(8):e1000117. 10.1371/journal.pcbi.1000117PubMed CentralView ArticlePubMedGoogle Scholar
 Wiggins C, Nemenman I: Process pathway inference via time series analysis. Experimental Mechanics 2003, 43(3):361–370. 10.1007/BF02410536View ArticleGoogle Scholar
 Horvath S, Zhang B, Carlson M, Lu K, Zhu S, Felciano R, Laurance M, Zhao W, Shu Q, Lee Y, Scheck A, Liau L, Wu H, Geschwind D, Febbo P, Kornblum H, TF C, Nelson S, Mischel P: Analysis of Oncogenic Signaling Networks in Glioblastoma Identifies ASPM as a Novel Molecular Target. Proc Natl Acad Sci U S A 2006, 103(46):17402–7. 10.1073/pnas.0608396103PubMed CentralView ArticlePubMedGoogle Scholar
 Goring HHH, Curran JE, Johnson MP, Dyer TD, Charlesworth J, Cole SA, Jowett JBM, Abraham LJ, Rainwater DL, Comuzzie AG, Mahaney MC, Almasy L, MacCluer JW, Kissebah AH, Collier GR, Moses EK, Blangero J: Discovery of expression QTLs using largescale transcriptional profiling in human lymphocytes. Nat Genet 2007, 39: 1208–1216. 10.1038/ng2119View ArticlePubMedGoogle Scholar
 Spellman PT, Sherlock G, Zhang MQ, Iyer VR, anders K, Eisen MB, Brown PO, Botstein D, Futcher B: Comprehensive Identification of Cell Cycleregulated Genes of the Yeast Saccharomyces cerevisiae by Microarray Hybridization. Mol Biol Cell 1998, 9(12):3273–3297.PubMed CentralView ArticlePubMedGoogle Scholar
 Carlson M, Zhang B, Fang Z, Mischel P, Horvath S, Nelson SF: Gene Connectivity, Function, and Sequence Conservation: Predictions from Modular Yeast Coexpression Networks. BMC Genomics 2006, 7(7):40.PubMed CentralView ArticlePubMedGoogle Scholar
 Ghazalpour A, Doss S, Zhang B, Plaisier C, Wang S, Schadt E, Thomas A, Drake T, Lusis A, Horvath S: Integrating Genetics and Network Analysis to Characterize Genes Related to Mouse Weight. PloS Genetics 2006, 2(2):8. 10.1371/journal.pgen.0020008View ArticleGoogle Scholar
 Fuller T, Ghazalpour A, Aten J, Drake T, Lusis A, Horvath S: Weighted gene coexpression network analysis strategies applied to mouse weight. Mamm Genome 2007, 18(6–7):463–472. 10.1007/s0033500790433PubMed CentralView ArticlePubMedGoogle Scholar
 Wilcox R: Introduction to Robust Estimation and Hypothesis Testing. San Diego: Academic Press; 1997.Google Scholar
 Dong J, Horvath S: Understanding Network Concepts in Modules. BMC Syst Biol 2007, 1: 24. 10.1186/17520509124PubMed CentralView ArticlePubMedGoogle Scholar
 Albanese D, Filosi M, Visintainer R, Riccadonna S, Jurman G, Furlanello C: cmine, minerva and minepy: a C engine for the MINE suite and its R and Python wrappers. ArXiv eprints 2012., 1(24):
 Li H, Zhan M: Unraveling transcriptional regulatory programs by integrative analysis of microarray and transcription factor binding data. Bioinformatics 2008, 24(17):1874–1880. 10.1093/bioinformatics/btn332PubMed CentralView ArticlePubMedGoogle Scholar
 Kauffman S: Metabolic stability and epigenesis in randomly connected nets. J.Theoret.Biol. 1969, 22: 437–467. 10.1016/00225193(69)900150View ArticleGoogle Scholar
 Chen X, Chen M, Ning K: BNArray: an R package for constructing gene regulatory networks from microarray data by using Bayesian network. Bioinformatics 2006. [http://view.ncbi.nlm.nih.gov/pubmed/17005537] []Google Scholar
 Werhli AV, Grzegorczyk M, Husmeier D: Comparative evaluation of reverse engineering gene regulatory networks with relevance networks, graphical gaussian models and bayesian networks. Bioinformatics 2006, 22(20):2523–2531. [http://dx.doi.org/10.1093/bioinformatics/btl391] [] 10.1093/bioinformatics/btl391View ArticlePubMedGoogle Scholar
Copyright
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.