### Empirical gene expression data sets description

**Brain cancer data set.** This data set was composed of 55 microarray samples of glioblastoma (brain cancer) patients. Gene expression profiling were performed with Affymetrix high-density oligonucleotide microarrays. A detailed description can be found in [61].

**SAFHS data set.** This data set [62] was derived from blood lymphocytes of randomly ascertained participants enrolled independent of phenotype in the San Antonio Family Heart Study. Gene expression profiles of 1084 samples were measured by Illumina Sentrix Human Whole Genome (WG-6) Series I BeadChips.

**ND data set.** This blood lymphocyte data set consisted of 346 samples from patients with neurological diseases. Illumina HumanRef-8 v3.0 Expression BeadChip were used to measure their gene expression profiles.

**Yeast data set.** The yeast microarray data set was composed of 44 samples from the Saccharomyces Genome Database (http://db.yeastgenome.org/cgi-bin/SGD/expression/expressionConnection.pl). Original experiments were designed to study the cell cycle [63]. A detailed description of the data set can be found in [64].

**Tissue-specific mouse data sets.** This study uses 4 tissue-specific gene expression data from a large *F*_{2} mouse intercross (B × H) previously described in [65, 66]. Specifically, the surveyed tissues include adipose (239 samples), whole brain (221 samples), liver (272 samples) and muscle (252 samples).

### Definition of Biweight Midcorrelation

Biweight midcorrelation (bicor) is considered to be a good alternative to Pearson correlation since it is more robust to outliers [67]. In order to define the biweight midcorrelation of two numeric vectors *x* = (*x*_{1},…,*x*_{
m
}) and *y* = (*y*_{1},…,*y*_{
m
}), one first defines *u*_{
i
}*v*_{
i
}with *i* = 1,…,*m*:

\begin{array}{c}{u}_{i}=\frac{{x}_{i}-\mathit{\text{med}}\left(x\right)}{9\mathit{\text{mad}}\left(x\right)}\\ {v}_{i}=\frac{{y}_{i}-\mathit{\text{med}}\left(y\right)}{9\mathit{\text{mad}}\left(y\right)}\end{array}

(23)

where *med*(*x*) is the median of *x*, and *mad*(*x*) is the median absolute deviation of *x*. This leads us to the definition of weight *w*_{
i
}for *x*_{
i
}, which is,

{w}_{i}^{\left(x\right)}={(1-{u}_{i}^{2})}^{2}I(1-|{u}_{i}\left|\right)

(24)

where the indicator *I*(1−|*u*_{
i
}|) takes on value 1 if 1−|*u*_{
i
}| > 0 and 0 otherwise. Therefore, \left(\right)close="">\n \n \n \n w\n \n \n i\n \n \n (\n x\n )\n \n \n \n ranges from 0 to 1. It decreases as *x*_{
i
}gets away from *med*(*x*), and stays at 0 when *x*_{
i
}differs from *med*(*x*) by more than 9*mad*(*x*). An analogous weight \left(\right)close="">\n \n \n \n w\n \n \n i\n \n \n (\n y\n )\n \n \n \n can be defined for *y*_{
i
}. Given the weights, we can define biweight midcorrelation of *x* and *y* as:

\mathit{\text{bicor}}(x,y)=\frac{\sum _{i=1}^{m}({x}_{i}-\mathit{\text{med}}(x)){w}_{i}^{\left(x\right)}({y}_{i}-\mathit{\text{med}}(y)){w}_{i}^{\left(y\right)}}{\sqrt{\sum _{j=1}^{m}{\left[({x}_{j}-\mathit{\text{med}}(x)){w}_{j}^{\left(x\right)}\right]}^{2}}\sqrt{\sum _{k=1}^{m}{\left[({y}_{k}-\mathit{\text{med}}(y)){w}_{k}^{\left(y\right)}\right]}^{2}}}.

(25)

A modified version of biweight midcorrelation is implemented as function *bicor* in the WGCNA R package [5, 20]. One major argument of the function is “maxPOutliers”, which caps the maximum proportion of outliers with weight *w*_{
i
}= 0. Practically, we find that *maxPOutliers* = 0*.* 02 detects outliers efficiently while preserving most data. Therefore, 0*.* 02 is the value we utilize in this study.

### Types of correlation based gene co-expression networks

Given the expression profile *x*,the co-expression similarity *s*_{
ij
}between genes *i* and *j* can be defined as:

{s}_{\mathit{\text{ij}}}=|\mathit{\text{cor}}({x}_{i},{x}_{j})|.

An **unweighted network adjacency** *A*_{
ij
}between gene expression profiles *x*_{
i
} and *x*_{
j
} can be defined by hard thresholding the co-expression similarity *s*_{
ij
}as follows

{A}_{\mathit{\text{ij}}}=\left\{\begin{array}{ll}1& \text{if}\phantom{\rule{0.50em}{0ex}}{s}_{\mathit{\text{ij}}}\ge \tau \\ 0& \text{otherwise,}\end{array}\right.

(26)

where *τ* is the ‘hard’ threshold parameter. Hard thresholding of the correlation leads to simple network concepts (e.g., the gene connectivity equals the number of direct neighbors) but it may lead to a loss of information.

To preserve the continuous nature of the co-expression information, we define the **weighted network adjacency** between 2 genes as a power of the absolute value of the correlation coefficient [4, 61]:

{A}_{\mathit{\text{ij}}}={s}_{\mathit{\text{ij}}}^{\beta},

(27)

with *β* ≥ 1. This soft thresholding approach emphasizes strong correlations, punishes weak correlations, and leads to a weighted gene co-expression network.

An important choice in the construction of a correlation network concerns the treatment of strong negative correlations. In **signed networks** negatively correlated nodes are considered unconnected. In contrast, in **unsigned networks** nodes with high negative correlations are considered connected (with the same strength as nodes with high positive correlations). As detailed in [4, 22], a signed weighted adjacency matrix can be defined as follows

{A}_{\mathit{\text{ij}}}={(0.5+0.5\text{cor}({x}_{i},{x}_{j}))}^{\beta}

(28)

and an unsigned adjacency by

{A}_{\mathit{\text{ij}}}=\left|\mathit{\text{cor}}\right({x}_{i},{x}_{j}){|}^{\beta}\phantom{\rule{0.3em}{0ex}}.

(29)

*β* is default to 6 for unsigned adjacency and 12 for signed adjacency. The choice of signed vs. unsigned networks depends on the application; both signed [22] and unsigned [30, 61, 65] weighted gene networks have been successfully used in gene expression analysis.

### Adjacency function based on topological overlap

The topological overlap matrix (TOM) based adjacency function *A*_{
TOM
} maps an original adjacency matrix *A*^{original}to the corresponding topological overlap matrix, i.e.

\begin{array}{c}{A}_{\mathit{\text{TOM}}}{\left({A}^{\mathit{\text{original}}}\right)}_{\mathit{\text{ij}}}\phantom{\rule{0.3em}{0ex}}=\phantom{\rule{0.3em}{0ex}}\frac{\sum _{l\ne i,j}{A}_{\mathit{\text{il}}}^{\mathit{\text{original}}}{A}_{l,j}^{\mathit{\text{original}}}\phantom{\rule{0.3em}{0ex}}+\phantom{\rule{0.3em}{0ex}}{A}_{\mathit{\text{ij}}}^{\mathit{\text{original}}}}{\mathit{\text{min}}(\sum _{l\ne i}{A}_{\mathit{\text{il}}}^{\mathit{\text{original}}},\sum _{l\ne j}{A}_{\mathit{\text{jl}}}^{\mathit{\text{original}}})\phantom{\rule{0.3em}{0ex}}-\phantom{\rule{0.3em}{0ex}}{A}_{\mathit{\text{ij}}}^{\mathit{\text{original}}}\phantom{\rule{0.3em}{0ex}}+\phantom{\rule{0.3em}{0ex}}1}.\end{array}

(30)

The TOM based adjacency function *A*_{
TOM
} is particularly useful when the entries of *A*^{original}are sparse (many zeroes) or susceptible to noise. This replaces the original adjacencies by a measure of interconnected that is based on shared neighbors. The topological overlap measure can serve as a filter that decreases the effect of spurious or weak connections and it can lead to more robust networks [17, 18, 68].

### Mutual-information based network inference methods

There are 4 commonly used mutual-information based network inference methods: RELNET, CLR, MRNET and ARACNE. In order to identify pairwise interactions between numeric variables *x*_{
i
},*x*_{
j
}, all methods start by estimating mutual information *MI*(*x*_{
i
},*x*_{
j
}).

#### RELNET

The relevance network (RELNET) approach [6, 28] thresholds the pairwise measures of mutual information by a threshold *τ*. However, this method suffers from a significant limitation that vectors separated by one or more intermediaries (indirect relationships) may have high mutual information without implying a direct interaction.

#### CLR

The CLR algorithm [26] is based on the empirical distribution of MI. It first defines a score *z*_{
i
} given the mutual information *MI*(*x*_{
i
},*x*_{
j
}) and the sample mean *μ*_{
i
} and standard deviation *σ*_{
i
} of the empirical distribution of mutual information *MI*(*x*_{
i
},*x*_{
k
}),*k* = 1,…,*n*:

{z}_{i}=max\left(0,\frac{\mathit{\text{MI}}({x}_{i},{x}_{j})-{\mu}_{i}}{{\sigma}_{i}}\right).

(31)

*z*_{
j
} can be defined analogously. In terms of *z*_{
i
},*z*_{
j
}, the score used in CLR algorithm can be expressed as \left(\right)close="">\n \n \n \n z\n \n \n ij\n \n \n =\n \n \n \n \n z\n \n \n i\n \n \n 2\n \n \n +\n \n \n z\n \n \n j\n \n \n 2\n \n \n \n \n \n.

#### MRNET

MRNET [27] infers a network by repeating the maximum relevance/minimum redundancy (MRMR) feature selection method for all variables. The MRMR method starts by selecting the variable *x*_{
i
} having the highest mutual information with target y. Next, given a set *S* of selected variables, the criterion updates *S* by choosing the variable *x*_{
k
} that maximizes *u*_{
j
}−*r*_{
j
} where *u*_{
j
} is a relevance term and *r*_{
j
} is a redundancy term. In particular,

\begin{array}{l}{u}_{j}=\mathit{\text{MI}}({x}_{k},y)\end{array}

(32)

\begin{array}{l}{r}_{j}=\frac{1}{\left|S\right|}\sum _{{x}_{i}\in S}\mathit{\text{MI}}({x}_{k},{x}_{i})\end{array}

(33)

The score of each pair *x*_{
i
} and *x*_{
j
} will be the maximum score of the one computed when *x*_{
i
} is the target and the one computed when *x*_{
j
}is the target.

#### ARACNE

The ARACNE [9] (Algorithm for the Reconstruction of Accurate Cellular Networks) developed by Andrea Califano’s group is an extension of RELNET. Given the limitation of RELNET, ARACNE removes the vast majority of indirect candidate interactions using a well-known information theoretic property, the **data processing inequality** (DPI). The DPI applied to association networks states that if variables *x*_{
i
} and *x*_{
j
} interact only through a third variable *x*_{
k
}, then

\mathit{\text{MI}}({x}_{i},{x}_{j})\le \mathit{\text{min}}\left(\mathit{\text{MI}}\right({x}_{i},{x}_{k}),\mathit{\text{MI}}({x}_{k},{x}_{j}\left)\right)

(34)

ARACNE starts with a network graph where each pair of nodes with *M* *I*_{
ij
}> *τ* is connected by an edge. The weakest edge of each triplet, e.g. the edge between *i* and *j*, is interpreted as an indirect interaction and is removed if the difference between *min*(*MI*(*x*_{
i
},*x*_{
k
}),*MI*(*x*_{
k
},*x*_{
j
})) and *MI*(*x*_{
i
},*x*_{
j
}) lies above a threshold *ε*, i.e. the edge is removed if

\mathit{\text{MI}}({x}_{i},{x}_{j})\le \mathit{\text{min}}\left(\mathit{\text{MI}}\right({x}_{i},{x}_{k}),\mathit{\text{MI}}({x}_{k},{x}_{j}\left)\right)-\mathrm{\epsilon .}

(35)

The tolerance threshold *ε* could be chosen to reflect the variance of the MI estimator and should decrease with increasing sample size *m*. Using a non-zero tolerance *ε* > 0 can lead to the persistence of some 3-vector loops.

The outputs from RELNET, CLR, MRNET or ARACNE are association matrices. They can be transformed into corresponding adjacencies based on the algorithm discussed in Introduction.

#### MIC

Another mutual information based method is the recently proposed the maximal information coefficient (MIC) [35]. The MIC is a type of maximal information-based nonparametric exploration (MINE) statistics [35]. In our empirical evaluations, we calculate the MIC using the *minerva* R package [69].

### Fitting indices of polynomial regression models

While networks based on the Pearson correlation can only capture linear co-expression patterns there is clear evidence for non-linear co-expression relationships in transcriptional regulatory networks [70]. The following classical regression based approaches can be used for studying non-linear relationships. The polynomial regression model:

\begin{array}{l}E\left(y\right)={\beta}_{0}1+{\beta}_{1}x+{\beta}_{2}{x}^{2}\dots +{\beta}_{d}{x}^{d}\\ \phantom{\rule{2em}{0ex}}\phantom{\rule{1pt}{0ex}}=M\beta \phantom{\rule{0.3em}{0ex}},\end{array}

(36)

where

M=[1,x,\dots ,{x}^{d}]\phantom{\rule{0.3em}{0ex}}.

(37)

One can show that the least squares estimate of the parameter vector \widehat{\beta} is

\widehat{\beta}={\left({M}^{\tau}M\right)}^{-}{M}^{\tau}\phantom{\rule{1em}{0ex}}y,

where^{-} denotes the (pseudo) inverse, and ^{τ}denotes the transpose of a matrix.

Given \widehat{\beta}, we can calculate the fitting index *R*^{2} as:

{R}^{2}=\mathit{\text{cor}}{(y,\u0177)}^{2}=\mathit{\text{cor}}{(y,M\widehat{\beta})}^{2}

(38)

In the context of a regression model, *R*^{2} is also known as the proportion of variation of y explained by the model.

### Spline regression model construction

To investigate the relationship between variable *x* and *y*, one can use another textbook method from the arsenal of statisticians: spline regression models. Here knots are used to decide boundaries of the sub-intervals. They are typically pre-specified, e.g. based on quantiles of *x*. The choice of the knots will affect the model fit. It turns out that the values of the knots (i.e. their placement) is not as important as the number of knots. We use the following rule of thumb for the number of knots: if *m* > 100 use 5 knots, if *m* < 30 use 3 knots, otherwise use 4 knots.

To ensure that fit between *y* and *x* satisfies a continuous relationship, we review the **hockey stick function**()_{+} to transform *x* :

{\left(s\right)}_{+}=\left\{\begin{array}{ll}s& \text{if}\phantom{\rule{1em}{0ex}}\text{s}\ge 0\\ 0& \text{if}\phantom{\rule{1em}{0ex}}\text{s}<0.\end{array}\right.

(39)

This function can also be applied to the components of a vector, e.g. (*x*)_{+} denotes a vector whose negative components have been set to zero. So (*x*−*knot* 1)_{+} is a vector whose u-th component equals *x*[*u*]−*knot* 1 if *x*[*u*]−*knot* 1 ≥ 0 and 0 otherwise.

We are now ready to describe **cubic spline regression model**, which fits polynomial of degree 3 to sub-intervals. The general form of a cubic spline with 2 knots is as follows

\begin{array}{l}E\left(y\right)={\beta}_{0}1+{\beta}_{1}x+{\beta}_{2}{x}^{2}+{\beta}_{3}{x}^{3}+{\beta}_{4}{(x-\mathit{\text{kno}}{t}_{1})}_{+}^{3}\\ \phantom{\rule{3em}{0ex}}+{\beta}_{5}{(x-\mathit{\text{kno}}{t}_{2})}_{+}^{3}.\end{array}

(40)

The knot parameters (numbers) *kno* *t*_{1},*kno* *t*_{2},… are chosen before estimating the parameter values. Analogous to polynomial regression, *R*^{2} can be calculated as the association measure between *x* and *y*. This method guarantees the smoothness of the regression line and restrict the influence of each observation to its local sub-interval.

### Other networks

Boolean network [71] and Probabilistic network [72, 73] are briefly mentioned in Table 1.

### Availability of software

**Project name:** Adjacency matrix for non-linear relationships

Project home page: http://www.genetics.ucla.edu/labs/horvath/CoexpressionNetwork/Rpackages/WGCNA

**Operating system(s):** Platform independent

**Programming language:** R

**Licence:** GNU GPL 3

The following functions described in this article have been implemented in the WGCNA R package [5]. Function *adjacency.polyReg* and *adjacency.splineReg* calculate polynomial and spline regression *R*^{2} based adjacencies. Users can specify the *R*^{2} symmetrization method. Function *mutualInfoAdjacency* calculates the mutual information based adjacencies *A*^{MI,SymmetricUncertainty} (Eq. 14), *A*^{MI,UniversalVersion 1} (Eq. 15) and *A*^{MI,UniversalVersion 2} (Eq. 16). Function *AFcorMI* implements the *F*^{cor−MI}prediction function 18 for relating correlation with mutual information.