Mutual information networks are a subcategory of network inference methods. The rationale of this family of methods is to infer a link between a couple of nodes if it has a high score based on mutual information [9].

Mutual informaton network inference proceeds in two steps. The first step is the computation of the mutual information matrix (MIM), a square matrix whose *i, j*-th element

*M I M*_{
ij
}= *I*(*X*_{
i
}; *X*_{
j
}) (1)

is the mutual information between *X*
_{
i
}and *X*
_{
j
}, where *X*
_{
i
}∈
, *i* = 1,...,*n*, is a discrete random variable denoting the expression level of the *i*th gene. The second step is the computation of an edge score for each pair of nodes by an inference algorithm that takes the MIM matrix as input.

The adoption of mutual information in network inference tasks can be traced back to the Chow and Liu's tree algorithm [13, 14]. Mutual information provides a natural generalization of the correlation since it is a non-linear measure of dependency. Hence with mutual information generalized correlation networks (relevance networks [7]) and also conditional independence graphs (e.g. ARACNE [8]) can be built. An advantage of these methods is their ability to deal with up to several thousands of variables also in the presence of a limited number of samples. This is made possible by the fact that the MIM computation requires only
estimations of a bivariate mutual information term. Since each bivariate estimation can be computed fastly and is low variant also for a small number of samples, this family of methods is adapted for dealing with microarray data. Note that since mutual information is a symmetric measure, it is not possible to derive the direction of an edge using a mutual information network inference technique. Notwithstanding the orientation of the edges can be obtained by using algorithms like IC which are well known in the graphical modelling community [15].

### 1.1 Relevance Network

The relevance network approach [7] has been introduced in gene clustering and was successfully applied to infer relationships between RNA expressions and chemotherapeutic susceptibility [6]. The approach consists in inferring a genetic network where a pair of genes {*X*
_{
i
}, *X*
_{
j
}} is linked by an edge if the mutual information *I*(*X*
_{
i
}; *X*
_{
j
}) is larger than a given threshold *I*
_{0}. The complexity of the method is *O*(*n*
^{2}) since all pairwise interactions are considered.

Note that this method does not eliminate all the indirect interactions between genes. For example, if gene *X*
_{1} regulates both gene *X*
_{2} and gene *X*
_{3}, this would cause a high mutual information between the pairs {*X*
_{1}, *X*
_{2}}, {*X*
_{1}, *X*
_{3}} and {*X*
_{2}, *X*
_{3}}. As a consequence, the algorithm will set an edge between *X*
_{2} and *X*
_{3} although these two genes interact only through gene *X*
_{1}.

### 1.2 CLR Algorithm

The CLR algorithm [

4] is an extension of the relevance network approach. This algorithm computes the mutual information for each pair of genes and derives a score related to the empirical distribution of the MI values. In particular, instead of considering the information

*I*(

*X*
_{
i
};

*X*
_{
j
}) between genes

*X*
_{
i
}and

*X*
_{
j
}, it takes into account the score

where

and *μ*
_{
i
}and *σ*
_{
i
}are respectively the sample mean and standard deviation of the empirical distribution of the values *I*(*X*
_{
i
}, *X*
_{
k
}), *k* = 1,...,*n*. The CLR algorithm was successfully applied to decipher the *E. Coli* TRN [4]. CLR has a complexity in *O*(*n*
^{2}) once the MIM is computed.

### 1.3 ARACNE

The Algorithm for the Reconstruction of Accurate Cellular Networks (ARACNE) [8] is based on the Data Processing Inequality [16]. This inequality states that, if gene *X*
_{1} interacts with gene *X*
_{3} through gene *X*
_{2}, then*I*(*X*
_{1}; *X*
_{3}) ≤ min (*I*(*X*
_{1}; *X*
_{2}), *I*(*X*
_{2}; *X*
_{3})).

ARACNE starts by assigning to each pair of nodes a weight equal to the mutual information. Then, as in relevance networks, all edges for which *I*(*X*
_{
i
}; *X*
_{
j
}) <*I*
_{0} are removed, with *I*
_{0} a given threshold. Eventually, the weakest edge of each triplet is interpreted as an indirect interaction and is removed if the difference between the two lowest weights is above a threshold *W*
_{0}. Note that by increasing *I*
_{0} the number of inferred edges is decreased while the opposite effect is obtained by increasing *W*
_{0}.

If the network is a tree and only pairwise interactions are present, the method guarantees the reconstruction of the original network, once it is provided with the exact MIM. ARACNE's complexity is *O*(*n*
^{3}) since the algorithm considers all triplets of genes. In [8] the method was able to recover components of the TRN in mammalian cells and outperformed Bayesian networks and relevance networks on several inference tasks [8].

### 1.4 MRNET

MRNET [9] infers a network using the maximum relevance/minimum redundancy (MRMR) feature selection method [17, 18]. The idea consists in performing a series of supervised MRMR gene selection procedures where each gene in turn plays the role of the target output.

The MRMR method has been introduced in [

17,

18] together with a best-first search strategy for performing filter selection in supervised learning problems. Consider a supervised learning task where the output is denoted by

*Y* and

*V* is the set of input variables. The method ranks the set

*V* of inputs according to a score that is the difference between the mutual information with the output variable

*Y* (maximum relevance) and the average mutual information with all the previously ranked variables (minimum redundancy). The rationale is that direct interactions (i.e. the most informative variables to the target

*Y*) should be well ranked whereas indirect interactions (i.e. the ones with redundant information with the direct ones) should be badly ranked by the method. The greedy search starts by selecting the variable

*X*
_{
i
}having the highest mutual information to the target

*Y*. The second selected variable

*X*
_{
j
}will be the one with a high information

*I*(

*X*
_{
j
};

*Y*) to the target and at the same time a low information

*I*(

*X*
_{
j
};

*X*
_{
i
}) to the previously selected variable. In the following steps, given a set

*S* of selected variables, the criterion updates

*S* by choosing the variable

that maximizes the score

*s*_{
j
}= *u*_{
j
}- *r*_{
j
}, (4)

where *u*
_{
j
}is a relevance term and *r*
_{
j
}is a redundancy term. More precisely,*u*
_{
j
}= *I*(*X*
_{
j
}; *Y*)

is the mutual information of

*X*
_{
j
}with the target variable

*Y*, and

measures the average redundancy of *X*
_{
j
}to each already selected variables *X*
_{
k
}∈ *S*. At each step of the algorithm, the selected variable is expected to allow an efficient trade-off between relevance and redundancy. It has been shown in [19] that the MRMR criterion is an optimal "pairwise" approximation of the conditional mutual information between any two genes *X*
_{
i
}and *X*
_{
j
}given the set *S* of selected variables *I*(*X*
_{
i
}; *X*
_{
j
}|*S*).

The MRNET approach consists in repeating this selection procedure for each target gene by putting *Y* = *X*
_{
i
}and *V* = *X* \ {*X*
_{
i
}}, *i* = 1,...,*n*, where *X* is the set of the expression levels of all genes. For each pair {*X*
_{
i
}, *X*
_{
j
}}, MRMR returns two (not necessarily equal) scores *s*
_{
i
}and *s*
_{
j
}according to (4). The score of the pair {*X*
_{
i
}, *X*
_{
j
}} is then computed by taking the maximum of *s*
_{
i
}and *s*
_{
j
}. A specific network can then be inferred by deleting all the edges whose score lies below a given threshold *I*
_{0} (as in relevance networks, CLR and ARACNE). Thus, the algorithm infers an edge between *X*
_{
i
}and *X*
_{
j
}either when *X*
_{
i
}is a well-ranked predictor of *X*
_{
j
}(*s*
_{
i
}> *I*
_{0}) or when *X*
_{
j
}is a well-ranked predictor of *X*
_{
i
}(*s*
_{
j
}> *I*
_{0}).

An effective implementation of the best-first search for quadratic problems is available in [20]. This implementation demands an *O*(*f* × *n*) complexity for selecting *f* features using a best first search strategy. It follows that MRNET has an *O*(*f* × *n*
^{2}) complexity since the feature selection step is repeated for each of the *n* genes. In other terms, the complexity ranges between *O*(*n*
^{2}) and *O*(*n*
^{3}) according to the value of *f*. In practice the selection of features stops once a variable obtains a negative score.

#### Implementation of the inference algorithms in *minet*

All the algorithms discussed above are available in the *minet* package. The RELNET algorithm is implemented by simply running the command build.mim which returns the MIM matrix which can be considered as a weighted adjacency matrix of the network. CLR, ARACNE and MRNET are implemented by the commands aracne(mim), clr(mim), mrnet(mim) respectively that return a weighted adjacency matrix of the network.

It should be noted, that the modularity of the *minet* package makes possible to assess network inference methods on similarity matrices other than MIM [21].