The GGM [21] is a class of models for multivariate Normal distributions. In general, let x = (x
_{1}, . . . , x
_{
p
})' be a pdimensional Normal random vector with mean µ and covariance matrix Σ. In our application, p = 3 corresponds to the {gene,microRNA,clinical outcome} triplet detailed in the previous section. For simplicity, we assume throughout that µ = 0 (i.e., the data is centered). Let
denote the inverse covariance, also known as the concentration matrix with elements
. Then the partial correlation between x
_{
i
} and x
_{
j
} given all the other variables is
. Thus
if and only if x
_{
i
} and x
_{
j
} are conditionally independent given all other variables.
The GGM can be represented by an undirected graph G = (V, E), where V is a set of vertices representing the variables and E is a set of undirected edges indicating the relationships among the variables. The graph represents the model by drawing an edge between vertices i and j when
. Complete graphs are defined as graphs having (i, j) ∈ E for every i, j ∈ V. A graph C is called a clique if it is a maximal complete subgraph. A graph S is called a separator if it is the overlap of two cliques. We denote the sets of cliques and separators of a graph by C and S, respectively (for details see [22]). For example, in Figure 2, the cliques for the independent model are {C, M} and {C, G}, and the separator is {C}.
There are various approaches to perform model selection in the GGMs. For example, Whittaker proposed the traditional stepwise forwardselection or backward deletion approach for small sized GGMs [23]. Bayesian shrinkage approaches for large scale gene networks are described in [24–26]. Here, we follow a Bayesian approach to solve this problem for two reasons: 1) the proposed Bayesian approach provides us an automatic objective prior for the model selection, 2) the resultant Bayes factors from the proposed Bayesian approach have a closed form, hence the algorithm is computational efficient.
Suppose we observe
n samples (
x
_{1}, . . . ,
x
_{n}) of
pdimensional vectors from an unknown decomposable graph
G, where each
x
_{i} ~ N (0, Σ), with unknown covariance matrix Σ. Let
X be the
n × p matrix of observed data. The posterior distribution of the graph
G given
X can be expressed as,
where π(G) is the prior probability of the graph G, and π(ΣG) is the prior for Σ given G.
One major problem in model selections is that the integral in equation (1) is very sensitive to different choices of the prior, and depends on sample size [22, 27]. Hence, in model selection, it is critical to choose an appropriate π(ΣG) that should at least have the two properties: a) π(ΣG) should be a conjugate prior for practical reasons (e.g., computation efficiency); b) both improper priors and vague proper priors cannot be used since they may confound graph selections.
A conjugate prior was first proposed as π(ΣG) ~ HIW
_{
G
}(σ, τ I) for a decomposable graph G [27], where HIW denotes a hyperinverse Wishart distribution [28] and the scale matrix τ I is proportional to the identity matrix. The scale parameter τ needs to be carefully chosen to be close to the real scale of the data for this group of priors. A small scale will result in lack of discrimination among graphs we compare, and a large scale will overpower the likelihood.
An alternative construction of this prior,
, was suggested in [29], where g is between 0 and 1. This prior is referred to as HIW gprior, similar to Zellner's gprior in linear regression [30]. It is shown that this prior corresponds to the implied fractional prior for selecting a graph using fractional Bayes factors [29] .
The fractional Bayes factor was motivated by the partial Bayes factor, which uses a part of samples to train the noninformative priors as proper priors, and the remaining data is employed to perform model comparisons [31]. It is usually applied when prior information is weak. The parameter g in the HIW gprior can be viewed as the fraction of the likelihood used for training the noninformative prior. However, if we interpret HIW gprior as the fractional Bayes factors, we cannot impose a hyper prior on g because g is no longer a model parameter. Instead, it represents the fractional power of the likelihood which is used for training the noninformative prior. Intuitively, we want to save as much of the data as possible to choose between models. Hence, we simply choose g = 1/n, equivalent to letting the training sample size to equal 1. In summary, there are several advantages to using this HIW gprior: a) The conjugate property of the prior produces closedform Bayes factors; thus the selection of thousands of models becomes feasible; b) compared to using the conventional prior (whose results depend largely on the arbitrary choice of a constant), using HIW gprior (which corresponds to the implied fractional prior for (ΣG)), automatically provides us an objective Bayesian approach for the model selection; c) the Bayes factors based on the HIW gprior is information consistent [29].
Bayes factor calculation: Let
G
_{0} denote the null graph having no edges, and let
G
_{
A
} denote the graph to be compared with the null. The Bayes factor for comparing these two models is
where
and
are the cliques and separators of
G
_{
A
} and
For the eight possible graphical models shown in Figure
2, let
G
_{0} denote the null model and
G
_{1} to
G
_{7} denote the seven network models, respectively. The algorithm for the iNET approach can be described as follows:

Step 1: We obtain the Bayes factor values for comparing the seven models with the null model by applying Equation (2) to each triplet.

Step 2: We sort BF (G
_{1} : G
_{0}) to BF (G
_{7} : G
_{0}) in a decreasing order; denote the sorted Bayes factors as BF
_{(1)} to BF
_{(7)} and their corresponding graphs as G
_{(1)} to G
_{(7)}

Step 3: If BF
_{(1)}
/BF
_{(2)} is greater than 3 (this cutoff is determined according to the scale for interpretation of Bayes factors [32]), we conclude that this triplet supports the model G
_{(1)}. Otherwise, we conclude that there is not enough evidence to suggest a specific model.

Step 4: Given the selected model, we calculate the maximum likelihood estimates for the strength of the edges (the conditional correlations). For the threeway model, causal model, inverse model, and zeroeffect model, if the strength of the edge between G and M is positive for a triplet, we filter out this triplet since it contradicts with the biological fact that the microRNAs are typically negatively associated with mRNAs.