Additional File 1 -detailed Explanation of the Expression Level Cpd

As mentioned in the main text, the main CPD for the biclustering model consists of two individual factors: ID array e B array e B gene e level e P ID array e B array e B gene e level e P ID array e B array e B gene e level e P   (1.1) This factor describes the main conditional probability that an expression level belongs to a distribution, determined by the gene to bicluster assignment e.gene.B, the array to bicluster assignment e.array.B and a specific array ID e.array.ID. Below we first define the CPD P 1 (…) for three specific situations: one in which the expression value was assigned to the background, one in which the expression value was assigned to a single bicluster and one in which the expression value is assigned to different overlapping biclusters. Based on these specific definitions (indicated by a *) we introduce the generalized definition of P 1 (…) that covers all three situations. If the expression level is not part of any bicluster (e.gene.B ∩ e.array.B = ø), it is assigned to a virtual bicluster with index-1 that describes the background. This bicluster is described with separate Normal distributions (µ a bgr ,σ a bgr), one for each array a. The parameters of these distributions are fixed and derived a priori from the dataset using a robust estimation.

This factor describes the main conditional probability that an expression level belongs to a distribution, determined by the gene to bicluster assignment e.gene.B, the array to bicluster assignment e.array.B and a specific array ID e.array.ID.Below we first define the CPD P 1 (…) for three specific situations: one in which the expression value was assigned to the background, one in which the expression value was assigned to a single bicluster and one in which the expression value is assigned to different overlapping biclusters.Based on these specific definitions (indicated by a *) we introduce the generalized definition of P 1 (…) that covers all three situations.

Situation 1: background distributions
If the expression level is not part of any bicluster (e.gene.B ∩ e.array.B = ø), it is assigned to a virtual bicluster with index -1 that describes the background.This bicluster is described with separate Normal distributions (µ a bgr ,σ a bgr ), one for each array a.The parameters of these distributions are fixed and derived a priori from the dataset using a robust estimation.

Situation 2: biclusters without overlap
If no overlap occurs between different biclusters, each expression level can only be assigned to exactly one bicluster, each of which is modeled with Normal distributions with parameters (µ,σ).
The values of these parameters depend on the gene to bicluster and array to bicluster assignments (g.B, a.B) and on the unique array identifier a.ID.
The probability P 1 (…) to observe an expression level that belongs to a single bicluster only, is defined as:

Situation 3: overlapping biclusters
When different biclusters overlap, an expression level can belong to multiple biclusters.To avoid overfitting it seems appropriate to model the overlap region using the parameter sets that were already defined for the individual biclusters (situation 2, i.e., one parameter set per arraybicluster combination).For example, by relying on a definition of the overlap, P 1 (…) would be assigned a high probability if the expression levels either approximate the sum, average, weighted sum, minimum, or the maximum, etc. of the probability distributions in the contributing biclusters.
In our model we choose for an overlap model where the probability of an expression level in the overlap region is defined as the geometric mean of the probabilities assigned to the expression levels based on the distribution of the individual biclusters.For computational reasons, we assumed that the standard deviations of the distributions of the overlapping biclusters are almost identical and that an expression level can maximally belong to two biclusters and.Formally, P 1 (…) can then be defined as:

Generalized formula
The following notation covers all situations mentioned above: terms in the log-likelihood or log-posterior distributions (such as the Bayesian information criterion (BIC) [1] or the Akaike information criterion (AIC) [2]) would lead to computational intractability if an Expectation-Maximization algorithm is used to find the MAP solution.The optimization algorithm assumes independent optimizations per gene or per array in the substeps of the EM procedure.This independency does no longer exist if one of the criteria mentioned above is included in the model.
Therefore, an alternative strategy is used to reduce model complexity by introducing a 'penalty' factor P 2 (…).The additional penalty factor P 2 (…) is defined such that it only allows a set of expression levels to be included in a bicluster if they are on average N times more likely to be in their respective bicluster distributions than in their background distributions.The factor P 2 (…) decomposes similarly to P 1 (…), leading to the following expression: 2 (e.level | b) = π bicl describes the probability that the expression level belongs to a bicluster other than the background (b ≠ -1).This implies that a subset of expression levels E s for a particular gene or array will be assigned to a bicluster if Equation (1.6) holds: The user-defined ratio bgr bicl   indicates how many times more likely it must be on average that an expression value is part of the bicluster distribution compared to being part of the background distribution before such a set of expression values E s is actually added to the bicluster.To guide the user in determining this ratio, we assume there exists one or more sets of genes in the dataset that are known to be coexpressed.In most practical biological situations, such known sets of genes exist (e.g., a set of operon genes).If such a set would not be available, standard clustering techniques can also be used to identify one or more these clusters.given that a set of genes is known to be coexpressed.We calculate for every array a, the probability that is generated by a bicluster distribution to which it is assigned versus its score of being generated by the background distribution.The difference between these two probability scores, is defined as δ.If the conditions under which these genes are coexpressed also known in advance (see Figure 1.1 (top panel)), we use the known labels that indicate whether or not the arrays belong to the bicluster to train a classifier.This implies determining the optimal threshold of δ so that the global error rate of misclassifying an array with known label is minimized (= the product of the false positive rate and the false negative rate).If the conditions are unknown in advance, a plot of sorted δ's is made.The suggested δ is the one that makes the best distinction between arrays with a low δ and arrays with a high δ value (cut-off point) as shown in Figure 1.1 (bottom panel).

1 . 2 .
implicitly covered in the notation of situation 3 as it can be formulated as a special case of 'overlap' with only one bicluster.Situation 1 is covered by the use of the virtual bicluster with index -1.This background bicluster can by definition not overlap with any other bicluster.The definition of the set iset(B i e ) is also slightly different from how it was defined in situation background distribution, the product is over the setb  [-1] so iset(B i e ) = [-1].B i e not empty: bicluster distribution, the product is over the set of biclusters in the intersection and never includes b = -1 by definition.CPD factor 2: P 2 (e.level | e.gene.B, e.array.B, e.array.ID) Without penalizing for model complexity, the MAP solution would include a very large number of biclusters since each additional bicluster introduces additional degrees of freedom to model the expression values.Models with many biclusters can better explain the data and thus result in higher MAP solutions.Reducing model complexity in a traditional way by including additional .level | b) = π bgr describes that probability that the expression level belongs to the background bicluster (b = -1) and P

Figure 1 .
1 illustrates how to choose the ratio bgr bicl  