Bayesian probabilistic network modeling from multiple independent replicates

Often protein (or gene) time-course data are collected for multiple replicates. Each replicate generally has sparse data with the number of time points being less than the number of proteins. Usually each replicate is modeled separately. However, here all the information in each of the replicates is used to make a composite inference about signal networks. The composite inference comes from combining well structured Bayesian probabilistic modeling with a multi-faceted Markov Chain Monte Carlo algorithm. Based on simulations which investigate many different types of network interactions and experimental variabilities, the composite examination uncovers many important relationships within the networks. In particular, when the edge's partial correlation between two proteins is at least moderate, then the composite's posterior probability is large.


Introduction
Often the laboratory collection of protein phosphorylation time-course data results not in a single set of timecourse data, but in multiple sets of time-course data. Typically the data are sparse: the number t of time points is significantly less than the number k of proteins. Even though there are differences between these data sets, the underlying biochemical interactions (signal) are reflected in each of these data sets. Many times these individual sets of time course protein data are modeled individually. The discussion in this paper focuses on protein measurements. However, it equally applies to sparse time course measurements obtained through gene microarrays.
The methods in this paper incorporate data from multiple replications of a systems biology investigation to determine composite posterior probabilities of network relationships. These methods are motivated by the desire to predict interactions between proteins based on probabilistically incorporating all of the data from several independent investigations. Utilizing underlying Gaussian-based regression likelihood, low informative empirical priors and Bayesian model averaging, closed form (up to a proportional constant) posterior probabilities are computed of networks, each of which is a directed acyclic graph (DAG). Extensive searching through the space of DAGs is performed with a multi structured Metropolis-Hastings Markov Chain Monte Carlo based algorithm. These model DAG posterior probabilities are combined with Bayesian model averaging [1] to produce posterior probabilities for relationships between the proteins.
Since the combined likelihood from our m independent replications has approximately m times more information, whether Akaike, Bayesian, Dirichlet information criteria or Fisher information, than that of a single replication, the combined technique tends to yield more precise estimates for posterior probabilities [2,3]. Also in this paper, simulations demonstrate that this combined analysis captures more of a network's signal.
In a previous paper [4], an approximate Bayesian posterior analysis for a single sparse replication was developed. As with the current paper, it used multiple regression to model cotemporal associations between the proteins' measurements, where each sampled time provided insight into the proteins' relationships. Diagnostics to test the suitability of this [4] method for a particular data set were presented. These diagnostics easily can be employed on each separate replication of a multiple replication study in order to test the current paper's suitability for a particular multiple replication study. Furthermore, many of the theoretical justifications from [4] carry over to the multiple replication setting. In particular, the previous and current methodology strongly relates to low-order (small number of predictors) dependency networks [5][6][7]. The use of DAGs, with the proteins being the nodes and directed edges signifying the relations, allows the splitting of a replicates' likelihood into conditionally independent parts [6,[8][9][10]. While in [4] approximate posterior probabilities were obtained for a single replication through the use of Bayesian information based scores, in the current paper an exact Bayesian posterior probability is obtained for the combined m replications.
Recent papers have examined combining data from multiple studies. In [11], a score function based on the expected number of associations was developed, and its results were weighted in a Bayesian fashion with supplementary information from gene ontology and protein structures. In a frequentist, non-Bayesian manner, the authors in [12] weighted different studies so as to maximize the statistical power (chance of claiming a true positive). They also obtained integrated p-value estimates. In [13], linear programming was used to find the subnetworks which are most consistent from one replication to the next.

Methods
Edge probabilities are computed for an undirected graph where the nodes represent individual proteins and where an edge between two nodes represents a relationship between the two corresponding proteins. These edge probabilities are based on an algorithmic search through the space of all models (DAGs) guided by the posterior probability of the DAGs. This DAG posterior probability takes into account all of the data sets. Verification of the effectiveness of this technique requires simulations. A simulation consists of generating multiple sets of data from the same underlying signal. The following discussions focus on each of the following important ideas: DAG posterior probability, algorithmic search, testing data sets and analysis techniques.

Posterior probability
Our mathematical space of network models consists of directed acyclic graphs. The vertices represent proteins and the directed edges signify parents-child linear associations between the proteins. In particular, the set of parents (predictors) of a particular child (response) is the set of vertices which have directed edges going from the parent to the child. In order to give equal consideration to each child and each potential parent, each protein's time course values within a data set are standardized, using its average and standard deviation. The number of parents for any particular child is restricted to be less than or equal to t -2, guaranteeing valid regression settings [14]. Acyclic refers to not allowing cycles in the graph, i.e. not allowing a protein to be a direct descendent of itself. In this paper, we present a theoretically strong probabilistic method which comprehensively incorporates multiple data sets. For convenience, the data sets are referred to as reps, even though the reps may have different signal parameters and may differ in their variances about the underlying signal network. Separately for each DAG, rep and child combination, we utilize independent unit-informative empirical g-priors for the slope parameters of the parents-child linear regression relations that are specified by the DAG [15]. As well, independent unit informative inverse gamma priors are independently placed on the residual variances. Thus, due to the prior structure, the reps' data sets are independent from one replicate to the next. In addition, due to the DAG structure, each child's conditional likelihood is independent from that of another child [8]. Therefore a particular DAG's Bayes factor, is where Y represents the (standardized) data, and Y (j) i is the data for child i in rep j. Due to the conjugacy nature of the priors, the child i rep j Bayes factor, P r (Y (j) i |DAG) , has a closed form expression [15,Chapter 9]. Specifically, this Bayes factor is given by where P(i) is the number of parents of child i for the DAG, X (j) i is child i's parents data matrix for rep j for the DAG, s 2 ij is the corresponding residual regression (error) variance, and The simple value of P r (Y (j) i |DAG) when i has no parents is a consequence of the standardized data.
From Equations (1) and (2), a closed form is easily found for the DAG's overall Bayes factor. As is common, we assume that the prior probability of one model (DAG) is the same as that of another. This yields that the posterior probability of a particular DAG, given all of the data, is proportional to the DAG's (overall) Bayes factor. Since for even a moderate number of proteins the DAG space is too large for a census, an intelligent search algorithm must be used.

Algorithmic search
In a Markov chain manner, the Metropolis-Hastings algorithm moves through the DAG space. From equations (1) and (2), for any element, a DAG, its Bayes factor (proportional posterior probability) can be computed. Given a current element in the search space, the algorithm generates a candidate element from the current one. If the probability of the candidate is greater than the probability of the current element, then the candidate replaces the current element. If the probability of the candidate is not an improvement over the probability of the current element, then the candidate replaces the current element with a probability of  [16].
The Metropolis-Hastings algorithm used in this research is a variation of the one presented in [4,17,18]. This paper's single rep algorithm is a pure Bayes posterior modification of the previous Bayesian information criterion (BIC) based approximation algorithm [4]. The multiple rep algorithm has similarities to the single rep algorithm, but it moves through the DAG space based on Bayes posterior probabilities after incorporating multiple reps, and it allows multiple edge insertions and deletions.
The use of a single move, a single insertion or deletion of an edge, in a Metropolis-Hastings search is common and is motivated by the Metropolis requirement that all moves be reversible with equal probability of a move and its inverse [16]. In the multiple rep algorithm, single, double, or triple reversible moves are allowed. Each vertex in the directed acyclic graphs has bounded indegree (a maximum number of parents of a given child), typically 3. This condition must be enforced as well. The implementation of multiple moves is straightforward. First, the number of changes (1, 2 or 3) is chosen uniformly. Second, using this chosen number, either edge insertions or deletions are selected and applied yielding a candidate directed acyclic graph. Third, if the candidate directed acyclic graph is found to be infeasible then the process of choosing a candidate starts over.

Testing data
In order to assess the quality of the models found by the multiple rep algorithm, it is necessary to engineer replicate data where the underlying signal is known. For this paper, five studies of multiple replicate data are generated from known underlying signals.
The simulated sets of data are sampled directly from multivariate normal distributions, hence no preprocessing transformations are needed. To generate the data for a particular rep of t time points and k proteins, we draw t independent samples from a k-dimensional multivariate normal distribution which has a mean vector of zeros and a selected generating covariance matrix, which provides the selected network signal. We use covariance matrices that are block diagonal with first-order autocorrelations [14, page 414] within the blocks (and zero correlations between blocks). Specifically, if proteins p 1 , p 2 , and p 3 constitute a block of 3 correlated proteins, then the covariance (equivalent to signal Pearson correlation) block corresponding to them is of the form: where r is the Pearson correlation between adjacent proteins within the block. We say that the (triple) block of three vertices is correlated with intensity r.
One benefit of the block structures is that we obtain closed-form solutions for the generating partial correlations [14, page 160] between the proteins. For the triple block with associated covariance matrix (3), the partial correlation matrix is: For a block of 4 proteins, the partial correlations between p 1 and p 2 and between p 3 and p 4 equal ρ √ ρ 2 +1 , while the partial correlation between p 2 and p 3 is ρ ρ 2 +1 . All other pairs of the four proteins have a partial correlation of zero.

Analysis of data and models
Our overall strategy is to conduct five illustrative simulation studies where each study consists of a set of three reps, each generated from a specific signal. Each of the five studies are designed to examine potentially different characteristics of biological networks. The three reps in a study mimic biological replicates. For each study, the multiple rep Bayesian Metropolis-Hastings algorithm is applied to all three replicates, giving the composite results. For comparison purposes, the single rep algorithm is applied to each of the three individual reps. Separately, for the composite and for each of the individual executions of the modeling algorithms, the matrix of protein-protein edge posterior probabilities and the vector of within-block connectivity probabilities are obtained.
Given a block of nodes, they are connected if given any pair of the nodes in the block, there exists a sequence of edges from the first node in the pair to the second node in the pair, where each edge is incident only with nodes in the block. The probability that a block of nodes, representing our proteins, is connected is estimated by the sum of the probabilities of the top undirected graphs in which those nodes are connected. Mathematically, this is: where the characteristic function χ(v 1 , . . . , v n , G) has the value 1 if and only if the vertices v 1 , . . . , v n are connected in the undirected graph G. The computation of the characteristic function χ () for 3 and 4 nodes is straightforward.

Results
Specific results of the five simulation studies are presented. The discussion of the first study is more detailed than that of the remaining four since some of the details of all five studies are quite similar. Following the discussion of the individual studies, a further analysis of the posterior probabilities is presented.

Individual studies
The first of five simulation studies is a set of three reps, R 2 and R (1) 3 . Each of these reps reflects t = 10 simulated measurements of twelve proteins, p 12 . Furthermore, the underlying generating covariance matrix has assigned high correlation intensity, r = 0.94, to each of the four blocks {p  Table 1 shows that for the four blocks of signal correlated proteins, our model exhibits extremely high posterior connectivity probabilities. In all the reps, except for the blocks {p 2 , the triple connectivity probability of the highly correlated proteins is 1.0. In addition, the average of the triple connectivity probabilities over all false triples does not exceed 0.0641. Figure 1 displays the moderate to high edge posterior probabilities of Study1. From Equation (4), the generating partial correlations between p do not exceed 0.0501. However, as seen in Table 1, the triple connectivity probabilities for proteins p Consider a particular one of the generating 3 × 3 blocks in this first simulation, say based on the ordered proteins: A, B, and C. It has high moderate Pearson correlations between all of its protein pairs. There is also substantial partial correlation between two adjacency proteins, namely between A and B, and also between B and C. However, there is zero partial correlation between the non-adjacent proteins A and C. In other words, for a fixed value of B, there is no correlation between A and C. It is informative to compare this setting to the biological setting where a parent protein A * has a causal influence on a child protein B * which has a causal influence on a grandchild protein C * . However, for a fixed result for B * , A * has no influence on C * . Hence, there is no partial causal influence between A * and C * .
The estimation of partial correlation for sparse biological data is accomplished through the Lasso, adaptive Lasso and Ridge techniques [6,[19][20][21][22][23][24][25][26]. Table 2 shows sample partial correlation estimates for R (1) 1 using Lasso and adaptive Lasso (entries above and below the main diagonal, respectively). These two sets of partial correlation estimates for R (1) 1 strongly reflect the true generating partial correlation. A sample partial correlation estimate for R (1) 1 based on Ridge is shown in Table 3. This estimate also captures the true underlying partial correlation, though not as cleanly. There is much less zero partial correlation from the Ridge technique than there is from the Lasso and adaptive Lasso techniques. Computationally, the Ridge technique involves a quadratic penalty parameter on slope magnitude, while the Lasso and adaptive Lasso techniques adopt a linear penalty.    .

Blocks of vertices {p
(1)   are computed using both the Lasso and adaptive Lasso methods. These estimates are shown below and above the main diagonal, respectively. These replicate partial correlation estimates reflect the generating partial correlations, though not exactly.
In Table 4, the protein to protein sample Pearson correlations of R (1) 1 are shown. Note that, as expected from the generator, p 3 has high sample Pearson correlation, despite its near zero sample partial correlation estimates.
The second simulation study consists of t = 5 measurements of 9 proteins, p 9 . The replicate data is generated from a signal giving correlation intensity r = 0.9 to each of the two triple blocks, {p  The triple and double probabilities also are examined. Table 5 depicts the analysis of these posterior probabil- ities. An interesting result is the low {p 3 } triple probability of 0.0416 for R (2) 2 . This is due to R 3 deviations from signal are associated with corresponding deviations of their sample partial correlations from the signal, which is most likely caused by the small sample size of t = 5.
The third study contains 10 measurements of 10 proteins, p 1 , . . . , p 10 . An underlying signal is generated with correlation intensity r = 0.94 for the two quadruple    High signal correlations between proteins yield high sample Pearson correlations (ranging from 0.73 to 0.99) within the reps. Additionally, the zero signal correlation pairs yield low sample Pearson correlations, ranging from -0.13 to 0.40.
The quadruple and double connectivity probabilities are shown in Table 6. For signal blocks in C (3) , R 4 , and p 3 are high, averaging 0.95 among the three reps and the composite. All other edge probabilities within this block have an average of 0.105 (see Figure 3). Nevertheless, each of the probabilities that all four proteins connect is 1.0 in all reps and the composite (view Table 6).
The fourth study complements the first study but with lower and decreasing correlation intensities among the 4 blocks of three proteins. The assigned r values to the four blocks of three proteins are 0.7, 0.6, 0.6 and 0.5, respectively. As in the previous three studies, average sample correlations between proteins in different blocks remain low throughout the reps, ranging from -0.0329 to 0.1159. The sample correlations between proteins that are signal correlated within blocks are representative of the signal correlations. The exception occurs in R 2 has relatively low within block sample correlations ranging from -0.1141 to 0.7817. This influenced its signal inconsistencies in edge probabilities (refer to Figure 4). In general, triples associated with lower generating r values receive lower and more inconsistent correlations, which also speak to some edge probability inconsistencies in Figure 4.
The triple connectivity probabilities can be seen in Table 7. With lower signal correlations among the triples, the triple connectivity posterior probabilities are not quite as strong as in the previous studies. However,    the composite performs at least as well if not better than each of the individual replicates. The fifth study uses the signal topology of the first and fourth studies. However, the first replicate has correlation intensity of r = 0.9, the second replicate has r = 0.82 and the third replicate has r = 0.7. In this study the generating signal is not as strong as the signals in the first three studies. The block correlations for each replicate are derived from the r value assigned to each rep. As expected, R 1 has the highest sample correlations among triple signal correlated proteins, followed by R (5) 2 and lastly R (5) 3 . All have small sample correlation averages among signal zero correlated proteins. The edge probability diagrams ( Figure 5) are symbolic of these results. Table 8 shows triple connectivity posterior probabilities. Analyses for C (5) and R 3 } (see Figure 5). The analysis for R (5) 3 , which was generated with the lowest correlation intensity, does not recognize the signal as well as those of R (5) 1 and C (5) .

ROC analysis of posterior probabilities
The receiver operating characteristic (ROC) curves [27] for the composite and the individual replicates from each of the five studies are shown in Figure 6. The ROC (x, y) coordinates are generated by the decreasing sequence of edge posterior probability cutoffs (i.e. lower limits for classifying positive edges). The y-coordinate, the true positive rate (TPR), is the fraction of signal edges that are classified as positive edges. The x-coordinate, the false positive rate (FPR), is the fraction of signal non-edges that are classified as positive edges. In each of Figures 6(a)-6(d), comparisons are made     between ROC curves whose signal edges are determined by non-zero Pearson correlation (ranging from 0.250 to 0.940) versus those whose signal edges are determined by non-zero partial correlation (ranging from 0.447 to 0.685). The signal partial correlation ROC curves tend to be above and to the left of the signal Pearson correlation ROC curves. This represents the algorithm's ability to identify, with higher posterior probability, signal partial correlation edges over signal Pearson correlation edges.
As can be seen in covariance block (3) and partial correlation matrix (4), there are more signal Pearson correlation edges than signal partial correlation edges. For the ROC curves based on partial correlation edges, there are only two edges within a triple block, and each has a relatively high posterior probability. This leads to their ROC curves increasing at a faster rate than those based on Pearson correlation edges. Overall, the signal nonzero Pearson correlation edges have lower posterior probabilities than do the signal non-zero partial correlation edges. In addition, the composite ROC curves tend to be to the upper left of their corresponding individual replication ROC curves. This corresponds to higher posterior probabilities for the true signal edges under composite analysis than under most individual rep analyses.

Conclusions
Structured Bayesian posterior probabilities are developed for network features based on multiple sparse timecourse data sets. This methodology allows for the incorporation of data sets with varying degree of experimental variability. For our simulations, the multiple rep composite method performs well in uncovering strong network signals. The composite method does better than a single rep method in uncovering moderate network signals. The composite method assigns high posterior probability to edges with at least moderate network partial correlation, while it assigns moderate to small posterior probabilities to edges with 0.0 network partial correlation.
Composite ROC curves based on system non-zero partial correlation (solid lines in Figure 6(a)) have small area above them which signifies that our composite method provides excellent detection of edges having partial correlation.
The five simulation studies span a range of network situations. The first three studies examine networks consisting of block subnetworks with high signal correlations within blocks. These blocks are of varying sizes. The composite method is more successful in identifying blocks of three or four proteins, rather than smaller blocks. For study four, blocks with moderate, rather than high, signal correlation within the blocks are examined. The composite method does not perform as well for these blocks but it does outperform the single rep method for all study four's subnetworks. In the fifth study, the different reps have varying degree of experimental variability. Still, the composite method recognizes the network's signals with high posterior probabilities.
The multiple rep method utilizes independent empirical priors acting on independent reps. Thus, as suggested by the fifth study, this method can be used even if there are major fixed, non-random differences between the reps. Each rep still contributes information about the network structure. This likelihood based methodology automatically weights the reps in the sense that reps having more experimental variability will receive less weight in determining the subnetworks which have highest posterior probability.
The computation of posterior probabilities lends itself towards the identification of various network features. These features can correspond to connected subgraphs in the interaction network. With experimental data, where the goal is to discover the generating signal, searching for high probability features is quite valuable.
If there are strictly random differences between the reps, it may be useful to employ a hierarchical structure (e.g. assuming that parent-child regression slopes for one rep come from the same distributions as do those from another rep). This approach would involve substantially more complex Bayes factors, and thus would be more computationally intensive. We are currently developing methodology for this setting.