 Research article
 Open access
 Published:
Bayesian Orthogonal Least Squares (BOLS) algorithm for reverse engineering of gene regulatory networks
BMC Bioinformatics volume 8, Article number: 251 (2007)
Abstract
Background
A reverse engineering of gene regulatory network with large number of genes and limited number of experimental data points is a computationally challenging task. In particular, reverse engineering using linear systems is an underdetermined and ill conditioned problem, i.e. the amount of microarray data is limited and the solution is very sensitive to noise in the data. Therefore, the reverse engineering of gene regulatory networks with large number of genes and limited number of data points requires rigorous optimization algorithm.
Results
This study presents a novel algorithm for reverse engineering with linear systems. The proposed algorithm is a combination of the orthogonal least squares, second order derivative for network pruning, and Bayesian model comparison. In this study, the entire network is decomposed into a set of small networks that are defined as unit networks. The algorithm provides each unit network with P(DH_{i}), which is used as confidence level. The unit network with higher P(DH_{i}) has a higher confidence such that the unit network is correctly elucidated. Thus, the proposed algorithm is able to locate true positive interactions using P(DH_{i}), which is a unique property of the proposed algorithm.
The algorithm is evaluated with synthetic and Saccharomyces cerevisiae expression data using the dynamic Bayesian network. With synthetic data, it is shown that the performance of the algorithm depends on the number of genes, noise level, and the number of data points. With Yeast expression data, it is shown that there is remarkable number of known physical or genetic events among all interactions elucidated by the proposed algorithm.
The performance of the algorithm is compared with Sparse Bayesian Learning algorithm using both synthetic and Saccharomyces cerevisiae expression data sets. The comparison experiments show that the algorithm produces sparser solutions with less false positives than Sparse Bayesian Learning algorithm.
Conclusion
From our evaluation experiments, we draw the conclusion as follows: 1) Simulation results show that the algorithm can be used to elucidate gene regulatory networks using limited number of experimental data points. 2) Simulation results also show that the algorithm is able to handle the problem with noisy data. 3) The experiment with Yeast expression data shows that the proposed algorithm reliably elucidates known physical or genetic events. 4) The comparison experiments show that the algorithm more efficiently performs than Sparse Bayesian Learning algorithm with noisy and limited number of data.
Background
Highthroughput technologies such as DNA microarrays provide the opportunity to elucidate the underlying complex cellular networks. There are now many genomewide expression data sets available. As an initial step, several computational clustering analyses have been applied to expression data sets to find sets of coexpressed and potentially coregulated genes [1–5]. As a next step, there have been efforts to elucidate gene regulatory networks (GRN) embedded in complex biological systems. A growing number of methods for reverse engineering of GRN have been reported as follows: Boolean networks [6, 7], Bayesian networks [8–10], Algorithm for the Reconstruction of Accurate Cellular Networks (ARACNe) [11], linear models [12], neural networks [13], methods using ordinary differential equations [14, 15], a sparse graphical Gaussian model [16], a method using a genetic algorithm [17], a method using Sparse Bayesian Learning (SBL) algorithm [18, 19], and etc.
In the reverse engineering of GRN, essential tasks are developing and comparing alternative GRN models to account for the data that are collected (Figure 1). There are two levels of inference involved in the task of data modeling process [20]. The first level of inference is fitting one of models to the data with an assumption that our chosen model is true. The second level of inference is the task of model comparison. It is desired to compare the alternative models with the help given by the data, and give some level of preference to the alternative models. Thus, the reverse engineering method should be used as a framework for fitting several different GRN models to the data to compare the models. For instance, there are several GRN modeling studies in which the reverse engineering algorithm could be applied: 1) system of ordinary differential equation (ODE) [15], 2) Dynamic Bayesian networks based methods (DBN) [10, 18], 3) a linear stochastic differential equation for a transcriptional regulatory network [14], and etc. It is noted that these three models can be represented by linear systems. By a "linear system", we mean a system represented as a linear equation in matrix form such as Eq. 2 in methods section.
Microarrays have been used to measure genomewide expression patterns during the cell cycle of different eukaryotic and prokaryotic cells. The review paper of Cooper and Shedden [21] presents various published microarray data sets, which have been interpreted as showing that a large number of genes are expressed in a cellcycledependent manner. From this point of view, it is assumed that the underlying GRN is dependent upon cellcyclic time. Therefore, this study uses the DBN because it can represent how the expression levels evolve over time.
In this paper, we address two main challenges in reverse engineering with linear systems and present a novel algorithm to overcome these difficulties. Firstly, reverse engineering of GRN will be computationally less challenging task if significantly large amount of experimental data is available. However, this is limited due to the expensive cost of microarray experiments. This problem makes the reverse engineering of GRN to be underdetermined, which means that there is substantially greater number of genes than the number of measurements. Secondly, reverse engineering of GRN with linear systems is ill conditioned because small relative changes in design matrix E in Eq. 2 due to the noise make substantially large changes in the solution. Therefore, the reverse engineering algorithm named as Bayesian orthogonal least squares (BOLS) is developed to overcome these difficulties. The BOLS method is created by combining three techniques: 1) Orthogonal Least Squares method (OLS) [22], 2) second order derivative for network pruning [23], and 3) Bayesian model comparison [20].
We evaluate the BOLS method by inferring GRN from both synthetic and Yeast expression data. We provide the performance comparison between BOLS and one of stateoftheart reverse engineering methods, SBL algorithm [24]. The SBL algorithm has been recently used in GRN studies with linear systems [18, 19]. For evaluation with Yeast expression data, we validate the inferred GRN using the information from the database that contains large data sets of known biological interactions.
Results and discussion
Case study 1: In silico experiment
Our in silico experiment follows the methodology for the generation of synthetic expression dataset for systems of DBN as used in Rogers and Girolami [19]. We generate synthetic networks using powerlaw distribution. To create a network structure, we decompose the entire network into a set of small networks that are defined as unit network and proceed a unit network by a unit network. Figure 2 presents a unit network consisting of a target gene and a list of genes as regulators. It should be noted that there is no requirement for the network to be acyclic. All created unit networks will be combined to create a whole GRN. The combination of all (or selected) unit networks is straightforward process based on the definition of a graph (see Methods section). This unit network approach is similar to the approach adopted in Bayesian network based methods [9, 25] and SBL based method [19]. For each target gene, we sample the number of genes (m_{i}) regulating this target gene from the approximate power distribution
where the normalization constant is given by
Following Rogers and Girolami [19], Wagner [26], and Rice et al. [27], the constant η is set to 2.5. m_{max} (or m_{min}) is the maximum (or minimum) number of allowed regulator genes in a unit network respectively. The condition m_{max} << K (the number of genes) ensures the sparseness of the synthetic network. Note that m_{max} is set to 4 for all experiments. The more details of creating synthetic network can be found from the supplementary information of Rogers and Girolami's study [19]. In this study, the Matlab code from Rogers and Girolami's study for creating synthetic network is used, which is available from [28]. We first randomly generate the synthetic networks unit network by unit network to combine them for a whole GRN, and then we generate the synthetic expression data with randomly generated synthetic GRN. Using these synthetic expression data, we infer the network with BOLS and the simulated data set. It should be noted that BOLS and the generation method of synthetic networks are not cooperative to work because the generation and inference of networks are completely separate processes.
In this experiment, the synthetic expression data is generated based on DBN using Eq. 1, in which the expression data are evolved over the time. However, as the simulation process is continued over the time, the expression data diverge by constantly either increasing or decreasing. Thus, we collect a single time point only after the expression data are simulated for a certain period of time (from t = 0 to t = T) to avoid expression levels being too high. We proceed in a single time point by a single time point manner. For a single time point, the generation of expression data is started with initial synthetic data. For each gene, the initial condition is assigned with random number between 0 and 1. With given initial condition, we simulate the expression data for each gene i from t = 0 to t = T using Eq. 1. We take the measure with t = T1 for design matrix E and with t = T for Y_{i} in Eq. 2. We repeat these process N times to collect N data points and apply the reverse engineering algorithms to reconstruct the synthetic network.
We make the performance comparison between BOLS and SBL methods to show the efficiency of BOLS method. The SBL method is one of the stateoftheart algorithms, which has been recently applied to GRN studies with linear systems [18, 19]. We use the Matlab code of SBL algorithm that is available from [29]. At first, we investigate the effect of the number of data points (N = 20, 40, 60, 80, 100) and noise (ε = 0.01, 0.05, 0.1) on the performance using synthetic GRN and expression data with fixed number of genes (K = 100). Sensitivity and complementary specificity are used as measures for the performance of the algorithm [9, 19, 27]. We compute the sensitivity = TP/(TP + FN) and the complementary specificity = FP/(TN + FP), where TP is the number of true positive, FN is the number of false negative, FP is the number of false positive, and TN is the number of true negative. In other words, the sensitivity is the proportion of recovered true positive interactions and the complementary specificity is the proportion of false positive interactions. To investigate the variability of test results, we have run both algorithms 20 times with same control parameters, i.e. the number of data points, noise, and etc. For each run, new random synthetic network has been created. We find that the sensitivity and the complementary specificity are constant over 20 experiments with small variability (see Table 1). The systematic effect of noise and the number of data points on the performance can be analyzed from Table 1. As the number of data points increases with the fixed number of genes and noise level ε, the performance of both algorithms increase. As noise level ε increases, the performance of both algorithms decreases. For the number of data points ≥ 80, the sensitivity of SBL is slightly greater than BOLS and the complementary specificity of SBL is significantly greater than the ones of BOLS. It means that BOLS algorithm produces significantly smaller proportions of false positive interactions than SBL algorithm. It should be noted that results from SBL algorithm with N = 20, 40, and 60 are not available because the Matlab code of SBL algorithm dose not run when the number of data points N is relatively low. SBL algorithm includes the Cholesky factorization of their Hessian matrix that is required to be positive definite. Note that the Hessian matrix is consisted of design matrix and hyperparameters [24]. When the number of data points is relatively smaller than the number of genes in the data set, this Hessian matrix becomes non positive definite. Our experiments show that SBL algorithm is not suitable for the reverse engineering with limited number of data points and it doesn't even run with significantly limited number of data points. On the other hand, BOLS algorithm produces relatively small proportion of FP interactions with limited number of data points. It should be noted that Rogers and Girolami [19] generate 2R expression levels for each gene in each knockout experimentR in the normal (wild type) system and R in the knockout (mutant) system. In a network of K genes, in which each is knocked out individually, they have 2RK data points. Since the evaluation of BOLS with Rogers and Girolami's knockout approach [19] is beyond the scope of the objectives of our study, i.e. the underdetermined problem using DBN model, we provide the comparisons between BOLS and SBL based on their approach as Additional file 1.
We use the receiver operating characteristic (ROC) analysis [19] to characterize the tradeoff between the proportions of true and false positive interactions with limited number of data points (N < 20 and K = 100) in Figure 3. This ROC analysis shows that BOLS algorithm produces the solution with extremely small proportion of false positive interactions when the number of data points is extremely small. We also analyze the effect of the number of gene K with limited number of data points N = 10. In Figure 4, it is shown that the performance of BOLS algorithm decreases as K increases from 100 to 300. Therefore, it can be concluded that performance of the proposed algorithm is dependent on K, N, and ε.
From the experiments, it is shown that BOLS produce solutions with significantly low complementary specificity regardless of K, N, and ε, because Bayesian model selection scheme is efficiently enough to discover the optimal solution. Since we do not have any information on noise in the data, we completely overfit the data to DBN model using OLS as a first step. Then, we remove the unnecessary inferred parameters (the inferred parameters that are related with "noise") to obtain the optimal solution by a tradeoff between minimizing the natural complexity of inferred GRN and minimizing the data misfit. As the complexity of inferred GRN decreases with network pruning process, we use the Bayesian model selection to select the most optimal solution. It should be noted that the Bayesian model selection includes Occam's factor, which automatically suppresses the tendency to discover spurious structure in data. Thus, we can say that BOLS is efficient to infer GRN with significant small portion of FP interactions with the noisy and limited number of data set for DBN. In Figure 5, we present an example showing that the performance of BOLS increases as the network pruning step proceeds. We first generate the synthetic networks of 50 genes (N) and then simulate 20 data points (K) with this networks and noise level ε = 0.1. We concentrate on an output unitnetwork with highest evidence value logP(DH_{i}) for evaluation. It should be reminded that as the network pruning continues the number of inferred interactions in the unit network decreases. Figure 5b shows that the number of errors (FP+FN) decreases, as the network pruning proceeds. The complementary specificity also decreases along the network pruning (Figure 5c). On the other hand, the sensitivity remains constant, which is equal to 1 (Figure 5d). It means that the overfitted solutions after OLS step contain only TP and FP interactions (no FN interactions). It is observed that the number of errors and the complementary specificity converge at 0 as the network pruning process proceeds, in which the unit network has the highest evidence logP(DH_{i}) (Figure 5a). Therefore, it can be concluded that the OLS method can cope with underdetermined problems using noisy data, provided that the method is combined with network pruning process and Bayesian model selection techniques.
The BOLS algorithm should be run K times producing K unit networks, which are combined to build the whole network. Each unit network is assigned with P(DH_{i}). For each unit network, we compute the number of errors = FN + FP. The relationship between P(DH_{i}) and the number of errors for each unitnetworks is shown in Figure 6. The number of errors decreases as the number of data points increases on the synthetic data. Unit networks with higher P(DH_{i})s are more accurate than those with lower P(DH_{i})s. This signifies that unit networks with higher P(DH_{i})s have a higher confidence that unit networks are correctly reconstructed. Thus, when low numbers of data points and extremely high numbers of genes are given, the algorithm should be able to recover a partially correct network with unit networks only having relatively high P(DH_{i})s. It should be noted that BOLS algorithm does not provide the confidence levels among interactions inside unit network. However, it can be noticed from Figure 6 that many unit networks with relatively high evidence values have zeroed number of incorrectly inferred interactions. Therefore, the evidence values for unit networks are efficient enough to cope with problems for locating unit networks without FP or FN interactions.
Case study 2: Application of BOLS to Saccharomyces cerevisiae data
To evaluate our algorithm for reverse engineering of GRN, we use the microarray data from Spellman et al. [30], in which they created three different data sets using three different synchronization techniques. We focus our analysis using the data set produced by α factor arrest method. Here we concentrate on our study with the expression data set of 20 genes known to be involved in cell cycle regulation [31]. Thus, we have a data set with 20 genes and 17 data points in this experiment. Based on the previous simulation experiment, it is expected to have sensitivity as 0.967 (ε = 0.01), 0.937 (ε = 0.05), 0.919 (ε = 0.1) and complementary specificity as 0.022 (ε = 0.01), 0.019 (ε = 0.05), 0.029 (ε = 0.1), respectively. It means that the output is expected to have approximately less than 3% of complementary specificity if we assume that ε ≤ 0.1, the sparseness m_{max} of GRN = 4, and etc.
The output unit networks using this data set are presented in Table 2 including the P(DH_{i}) of unit networks, in which there are 107 inferred interactions from BOLS. Among those interactions, 45 of them are identified as physical or genetic interactions from the BioGRID database [32]. Later in this section, we provide the logical basis such that some of unidentified interactions might be possible physical or genetic events. BioGRID is a freely accessible database including physical or genetic interactions from the Saccharomyces cerevisiae available at [33]. The output in Table 2 shows that the interactions with higher P(DH_{i})s have higher likelihood of having known physical or genetic interactions identified from the BioGRID than the interactions with lower P(DH_{i})s: 1) Among the elucidated interactions with P(DH_{i}) > 66^{th} percentile of all P(DH_{i})s, 23 of them are identified as physical or genetic interactions from BioGRID database. 2) Among the elucidated interactions with 66^{th} percentile > P(DH_{i})s > 33^{rd} percentile, 14 of them are identified as physical or genetic interactions, 3) Among the elucidated interactions with P(DH_{i})s < 33^{rd} percentile, 8 of them are identified as physical or genetic interactions. This could be explained from the previous simulation experiment showing that BOLS has less false positive interactions with relatively high P(DH_{i}) than with low P(DH_{i}). It is noted that the overall values of P(DH_{i}) in Table 2 are relative small compared to the ones in Figure 6. It should be noted that number of genes is set to 20 in Table 2 and the number of genes is set to 100 in Figure 6. From Figure 6, it is also noted that the overall values of P(DH_{i}) decreases as the number of data points decreases. It is also found that the overall values of P(DH_{i}) decreases as the noise level increases. Therefore, the overall values of P(DH_{i}) can be relatively different depending on the number of genes, noise level, and the number of data points.
We pool both physical and genetic interactions from BioGRID to validate the output interactions in this experiment. The rationale for this pooling can be described as follows. Several proteins join together to form multiprotein complex having certain functions or regulating other proteins. For example, SCF complex consists of Skp, Cullin, and Fbox proteins, which promotes G1S transition by targeting G1 cyclins and ClnCdk inhibitor Sic1 for degradation. From Breeden's [34] review study, it is known that cell cycle regulated complexes with at least one subunit are regulated at the transcript level. This means that certain protein complexes might regulate other complexes with time delay. With only expression data sets, we only have information that which proteins are present for certain time. In terms of efficiency and logical order, it is assumed that the cell only makes the proteins when it is needed. If the proteins are made all the time, the cell could be inefficient in an environment without the substrates of the protein [21]. From these points of view, there can be two possible cases to be considered when certain two proteins are present: 1) two proteins form a multiprotein complex by interacting each other, 2) One protein might form a complex with some other proteins. This complex might regulates the other protein that could also form a protein complex. Therefore, those two types of interactions can be pooled together to validate the output interactions.
Unitnetworks having relatively many identified physical or genetic interactions are the ones with Cln3, Cln2, Clb5, Cln1, Cdc28, Swi6, Cdc53 and Cdc34 as target genes. For example, we present a unit network with Cdc28 as a target gene. Cdc28 is identified as having physical or genetic interactions with Cln1 [35–40], Cln2 [36–49], Cln3 [47, 50–55], Clb1 [35, 56, 57], Clb2 [35, 38, 41, 58–64], Clb4 [56, 57, 61], Hct1 [38, 53, 63, 65–67], Sic1 [37, 53, 61, 68–71], Cdc20 [53, 67], Swi5 [53], and Swi6 [53, 72]. Cdc28 is a catalytic subunit of the main cell cycle cyclindependent kinase (CDK), which alternatively associates with G1 cyclins (Cln1, Cln2, and Cln3) and G2/M cyclins (Clb1, Clb2, and Clb4) that direct the CDK to specific substrates. Hct1 (Cdh1) and Cdc20 are cell cycle regulated activators of the anaphasepromoting complex/cyclosome (APC/C), which direct ubiquitination of mitotic cyclins. One of Cdc28's Gene Ontology definitions gives another evidence that Cdc28 might have associations with Hct1 and Cdc20, because Cdc28 is involved in the progression from G1 phase to S phase of the mitotic cell cycle. Sic1 is an inhibitor of Cdc28Clb kinase complexes that controls G1/S phase transition, which prevents premature S phase and ensuring genomic integrity.
Among the list of genes inside Cdc28 unit network, there are several genes not identified as having physical or genetic interactions with Cdc28 from the BioGRID: Clb6, Mbp1, Mcm1, Cdc34, Cdc53 and Skp1. However, the definitions of these genes from the BioGRID give enough evidences such that some of them indirectly interact with Cdc28. For example, Clb6 is a Btype cyclin involved in DNA replication during S phase, which activates Cdc28 to promote initiation of DNA synthesis. Clb6 also has a role for the formation of mitotic spindles along with Clb3 and Clb4. Thus, Clb6 indirectly regulates Cdc28 along with Clb4. Cdc53, Cdc34 and Skp1 also indirectly regulate Cdc28 through Sic1. They form a structural protein of SCF complexes, called cullin, with an Fbox protein. The SCF promotes the G1S transition by targeting G1 cyclins and the ClnCdk inhibitor Sic1 for degradation. Mbp1 is a transcription factor involved in regulation of cell cycle progression from G1 to S phase, which forms a complex with Swi6 that binds to Mull cell cycle box regulatory element in promoters of DNA synthesis genes. Thus, Mbp1 is associated with Cdc28 by forming a complex with Swi6.
For another example, we present a unit network having Clb5 as target gene. Clb5 is identified as having physical or genetic interactions with Clb6 [73–79], Sic1 [41, 61, 71, 80–82], Clb2 [83], Cdc20 [84], Cln3 [75], Hct1 [63, 85], Swi4 [75, 86], and Swi6 [86].
Among the list of genes inside Clb5 unit network, there are several genes not identified as having physical or genetic interactions with Clb5: Cln2, Clb1, Mcm1, Mbp1, Swi5 and Clb4. However, there are evidences that some of genes in the list might have indirect interactions with Clb5. For example, Clb4 has an association with Clb5, because Clb5 is a Btype cyclin involved in DNA replication during S phase and has a role for the formation of mitotic spindles along with Clb3 and Clb4. For another example, Mbp1 and Clb5 have an indirect interaction. Mbp1 is a transcription factor involved in regulation of cell cycle progression from G1 to S phase, which forms a complex with Swi6 that binds to MluI cell cycle box regulatory element in promoters of DNA synthesis genes. It is already identified that Swi6 regulates Clb5. Thus, Mbp1 indirectly regulate Clb5 through Swi6.
We also evaluate both BOLS and SBL using the same data set in Table 2 to compare the efficiency of BOLS with SBL. For direct comparison purpose, we make the graphs of inferred GRN by BOLS and SBL (Figure 7). There are 107 inferred interactions (45 of them are identified as known interactions from the database) by BOLS (Figure 7a) and 116 interactions (38 of them are known interactions) by SBL (Figure 7b). Figure 7c shows that 62 interactions are inferred by both BOLS and SBL, in which 32 of them are known interactions. Figure 7d shows 45 interactions only by BOLS in which 13 of them are identified as known interactions from the database. On the other hand, Figure 7e shows 54 interactions only by SBL in which only 6 of them are known interactions from the database. It is reasonable to assume that the complete information of biological interactions of Yeast is not available from the database yet. It is believed that the more depositions of the information concerning Yeast interactions are still on the way to reach the more complete understanding of the underlying complex cellular networks of Yeast. Based on the currently available information from the database, we can say that SBL algorithm infers GRN with relatively more complexity and less identified known interactions than BOLS. Therefore, based on our evaluation experiments with synthetic and Yeast expression data, it is sufficient to conclude that SBL produces more overfitting solutions (i.e. more FP solutions) than BOLS.
Conclusion
In the evaluation of BOLS using synthetic data, it is shown the proposed BOLS algorithm is able to reconstruct networks using a very limited number of experimental samples. In this study, we assume that there is significantly limited number of data points and the noise level in the data is not known. This is a common situation in expression data analysis. To handle these difficulties, we adopt a decomposition strategy to break down the entire network into a set of unit networks. This decomposition makes the inferring of a whole GRN into several separate regressions. Thus, if we have extremely small number of data points, our method can not provide 100% correct solutions, but provides each unit network with P(DH_{i}), which can be used as confidence level. The unit network with a higher P(DH_{i}) has a higher confidence such that the unit network is correctly inferred (Figure 5). Previously, Basso et al. [11] validated their ARACNe algorithm using 19 nodes synthetic network. With 350 sample size, the sensitivity and complementary specificity are approximately 80% and 20%, respectively. The inferred interactions from their method contain approximately 20% false positive and false negative interactions, respectively. With our method, it is possible to locate false positive or false negative interactions with P(DH_{i})s, which is the unique property of BOLS algorithm. Our in silico experiment shows that the performance depends on the number of data points, genes, and noise level. Further study will be required to investigate the relationship between these parameters and the performance so that the BOLS algorithm can be generally applied to the microarray data with any number of genes, data points, and noise level.
Another evaluation is conducted with the Yeast Saccharomyces cerevisiae data of 20 genes, which are involved in cellcycle regulation. In the output network, there is noticeable number of interactions that are identified as physical or genetic interactions from the literature. There are also several interactions that are not identified from the literature. However, it is shown that the definition of these genes from the BioGRID database gives enough evidences that some of them have indirect interactions. Thus, this experiment shows that BOLS algorithm is able to elucidate remarkable number of known physical or genetic events.
For both evaluation experiments with synthetic and Yeast expression data, we compare the performance between BOLS and SBL algorithms. The SBL algorithm [18, 19, 24] is a general Bayesian framework to obtain sparse solutions utilizing linear models. This method is known as typeII maximum likelihood method [24], in which the solutions are obtained by maximizing the marginal likelihood. On the other hand, BOLS utilizes the Bayesian model selection that is an extension of maximum likelihood model selection, in which the posterior is obtained by multiplying the best fit likelihood by the Occam's factor. From our both evaluation experiments, it is concluded that BOLS produces sparser solutions with less FP than SBL does.
Methods
1. Gene regulatory network model and Unit networks
To study GRN, we choose a system of DBN as our GRN model. This model is described by
Here N is the number of data points, K is the number of gene in the data, and ξ_{i}(t) is a noise at any time. The e_{i}(t)s are the level of mRNAs at any given time t, which influences the expression levels of the gene. The value w_{ij} describes the interaction strength between the j^{th} gene and the i^{th} gene. This model has the first order Markov property, which means that the future expression e_{i}(t+1) is independent of the past expression level e_{i}(t1) given the present expression level e_{i}(t). As briefly mentioned in the previous section, it is required that the data set is reorganized into a linear system to reverse engineer GRN using BOLS algorithm. This GRN model can be easily rewritten into a linear system as
Y_{ i }= Ew_{ i }+ n_{ i } (2)
where i = 1, 2,..., K. w_{ i }is a column matrix of regulation strength values, which is defined as w_{ i }= [w_{i1}, w_{i2},......, w_{ik}]^{T}. Y_{ i }is a column matrix of expression levels for target genes, which is defined as Y_{ i }= [e_{i}(2), e_{i}(3),..., e_{i}(N)]^{T}. E is a N1 × K design matrix, which is defined as E = [e_{1}, e_{2},......, e_{K}] and e_{i} = [e_{i}(1), e_{i}(2),......, e_{i}(N1)]^{T}. n_{ i }is an Gaussian noises, which is defined as n_{ i }= [n_{i}(1), n_{i}(2),..., n_{i}(N1)]^{T}.
It should be noted that the expression levels e_{i}(t)s in both Eq. 1 and 2 are same and noisy ones. Eq. 1 describes that the current expression levels e_{i}(t)s are determined depending on the previous ones e_{i}(t1) and noises ξ_{i}(t1) and the expression levels are evolved over the time based on the GRN. Hence, ξ_{i}(t) is a noise added into e_{i}(t)s during the generation of synthetic or real expression levels based on DBN model. Once the "noisy" expression levels are available, we consider Eq. 2 for reverse engineering of GRN. Because the given expression levels e_{i}(t)s in Eq. 2 are noisy, we should have a condition such that Y_{ i } Ew_{ i } = n_{ i } > 0. If n_{ i }= 0, we will have overfitting solutions, in which model fitting oscillates widely so as to fit the noise. Thus, we can say that n_{ i }is a noise related with "data misfit" or "confidence interval" on the best fit parameters. On the other hand, ξ_{i}(t) is a noise related with the "generation" of expression levels. If n_{ i }is modeled as zeromean Gaussian noise with standard deviation σ_{n}, the probability of the data given the parameter w_{ i }is
where β = 1/σ_{n}^{2}, E_{D} = (Y_{i}Ew_{i})^{T}(Y_{i}Ew_{i}), and Z_{D} = (2π/β)^{N/2}. P(Dw_{i}, β) is called the maximum likelihood. It is well known that maximum likelihood is underdetermined and ill conditioned problems. Thus, we are motivated to develop a novel strategy to overcome these problems.
We decompose the entire network into a set of small networks that are defined as unit networks. Each unit network consists of one particular target gene and its regulator genes. The unit network is used as input and output of BOLS algorithm. Figure 2a presents an input unit network that includes all genes in the data set as regulator candidates. Figure 2b presents an output unit network that contains most probable regulator genes to the target gene. Thus, we can decompose the GRN into several separate regressions and apply the BOLS algorithm to each unit network. Therefore, for GRN with K number of genes, we will run the algorithm K timesproducing a unit network for each time. It should be noted that we can use all unit networks to create whole GRN that consists of two finite sets, a set of nodes (genes) and a set of edges (interactions) such that each edge connects two nodes. With all (or selected) unit networks, the generation of GRN can easily be generalized by constructing a K × K graph matrix G = [g_{i, j}], with binary element
Furthermore, this matrix induces a GRN, in which nodes corresponds to genes and an edge joins nodes i and node j if and only if g_{i, j}= 1. For each edge, we store the information of unit network where it belongs. The information of these sets can be easily obtainable from all unit networks. Thus, the combination of all unit networks for creating whole GRN is easy and straightforward procedure.
2. Bayesian orthogonal least square algorithm
As briefly described in background section, reverse engineering with a linear system with limited data has to overcome two difficulties. In this section, we describe our efforts to overcome these challenges by developing BOLS algorithm.
The system is referred as underdetermined when the number of parameters is larger than the number of available data points, so that standard least squares techniques break down. This issue can be solved with the OLS [22] method involves decomposition of the design matrix into two using GramSchmidt Orthogonalization theory as,
E = XU
where E_{N1 × K} = [e_{1}, e_{2},......, e_{k}], X_{N1 × K} = [x_{1}, x_{2},......, x_{k}] and U_{K × K} is a triangular matrix with 1's on the diagonal and 0's below the diagonal, that is,
Let's say that w is the regression parameter inferred by E and g is the regression parameter inferred by X. It is noted that g and w satisfy the triangular system
g = Uw.
The computational procedure of GramSchmidt method is described as
Because x_{i} and x_{j} (i ≠ j) are orthogonal to each other, the sum of square of Y is defined as
The variance of Y_{i} is defined as
It is noticed that (x_{i}^{T}x_{i})g_{i}^{2}/N1 is the variance of Y_{i} which is contributed by the regressors and n_{i}^{T}n_{i}/N1 is the noise (or unexplained) variance of Y_{i}. Hence, (x_{i}^{T}x_{i})g_{i}^{2}/N1 is the increment to the variance of Y_{i} contributed by w_{I}, and the error reduction ratio only due to x_{i} can be defined as
[NError]_{i} = (x_{i}^{T}x_{i})g_{i}^{2}/(Y^{T}Y), 1 ≤ i ≤ K.
This error term provides a simple and efficient measure for seeking a subset of significant regression parameters in a forwardregression way. The repressor selection procedure from Chen et al.[22] is summarized as follows:

1)
At the first step, for 1 ≤ i < K, compute x_{1}^{(i)} = e_{i}g_{1}^{(i)} = (x_{1}^{(i)})^{T}Y/((x_{1}^{(i)})^{Tx}_{1}^{(i)})[NError]_{1}^{(i)} = (g_{1}^{(i)})^{2}(x_{1}^{(i)T}x_{1}^{(i)})/(Y^{T}Y)
Find [NError]_{1}^{(i1)} = max{[NError]_{1}^{(i)}, 1 ≤ i ≤ K}
and select x_{1} = x_{1}^{(i1)} = e_{i1}.

(2)
At the j^{th} step where j ≥ 2, for 1 ≤ i ≤ K, i ≠ i_{1},..., i ≠ = i_{j1}, compute u_{mj}^{(i)} = x_{m}^{T}e_{i}/(x_{m}^{T}x_{l}), 1 ≤ m < j
{\text{x}}_{\text{j}}^{(\text{i})}={\text{e}}_{\text{i}}{\displaystyle \sum _{m=1}^{j1}{u}_{mj}^{(i)}{x}_{m}}g_{j}^{(i)} = (x_{j}^{(i)})^{T}Y/((x_{j}^{(i)})^{T}x_{j}^{(i)}) [NError]_{j}^{(i)} = (g_{j}^{(i)})^{2}(x_{j}^{(i)})^{T}x_{j}^{(i)}/(Y^{T}Y)
Find [NError]_{j}^{(ij)} = max{[NError]_{j}^{(i)}, 1 ≤ i ≤ K, i ≠ i_{1},..., i ≠ = i_{j1} }
and select
where u_{mj} = u_{mj}^{(ij)}, 1 ≤ m < j.

(3)
The OLS is terminated at the K_{s}^{th} step when
1{\displaystyle \sum _{m=1}^{{K}_{S}}{\left[NError\right]}_{m}<\rho},
where 0 <ρ < 1 is a chosen tolerance.
We assume that we do not have any information about noise level in the data, so that we completely overfit the data to the model using OLS method with ρ << 1 (ρ = 1.0e3 in this study). Then we reduce unnecessary parameters to deal with the illconditioned problem by using second order derivative for network pruning techniques and Bayesian model comparison framework. We can obtain the optimal solution by trading off between the complexity of the model and the data misfit [20]. We start this procedure using an extremely small value for data misfit by completely overfitting the data to the model. As the complexity of the model is reduced; i.e. the number of effective parameters is reduced, the value for data misfit is increased. The optimal complexity of the model for "true solution" is decided using a Bayesian model comparison frame that assigns a preference to the model H_{i} with certain complexity, a Bayesian evaluation so called as the evidence P(DH_{i}). The evidence is obtained by multiplying the bestfit likelihood by the Occam's factor,
where P(Dg_{ MP }, H_{i}) corresponds to the best fit likelihood, P(g_{ MP }H_{i})(2π)^{K/2}det^{1/2}A corresponds to the Occam's factor, A = ∂^{2}logP(gD, Hi)/∂g^{2}, and g_{ MP }represents the most probable parameters of g. The Occam's factor is equal to the ratio of the posterior accessible volume of H_{i}'s parameter space to the prior accessible volume, or the factor by which H_{i}'s hypothesis space collapses when the data is collected [20]. The model H_{i}s can be viewed as consisting of a certain number of exclusive submodels, of which only one is chosen when the data is collected. The Occam's factor is a measure of complexity of the model that depends not only on the number of parameters in the model but also on the prior probability of the model. Therefore, the overfitting solution can be avoided by using Bayesian model comparison frame because the Bayesian Occam's factor assures getting the optimal complexity of the model. See Mackay [20] for more details of Occam's factor.
With a second order derivative for network pruning [23], we can select the parameters to be eliminated first. Our goal here is to find a set of parameters whose deletion causes the least increase of cost function CC = β/2(Y  Xg)^{T} (Y  Xg) + α/2(g^{T} g), (3)
where α and β are hyperparameters that measure the complexity of the model and data misfit, respectively. It will be shown later in this section the iterative formulae to estimate α and β with a given data set and model structure in Eq. 6 and 7. Using the second order derivative for network pruning method, we can derive the saliency equation as follows,
where A_{jj}^{1} is a j^{th} diagonal component of the inverse matrix of A, A = ∂^{2}C/∂g^{2}, and v_{j} is the unit vector in parameter space, the j^{th} dimension at which it is equal to one and the rest of the dimensions are equal to zero. It should be noted that A and A^{1} are diagonal matrices because of the decomposition of design matrix E by GramSchmidt Orthogonalization theory. With Eq. 3 we can select parameters, whose elimination produces the least increase of cost function C.
With a Bayesian frame [20] we can compare alternative models when our model structures keep changing with the network pruning method. Let's say we have a data set D = [Y, X], where Y = [y(1), y(2),..., y(N)]^{T} is the target data set, X = [x_{1}, x_{2},..., x_{K}] the N × K design matrix, and x_{i} = [x_{i}(1), x_{i}(2),..., x_{i}(N)]^{T} each column matrix in X. The regression parameters we want to infer are g = [g_{1}, g_{2},..., g_{K}]^{T}. The log posterior probability of data D, given α and β, can be derived [20] as,
where the subscript MP denotes Most Probable. The evidence P(DH_{i}) can be obtained if we marginalize the probability defined in Eq. 5 over the hyperparameters α and β. Before the estimation of the evidence P(DH_{i}), we have to find the most probable value of the hyperparameters α and β. The differentiation of Eq. 5 over α and β and the rearrangement gives formulae for the iterative reestimation of α and β [20],
where γ = N  α Trace(A^{1}), g = A^{1}X^{T}Y, N is number of data points, and K is number of variables (genes). To rank alternative structures (or complexities) of the model in the light of data set D, we evaluate the evidence by marginalizing the posterior probability P(Dα, β, H_{i}) over α and β,
P(DH_{i}) = ∫∫P(Dα, β, H_{i})P(α, β)dαdβ.
We have very little prior information about α and β. When the available prior information is minimal, the learning process is often started with an objective prior probability. This uninformative prior probability is referred to as "vague prior" for a parameter with a range from 0 to ∞, which is a flat prior [87]. This prior probability can be left out when we compare alternative models. With the prior available, we can marginalize the posterior P(Dα, β, H_{i}). The marginalization of P(Dα, β, H_{i}) over α and β can be estimated using a flat prior and Gaussian integration [20],
where σ_{logαD}and σ_{logβD}are the error bars on logα and logβ, found by differentiating Eq. 5 twice:
In this study, we create a novel reverse engineering algorithm for linear systems with K number of genes using three techniques described above. The algorithm is run K times so that all genes in the data set are considered as a target gene at least once. The algorithm of BOLS for a unitnetwork construction is summarized as

1.
Set certain gene as the target gene and set remaining genes as regulator candidates as input.

2.
Overfit the data to Eq. 2 using OLS.

3.
While the number of parameters is greater than 1.
3.1 Estimate α and β with iterative reestimation Eq. 6 and 7.
3.2 Compute the P(DH_{i}) for the current state network H_{i} with Eq. 8.
3.3. Find the parameter g_{j} that gives the smallest L_{j} by Eq. 4 and delete g_{j}

4.
Select the network with the maximum P(DH_{i}) as an output unit network.
References
Altman RB, Raychaudhuri S: Whole genome expression analysis: challenges beyond clustering. Curr Opin Struct Biol. 2001, 11: 340347. 10.1016/S0959440X(00)002128.
Brown MP, Grundy WN, Lin D, Cristianini N, Sugnet CW, Furey TS, Ares M, Haussler D: Knowledgebased analysis of microarray gene expression data by using support vector machines. Proc Natl Acad Sci USA. 2000, 97: 262267. 10.1073/pnas.97.1.262.
Huges JD, Estep PW, Tavazoi S, Church GM: Computational identification of cisregulatory elements associated with groups of functionally related genes in Sacchaaromyces cerevisiae. J Mol Biol. 2000, 296: 12051214. 10.1006/jmbi.2000.3519.
Jansen R, Greenbaum D, Gerstein M: Relating wholegenome expression data with proteinprotein interactions. Genome Res. 2000, 12: 3746. 10.1101/gr.205602.
Niehrs C, Pollet N: Synexpression groups in eukaryotes. Nature. 1999, 402: 483487. 10.1038/990025.
Kauffman SA: Metabolic stability and epigenesist in randomly constructed genetic nets. J Theor Biol. 1969, 22: 437467. 10.1016/00225193(69)900150.
Shmulevich I, Dougherty E, Kim S, Zhang W: Probabilistic Boolean networks: a rulebased uncertainty model for gene regulatory networks. Bioinformatics. 2002, 18: 261274. 10.1093/bioinformatics/18.2.261.
Hartmink AJ, Gifford DK, Jaakola TS, Young RA: Using graphical models and genomic expression data to statistically validate models of genetic regulatory networks. Pac Symp Biocomputing. 2001, 6: 422433.
Husmeier D: Sensitivity and specificity of inferring genetic regulatory interactions from microarray experiments with dynamic Bayesian networks. Bioinformatics. 2003, 19: 22712282. 10.1093/bioinformatics/btg313.
Murphy K, Mian S: Modeling gene expression data using dynamic Bayesian networks. Technical report. 1999, Computer Science Division, University of California, Berkeley
Basso K, Margolin A, Stolovitzky G, Klein U, DallaFavera R, Califano A: Reverse engineering of regulatory networks in human B cells. Nat Genet. 2005, 37: 382390. 10.1038/ng1532.
D'Haeseleer P, Wen X, Fuhrman S, Somogyi R: Linear modeling of mRNA expression levels during CNS development and injury. Pac Symp Biocomputing. 1999, 4: 4152.
Weaver D, Workman C, Stormo G: Modeling regulatory networks with weight matrices. Pacific Symposium on Biocomputing. 1999, 4: 112123.
Chen K, Wang T, Tseng H, Huang C, Kao C: A stochastic differential equation model for quantifying transcriptional regulatory network in Saccharomyces cerevisiae. Bioinformatics. 2005, 21: 28832890. 10.1093/bioinformatics/bti415.
Yeung SM, Tegner J, Collins J: Reverse engineering gene networks using singular value decomposition and robust regression. Proc Natl Acad Sci USA. 2002, 99: 61636168. 10.1073/pnas.092576199.
Wille AZP, Vranova E, Furholz A, Laule O, Bleuler S, Hennig L, Prelic A, von Rohr P, Thiele L, Zitzler E, Gruissem W, Buhlmann P: Sparse graphical Gaussian modeling of the isoprenoid gene network in Arabidopsis thaliana. Genome Biol. 2004, 5: R9210.1186/gb2004511r92.
Deng X, Geng H, Ali H: EXAMINE: A computational approach to reconstructing gene regulatory networks. BioSystems. 2005, 81: 125136. 10.1016/j.biosystems.2005.02.007.
Chan ZS, Collins L, Kasabov N: Bayesian learning of sparse gene regulatory networks. Biosystems. 2007, 87 (2–3): 299306. 10.1016/j.biosystems.2006.09.026.
Rogers S, Girolami M: A Bayesian regression approach to the inference of regulatory networks from gene expression data. Bioinformatics. 2005, 21: 31313137. 10.1093/bioinformatics/bti487.
Mackay DJC: Bayesian interpolation. Neural Computation. 1992, 4: 415447.
Cooper S, Kerby S: Microarray analysis of gene expression during the cell cycle. Cell & Chromosome. 2003, 2: 110.1186/1475926821.
Chen S, Cowan C, Grant P: Orthogonal least squares learning algorithm for radial basis function networks. IEEE Trans Neural Net. 1991, 2: 302309. 10.1109/72.80341.
Hassibi B, Stork DG: Second order derivatives for networks pruning: optimal brain surgeon. Adv Neural Inform Proc Syst. 1993, 5: 164171.
Tipping ME: Sparse Bayesian learning and the relevance vector machine. Journal of Machine Learning Research. 2001, 1: 211244. 10.1162/15324430152748236.
Friedman N, Linjal M, Nachman I, Pe'er D: Using Bayesian networks to analyze expression data. J Comput Biol. 2000, 7: 601620. 10.1089/106652700750050961.
Wagner A: Estimating coarse gene network structure from largescale gene perturbation data. Genome Research. 2002, 12: 309315. 10.1101/gr.193902.
Rice J, Tu Y, Stolovitzky G: Reconstructing biological networks using conditional correlation analysis. Bioinformatics. 2005, 21 (6): 765773. 10.1093/bioinformatics/bti064.
Matlab code for creating synthetic network. [http://www.dcs.gla.ac.uk/~srogers/reg_nets.htm]
A Matlab implementation of "SparseBayes". [http://research.microsoft.com/mlp/RVM/default.htm]
Spellman PT, Sherlock G, Zhang MQ, Iyer V, Anders K, Eisen M, Brown P, Botstein D, Futcher B: Comprehensive identification of cell cycleregulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. Mol Biol Cell. 1998, 9: 32733297.
Chen K, CsikaszNagy A, Gyorffy B, Val J, Novak B, Tyson J: Kinetic analysis of a molecular model of the budding yeast cell cycle. Mol Bio Cell. 2000, 11: 369391.
Stark C, Breitkreutz BJ, Reguly T, Boucher L, Breitkreutz A, Tyers M: BioGRID: a general repository for interaction datasets. Nucleic Acids Res. 2006, 1 (34): D535539. 10.1093/nar/gkj109.
BioGRID database: general depository for interaction datasets. [http://www.thebiogrid.org/]
Breeden LL: Periodic transcription: a cycle within a cycle. Current Biology. 2003, 13: R31R38. 10.1016/S09609822(02)013866.
Ahn SH, Tobe BT, FitzGerald JN, Anderson SL, Acurio A, Kron SJ: Enhanced cell polarity in mutants of the budding yeast cyclindependent kinase Cdc28p. Mol Biol Cell. 2001, 12 (11): 36893560.
Gavin AC, Bosche M, Krause R, Grandi P, Marzioch M, Bauer A, Schultz J, Rick JM, Michon AM, Cruciat CM, Remor M, Hofert C, Schelder M, Brajenovic M, Ruffner H, Merino A, Klein K, Hudak M, Dickson D, Rudi T, Gnau V, Bauch A, Bastuck S, Huhse B, Leutwein C, Heurtier MA, Copley RR, Edelmann A, Querfurth E, Rybin V, Drewes G, Raida M, Bouwmeester T, Bork P, Seraphin B, Kuster B, Neubauer G, SupertiFurga G: Functional organization of the yeast proteome by systematic analysis of protein complexes. Nature. 2002, 415 (6868): 141147. 10.1038/415141a.
Gavin AC, Aloy P, Grandi P, Krause R, Boesche M, Marzioch M, Rau C, Jensen LJ, Bastuck S, Dumpelfeld B, Edelmann A, Heurtier MA, Hoffman V, Hoefert C, Klein K, Hudak M, Michon AM, Schelder M, Schirle M, Remor M, Rudi T, Hooper S, Bauer A, Bouwmeester T, Casari G, Drewes G, Neubauer G, Rick JM, Kuster B, Bork P, Russell RB, SupertiFurga G: Proteome survey reveals modularity of the yeast cell machinery. Nature. 2006, 440 (7084): 631636. 10.1038/nature04532.
Ho Y, Gruhler A, Heilbut A, Bader GD, Moore L, Adams SL, Millar A, Taylor P, Bennett K, Boutilier K, Yang L, Wolting C, Donaldson I, Schandorff S, Shewnarane J, Vo M, Taggart J, Goudreault M, Muskat B, Alfarano C, Dewar D, Lin Z, Michalickova K, Willems AR, Sassi H, Nielsen PA, Rasmussen KJ, Andersen JR, Johansen LE, Hansen LH, Jespersen H, Podtelejnikov A, Nielsen E, Crawford J, Poulsen V, Sorensen BD, Matthiesen J, Hendrickson RC, Gleeson F, Pawson T, Moran MF, Durocher D, Mann M, Hogue CW, Figeys D, Tyers M: Systematic identification of protein complexes in Saccharomyces cerevisiae by mass spectrometry. Nature. 2002, 415 (6868): 180183. 10.1038/415180a.
Reed SI, Hadwiger JA, Richardson HE, Wittenberg C: Analysis of the Cdc28 protein kinase complex by dosage suppression. J Cell Sci Suppl. 1989, 12: 2937.
Tyers M, Futcher B: Far1 and Fus3 link the mating pheromone signal transduction pathway to three G1phase Cdc28 kinase complexes. Mol Cell Biol. 1993, 13 (9): 56595669.
Archambault V, Chang EJ, Drapkin BJ, Cross FR, Chait BT, Rout MP: Targeted proteomic study of the cyclinCdk module. Mol Cell. 2004, 14 (6): 699711. 10.1016/j.molcel.2004.05.025.
Ceccarelli E, Mann C: A Cdc28 mutant uncouples G1 cyclin phosphorylation and ubiquitination from G1 cyclin proteolysis. J Biol Chem. 2001, 276 (45): 4172541732. 10.1074/jbc.M107087200.
Deshaies RJ, Kirschner M: G1 cyclindependent activation of p34CDC28Cdc28p – in vitro. Proc Natl Acad Sci USA. 1995, 92 (4): 11821186. 10.1073/pnas.92.4.1182.
Lanker S, Valdivieso MH, Wittenberg C: Rapid degradation of the G1 cyclin Cln2 induced by CDKdependent phosphorylation. Science. 1996, 271 (5255): 15971601. 10.1126/science.271.5255.1597.
Levine K, Oehlen LJ, Cross FR: Isolation and characterization of new alleles of the cyclindependent kinase gene CDC28 with cyclinspecific functional and biochemical defects. Mol Cell Biol. 1998, 18 (1): 290302.
Lim HH, Loy CJ, Zaman S, Surana U: Dephosphorylation of threonine 169 of Cdc28 is not required for exit from mitosis but may be necessary for start in Saccharomyces cerevisiae. Mol Cell Biol. 1996, 16 (8): 45734583.
Miller ME, Cross FR, Groeger AL, Jameson KL: Identification of novel and conserved functional and structural elements of the G1 cyclin Cln3 important for interactions with the CDK Cdc28 in Saccharomyces cerevisiae. Yeast. 2005, 22 (13): 10211036. 10.1002/yea.1292.
Peter M, Herskowitz I: Direct inhibition of the yeast cyclindependent kinase Cdc28Cln by Far1. Science. 1994, 265 (5176): 11281231. 10.1126/science.8066461.
Wang H, Gari E, Verges E, Gallego C, Aldea M: Recruitment of Cdc28 by Whi3 restricts nuclear accumulation of the G1 cyclinCdk complex to late G1. EMBO J. 2004, 23 (1): 180190. 10.1038/sj.emboj.7600022.
Cross FR: Further characterization of a size control gene in Saccharomyces cerevisiae. J Cell Sci Suppl. 1989, 12: 117127.
Cross FR, Blake CM: The yeast Cln3 protein is an unstable activator of Cdc28. Mol Cell Biol. 1993, 13 (6): 32663271.
Tyers M, Tokiwa G, Nash R, Futcher B: The Cln3Cdc28 kinase complex of Scerevisiae is regulated by proteolysis and phosphorylation. EMBO J. 1992, 11 (5): 17731784.
Ubersax JA, Woodbury EL, Quang PN, Paraz M, Blethrow JD, Shah K, Shokat KM, Morgan DO: Targets of the cyclindependent kinase Cdk1. Nature. 2003, 425 (6960): 859864. 10.1038/nature02062.
Wijnen H, Landman A, Futcher B: The G(1) cyclin Cln3 promotes cell cycle entry via the transcription factor Swi6. Mol Cell Biol. 2002, 22 (12): 44024418. 10.1128/MCB.22.12.44024418.2002.
Yaglom J, Linskens MH, Sadis S, Rubin DM, Futcher B, Finley D: p34Cdc28mediated control of Cln3 cyclin degradation. Mol Cell Biol. 1995, 15 (2): 731741.
Grandin N, Reed SI: Differential function and expression of Saccharomyces cerevisiae Btype cyclins in mitosis and meiosis. Mol Cell Biol. 1993, 13 (4): 21132125.
Surana U, Robitsch H, Price C, Schuster T, Fitch I, Futcher AB, Nasmyth K: The role of CDC28 and cyclins during mitosis in the budding yeast S. cerevisiae. Cell. 1991, 65 (1): 145161. 10.1016/00928674(91)90416V.
Bailly E, Cabantous S, Sondaz D, Bernadac A, Simon MN: Differential cellular localization among mitotic cyclins from Saccharomyces cerevisiae – a new role for the axial budding protein Bud3 in targeting Clb2 to the motherbud neck. J Cell Sci. 2003, 116: 41194130. 10.1242/jcs.00706.
Honey S, Schneider BL, Schieltz DM, Yates JR, Futcher B: A novel multiple affinity purification tag and its use in identification of proteins associated with a cyclinCDK complex. Nucleic Acids Res. 2001, 29 (4): E2410.1093/nar/29.4.e24.
Kaiser P, Moncollin V, Clarke DJ, Watson MH, Bertolaet BL, Reed SI, Bailly E: Cyclindependent kinase and CksSuc1 interact with the proteasome in yeast to control proteolysis of Mphase targets. Genes Dev. 1999, 13 (9): 11901202.
Krogan NJ, Cagney G, Yu H, Zhong G, Guo X, Ignatchenko A, Li J, Pu S, Datta N, Tikuisis AP, Punna T, PeregrinAlvarez JM, Shales M, Zhang X, Davey M, Robinson MD, Paccanaro A, Bray JE, Sheung A, Beattie B, Richards DP, Canadien V, Lalev A, Mena F, Wong P, Starostine A, Canete MM, Vlasblom J, Wu S, Orsi C, Collins SR, Chandran S, Haw R, Rilstone JJ, Gandi K, Thompson NJ, Musso G, St Onge P, Ghanny S, Lam MH, Butland G, AltafUl AM, Kanaya S, Shilatifard A, O'shea E, Weissman JS, Ingles CJ, Hughes TR, Parkinson J, Gerstein M, Wodak SJ, Emili A, Greenblatt JF: Global landscape of protein complexes in the yeast Saccharomyces cerevisiae. Nature. 2006, 440: 637643. 10.1038/nature04670.
Ross KE, Kaldis P, Solomon MJ: Activating phosphorylation of the Saccharomyces cerevisiae cyclindependent kinase, cdc28p, precedes cyclin binding. Mol Biol Cell. 2000, 11 (5): 15971609.
Schwab M, Neutzner M, Mocker D, Seufert W: Yeast Hct1 recognizes the mitotic cyclin Clb2 and other substrates of the ubiquitin ligase APC. EMBO J. 2001, 20 (18): 51655175. 10.1093/emboj/20.18.5165.
Shellman YG, Svee E, Sclafani RA, Langan TA: Identification and characterization of individual cyclindependent kinase complexes from Saccharomyces cerevisiae. Yeast. 1999, 15 (4): 295309. 10.1002/(SICI)10970061(19990315)15:4<295::AIDYEA377>3.0.CO;2Z.
Jaspersen SL, Charles JF, Morgan DO: Inhibitory phosphorylation of the APC regulator Hct1 is controlled by the kinase Cdc28 and the phosphatase Cdc14. Curr Biol. 1999, 9 (5): 227236. 10.1016/S09609822(99)801110.
Loog M, Morgan DO: Cyclin specificity in the phosphorylation of cyclindependent kinase substrates. Nature. 2005, 434 (7029): 104108. 10.1038/nature03329.
Rudner AD, Hardwick KG, Murray AW: Cdc28 activates exit from mitosis in budding yeast. J Cell Biol. 2000, 149 (7): 13611376. 10.1083/jcb.149.7.1361.
Petroski MD, Deshaies RJ: Context of multiubiquitin chain attachment influences the rate of Sic1 degradation. Mol Cell. 2003, 11 (6): 14351444. 10.1016/S10972765(03)002211.
Rice LM, Plakas C, Nickels JT: Loss of meiotic replication block in Saccharomyces cerevisiae cells defective in Cdc28p regulation. Eukaryotic Cell. 2005, 4 (1): 5562. 10.1128/EC.4.1.5562.2005.
Skowyra D, Craig KL, Tyers M, Elledge SJ, Harper JW: Fbox proteins are receptors that recruit phosphorylated substrates to the SCF ubiquitinligase complex. Cell. 1997, 91 (2): 209219. 10.1016/S00928674(00)804031.
Verma R, McDonald H, Yates JR, Deshaies RJ: Selective degradation of ubiquitinated Sic1 by purified 26S proteasome yields active S phase cyclinCdk. Mol Cell. 2001, 8 (2): 439448. 10.1016/S10972765(01)003082.
Geymonat M, Spanos A, Wells GP, Smerdon SJ, Sedgwick SG: Clb6Cdc28 and Cdc14 regulate phosphorylation status and cellular localization of Swi6. Mol Cell Biol. 2004, 24 (6): 22772285. 10.1128/MCB.24.6.22772285.2004.
Donaldson AD, Raghuraman MK, Friedman KL, Cross FR, Brewer BJ, Fangman WL: CLB5dependent activation of late replication origins in Scerevisiae. Mol Cell. 1998, 2 (2): 173182. 10.1016/S10972765(00)801276.
Gibson DG, Aparicio JG, Hu F, Aparicio OM: Diminished Sphase cyclindependent kinase function elicits vital Rad53dependent checkpoint responses in Saccharomyces cerevisiae. Mol Cell Biol. 2004, 24 (23): 1020810222. 10.1128/MCB.24.23.1020810222.2004.
Li X, Cai M: Recovery of the yeast cell cycle from heat shockinduced G(1) arrest involves a positive regulation of G(1) cyclin expression by the S phase cyclin Clb5. J Biol Chem. 1999, 274 (34): 2422024231. 10.1074/jbc.274.34.24220.
Meyn MA, Holloway SL: Sphase cyclins are required for a stable arrest at metaphase. Curr Biol. 2000, 10 (24): 15991602. 10.1016/S09609822(00)008630.
Schwob E, Nasmyth K: CLB5 and CLB6, a new pair of B cyclins involved in DNA replication in Saccharomyces cerevisiae. Genes Dev. 1993, 7 (7): 11601175. 10.1101/gad.7.7a.1160.
Smith KN, Penkner A, Ohta K, Klein F, Nicolas A: Btype cyclins CLB5 and CLB6 control the initiation of recombination and synaptonemal complex formation in yeast meiosis. Curr Biol. 2001, 11 (2): 8897. 10.1016/S09609822(01)000264.
Stuart D, Wittenberg C: CLB5 and CLB6 are required for premeiotic DNA replication and activation of the meiotic SM checkpoint. Genes Dev. 1998, 12 (17): 26982710.
Cross FR, Jacobson MD: Conservation and function of a potential substratebinding domain in the yeast Clb5 Btype cyclin. Mol Cell Biol. 2000, 20 (13): 47824790. 10.1128/MCB.20.13.47824790.2000.
Verma R, Feldman RM, Deshaies RJ: SIC1 is ubiquitinated in vitro by a pathway that requires CDC4, CDC34, and cyclinCDK activities. Mol Biol Cell. 1997, 8 (8): 14271437.
Wasch R, Cross FR: APCdependent proteolysis of the mitotic cyclin Clb2 is essential for mitotic exit. Nature. 2002, 418 (6897): 556562. 10.1038/nature00856.
Jacobson MD, Gray S, YusteRojas M, Cross FR: Testing cyclin specificity in the exit from mitosis. Mol Cell Biol. 2000, 20 (13): 44834493. 10.1128/MCB.20.13.44834493.2000.
Shirayama M, Toth A, Galova M, Nasmyth K: APCCdc20promotes exit from mitosis by destroying the anaphase inhibitor Pds1 and cyclin Clb5. Nature. 1999, 402 (6758): 203207. 10.1038/46080.
Zachariae W, Schwab M, Nasmyth K, Seufert W: Control of cyclin ubiquitination by CDKregulated binding of Hct1 to the anaphase promoting complex. Science. 1998, 282 (5394): 17211724. 10.1126/science.282.5394.1721.
Queralt E, Igual JC: Functional Connection Between the Clb5 Cyclin, the Protein Kinase C Pathway and the Swi4 Transcription Factor in Saccharomyces cerevisiae. Genetics. 2005, 171 (4): 14851498. 10.1534/genetics.105.045005.
Press SJ: Subjective and Objective Bayesian Statistics: Principles, Models, and Applications. 2002, John Wiley & Sons
Acknowledgements
The author was supported by the postdoctoral fellowship of Turku Centre for Computer Science during the development of BOLS algorithm. The author thanks Tapio Salakoski and Mauno Vihinen for helpful discussion.
Author information
Authors and Affiliations
Corresponding author
Additional information
Authors' contributions
CSK developed BOLS algorithm, performed experiments for evaluation of BOLS algorithm, and drafted and finalized manuscript.
Electronic supplementary material
12859_2006_1623_MOESM1_ESM.doc
Additional file 1: Comparing the performance between BOLS and SBL using the data set generated based on Rogers and Girolami's study – Supplementary Information. This description provides the comparison of performance between BOLS and SBL using the synthetic data generated by Rogers and Girolami [19]. (DOC 43 KB)
Authors’ original submitted files for images
Below are the links to the authors’ original submitted files for images.
Rights and permissions
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
About this article
Cite this article
Kim, C.S. Bayesian Orthogonal Least Squares (BOLS) algorithm for reverse engineering of gene regulatory networks. BMC Bioinformatics 8, 251 (2007). https://doi.org/10.1186/147121058251
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/147121058251