A new Bayesian piecewise linear regression model for dynamic network reconstruction

Background Linear regression models are important tools for learning regulatory networks from gene expression time series. A conventional assumption for non-homogeneous regulatory processes on a short time scale is that the network structure stays constant across time, while the network parameters are time-dependent. The objective is then to learn the network structure along with changepoints that divide the time series into time segments. An uncoupled model learns the parameters separately for each segment, while a coupled model enforces the parameters of any segment to stay similar to those of the previous segment. In this paper, we propose a new consensus model that infers for each individual time segment whether it is coupled to (or uncoupled from) the previous segment. Results The results show that the new consensus model is superior to the uncoupled and the coupled model, as well as superior to a recently proposed generalized coupled model. Conclusions The newly proposed model has the uncoupled and the coupled model as limiting cases, and it is able to infer the best trade-off between them from the data. Supplementary Information The online version supplementary material available at 10.1186/s12859-021-03998-9.

some edges) might be subject to changes, while other network parameters (i.e. edges) might stay constant. The model has been designed for analysing data that have been measured under different experimental conditions, so that it does not allow the segmentation of a time series to be inferred. The non-homogeneous model from Shafiee Kamalabad and Grzegorczyk [8] distinguishes between two groups of edges: (i) edges that are fully coupled among all segments and (ii) edges that are uncoupled among all segments. The new model that we propose here is conceptual related, but complementary in that it replaces the concept of partially coupled edges by the concept of partially coupled time segments.
We note that network reconstruction is a topical research field in the computational biology literature and that many different network reconstruction approaches have been proposed over the years. However, most of the proposed models do not focus on nonhomogeneous regulatory processes but rely on a homogeneity of the regulatory processes. For some applications this assumption of homogeneity can be too restrictive; compare, e.g., our data applications. In response to one of the reviewers of our paper, we here briefly discuss a few recently proposed network reconstruction methods. Vignes et al. [9] investigated and compared a wide variety of methods, ranging from Bayesian networks to penalised linear regression based models and proposed a meta-analysis based on Fisher's Inverse Chi-Square meta-test for combining different approaches. Huang et al. [10] proposed to apply Bayesian model averaging for linear regression methods. The method uses a closed form solution to compute the edge posterior probabilities within a hybrid framework of Bayesian model averaging and linear regression. Xing et al. [11] proposed a Candidate Auto Selection algorithm based on the pairwise mutual information and breakpoint detection. With a greedy search algorithm it is searched for the best network topology. Unlike the above mentioned models, Fan et al. [12] propose to impose a prior on the topology information in their inference process. Incorporating this prior information can partially compensate for the lack of reliable data. They then developed a Bayesian group lasso with spike and slab prior approach based on non-parametric models. Xu et al. [13] propose to employ a series of linear regression problems to model the relationship between the network nodes. They use an efficient variational Bayes method for optimization and inference of the unknown network parameters.

Learning dynamic networks with time-varying parameters
Consider N random variables Z 1 , . . . , Z N that are the nodes of a network. Let D denote an N-by-(T + 1) data matrix, whose N rows correspond to the variables and whose T + 1 columns correspond to time points t = 1, . . . , T + 1 . The element in the ith row and tth column, D i,t , is the value of Z i at time point t. For temporal data it is typically assumed that the regulatory interactions are subject to a lag of one time point. For example, an edge Z i → Z j indicates that D j,t+1 ( Z j at t + 1 ) depends on D i,t ( Z i at t). The variable Z i is then called a parent (node) of Z j .
Because of the lag, there is no need for any acyclicity constraint, and for each node Z j ( j = 1, . . . , N ) the parent nodes can be learned separately. This has computational advantages, since the 'network learning task' can be separated into N independent 'parent learning tasks' . Henceforth, when a computer cluster is available, the N parent sets can be learned in parallel, so that the inference algorithms scale-up well.
Because of the lag, T + 1 time points yield T observations for the linear regression model. Each observation D t (t ∈ {1, . . . , T }) consists of a response value Y = D j,t+1 and the shifted covariate values: Having inferred a covariate set π j for each Z j , a network is built by merging the covariate sets: As the same linear regression approaches are used for each Z j , we describe the models using a general terminology: Let Y be the response and let X 1 , . . . , X n be the covariates of the linear regression model.
To allow for time-dependent regression coefficients, a piece-wise linear regression model can be used. Changepoints τ : We assume all covariate sets π ⊂ {X 1 , . . . , X n } with up to F = 3 covariates to be equally likely a priori, p(π ) = c , while parent sets with more than F covariates get a zero prior probability ('fan-in restriction'). Further we assume that the distance between changepoints is geometrically distributed with hyperparameter p ∈ (0, 1) , so that With y = y τ := {y 1 , . . . , y H } being the set of segment-specific response vectors, implied by the changepoint set τ , the posterior distribution takes the form: where θ = θ(π , τ ) denotes the set of all model parameters, including segment-specific parameters as well as parameters that are shared among segments.

A generic Bayesian piece-wise linear regression model
Consider a Bayesian linear regression model, where Y is the response and X 1 , . . . , X k are the covariates. We assume that T observations D 1 , . . . , D T have been made at equidistant time points and that the data can be subdivided into disjoint segments h ∈ {1, . . . , H} , where segment h contains T h data points and has a segment-specific regression coefficient vector w h . Let y h be the response vector and X h be the design matrix for segment where I is the identity matrix, and σ 2 is a noise variance parameter that is shared among all segments. We impose an inverse Gamma prior on σ 2 , σ −2 ∼ GAM(α σ , β σ ) , and we assume that the vectors w h have Gaussian priors: where µ h is a (k+1)-dimensional vector, and h is a positive definite (k + 1)-by-(k + 1) matrix. Re-using the parameter σ 2 in Eq. (3), yields a fully-conjugate prior in both w h and σ 2 (see, e.g., Sections 3.3 and 3.4 in Gelman [14]). Figure 1 shows a graphical model representation of this generic model. For notational convenience we define: The full conditional distribution of w h is (cp. Section 3.3 in [15]): and the segment-specific marginal likelihoods with w h integrated out are: where C h (θ) := I + X h � h X T h (cp. Section 3.3 in [15]). From Eq. (5) we get: . The shape of p(σ 2 |y, θ ) implies: For the marginal likelihood, with w h ( h = 1, . . . , H ) and σ 2 integrated out, we apply the rule from Section 2.3.7 of Bishop [15]: When all parameters in θ are fixed, the marginal likelihood of the piece-wise linear regression model can be computed in closed form. In typical models the (hyper-)hyperparameters in θ depend on hyperparameters with their own hyperprior distributions. From now on we will only include the free hyperparameters in θ . In the following subsections we describe four possible model instantiations, namely: the uncoupled model (M1), the coupled model (M2), the newly proposed partially coupled model (M3), and the generalized coupled model (M4). In the forthcoming subsections we will introduce T Total number of data points - further mathematical symbols. For convenience, Table 1 lists the mathematical symbols that we will use in this paper.

Model M1: the uncoupled model
A standard approach, akin to the models of Lèbre et al. [1] and Dondelinger et al. [3], is to set µ h = 0 and to assume that the matrices h are diagonal matrices h = u I , where the parameter u ∈ R + is shared among segments and assumed to be inverse Gamma distributed, −1 u ∼ GAM(α u , β u ) . In the supplementary material we provide a graphical model representation of the uncoupled model (M1). Using the notation of the generic model, we have: For the posterior distribution of the uncoupled model we have: where w := {w 1 , . . . , w H } . From Eq. (9) it follows for the full conditional distribution of u : and the shape of the latter density implies: Since the full conditional distribution of u depends on σ 2 and w , those parameters have to be sampled first. From Eq. (6) a value of σ 2 can be sampled via a collapsed Gibbs-sampling step, with the w h 's being integrated out. Subsequently, given σ 2 , Eq. (4) can be used to sample the vectors w h 's. Finally, for each u sampled from Eq. (10) the marginal likelihood, p(y| u ) , can be computed by plugging in the expressions from Eq. (8) into Eq. (7).

Model M2: the (fully) coupled model
The (fully) coupled model, proposed by Grzegorczyk and Husmeier [5], uses the posterior expectation of w h−1 as prior expectation for w h . Only the first segment h = 1 has an uninformative prior: where w h−1 is the posterior expectation of w h−1 (cp. Eq. (4)): The parameter c has been called the 'coupling parameter' and it has been assumed that it has an inverse Gamma prior distribution, −1 c ∼ GAM(α c , β c ) . Using the notation from the generic model (see Fig. 1), we note that Eq. (11) corresponds to: As w h−1 is treated like a fixed hyperparameter when used as input for segment h, we exclude the parameters w 1 , . . . ,w H −1 from θ.
In the supplementary material we provide a graphical model representation of the coupled M2 model. For the posterior we have: In analogy to the derivations in the previous subsection one can derive (cp. [5]): , can be computed by plugging the expressions C h (θ) and � 2 (θ) into Eq. (7).

Model M3: the new partially segment-wise coupled model
We propose a new 'consensus' model between the M1 and the M2 model. The new model (M3) allows each segment h > 1 either to coupled top or to uncouple from the preceding segment h − 1 . We use an uninformative prior for the first segment h = 1 , and for all segments h > 1 we introduce a binary variable δ h which indicates whether segment h is coupled to ( δ h = 1 ) or uncoupled from ( δ h = 0 ) the preceding segment h − 1: where w h−1 is the posterior expectation of w h−1 . The new priors from Eq. (15) yield for h ≥ 2 the following posterior expectations (cp. Eq. (4)): with w 0 := 0 , δ 1 := 0 , we have in the generic model notation: We assume the binary variables δ 2 , . . . , δ H to have a Bernoulli prior distributions, δ h ∼ BER( p) , with a joint hyperparameter p ∈ [0, 1] having a Beta hyperprior distribution, p ∼ BETA(a, b) . We note that • The new partially segment-wise coupled model infers the variables δ h ( h ≥ 2 ) from the data. It searches for the best trade-off between the models M1 and M2.
A graphical model presentation of the partially coupled model is shown in Fig. 2. For δ h ∼ BER( p) with p ∼ BETA(a, b) the joint marginal density of {δ h } h≥2 is:  7), where C h (θ ) was defined above, and We have for each binary variable δ k ( k = 2, . . . , H): so that its full conditional distribution is: Each δ k ( k > 1 ) can therefore be sampled with a collapsed Gibbs sampling step, where {w h } , σ 2 and p have been integrated out.

Model M4: the generalised (fully) coupled model
In [6] we proposed to generalise the (fully) coupled model (i.e. the M2 model) by introducing a segment-specific coupling parameter h for each segment h > 2 . This yields: where w h−1 is the posterior expectation of w h−1 . For the parameters h we have assumed that they are inverse Gamma distributed, −1 h ∼ GAM(α c , β c ) , with hyperparameters α c and β c . In the supplementary material we provide a graphical model representation of the M4 model. Recalling the generic notation and setting w 0 := 0 and 1 := u , Eq. (17) gives: For the posterior we have: , can be computed with Eq. (7); using the expressions C h (θ ) and � 2 (θ ) defined above.
Unlike the proposed partially coupled M3 model, the generalized coupled M4 model does not feature any mechanism to uncouple neighbouring segments. Like the fully coupled M2 model, the M4 model has been designed such that it has to couple all neighbouring segments. The only advantage over the M2 model is that the the M4 model introduces segment-specific coupling parameters, so that the coupling strength(s) can vary over time. (17)

Reversible jump Markov chain Monte Carlo inference
We use Reversible Jump Markov Chain Monte Carlo simulations to generate posterior samples {π (w) , τ (w) , θ (w) } w=1,...,W . In each iteration we re-sample the parameters in θ from their full conditional distributions (Gibbs sampling), and we perform two Metropolis-Hastings moves; one on the covariate set π and one on the changepoint set τ . For the four models (M1-M4) Eq. (1) takes the form: All likelihood terms, p(y| . . .) , are marginalized over σ 2 and {w h } and for the new M3 model also the Bernoulli parameter p has been integrated out.
For the models M1-M2 the dimension of θ does not depend on τ , while for the models M3-M4 the dimension of θ does depend on τ . The M3 model has a discrete parameter δ h ∈ {0, 1} and the M4 model has a continuous parameter h ∈ R + for each h > 1.
The model-specific full conditional distributions for the Gibbs sampling steps have been provided above. For sampling π we implement 3 moves: covariate 'removal (R)' , 'addition (A)' , and 'exchange (E)' . Each move proposes to replace π by a new covariate set π * having one covariate more (A) or less (R) or exchanged (E). When randomly selecting the move type and the involved covariate(s), we get for all models the acceptance probability: For sampling τ we also implement 3 move types: changepoint 'birth (R)' , 'death (D)' , and 're-allocation (R)' moves. Each move proposes to replace τ by a new changepoint set τ * having one changepoint added (B) or deleted (D) or re-allocated (R). When randomly selecting the move type, the involved changepoint and the new changepoint location, we get for M1 and M2: For the models M3 (proposed here) and the model M4 from [6] the changepoint moves also affect the numbers of parameters in {δ h } h≥2 and { h } h≥2 , respectively. For all segments that stay identical we keep the parameters unchanged. For all new segments we re-sample the corresponding parameters. For the new model M3 we flip coins to get candidates for the involved δ h 's. This yields: A π → π * = min 1, p y|π * , . . . p y|π , . . . · p(π * ) p(π ) · HR π with the Hastings Ratios: HR π ,R = |π| n − |π * | , HR π ,A = n − |π | |π * | , HR π,E = 1 where c τ ,B = 2 for birth, c τ ,D = 1/2 for death, and c τ ,R = 1 for re-allocation moves. For the model M4 we follow [6] and re-sample the involved h 's from their priors p( h ) . We obtain: Note that the additional factor c τ := p({ h }) p({ h } * ) of the Hastings ratio has been canceled with the prior ratio p({ h } * ) p({ h }) .

Edge scores and areas under precision recall curves (AUC)
For a network with N variables Z 1 , . . . , Z N we infer N separate regression models. For ..,W from the ith posterior. From the covariate sets we form a sample of graphs G (w) = {π (w) 1 , . . . , π (w) N } w=1,...,W . For each edge Z i → Z j the edge posterior probability (edge score) is: If the true network is known and has M edges, we can quantify the network reconstruction accuracy. For each threshold ξ ∈ [0, 1] we extract the n ξ edges whose scores ê i,j exceed ξ , and we count the number of true positives T ξ among them. Plotting the precisions P ξ := T ξ /n ξ against the recalls R ξ := T ξ /M , gives the precision-recall curve. We refer to the area under the curve as AUC value.

Hyperparameter settings and simulation details
The hyperparameters of the priors and hyperpriors of the four NH-DBN models (M1-M4) have to be specified in advance, and we note that the hyperparameter setting can have an effect on the resulting posterior distributions and so on the network reconstruction results. Selecting appropriate hyperparameters is therefore a crucial task. In the absence of genuine prior knowledge (e.g. from experts or from the literature), we re-use the rather uninformative (and thus generic) parameter settings from earlier publications. Re-using those hyperparameters also has the advantage that our empirical results can be compared with earlier reported results. More specifically, we proceed as follows: For the models M1, M2 and M4 we re-use the hyperparameters from the earlier works by Lèbre et al. [1], Grzegorczyk and Husmeier [5], and Shafiee Kamalabad and Grzegorczyk [6]: σ −2 ∼ GAM(α σ = ν, β σ = ν) with ν = 0.005 , −1 u ∼ GAM(α u = 2, β u = 0.2) , and −1 c ∼ GAM(α c = 3, β c = 3) . For the new partially coupled model M3 we use the same setting with the extension: δ h ∼ BER( p) with p ∼ BETA(a = 1, b = 1) , which seems to be a very natural choice. For the M3 model we also tested several alternative hyperparameter settings, but we did not observe significantly deviating results, indicating that the M3 model is rather robust with respect to the hyperparameter settings. For more thorough studies on how the hyperparameter setting affects the network reconstruction results, we refer to the work by Grzegorczyk and Husmeier [5]. For all models M1-M4 we run each reversible jump Markov chain Monte Carlo simulation for V = 100,000 iterations. Setting the burn-in phase to 0.5V (50%) and thinning out by the factor 10 during the sampling phase, yields W = 0.5V /10 = 5000 samples from each posterior. To check for convergence, we compared the samples of independent simulations, using standard trace plot diagnostics as well as scatter plots of the estimated edge scores. For most of the data sets, analysed here, the diagnostics indicated almost perfect convergence already after V = 10,000 iterations; see Fig. 7a for an example.

Synthetic network data
For model comparisons we generated various synthetic network data sets. We report here on two studies with realistic network topologies, shown in Figs. 3 and 4. In both studies we assumed the data segmentation to be known. Hence, we kept the changepoints in τ fixed at their right locations and did not perform reversible jump Markov chain Monte Carlo moves on τ.
Study 1 For the RAF pathway with N = 11 nodes and M = 20 edges, shown in Fig. 4 and taken from Sachs et al. [16], we generated data with H = 4 segments having m = 10 data points each. For each node Z i and its parent nodes in π i we sampled the regression coefficients for h = 1 from standard Gaussian distributions and collected them in a vector w i 1 which we normalised to Euclidean norm 1, w i  Fig. 3 and taken from Cantone et al. [17]. (ii) Again we generated data with H = 4 segments, but we varied the number of time points per segment m ∈ {2, 3, . . . , 12} . (iii) We focused on one scenario: For each node Z i and its parent nodes in π i we generated two vectors w i ⋄ and w i ⋆ with standard Gaussian distributed entries. We re-normalised the first vector to Euclidean norm 1, w i ⋄ ← w i ⋄ /|w i ⋄ | , and the 2nd vector to norm 0.5, w i ⋆ ← 0.5 · w i ⋆ /|w i ⋆ | . We set w i 1 = w i 2 = w i ⋄ so that the segments h = 2 and h = 3 are coupled, and , so that the segments h = 3 and h = 4 are coupled, while the coupling between h = 3 and h = 2 is 'moderate' . For each m we generated 25 data matrices with different regression coefficients.

Yeast gene expression data
Cantone at al. [17] synthetically designed a network in S. cerevisiae (yeast) with N = 5 genes, and measured gene expression data under galactose-and glucose-metabolism: 16 measurements were taken in galactose and 21 measurements were taken in glucose, with 20 minutes intervals in between measurements. Although the network is small, it is an ideal benchmark data set: The network structure is known, so that network reconstruction methods can be cross-compared on real wet-lab data. We follow Grzegorczyk and Husmeier and pre-process the data as described in [5]. The true network structure is shown in the left panel of Fig. 3. As an example, a network prediction obtained with the

Arabidopsis gene expression data
The circadian clock in Arabidopsis thaliana optimizes the gene regulatory processes with respect to the daily dark:light cycles (photo periods). In four experiments Arabidopsis plants were entrained in different dark:light cycles, before gene expression data were measured under constant light condition over 24-and 48-h time intervals. We follow Grzegorczyk and Husmeier [5] and merge the four time series to one single data set a b Fig. 5 Results for synthetic RAF pathway data. We distinguish 8 coupling scenarios (δ 1 = 0, δ 2 , δ 3 , δ 4 ) . a Each histogram has three bars for the average AUC differences between the partially coupled model ( with T = 47 data points and focus our attention on the N = 9 core genes: LHY, TOC1, CCA1, ELF4, ELF3, GI, PRR9, PRR5, and PRR3.

Results
In this section we present the results of a comparative evaluation study, in which we compare the performance of the new partially coupled model (M3) with the competing models M1, M2 and M4. Throughout this section we use the new M3 model as reference model.

Results for synthetic network data
We start with the RAF-pathway for which we generated network data for 8 different coupling scenarios. Figure 5a compares the network reconstruction accuracies in terms of average AUC value differences. For 6 out of 8 scenarios the three AUC differences are clearly and significantly in favour of M3. Not surprisingly, for the two extreme scenarios, where all segments h ≥ 2 are either coupled ('0111') or uncoupled ('0000'), M3 performs slightly worse than the fully coupled models (M2 and M4) or the uncoupled model (M1), respectively. But unlike the uncoupled model (M1) for coupled data ('0111'), and unlike the coupled models (M2 and M4) for uncoupled data ('0000'), the partially coupled model (M3) never performs significantly worse than the respective 'gold-standard' model. For the partially coupled model, Fig. 5b shows the posterior probabilities that the segments h = 2, 3, 4 are coupled. The trends are in good agreement with the true coupling mechanism. Model M3 correctly infers whether the regression coefficients stay similar (identical) or change (substantially). The generalised coupled model (M4) can only adjust the segment-specific coupling strengths, but has no option to uncouple. Like the coupled model (M2), it fails when the parameters are subject to drastic changes. When comparing the coupled model (M2) with the generalised coupled model (M4), we see that M2 performs better when only one segment is coupled, while the new M4 model is superior to M2 if two segments are coupled, see the scenarios '0011' , '0110' , and '0101' .
For the yeast network we generated data corresponding to a '0101' coupling scheme and the change of the parameters (from the 2nd to the 3rd segment) is less drastic than for the RAF pathway data. Figure 6 shows how the AUC differences vary with the number of time points T, where T = 4m and m is the number of data points per segment. For sufficiently many data points the effect of the prior diminishes and all models yield high AUC values (see bottom right panel). There are then no significant differences between the AUC values anymore. However, for the lower sample sizes again the new partially coupled model (M3) performs clearly best. For 12 ≤ m ≤ 28 model M3 is significantly superior to all other models and for 30 ≤ T ≤ 40 it still significantly outperforms the uncoupled (M1) and the coupled (M2) model. The performance of the generalised model (M4) is comparable to the performance of the uncoupled model. For moderate sample sizes ( 12 ≤ T ≤ 44 ) model M4 is significantly better than the fully coupled model (M2).

Results for yeast gene expression data
For the yeast gene expression data we assume the changepoint(s) to be unknown and we infer the segmentation from the data. Figure 7a shows convergence diagnostics for the The average AUC scores of the models M1-M4 are shown in Fig. 7b. Since the number of inferred changepoints grows with the hyperparameter p of the geometric distribution on the distance between changepoints, we implemented the models with different p's. The uncoupled model is superior to the coupled model for the lowest p ( p = 0.02 ) only, but becomes more and more inferior to the coupled model, as p increases. This result is consistent with the finding in Grzegorczyk and Husmeier [5] and can be explained as follows: As the hyperparameter of the changepoint prior p ∈ (0, 1) increases, the number of inferred data segments H grows so that the individual data segments h = 1, . . . , H get shorter. The individual segments h then cover less data points and are thus less informative. The coupling scheme allows for information-sharing among segments. The information content of large segments is sufficient for inference, so that coupling does not provide any noteworthy advantage. But for short (uninformative) segments information coupling improves the inference certainty, as coupling allows for the incorporation of information from the preceding segment(s). Therefore the potential improvement that can be gained by coupling grows with the hyperparameter p.
The new partially coupled model (M3) performs consistently better than the uncoupled and the coupled model (M1-M2). The only exemption occurs for p = 0.1 where the coupled model (M2) appears to perform slightly (but not significantly) better than M3. For p's up to p = 0.05 the fully coupled (M2) and the generalised fully coupled model (M4) perform approximately equally well. However, for the three highest p's the M4 model performs better than the coupled model (M2) and even outperforms the new partially coupled model (M3). While the performances of the models M1-M3 decrease with the number of changepoints, the performance of the model M4 stays rather robust.
Subsequently, we re-analysed the yeast data with K = 1, . . . , 5 fixed changepoints. Figure 8a, b shows the average AUC scores and the AUC score differences in favour of the partially coupled model (M3). Panel (a) reveals that the new partially coupled model (M3) reaches again the highest network reconstruction accuracy. Panel (b) shows that the superiority of M3 is significant, with only one exemption: For K = 1 the uncoupled model M1 does not perform worse than the partially coupled model (M3).
Subsequently, we also investigated the segment-specific coupling posterior probabilities p(δ h = 1|D) ( h = 2, . . . , H = K + 1 ) for the new partially coupled model (M3) and the posterior distributions of the coupling parameters u , 2 , . . . , K +1 for the generalised model (M4), but we could not find clear trends for any gene. As an example, we provide the results for gene ASH1 in Fig. 9a, b. Panel (a) shows that the coupling posterior Fig. 8 Results for real yeast data with fixed changepoints. We imposed K ∈ {1, . . . , 5} changepoints and kept them fixed. K changepoints yield H = K + 1 segments. For each K we used the first changepoint to separate the two parts of the time series (galactose vs. glucose metabolism). Successively we located the next changepoint in the middle of the longest segment to divide it into 2 segments, until K changepoints were set. a show the model-specific average total AUC scores with error bars indicating standard deviations. b shows the AUC score differences in favour of the partially coupled model (M3). Here the error bars indicate t-test confidence intervals

Application to Arabidopsis gene expression data
For the Arabidopsis gene expression data we cannot objectively compare the network reconstruction accuracies of the four models, since the true circadian clock network is not known. We therefore only applied the new partially coupled model (M3), which we had found to be the best model in our earlier studies. Figure 10 shows the Arabidopsis network, which was reconstructed using the hyperparameter p = 0.1 for the geometric distribution on the distance between changepoints. To obtain a network prediction, we extracted the 20 edges with the highest edge scores. Although a proper evaluation of the network prediction is beyond the scope of this paper, we note that several features of the network are consistent with the plant biology literature. E.g. the feedback loop between LHY and TOC1 is the most important key feature of the circadian clock network (see, e.g., the work by Locke et al. [18]). Many of the other predicted edges have been reported in more recent works. E.g. the edges LHY → ELF 3 , LHY → ELF 4 , GI → TOC1 , ELF 3 → PRR3 and ELF 4 → PRR9 can all be found in the circadian clock network (hypothesis) of Herrero et al. [19]. a b Fig. 9 Results for real yeast data with fixed changepoints. We imposed K ∈ {1, . . . , 5} changepoints and kept them fixed. K changepoints yield H = K + 1 segments. For each K we used the first changepoint to separate the two parts of the time series (galactose vs. glucose metabolism). Successively we located the next changepoint in the middle of the longest segment to divide it into 2 segments, until K changepoints were set. a Diagnostic for the partially coupled model (

Discussion and conclusions
We have proposed a new Bayesian piece-wise linear regression model for reconstructing regulatory networks from gene expression time series. The new partially coupled model (M3), whose graphical model representation is given in Fig. 2, is a consensus model between the uncoupled model (M1) and the fully coupled model (M2). In the uncoupled model (M1) the segment-specific regression coefficients have to be learned for each segment separately. In the fully coupled model (M2) each segment is compelled to be coupled to the previous one. The new partially coupled model (M3) combines features of the uncoupled and the fully coupled model, and it can infer for each individual time segment whether it is coupled to (or uncoupled from) the preceding segment.
We have cross-compared the new model (M3) with the two established models (M1-M2) as well as with the generalised coupled model (M4) that makes use of segment-specific coupling parameters [6]. In our data applications, the new partially coupled model (M3) reached significantly better network reconstruction accuracies than its competitors (M1, M2, and M4).
In an earlier work [6], we found that the performances of the fully coupled model (M1) and of the generalised fully coupled model (M4) can be improved by imposing additional hyperpriors on the hyperparameters of the coupling strength parameter. In our future work we will therefore investigate whether either the use of hyperpriors or the use of segment specific continuous (coupling/SNR) parameters along the lines of the M4 model can improve the new partially coupled model (M3). Moreover, in our future work we will also try to combine the concept of partially coupled time segments of the proposed model (M3) with the recently proposed concept of partially coupled edges [8]. The