Inferring signalling networks from longitudinal data using sampling based approaches in the Rpackage 'ddepn'
 Christian Bender^{1}Email author,
 Silvia vd Heyde^{2},
 Frauke Henjes^{1},
 Stefan Wiemann^{1},
 Ulrike Korf^{1} and
 Tim Beißbarth^{2}
DOI: 10.1186/1471210512291
© Bender et al; licensee BioMed Central Ltd. 2011
Received: 6 December 2010
Accepted: 19 July 2011
Published: 19 July 2011
Abstract
Background
Network inference from highthroughput data has become an important means of current analysis of biological systems. For instance, in cancer research, the functional relationships of cancer related proteins, summarised into signalling networks are of central interest for the identification of pathways that influence tumour development. Cancer cell lines can be used as model systems to study the cellular response to drug treatments in a timeresolved way. Based on these kind of data, modelling approaches for the signalling relationships are needed, that allow to generate hypotheses on potential interference points in the networks.
Results
We present the Rpackage 'ddepn' that implements our recent approach on network reconstruction from longitudinal data generated after external perturbation of network components. We extend our approach by two novel methods: a Markov Chain Monte Carlo method for sampling network structures with two edge types (activation and inhibition) and an extension of a prior model that penalises deviances from a given reference network while incorporating these two types of edges. Further, as alternative prior we include a model that learns signalling networks with the scalefree property.
Conclusions
The package 'ddepn' is freely available on RForge and CRAN http://ddepn.rforge.rproject.org, http://cran.rproject.org. It allows to conveniently perform network inference from longitudinal highthroughput data using two different sampling based network structure search algorithms.
Background
Reconstruction of biological networks from data has become important in modern analysis of large data sets in genomics or proteomics. The aim is to infer pairwise regulations or influences of network nodes, such as genes or proteins, describing the system as a graph structure. With this graphical representation, the functional characteristics of a biological system can be visualised in a comprehensive and informative way. For this purpose, many approaches have been suggested in the past, including Boolean or Probabilistic Boolean Networks, Bayesian or Dynamic Bayesian Networks (DBN) or learning with differential equation systems and many more [1–5]. These methods rely on the measurement of network components, either under several experimental conditions or at different time points. The simultaneous measurement of time courses combined with different experimental conditions or even directed perturbation of components becomes an increasingly important way of characterising biological systems [6]. Corresponding modelling approaches try to describe the responses of model systems after external perturbation [7, 8]. Recently, we proposed a framework that searches for optimal network structures by modelling the signal flow in a cell from an external stimulus to the downstream components and we showed how parts of a literature based reference network could be reconstructed [9]. The method is now implemented in the Rpackage 'ddepn' which is available in the 'Comprehensive R Archive Network' (CRAN).
The purpose of this document is to give a comprehensive description of the package and its capabilities. Besides the original approach for network inference, three components are added to the package. First, as alternative to the Genetic Algorithm (GA) presented in our previous work, a Metropolis Hastings Markov Chain Monte Carlo (MCMC) sampling scheme is set up. This approach is based on a publication of Werhli et al. [10], but extended by the possibility to sample two types of edges directly (activation and inhibition). Hence, the package contains an optimisation algorithm (the GA) that is designed for converging to an optimal network, as well as the MCMC algorithm for true sampling of the space of possible network structures. Second, a prior probability model for the network structure is adapted from Fröhlich et al. [11], that can penalise differences of inferred edges to prior confidences of these edges. Again, we extended the previous model to include the possibility to model different edge types. Third, an alternative prior model is provided that models the scalefree characteristics of inferred networks, i.e. it tries to reconstruct networks with node degrees that follow a power law distribution. This approach was introduced before [12, 13]. We describe how prior parameters can be adjusted to guide the reconstruction to be more or less close to the given reference and how reconstruction performance is affected by the prior parameter settings. Finally, advantages and disadvantages of both the GA and MCMC methods are discussed and a real data example is given that highlights how the prior knowledge inclusion is improving the outcome of the inference process.
Implementation
The following sections provide a description of the different methods included in the package. The first section 'Network inference types' describes the network inference approaches. A short overview on the original method based on a GA is given in the first subsection 'Genetic Algorithm'. In section 'inhibMCMC' we present our novel approach for Markov Chain Monte Carlo sampling of network structures. The next section 'Prior knowledge incorporation' includes the two prior models for the inference. Subsection 'Laplace Prior' introduces our extension to the prior model of Fröhlich et al. [11], subsection 'ScaleFree prior' describes the implementation of the alternative prior model of Kamimura et al. [12].
In general, the networks to be inferred are encoded as directed (and possibly cyclic) graphs. Let V = {v_{ i } : i ∈ 1 ... N} be the set of nodes representing the network components (proteins, genes, etc.) and Φ = V × V → {0, 1, 1} an adjacency matrix defining a network. Each edge in Φ is defined as pair of nodes {ϕ_{ ij } : i, j ∈ 1 ... N }, where 0 means no edge, 1 activation and 1 inhibition between two nodes. The networks are inferred using a data matrix D = {d_{ itr } : i ∈ 1... N, t ∈ 1 ... T, r ∈ 1 ... R} holding the timeresolved data of N proteins in T time points, measured in R replicates each.
Network inference types
Genetic Algorithm
In principle, a population of candidate networks is 'evolved' over a large number of iterations, starting with either a population of randomly drawn networks or by providing an initial population of networks. One can thus extract networks based on biological prior knowledge and feed them into the algorithm as a starting point for the network search. In each iteration of the algorithm, first up to a fraction 1  q ∈ [0; 1] of all candidate networks is selected that has a score larger than a given quantile of the scores of all current networks (we use the median). This is done to keep the best scoring networks in the population. Second, crossover is performed between pairs within a fraction q of the networks to allow exchange of parts of the networks. Third, mutation of a fraction of m ∈ [0; 1] edges in each network is performed, changing an edge to one of the remaining states, e.g. if an activation edge is present, it can change to an inhibition or to no edge at the current position. This increases the chances to evade local optima and explore different parts of the search space, even if the local move is reducing the score. The details of the methods are described in our previous publication [9]. For our purposes, we set the parameters to q = 0.3 and m = 0.8 and recommend to use a population size of at least 500 networks to ensure broad sampling of the network search space.
inhibMCMC: Markov Chain Monte Carlo for two edge types
Edge transitions and corresponding move operations.
→  ⊣  ←  ⊢  ∅  

→    st  rev  rst  del 
⊣  st    rst  rev  del 
←  rev  rst    st  del 
⊢  rst  rev  st    del 
∅  addA  addI  addA  addI   
where D is the N × T × R datamatrix and the optimized system state matrix, holding the active and passive states for each protein at each time. , ∀i ∈ 1 ... N is the parameter matrix obtained during the HMM procedure from Bender et. al. [9], containing the parameter estimates for the Gaussian distributions for the active and passive states . Details on parameter estimation as well as the system state matrix computation can be found in our previous publication [9].
We end this section by describing the determination of the neighbourhood of a network. There are three cases to be considered to determine the cardinality of the neighbourhood of a network Φ_{ k }:
I) addactivation/addinhibition (the number of node pairs that is not connected by an edge, where selfactivations/inhibitions are not considered, w.l.o.g.)
II) deletion/switchtype (the number of node pairs that are connected by an edge)
III) revert/revswitch (the number of node pairs that are connected by an edge, and where the reverse edge is not already present)
Φ_{2} is reached by adding the edge A → B, and going back from Φ_{2} to Φ_{1} is done by deleting this edge. The number of neighbours of the first network is calculated as follows. According to I), 2 neighbouring networks can be reached by adding a single activation edge, as well as 2 networks by adding a single inhibition. Following II) and III), there are no edges that could be deleted, reverted or whose type could be switched. In total, there are 4 neighbouring networks to Φ_{1}. Analogous to that, for the second network there is one reachable neighbour by adding an activation and one by adding an inhibition, one edge can be deleted, and one reverted, and for one edge a type switch as well as a reversetypeswitch is possible, totalling in a neighbourhood of 6 networks for network Φ_{2}. Thus, the proposal distribution is not symmetric for all possible networks.
Prior knowledge incorporation
Laplace prior
Based on the structure prior of Fröhlich et al. [11], a prior model is proposed that also incorporates different types of edges and a more finegrained control of the prior influence. Networks are encoded as mentioned above. We need a matrix B = V × V → [1, 1] containing prior confidences for each edge, which can be obtained in various ways. Here, one example is given how to derive B using the KEGG database [14]. The approach is similar to the one described by Werhli et al. [10], but preserves the information on the type of the edges.
assuming that the type of each edge is consistent in all reference networks. This leaves positive confidences for activation edges and negative confidences for inhibiting edges. The larger the absolute value of the confidence score, the stronger is the belief in the presence of this edge.
which penalises large differences from the network structure Φ to the prior belief B.
Now we derive upper and lower bounds for the prior influence, in the general case for two edge types. Let λ, γ ∈ ℝ^{+}. If the edge type information is used, all differences Δ _{ ij } lie in the interval [0; 2 ^{ γ } ], because ϕ_{ ij } ∈ {0,1, 1} and b_{ ij } ∈ [1; 1]. Without edge type information, we ignore the signs in both Φ and B, leading to Δ _{ ij } ∈ [0; 1 ^{ γ } ], because ϕ_{ ij } ∈ {0, 1} and b_{ ij } ∈ [0; 1]. Because the bounds for P (Φ) will not change in either case, only the case for including edge type information is shown in the following.
Now consider the prior and likelihood ratios on log scale, i.e. the differences of the log priors and log likelihoods. To make the prior capable of having substantial influence on the inference, the log prior differences should be on similar scale as the log likelihood differences. For instance, if the log likelihood differences are on the scale of 10^{3}, set λ = 10^{3} and γ = 1, such that will be in the range of the thousands. The first part of the prior (log(2)  log(λ)) cancels out in the difference and does not have an influence. This means, that the prior influence is controlled over the second part, which is zero for no difference to the prior and and can become very large for differences > 0. Hence, edge mismatches between the reference and the inferred net guide the structure search and the strength of the influence can be controlled using different settings of λ.
Scalefree prior
This probability decreases when i gets large, and all P_{ i } sum up to 1, i.e. . μ is in the range 0 < μ < 1 and defined as , γ ∈ [2; ∞[.
For a detailed description of the model we refer to the previous publications [12, 13, 15]. The scalefree prior can be used in cases where no information on edge confidences is available. During inference with the scalefree prior model, sparse network structures will be preferred, because high node degrees are penalised by the prior model. In our implementation the scalefree model can be selected by passing the argument priortype="scalefree" to the function call ddepn. Further the arguments gam, it and K have to be provided. Kamimura et al. give hints on the proper setting of the arguments. In the following sections, we assume that prior edge confidences are present and suggest the use of the Laplace prior. However, the scalefree prior might be a reasonable substitute for modelling more general characteristics of network structures and thus interesting for further analyses.
Results and Discussion
Testing the Laplace prior influence
For both the GA and inhibMCMC sampler several tests were performed. The aim was to show that the inference could be influenced in a way that on the one hand the result was close to a given reference network and on the other hand allowed to confute the prior, when evidence from the data got strong enough. The following rationale was applied for these tests. First, we assumed that our prior information was true. To ensure this, a network with N = 15 nodes was sampled, data was generated depending on this network structure and the original sampled network was used as Laplace prior matrix B. Sampling of the network and data were described previously (see Bender et. al. [9]). Both the prior confidences and inferred edges only take on values ∈ {0, 1, 1}, so the absolute differences were either 0, 1 or 2. All differences larger than 0 should have been strongly penalised, so we set γ = 1, leading to a sharp decrease of the prior density (equation 8) for Δ _{ ij } > 0. Each mismatch in an inferred edge to the prior was thus given a weight close to 0 (see Figure 1). We performed tests of the reconstruction performance for the following cases:
GA, BIC score optimisation 1000 iterations, p = 500, q = 0.3, m = 0.8 no prior
GA, Laplace prior 1000 iterations, p = 500, q = 0.3, m = 0.8, γ = 1, λ ∈ {1, 0.5, 0.1, 0.05, 0.025, 0.01, 0.005, 0.001}
inhibMCMC, Laplace prior 50000 iterations, burnin 5000 iterations, γ = 1, λ ∈ {0.1, 0.05, 0.025, 0.01, 0.005, 0.001},
We performed n = 10 independent reconstructions on the same original network and data for both the GA and inhibMCMC samplers and calculated the Area Under Curve (AUC) of the Receiver Operator Characteristic (ROC) curves for each sampling run to measure the quality of the inference. AUCs were calculated as follows. For inhibMCMC, 50000 iterations were performed in each run with a burnin phase of 5000 iterations. Final networks were generated by including an edge into the network that was present in at least a given proportion th ∈ [0; 1] of the 45000 non burnin networks. By varying th between 0 and 1, for each setting of th the number of true positive, false positive, true negative and false negative edges of the final network compared to the original network could be counted. ROC curves were set up and the AUCs calculated as area under the ROC curves.
Figure 2 shows the AUC scores for the inhibMCMC test. It can be seen that for decreasing λ the reconstruction performance increased. For λ = 0.005 and λ = 0.001 the reference network could be successfully reconstructed (AUCs around 1). We suggested to inspect the observed likelihood ratios during the sampling runs and set λ such that the quotient is on a comparable scale (see section Laplace Prior). The right plot of Figure 2 shows the likelihood and prior ratios for one inhibMCMC run. For λ = 0.001 the prior ratios varied over a much broader range than the likelihood ratios, which lead to inferred networks that were nearly identical to the prior network, as it can be seen in the AUC of around 1. For increasing λ the likelihood ratios showed a larger variance than the prior ratios, which lead to decreasing AUCs and more variable inferred networks in turn. Thus, the setting of the prior parameters determines how robust the reconstruction of the networks is. The settings have to be carefully adjusted to preserve robustness, but leave enough variance to gain additional knowledge, represented in the data, too. The test for the GA reconstruction is shown in Figure 3. AUCs for the GA were calculated similar than for inhibMCMC, where final networks were estimated by including edges if they appeared in at least a fraction th of the networks of the final population. On the left hand side of Figure 3 the AUC distributions are shown. Using the BIC score optimisation for the reconstruction of a network of size N = 15, it is apparent that the performance of the GA drops significantly, with AUCs around 0.5 compared to the case for N = 10 in Bender et. al. [9], where performance was still good with AUCs around 0.73. This strong decrease of performance with increasing number of nodes emphasises the need for the inclusion of prior knowledge to produce reliable results, especially when the network size is increasing. When using prior knowledge, for decreasing λ, also an increase in the reconstruction performance was observed. For λ = 1, the performance of the GA was comparable to inhibMCMC performance with λ = 0.1, the improvement in reconstruction performance can be controlled similar to the case for inhibMCMC using smaller settings for λ. Using λ ≤ 0.01 gave comparable reconstruction results with AUCs above 0.95, but in contrast to inhibMCMC, the reference network could not be inferred entirely for even smaller settings for λ. The GA seems to reach a local optimum, but does not find the true network, even for the test situation where the prior strength is increased subsequently.
On the right hand side of Figure 3, the likelihood and prior differences are shown. As stated in section 'Laplace prior', λ should be set such that the prior differences are slightly above zero. As depicted in Figure 3, setting λ = 0.1 lead to prior differences of around 10 and already had a strong influence on the reconstruction performance and could be used as appropriate setting for λ. Nevertheless, it remains difficult to find the proper λ, and it is ongoing work to find reasonable ways of identifying 'good' parameter settings for both inhibMCMC and the GA.
Note on the choice of the algorithm
The question, of course, arises, which algorithm to choose. One should be aware, that the purpose of both approaches differs. The GA is used for performing optimisation of the network structure, while inhibMCMC explicitly samples the space of networks. However, both methods can be used to generate final estimates of the network structure and provide the user with a confidence of each edge in the final inferred network. The influence of the prior seems to be less controllable in the case of the GA, since rather 'weak' prior settings (compared to the inhibMCMC case) already had a strong impact, but increasing the prior strength always left some errors in the reconstructed networks. However, a clear dependency on the prior strength could be observed in both cases and a rough guideline for finding a suitable setting for the prior hyperparameters could be given. In general, finding the tradeoff between strength of the prior and the influence of the data is of central importance. Too strong prior influence will only reproduce the prior knowledge and not allow for novel insights from the data. If the prior is too weak, the inference might not be able to identify the underlying network structure, due to e.g. too wide time intervals during the measurements, noise in the data, or nodes that were not measured at all.
As a last point, we consider the computational demands of both approaches. Clearly, the GA is much more expensive in terms of computation time. As an example, consider reconstruction of networks with the following settings (as we currently are using them): population size p = 500, iterations = 1000, q = 0.3, m = 0.8. In each iteration q * p = 150 individuals are processed in the crossover step and m * p = 400 individuals are processed during the mutation step. For each of these 150 + 400 = 550 operations, the time limiting step is the Viterbi Training algorithm including Hidden Markov Model (HMM) computations. In our experience, for networks of size around N = 15, Viterbi Training is computed in less than one second, leading to an estimate of total computation time of (1000 * 550) seconds, corresponding to ~ 6 days. For inhibMCMC, we usually use 50000 iterations for one sampling run, which means 50000 times the Viterbi Training in each sampling. This corresponds to approximately half a day for one network reconstruction, meaning that more than 10 independent samplings can be performed in the same time as needed for one reconstruction with a GA. If parallel computing architecture is available, the GA computation time can be reduced to a few days, but also the independent inhibMCMC runs can be distributed on different computing cores or nodes, making multiple parallel network reconstructions possible in about half a day. So due to the computational burden of the GA and the improved control of the prior strength in inhibMCMC, it is suggested to prefer inhibMCMC over the GA.
RPPA data from breast cancer cell line HCC1954
For each run, a summarised network was generated by including a particular edge, if it occurred in at least a fraction th ∈ [0; 1] of the 25000 non burnin networks. This left us with 10 summarised networks, that were merged into a consensus network. For this, we counted, how often each edge was an activation, inhibition or empty edge in all of the 10 summarised networks. A simple majority rule decided on the final type of the edge. In the case of ties, the edge type was chosen that had the larger posterior probability average over all summary networks with the same edge type.
Conclusions
We present our Rpackage 'ddepn' for inference of signalling networks from longitudinal highthroughput data. The method is able to model the effects of external perturbation, as it might be introduced by external stimulations or inhibitions. Two different network structure search algorithms are available in the package, a GA performing network structure optimisation and a Metropolis Hastings MCMC approach that samples the space of possible networks. We extend MCMC structure sampling by the ability to sample two edge types, one for activation and one for inhibition. Further, two models for the inclusion of prior knowledge are included in the package. The first uses a reference network as guidance for the inference (Laplace prior), the second uses a general property of biological networks, namely that node degrees follow a power law and high node degrees are penalised (scalefree prior). We also give a guideline on how to adjust parameters for the Laplace prior model, such that precise control on how close the reconstruction will be to the prior knowledge is possible. We show the dependence of the reconstruction performance on the prior parameter setting and give an assessment of both methods and their usage. Finally, for a data set measuring phosphorylation of proteins related to the ERBB signalling network, it is described, how inclusion of the prior is improving the outcome of the network reconstruction.
Availability and requirements
Project home page http://ddepn.rforge.rproject.org
Operating systems Linux, Windows
Programming language R
Other requirements graphviz
Licence GNU GPL
List of abbreviations
 AUC :

Area under curve
 CRAN :

Comprehensive R Archive Network
 GA :

Genetic Algorithm
 HMM :

Hidden Markov Model
 MCMC :

Markov Chain Monte Carlo
 ROC :

Receiver Operator Characteristic
 RPPA :

Reverse Phase Protein Array.
Declarations
Acknowledgements
We thank Anika Joecker and Alexander Kerner for proofreading the manuscript, as well as Stephan Gade, Maria Fälth and Marc Johannes for helpful discussions. We also thank Dirk Ledwinka for technical support.
Funding: This project was supported through the Helmholtz Alliance on Systems Biology network SBCancer and through the German Federal Ministry of Education and Science (BMBF) project BreastSys in the platform Medical Systems Biology.
Authors’ Affiliations
References
 Akutsu T, Miyano S, Kuhara S: Identification of genetic networks from a small number of gene expression patterns under the Boolean network model. Pac Symp Biocomput 1999, 17–28.Google Scholar
 Shmulevich I, Dougherty ER, Kim S, Zhang W: Probabilistic Boolean Networks: a rulebased uncertainty model for gene regulatory networks. Bioinformatics 2002, 18(2):261–274. 10.1093/bioinformatics/18.2.261View ArticlePubMedGoogle Scholar
 Sachs K, Perez O, Pe'er D, Lauffenburger DA, Nolan GP: Causal proteinsignaling networks derived from multiparameter singlecell data. Science 2005, 308(5721):523–529. 10.1126/science.1105809View ArticlePubMedGoogle Scholar
 Murphy K, Mian S: Modelling gene expression data using dynamic bayesian networks. Tech. rep., University of California, Berkeley; 1999.Google Scholar
 Nelander S, Wang W, Nilsson B, She QB, Pratilas C, Rosen N, Gennemark P, Sander C: Models from experiments: combinatorial drug perturbations of cancer cells. Mol Syst Biol 2008, 4: 216.PubMed CentralView ArticlePubMedGoogle Scholar
 Markowetz F: How to Understand the Cell by Breaking It: Network Analysis of Gene Perturbation Screens. PLoS Comput Biol 2010, 6(2):e1000655. 10.1371/journal.pcbi.1000655PubMed CentralView ArticlePubMedGoogle Scholar
 Markowetz F: Probabilistic Models for Gene Silencing Data. PhD thesis, FU Berlin 2005.Google Scholar
 Fröhlich H, Özgür Sahin, Arlt D, Bender C, Beißbarth T: Deterministic Effects Propagation Networks for Reconstructing Protein Signaling Networks from Multiple Interventions. BMC Bioinformatics 2009., 322(10):Google Scholar
 Bender C, Henjes F, Fröhlich H, Wiemann S, Korf U, Beißbarth T: Dynamic deterministic effects propagation networks: learning signalling pathways from longitudinal protein array data. Bioinformatics 2010, 26(18):i596i602. 10.1093/bioinformatics/btq385PubMed CentralView ArticlePubMedGoogle Scholar
 Werhli AV, Husmeier D: Reconstructing gene regulatory networks with bayesian networks by combining expression data with multiple sources of prior knowledge. Stat Appl Genet Mol Biol 2007, 6: Article15.PubMedGoogle Scholar
 Fröhlich H, Fellmann M, Sültmann H, Poustka A, Beißbarth T: Estimating Large Scale Signaling Networks through Nested Effect Models with Intervention Effects from Microarray Data. Bioinformatics 2008, 24(22):2650–2656. 10.1093/bioinformatics/btm634PubMed CentralView ArticlePubMedGoogle Scholar
 Kamimura T, Shimodaira H: A Scalefree Prior over Graph Structures for Bayesian Inference of Gene Networks. Online
 Sheridan P, Kamimura T, Shimodaira H: A scalefree structure prior for graphical models with applications in functional genomics. PLoS One 2010, 5(11):e13580. 10.1371/journal.pone.0013580PubMed CentralView ArticlePubMedGoogle Scholar
 Kanehisa M, Araki M, Goto S, Hattori M, Hirakawa M, Itoh M, Katayama T, Kawashima S, Okuda S, Tokimatsu T, Yamanishi Y: KEGG for linking genomes to life and the environment. Nucleic Acids Res 2008, 36: D480D484.PubMed CentralView ArticlePubMedGoogle Scholar
 Lee DS, Goh KI, Kahng B, Kim D: Scalefree random graphs and Potts model. Pramana  journal of physics 2005, 64(6):1149–1159. 10.1007/BF02704176View ArticleGoogle Scholar
 Paweletz CP, Charboneau L, Bichsel VE, Simone NL, Chen T, Gillespie JW, EmmertBuck MR, Roth MJ, EFP III, Liotta LA: Reverse phase protein microarrays which capture disease progression show activation of prosurvival pathways at the cancer invasion front. Oncogene 2001, 20(16):1981–1989. 10.1038/sj.onc.1204265View ArticlePubMedGoogle Scholar
 Cowles MK, Carlin BP: Markov Chain Monte Carlo Convergence Diagnostics: A Comparative Review. Journal of the American Statistical Association 1996, 91: 883–904. 10.2307/2291683View ArticleGoogle Scholar
Copyright
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.