IRIS: a method for reverse engineering of regulatory relations in gene networks
 Sandro Morganella^{1, 2},
 Pietro Zoppoli^{1, 2} and
 Michele Ceccarelli^{1, 2}Email author
https://doi.org/10.1186/1471210510444
© Morganella et al; licensee BioMed Central Ltd. 2009
Received: 22 December 2008
Accepted: 23 December 2009
Published: 23 December 2009
Abstract
Background
The ultimate aim of systems biology is to understand and describe how molecular components interact to manifest collective behaviour that is the sum of the single parts. Building a network of molecular interactions is the basic step in modelling a complex entity such as the cell. Even if genegene interactions only partially describe real networks because of posttranscriptional modifications and protein regulation, using microarray technology it is possible to combine measurements for thousands of genes into a single analysis step that provides a picture of the cell's gene expression. Several databases provide information about known molecular interactions and various methods have been developed to infer gene networks from expression data. However, network topology alone is not enough to perform simulations and predictions of how a molecular system will respond to perturbations. Rules for interactions among the single parts are needed for a complete definition of the network behaviour. Another interesting question is how to integrate information carried by the network topology, which can be derived from the literature, with largescale experimental data.
Results
Here we propose an algorithm, called inference of regulatory interaction schema (IRIS), that uses an iterative approach to map gene expression profile values (both steadystate and timecourse) into discrete states and a simple probabilistic method to infer the regulatory functions of the network. These interaction rules are integrated into a factor graph model. We test IRIS on two synthetic networks to determine its accuracy and compare it to other methods. We also apply IRIS to gene expression microarray data for the Saccharomyces cerevisiae cell cycle and for human Bcells and compare the results to literature findings.
Conclusions
IRIS is a rapid and efficient tool for the inference of regulatory relations in gene networks. A topological description of the network and a matrix of gene expression profiles are required as input to the algorithm. IRIS maps gene expression data onto discrete values and then computes regulatory functions as conditional probability tables. The suitability of the method is demonstrated for synthetic data and microarray data. The resulting network can also be embedded in a factor graph model.
Background
Although all cells of an organism contain the same DNA, each cell only transcribes and translates a fraction of that DNA. Each cell has a particular interaction pattern that involves genes, proteins and molecules; these complex schema are known as gene regulatory networks. Full understanding of gene interactions can be used to identify methods to control the behaviour of genes directly involved in disease processes. Methods used to infer patterns of interaction among molecular components from observed data are called reverse engineering algorithms. Even if genegene interactions only partially describe real networks because of posttranscriptional modifications and protein regulation, using microarray technology it is possible to combine measurements for thousands of genes into a single analysis step that provides a picture of cell gene expression. Therefore, many reverse engineering algorithms rely on gene expression data [1]. These methods differ in the type of expression profiles used, so there are algorithms for timeseries data [2], algorithms for steadystate data [3, 4] and algorithms that work on both types of data [5]. A graphical representation of the gene regulatory network is often used, with Bayesian networks probably the most popular graphical models used in this scenario [1, 5–8]. The major limitation of Bayesian networks is that they cannot represent cyclic structures. To overcome this limitation, methods based on dynamic Bayesian networks have been proposed [9–13].
However, the network topology alone is not enough to perform simulations and predictions of how a molecular system will respond to perturbations. The rules for interactions among the single parts are needed for a complete definition of the network behaviour. Another interesting question is how to integrate the information carried by the network topology, which can be derived from the literature, with largescale experimental data. Much attention has recently been focused on the modelling [14] and inference [15] of activation rules between molecular components in the cell. Although this can be considered a simpler problem than inference of the network topology, it is important to point out the following:

Many interaction patterns between molecular components can be obtained from the literature (using, for example, databases such as Ingenuity Pathway Analysis) and integrated in the inference algorithm.

Owing to the limited amount of experimental data, it can be convenient to solve a simplified problem and exploit as much as possible prior knowledge about the phenomenon being investigated.

Modelling of the interaction pattern between molecular components can be useful when performing largescale simulations and deriving hypotheses about the behaviour of biological systems under different conditions [14].
In the present study we follow this research direction. However, instead of considering continuous expression levels, as in [15], we use a discrete representation of gene activation. Indeed, methods based on graphical models often work on discrete data obtained from realvalued gene expression profiles. The discretisation step is of fundamental importance for good accuracy of subsequent computational steps. Ślęzak and Wróblewski proposed a discretisation approach based on rough set theory in which quality functions are used for roughly discretised data, with inexact dependence between attribute rankings [16]. Friedman et al. considered two different approaches to discretise realvalued data [1]. In the first approach they discretised expression levels to several discrete states according to a fixed discretisation rule and demonstrated that this approach is sensitive to the discretisation procedure. In the second method they combined a linear regression model with the model dependence and measurements, but this approach was strongly affected by the linear dependence. Pe'er et al. proposed a new discretisation procedure for each gene in which genespecific variation is used to estimate the normal distribution mixture by standard kmeans clustering [8]. A probabilistic approach is used to identify interactions between genes, such as activation and inhibition, so this method provides a description of the gene network with the interaction features, but many of these interactions are undirected. Moreover, the discretisation step is sensitive to the choice of the number of states that a gene may attain. In this paper we propose a new discretisation approach that depends on the expression profile data (both timecourse and steadystate values) for each gene, so that different genes with different expression profiles lead to different discretisation rules. Indeed, we also use this approach to reduce the effect of noise.
After discretisation, the problem faced is how to infer the rules, and not just the pattern, by which the various molecular components interact with each other. We call this problem the inference of regulatory relations given a wellspecified gene regulatory network. GatViks et al. proposed an approach to learn improved regulatory functions from highthroughput data using a discrepancy score in which discretisation is carried out as a preprocessing step [17], but the discretisation rules must be determined and tuned rather arbitrarily and each variable is discretised using the same rule. GatViks et al. also proposed a more flexible approach to learn the regulatory functions in a gene network [18], which is represented as a factor graph [19] to model cyclic structures. In this approach discretisation is carried out according to an expectation maximisation (EM) algorithm that, combined with the factor graph model, provides a very flexible discretisation scheme. However, in practice this flexibility can lead to over fitting and may decrease learnability, so the authors suggested to use the same or a few discretisation schemes for all the variables. Chuang et al. proposed an approach to infer gene relations from timecourse expression profiles in which first and second derivatives are used to detect timelagged correlated gene pairs [20]. The basic assumption is that pairs of correlated genes exhibit either a complementary pattern (that represents a repressor relation) or a similar pattern (that represents an activator relation). In our approach we propose a simple regulatory function inference that is particularly fast and yields rather good accuracy, even if the network is cyclic. In this step we use an observation similar to that of Chuang et al. [20] for expression profile patterns, but here we use discrete data. Finally, we merge the inferred regulatory functions into a factor graph representation as reported by GatViks et al. [18].
Implementation
Biological Model
We first define a simple model for biological networks [17]. A biological network can be modelled through a direct graph G(V, E), where each node v ∈ V represents a gene that can be in a discrete state D = {0,1}, representing an inactive and an active state, respectively. If a gene v ∈ V has at least one parent then it is called a regulated gene and we define as R_{ v }the set of parents (regulators) of v. If a gene v ∈ V has no parents, then it is a stimulator and we define as V_{ s }the set all stimulators of the network. In addition, we represent the expression data using an n × m matrix M, where n is the number of genes in the network and m is the number of experiments performed or samples. For each 1 ≤ i ≤ n and 1 ≤ j ≤ m the value of M [i, j] is the expression level of gene i in sample j.
IRIS Algorithm
In this section we describe our approach to infer the regulatory relations in gene networks from highthroughput data. IRIS needs an input network topology N, an expression profile data matrix M. The method consists of two main steps: (i) Dicretisation and (ii) Regulation Functions Learning. The details are reported in the following subsections.
Discretisation
This steps is aimed at computing a binary matrix from the observed gene activation levels. The discretisation step uses a matrix of local variation of the gene expression, defined as = M_{ s }[i, j]  M_{ s }[i, j  1], j = 2,..., m, Let S be the matrix of discrete states, i.e. S [i, j] contains the discrete state relative to the value M_{ s }[i, j]. The discretisation procedure is iterative. It first tries to fix the lower values to zero and the upper values to one on the basis of two thresholds and , they are computed in order that the interval [ , ], for each row i, contains a given percentage, α, of the second and third quartiles of the data. The other values are then fixed on the basis of their nearby values.
We extensively tested various values of α for various datasets, and the results showed that it can be arbitrarily chosen in the range 5%35% without significantly affecting the results. In all the experiments reported below we choose α as the minimum of this range.
Regulation Functions Learning
In this step we use the matrix S to compute the PTs and TTs. To infer the PTs we use relative frequencies. Consider a gene v and the set of its regulators R_{ v }. Then the matrix S contains several state assignments for the genes in R_{ v }and v itself. Let Γ_{ v }be the set of all possible state assignments of the variables in R_{ v }.
where {r_{ v }, v = s} are the occurrence numbers of state assignment {r_{ v }∪ v = s} in S. Let
we cannot distinguish between the active and inactive state, so we have an undefined response of regulated gene v to the state assignment {r_{ v }}. This situation is indicated as 1 in the TTs.
Integration with Factor Graph
A factor graph is a class of probabilistic models that were originally applied to coding/decoding problems. Using a factor graph we can model complex domain knowledge in which feedback loops play a fundamental role. One of the important advantages of factor graphs is their combination with the sumproduct algorithm [19], which is a messagepassing algorithm for efficiently computing marginal distributions, even in the presence of cycles.

Each gene node becomes a variable node of the factor graph;

For each regulatory function a factor node must be inserted to link the genes involved;

Each stimulator must be linked to a specific factor node.
Applying these rules, we obtain the factor graph in Figure 3(b). This model can answer questions such as: "What is the probability that gene C is active given that genes A and B are inactive?" and "What is the likelihood that genes B and A are inactive given that gene C is active?". For this purpose, we set the state of the observed genes and use the sumproduct algorithm to compute the posterior distribution of hidden genes. Here we follow the propagation of belief in directed graphs for Forneystyle factor graphs [21]. If a stimulator is not fixed, then its factor node will be set with a uniform distribution.
Some considerations about the presence of cycles are useful, indeed this is one of the most important problems in the field of Probabilistic Graphical Models (PGM). The inference in PGM consists in the computation of marginal probabilities of complex probability distributions defined over many variables. Many exact and approximate algorithms have been proposed for this task [22] from Monte Carlo methods to variational methods, mean field methods and belief propagation (BP). The sumproduct algorithm, adopted in this paper, is a special case of belief propagation. It is wellknown that belief propagation yields exact results if the graphical model does not contain cycles. If the graphical model contains loops, the sumproduct algorithm can still yield accurate results using little computational effort [21]. However, if the influence of loops is large, the approximate marginals calculated by BP can have large errors and the quality of the BP results may not be satisfactory. Many recent research efforts in statistical machine learning are devoted to the development of efficient approximate inference algorithms for cyclic graphical models (see for example [23]). For the purposes of this paper we could have to deal with cycles in the case of inference, when we use PTs, and computation of steady states, when using TTs. We adopt the belief propagation algorithm for the former and the algorithm of GatViks et al. [17] for the latter. In any case the IRIS method, proposed here, is by no way influenced by the presence of cycles. In particular IRIS takes as input the description of the network and the expression profiles giving in output a map of the regulatory relations between sets of regulators and regulated genes, this means that all the information used by IRIS are based on "local" relationships between a gene and the set of its regulators. The influence of cycles appears in the successive phases for the use of these relationships in inference tasks. However, the inference in cyclic PGMs is still a very important research question in the field of statistical machine learning and its solution is, of course, outside the scopes of this paper.
Results
In this section we report IRIS results for both synthetic networks and microarray expression profiles. IRIS needs a welldefined gene network as input. We say that a gene network is well defined if each of its interactions allows us to distinguish between regulator genes and regulated genes. Given a welldefined network, we have genes with zero regulators (called stimulators representing environmental conditions), genes with one regulator, genes with two regulators, and so on. If a gene has at least one regulator then it has a regulatory function that describes its response to a particular stimulus by its regulator(s). In our approach we suppose, without loss of generality, that a gene can be in one of two states: inactive and active, represented as 0 and 1, respectively. This assumption is commonly used in the literature to distinguish the response of a gene to a given experimental condition.
Given a welldefined gene regulatory network, IRIS computes the regulatory functions, providing two different descriptions: a description in which each interaction is described as a conditional probability table, which we refer to as a potential table (PT), and a description where each regulatory relation is a truth table, which, by analogy to neural logic networks, we refer as a truth table (TT). These two different descriptions allow different analyses. Using the PTs we can execute an inference step to compute the a posteriori probability of hidden genes given observed genes, so that, for example, it is possible to understand how to control a gene using particular environmental conditions. Using the TTs we can compute the steady states of a gene regulatory network. In this scenario, we deal with the problem of the cyclic structure of gene networks, so we use an approach based on the factor graph model [19] as an inference engine and the idea of feedback sets [17] to compute the steady states of the networks.
Results for Synthetic Networks
Most of the rules governing the activation of genes in these networks have already been investigated in several studies. In particular, for the E. coli network we use conclusions from references [25–28] to obtain the regulatory relations for glcD, focA and lacZ. For S. cerevisiae we use the results of Wilcox et al. [29] to obtain the regulatory relation for FIT2. Since all the regulators of A2 have an inhibitory function, the gene will be in an active state if and only if both regulators are inactive. For more details on true descriptions, see Additional file 1: regulation_true_descriptions.pdf.
Percentage of correct entries in the inferred truth tables for synthetic networks for E. coli and S. cerevisiae.
E. Coli  

True Table vs IRIS Inferred TT  
Regulated Gene  Regulator Genes  TT Size  Correct  Incorrect  Undefined 
glcD  glcC arcA  4  4  0  0 
focA  arcA crp  4  4  0  0 
rpoH  crp  2  2  0  0 
tdc  crp  2  2  0  0 
lacZ  crp lacI  4  4  0  0 
Total  16  16  0  0  
Percentage  100%  0%  0%  
S. cerevisiae  
True Table vs IRIS Inferred TT  
Regulated Gene  Regulator Genes  TT Size  Correct  Incorrect  Undefined 
CLN2  SSL1  2  2  0  0 
CDC28  SSL1  2  2  0  0 
NOT3  SSL1  2  2  0  0 
FIT2  SSL1 PDR11  4  4  0  0 
CDC6  PDR11  2  2  0  0 
CEF1  PDR11  2  1  1  0 
LEU2  PDR11  2  2  0  0 
CLB6  PDR11  2  2  0  0 
DAL80_GZF3  PDR11  2  2  0  0 
A2  PDR11 IPT1  4  4  0  0 
Total  24  23  1  0  
Percentage  95.83%  4.17%  0% 
 1.
D_{ KL }(P_{ true }P_{ IRIS }) = 0.1872 and D_{ KL }(P_{ true }P_{EMMAP}) = 0.2865 for E. coll
 2.
D_{ KL }(P_{ true }P_{ IRIS }) = 0.1743 and D_{ KL }(P_{ true }P_{EMMAP}) = 0.1821 for S. cerevisiae.
Execution time for IRIS and EMMAP.
E. coli  S. cerevisiae  

Execution Time  IRIS  EMMAP  IRIS  EMMAP  
Biological Noise  Time  Time  Iter  Time  Time  Iter 
0.10  0.929 s  9.447 s  6  1.833 s  14.557 s  5 
0.15  0.897 s  9.435 s  6  1.892 s  14.399 s  5 
0.20  0.910 s  8.770 s  5  1.853 s  14.510 s  5 
0.25  0.905 s  8.642 s  5  1.874 s  14.381 s  5 
0.30  0.968 s  8.787 s  5  1.876 s  17.446 s  6 
0.35  0.965 s  8.762 s  5  1.790 s  17.432 s  6 
0.40  1.041 s  8.880 s  5  1.825 s  17.469 s  6 
0.45  0.953 s  8.674 s  5  1.839 s  14.600 s  5 
0.50  1.005 s  8.741 s  5  1.860 s  14.735 s  5 
 a)
a_{1} <a_{2} and b_{1} <b_{2} (or a_{1} >a 2 and b_{1} >b_{2}): here we can state that in the experiment E_{2} both genes have an expression level greater (lower) then in E_{1}, in other words, the two genes have a similar behaviour.
 b)
a_{1} >a_{2} and b_{1} <b_{2} (or a_{1} <a_{2} and b_{1} >b_{2}): here we can state that in E_{1} the gene A has an expression level greater (lower) then in E_{2}, whereas, the gene B has an expression level in E_{1} lower (greater) than in E_{2}, in other words, the two genes have an opposite behaviour which can be observed form an opposite sign of the expression derivative for the genes in the experimental condition E_{2}.
Results for Microarray Expression Profiles
We also applied the IRIS algorithm to two real data sets comprising microarray expression profiles for the yeast mitotic cell cycle and human Bcells.
Yeast Mitotic Cell Cycle
Percentage of correct entries in inferred truth tables for the S. cerevisiae mitotic cellcycle network.
True Table vs IRIS Inferred TT  

Regulated Gene  Regulator Genes  TT Size  Correct  Incorrect  Undefined 
CDC28  CLN3  2  2  0  0 
MBP1  SWI6  2  2  0  0 
CLN1  CLN3  2  1  0  1 
CDC6  CLB5  2  2  0  0 
CLN2  CLN1 SWI6  4  4  0  0 
CLB 5  SWI6 MBP1 CLB6  8  7  0  1 
SWI6  CLN2 CLN3  4  2  2  0 
CLB6  SIC1 MBP1 CLB5  8  6  1  1 
SWI4  CLN1 CLN2 CLN3  8  6  1  1 
SIC1  CLN1 CLN2  4  1  2  1 
Total  44  33  6  5  
Percentage  75%  13.64%  11.36% 
Inference results. Column "Biological Findings" lists a short description of the features of interest and references.
Biological Findings  Observed Genes  Hidden Genes  Inference Results 

Strong relationship between cyclins CLB5 and CLB6 [40] and between CLN1 and CLN2 [41]  CLB5  CLB6  P(CLB 6 = 0CLB 5 = 0) = 0.9851 P(CLB 6 = 1CLB 5 = 1) = 1.0000 
CLB6  CLB5  P(CLB 5 = 0CLB 6 = 0) = 1.0000 P(CLB 5 = 0CLB 6 = 0) = 0.8079  
CLN1  CLN2  P(CLN 2 = 0CLN 1 = 0) = 0.95028 P(CLN 2 = 1CLN 1 = 1) = 0.8449  
CLN2  CLN1  P(CLN 1 = 0CLN 2 = 0) = 0.6111 P(CLN 1 = 1CLN 2 = 1) = 0.6111  
Inhibitory activity of SIC1 on cyclins CLB5 and CLB6 [42]  SIC1  CLB5  P(CLB 5 = 0SIC 1 = 1) = 0.7367 
CLB6  P(CLB 6 = 0SIC 1 = 1) = 0.7367  
Inactivation of MBP1 and SWI6 causes CLB5 and CLB6 levels to fall [35]  MBP1  CLB5  P(CLB 5 = 0MBP 1 = 0, SWI 6 = 0) = 0.7742 
SWI6  CLB6  P(CLB 6 = 0MBP 1 = 0, SWI 6 = 0) = 0.8483  
While CLN1 and CLN2 are active, SIC1 is degraded [43]  CLN1  SIC1  P(SIC 1 = 0CLN 1 = 1, CLN 2 = 1) = 0.6476 
CLN2  
Inactivation activity of MBP1 and SWI6 on CLN1 and CLN2 [44]  MBP1 SWI6  CLN1  P(CLN 1 = 0MBP 1 = 1, SWI 6 = 1) = 0.6111 
CLN2  P(CLN 2 = 0MBP 2 = 1, SWI 6 = 1) = 0.6148 
Steady states for the yeast mitotic cellcycle network obtained using IRIS
CLN3  MBP1  SWI6  SWI4  CLN1  CLN2  CLB5  CLB6  SIC1 

0  0  1  0  0  0  0  0  0 
0  0  1  0  0  0  1  1  0 
1  0  0  1  1  1  0  0  0 
Human BCells
Recent studies have demonstrated that the organisation of a gene regulatory network often follows a scalefree nature [37]. A scalefree network is characterised by an inverse relationship between the number of nodes and their connectivity. Another feature of gene networks is the presence of highly connected genes (called hubs). These networks typically contain short feedback loops. To test the suitability of IRIS for scalefree gene regulatory networks, we inferred the regulatory relations from human Bcell data, for which we considered the MYC gene as a major hub. MYC codes for a protein that binds to the DNA sequence of other genes. When MYC is mutated or over expressed, the protein does not bind correctly and often causes cancer. Both the gene expression profiles and network topology were extracted from the results of Basso et al. [3]. The network topology represents the MYC gene and 55 genes directly connected to it. To infer the regulatory rules of this network, we used a subset of 100 expression profiles (in [3] 336 samples are used). Here we use the MYC target gene database (MYCDB) [38].
IRIS results for the MYC subnetwork including 55 genes directly connected to MYC.
Gene  IRIS Inferred Regulation  MYCDB Regulation 

HSPC111  Upregulation  Upregulation 
PPAT  Upregulation  Upregulation 
POLD2  Upregulation  Upregulation 
NOL5A  Upregulation  Upregulation 
ZRP1  Downregulation  Upregulation 
NME1  Undefined  Upregulation 
EBNA1BP2  Upregulation  Upregulation 
APEX1  Upregulation  Upregulation 
NDUFB5  Undefined  Not Specified 
PSPH  Upregulation  Upregulation 
EEF1E1  Upregulation  Not Specified 
CTPS  Upregulation  Upregulation 
C1QBP  Upregulation  Upregulation 
SRM  Undefined  Upregulation 
CCT3  Upregulation  Upregulation 
NOLC1  Upregulation  Upregulation 
JTV1  Upregulation  Upregulation 
TRAP1  Upregulation  Not Specified 
BOP1  Undefined  Upregulation 
IARS  Upregulation  Upregulation 
EIF3S9  Upregulation  Upregulation 
PAICS  Upregulation  Not Specified 
RRS1  Upregulation  Upregulation 
RCL  Undefined  Upregulation 
POLR1C  Upregulation  Not Present 
DKFZP564M182  Downregulation  Not Present 
HPRT1  Upregulation  Not Present 
C4orf9  Upregulation  Not Present 
MRPL9  Upregulation  Not Present 
LOC283537  Downregulation  Not Present 
STAT3  Downregulation  Not Present 
TUFM  Undefined  Not Present 
SUPV3L1  Upregulation  Not Present 
MRPL3  Upregulation  Not Present 
LIMK2  Upregulation  Not Present 
ATP6VOD1  Downregulation  Not Present 
MX1  Upregulation  Not Present 
TOMM40  Upregulation  Not Present 
CYC1  Upregulation  Not Present 
NOLA2  Upregulation  Not Present 
MRPL12  Upregulation  Not Present 
TIP1  Undefined  Not Present 
BYSL  Upregulation  Not Present 
PFAS  Upregulation  Not Present 
ZT86  Undefined  Not Present 
TRA@  Undefined  Not Present 
PRMT3  Upregulation  Not Present 
MGC27165  Undefined  Not Present 
ATIC  Upregulation  Not Present 
HSD17B8  Upregulation  Not Present 
SSRP1  Undefined  Not Present 
TEGT  Downregulation  Not Present 
TCP1  Upregulation  Not Present 
Cdna_flj30991  Undefined  Not Present 
IDH3A  Upregulation  Not Present 
Conclusions
This paper described a method to infer regulatory relations in gene networks from expression data. The basic features of IRIS are a simple discretisation method to translate realvalued measurements into two discrete states (active and inactive) and a regulatory inference rule. To compare the proposed approach with other methods, we reported results for synthetic networks. The main conclusion is that the proposed method yields similar or slightly better results than other wellknown approaches, but requires much less computational resources.
We also tested IRIS on two real data sets to infer interaction rules for the yeast mitotic cell cycle and the human MYC subnetwork. IRIS exhibited good accuracy for these networks compared to literaturederived rules. IRIS relies on knowledge of the network topology, which can be extracted from online databases (e.g. KEGG) or can be obtained from network reverse engineering algorithms. In other words regulatory network parameter estimation and model selection are treated and performed as two different tasks. This approach could be useful in studying gene regulatory networks with hundreds of genes as a set of smaller subnetworks, as reported for MYC expression profiles.
IRIS is useful for extracting the main rules within a gene network with a welldefined topology. This information can then be used in subsequent analysis steps, such as probabilistic inference or as a preliminary step for building models of complex biological systems [14].
Availability and Requirements

Project name: Inference of Regulatory Interaction Schema (IRIS)

Project home page: http://bioinformatics.biogem.it:8081/BioPlone/downloadfolder/irisdownloadpage

Operating system(s): Platform independent

Programming language: MATLAB

Other requirements: Graphviz is required if the user wants to create a file containing the gene regulatory network topology

License: GNU GPL

Any restrictions to use by nonacademics: None
Declarations
Acknowledgements
We thank the anonymous referees for their comments. PZ was supported by a scholarship of IGRS Biogem. and SM was supported by a scholarship of University of Sannio.
Authors’ Affiliations
References
 Friedman N, Linial M, Nachinan I, Pe'er D: Using Bayesian Networks to Analyze Expression Data. Journal of Computational Biology 2000, 7: 601–620. 10.1089/106652700750050961View ArticlePubMedGoogle Scholar
 Bansal M, Delia Gatta G, Di Bernardo D: Inference of Gene Regulatory Networks and Compound Mode of Action from Time Course Gene Expression Profiles. Bioinformatics 2006, 22(7):815–822. 10.1093/bioinformatics/btl003View ArticlePubMedGoogle Scholar
 Basso K, Margolin AA, Stolovitzky G, Klein U, Dalla Favera R, Califano A: Reverse Engineering of Regulatory Networks in Human B Cells. Nature Genetics 2005, 37(4):382–390. 10.1038/ng1532View ArticlePubMedGoogle Scholar
 Margolin AA, Nemenman I, Basso K, Wiggins C, Stolovitzky G, Dalla Favera R, Califano A: ARACNE: an Algorithm for the Reconstruction of Gene Regulatory Networks in a Mammalian Cellular Context. BMC Bioinformatics 2006, 7(Suppl I):S7. 10.1186/147121057S1S7PubMed CentralView ArticlePubMedGoogle Scholar
 Yu PP, Wang J, Smith VA, Hartemink AJ, Jarvis ED: Advances to Bayesian Network Inference for Generating Causal Networks from Observational Biological Data. Bioinformatics 2004, 20(18):3594–3603. 10.1093/bioinformatics/bth448View ArticlePubMedGoogle Scholar
 Imoto S, Goto T, Miyano S: Estimation of Genetic Networks and Functional Structures Between Genes by Using Bayesian Networks and Nonparametric Regression. Pac Symp Biocomput 2002, 175–186.Google Scholar
 Kim S, Imoto S, Miyano S: Dynamic Bayesian Network and Nonparametric Regression for Nonlinear Modeling of Gene Networks from Time Series Gene Expression Data. BioSystems 2004, 75: 57–65. 10.1016/j.biosystems.2004.03.004View ArticlePubMedGoogle Scholar
 Pe'er D, Regev A, Elidan G, Friedman N: Inferring Subnetworks from Perturbed Expression Profiles. Bioinformatics 2001, 17(Suppl I):S215S224.View ArticlePubMedGoogle Scholar
 Blimes J: Dynamic Bayesian Multinets. In Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence. Inc MKP; 2000:38–45.Google Scholar
 Friedman N, Murphy K, Russel S: Learning the Structure of Dynamic Probabilistic Networks. Proceedings of the 14th Conference on Uncertainty in Artificial Intelligence 1998, 139–147.Google Scholar
 Ong IM, Glasner JD, Page D: Modelling Regulatory Pathways in E. Coli from Time Series Expression Profiles. Bioinformatics 2002, 18(Suppl I):S241S248.View ArticlePubMedGoogle Scholar
 Someren EV, Wessels L, Reinders M: Linear Modeling of Genetic Networks from Experimental Data. Proc Int Conf Intell Syst Mol Biol 2000, 8: 355–366.PubMedGoogle Scholar
 Dojer N, Gambin A, Mizera I, Wilczynski B, Tiuryn J: Applying Dynamic Bayesian Networks to Perturbed Gene Expression Data. BMC Bioinformatics 2006, 7: 249. 10.1186/147121057249PubMed CentralView ArticlePubMedGoogle Scholar
 Ulitsky I, GatViks I, Shamir R: MetaReg: a Platform for Modeling, Analysis and Visualization of Biological Systems Using LargeScale Experimental Data. Genome Biology 2008, 9: R1. 10.1186/gb200891r1PubMed CentralView ArticlePubMedGoogle Scholar
 Xiong H, Choe Y: Structural Systems Identification of Genetic Regulatory Networks. Bioinformatics 2008, 24(4):553–560. 10.1093/bioinformatics/btm623View ArticlePubMedGoogle Scholar
 Ślęzak D, Wróblewski J: Rough Discretization of Gene Expression Data. Hybrid Information Technology, International Conference on 2006, 2: 265–267. full_textGoogle Scholar
 GatViks I, Tanay A, Shamir R: Modeling and Analysis of Heterogeneous Regulation in Biological Networks. Journal of Computational Biology 2004, 11(6):1034–1049. 10.1089/cmb.2004.11.1034View ArticlePubMedGoogle Scholar
 GatViks I, Tanay A, Raijman D, Shamir R: A Probabilistic Methodology for Integrating Knowledge and Experiments on Biological Networks. Journal of Computational Biology 2006, 13(2):165–181. 10.1089/cmb.2006.13.165View ArticlePubMedGoogle Scholar
 Kschischang FR, Brendan JF, Loeliger AH: Factor Graphs and the SumProduct Algorithm. IEEE Transactions on Information Theory 2001, 47(2):498–519. 10.1109/18.910572View ArticleGoogle Scholar
 Chuang CL, Jen CH, Chen CM, Shieh GS: A Pattern Recognition Approach to Infer Timelagged Genetic Interactions. Bioinformatics 2008, 24(9):1183–1190. 10.1093/bioinformatics/btn098View ArticlePubMedGoogle Scholar
 GFF Jr: Normal Realizations. IEEE Transactions on Information Theory 2001, 47(2):520–548. 10.1109/18.910573View ArticleGoogle Scholar
 Bishop C: Pattern recognition and machine learning. Springer New York; 2006.Google Scholar
 Mooij J, Kappen H: Loop corrections for approximate inference on factor graphs. The Journal of Machine Learning Research 2007, 8: 1113–1143. [http://jmlr.csail.mit.edu/papers/v8/mooij07a.html]Google Scholar
 Bulcke K, Van Leemput T, Naudts B, Van Remortel P, Ma H, Verschoren A, De Moor B, Marchal K: SynTReN: a Generator of Synthetic Gene Expression Data for Design and Analysis of Structure Learning Algorithms. BMC Bioinformatics 2006, 7: 7–43. 10.1186/1471210577View ArticleGoogle Scholar
 Peelicer MT, Fernandez C, Badía J, Aguilar J, Lin ECC, Baldomà L: Crossinduction of glc and ace Operons of Escherichia coli Attributable to Pathway Intersection. J Biol Chem 1999, 274(3):1745–1752. 10.1074/jbc.274.3.1745View ArticleGoogle Scholar
 Sawers G, Kaiser M, Sirko A, Freundlich M: Transcriptional Activation by FNR and CRP: Reciprocity of Bindingsite Recognition. Molecular Microbiology 1997, 23(4):835–845. 10.1046/j.13652958.1997.2811637.xView ArticlePubMedGoogle Scholar
 Shuman HA, Shilhavy TJ: The Art and Design of genetic Screens: Escerichia Coli . Nature Reviews Genetics 2003, 4(6):419–431. 10.1038/nrg1087View ArticlePubMedGoogle Scholar
 Herring CD, Glasner JD, Blattner FR: Gene Replacement without Selection: Regulated Suppression of Amber Mutations in Escherichia Coli . Gene 2003, 311: 153–163. 10.1016/S03781119(03)005857View ArticlePubMedGoogle Scholar
 Wilcox L, Balderes DA, Wharton B, Tinkelenberg AH, Rao G, L SS: Transcriptional Profiling Identifies two Members of the ATPbinding Cassette Transporter Superfamily Required for Sterol Uptake in Yeast. The Journal of biological chemistry 2002, 277(36):32466–32472. 10.1074/jbc.M204707200View ArticlePubMedGoogle Scholar
 Kullback S, Libler R: On Informations and Sufficiency. Annals of Mathematical Statistics 1951, 22: 79–86. 10.1214/aoms/1177729694View ArticleGoogle Scholar
 Lauritzen SL: The EM algorithm for Graphical Association Models with Missing Data. Elsevier Science 1995, 19(2):191–201.Google Scholar
 Murphy KP: The Bayes Net Toolbox for Matlab. Computing Science and Statistics 2001., 33:Google Scholar
 Noman N, Iba H: Inferring Gene Regulatory Networks Using Differential Evolution with Local Search Heuristics. IEEE/ACM Trans Comput Biol Bioinform 2007, 4(4):634–647. 10.1109/TCBB.2007.1058View ArticlePubMedGoogle Scholar
 KEGG: Kyoto Encyclopedia of Genes and Genomes[http://www.genome.jp/kegg/]
 Li F, Long T, Lu Y, Tao C: The Yeast Cell Cycle Network is Robustly Designed. Proc Natl Acad Sci U S A 2004, 101(14):4781–4786. 10.1073/pnas.0305937101PubMed CentralView ArticlePubMedGoogle Scholar
 Spellinan PT, Sherlock G, Zhang MQ, Iyer VR, Anders K, Eisen MB, Brown PO, Botsein D, Futcher B: Comprehensive Identification of Cell Cycleregulated Genes of the Yeast Saccharomyces cerevisiae by Microarray Hybridization. Molecular Biology of the Cell 1998, 9(12):3273–3297.View ArticleGoogle Scholar
 Albert R: Scalefree networks in cell biology. Journal of Cell Science 2005, 118(21):4947–4957. 10.1242/jcs.02714View ArticlePubMedGoogle Scholar
 MYC Target Gene Database[http://www.myccancergene.org/site/mycTargetDB.asp]
 Shiio Y, Donohoe S, Yi EC, Goodlett DR, Aebersold R, Eisenman RN: Quantitative proteomic analysis of Myc oncoprotein function. The EMBO Journal 2002, 21(19):5088. 10.1093/emboj/cdf525PubMed CentralView ArticlePubMedGoogle Scholar
 Schwob E, Nasmyth K: CLB5 and CLB6, a new Pair of B Cyclins Involved in DNA Replication in Saccharomyces Cerevisiae . Genes and Development 1993, 7: 1160–1175. 10.1101/gad.7.7a.1160View ArticlePubMedGoogle Scholar
 Di Como CJ, Chang H, Arndt KT: Activation of CLN1 and CLN2 G1 cyclin gene expression by BCK2. Molecular and Cellular Biology 1995, 15(4):1835–1846.PubMed CentralView ArticlePubMedGoogle Scholar
 Nugorho TT, Mendenhall MD: An Inhibitor of Yeast Cyclindependent Protein Kinase Plays an Important Role in Ensuring the Genomic Integrity of Daughter Cells. Molecular and Cellular Biology 1994, 14(5):3320–3328.View ArticleGoogle Scholar
 Vernia R, Annan RS, Huddleston MJ, Carr SA, Reynard G, Deshaies RJ: Phosphorylation of Sic1p by G1 Cdk Required for Its Degradation and Entry into S Phase. Science 1997, 278(5337):455–460. 10.1126/science.278.5337.455View ArticleGoogle Scholar
 Anon A, Tyers M, Futcher B, Nasmyth K: Mechanisms that Help the Yeast Cell Cycle Clock Tick: G2 Cyclins Transcriptionally Activate G2 Cyclins and Repress G1 Cyclins. Cell 1993, 74(6):993–1007. 10.1016/00928674(93)907223View ArticleGoogle Scholar
Copyright
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.