A Three Stage Integrative Pathway Search (TIPS©) framework to identify toxicity relevant genes and pathways
© Li et al; licensee BioMed Central Ltd. 2007
Received: 23 January 2007
Accepted: 14 June 2007
Published: 14 June 2007
The ability to obtain profiles of gene expressions, proteins and metabolites with the advent of high throughput technologies has advanced the study of pathway and network reconstruction. Genome-wide network reconstruction requires either interaction measurements or large amount of perturbation data, often not available for mammalian cell systems. To overcome these shortcomings, we developed a Three Stage Integrative Pathway Search (TIPS©) approach to reconstruct context-specific active pathways involved in conferring a specific phenotype, from limited amount of perturbation data. The approach was tested on human liver cells to identify pathways that confer cytotoxicity.
This paper presents a systems approach that integrates gene expression and cytotoxicity profiles to identify a network of pathways involved in free fatty acid (FFA) and tumor necrosis factor-α (TNF-α) induced cytotoxicity in human hepatoblastoma cells (HepG2/C3A). Cytotoxicity relevant genes were first identified and then used to reconstruct a network using Bayesian network (BN) analysis. BN inference was used subsequently to predict the effects of perturbing a gene on the other genes in the network and on the cytotoxicity. These predictions were subsequently confirmed through the published literature and further experiments.
The TIPS© approach is able to reconstruct active pathways that confer a particular phenotype by integrating gene expression and phenotypic profiles. A web-based version of TIPS© that performs the analysis described herein can be accessed at http://www.egr.msu.edu/tips.
The regulation of cellular functions is achieved through the contribution and interactions of genetic, signaling and metabolic pathways. Consequently, cellular processes may be singularly regulated, i.e., at the gene or transcription level, or controlled by a network of interactions of genes, proteins and metabolites. Therefore, understanding the biological function of the myriad of genes and how these genes and gene products interact and regulate each other to yield a functional cell would help to identify more appropriate pathways that should be targeted or studied for a given disease.
Novel analytical frameworks are required to unravel the regulatory and functional relationships from profiles of genes, proteins and metabolites. Approaches have been developed to infer genome wide networks from high throughput data. These approaches typically require genome-wide interaction data [1, 2] or a wide range of perturbation data [3–5], which have been applied to yeast and human B cells, respectively, to reverse engineer genome wide gene regulatory networks. Unfortunately, genome-wide interaction measurements as well as large amounts of perturbation data are not easily or readily obtainable for researchers working with mammalian systems. Thus, a challenge remains to reconstruct networks from a small amount of perturbation data. The small perturbation data size poses a challenge to statistical inference and makes it extremely hard to infer genome wide networks with any degree of confidence [6–8]. To address this, we developed an approach to identify a smaller network of active pathways rather than genome wide networks based upon a subset of important genes selected by integrating gene expression and phenotypic profiles. The approach first integrates genetic algorithm coupled partial least squares analysis (GA/PLS)  and constrained independent component analysis (CICA)  to identify a subset of genes relevant to a phenotype. Next, BN analysis is used to reconstruct an active sub-network from this smaller group of genes to reveal which pathways are induced by the external stimuli or environmental factors. Applying BN to a reduced subset of genes also circumvents the inefficiency of the BN analysis in inferring large size networks, such as is the case with genome wide networks. BN can detect indirect influences and unmeasured events and is not susceptible to the existence of unobserved variables . It has been applied to infer gene regulatory network of yeast cell cycle from gene expression data, metabolic sub-networks from metabolic data and protein signaling network from protein activity data [6, 11–13]. The reconstructed network was then used to predict the effect of perturbing a gene on the other genes in the network with BN inference, and thus provided insight into how the genes interacted within the network to produce a specific phenotype.
As proof-of-concept, the framework was applied to identify the pathways that confer cytotoxicity in HepG2/C3A cells. Liver toxicity is often used to assess the safety of drugs and is a primary reason for drug recalls . Therefore, predicting liver toxicity earlier in the drug development process would be valuable. In the current study, saturated FFA, palmitate, was found to induce liver toxicity, and this effect was exacerbated by the presence of TNF-α. Indeed, elevated levels of FFAs and TNF-α have been shown to be involved in the pathogenesis of liver disorders, such as fatty liver disease and steatohepatitis [15–18]. Therefore, we applied our approach to this model system from the standpoint that if we can identify pathways that may attenuate the toxicity in the presence of FFAs and TNF-α, perhaps this model could be applied eventually to drug candidates to identify pathways that may be modulated to enhance the efficacy and minimize the toxicity of the drug. We analyzed the genetic responses of the hepatocytes to physiologically elevated levels of FFAs and TNF-α to identify pathways involved in conferring cytotoxicity, and which in turn may provide insight into the physiological actions of these factors.
We developed a TIPS© framework that first applied GA/PLS  and independent component analysis (ICA) [19, 20] to identify a subset of genes relevant to cytotoxicity. We assumed, as a first approximation, a log linear relationship between gene expression and cytotoxicity. The genes selected by GA/PLS were initially corroborated with published results to identify known interactions. In order to extract an independent pathway related to a phenotype, such as cytotoxicity, from the gene expression profile, we propose a constrained ICA (CICA) approach. The relevance of the genes to the toxicity identified by GA/PLS along with the cytotoxicity profiles were used as constraints in CICA. CICA extracted a phenotype-relevant-component from the gene expression data. CICA assumes that the expression profile of thousands of genes can be represented by a reduced number of mutually independent processes. Biologically meaningful gene groups have been successfully identified by ICA [19, 20]. A phenotype-relevant-component was identified by minimizing the mutual information between the phenotype-relevant-component and the other independent components while maximizing the correlation between the component and the constraints. The expression profiles of the genes with the highest weights in CICA were used in BN analysis for network reconstruction. The reconstructed network was perturbed to identify i) which genes, when perturbed, had an impact on altering the cytotoxic phenotype in the palmitate cultures, and ii) how perturbing one gene (node) affected the other genes in the network. The reconstructed network provided potential explanation(s) on how palmitate and TNF-α induced cytotoxicity. The model identified (i) the roles played by stearoyl-CoA desaturase (SCD), double-stranded-RNA-dependent protein kinase (PKR) and Bcl-2 in the palmitate-induced cytotoxicity, and (ii) the activation of nuclear factor kappa B (NF-κB) by TNF-α is mediated by protein kinase C delta (PKC-δ). These simulated perturbations of the reconstructed network were evaluated experimentally to assess the accuracy of the predictions.
1. Identifying genes relevant to cytotoxicity using GA/PLS
Functional groups of genes related to LDH release, selected by GA/PLS
(gC) cytochrome P450, polypeptide 1 (CYP1A1)
(g) glutamate-cysteine ligase, catalytic subunit
(gC) NADH dehydrogenase (ubiquinone) 1 alpha subcomplex, 1
(gC) glutathione S-transferase A2 (GSTA2)
(gC) P450 (cytochrome) oxidoreductase (POR)
(gC) glutathione S-transferase theta 1 (GSTT1)
(gC) NADH dehydrogenase (ubiquinone) 1
(gC) ATP synthase, H+ transporting, subunit c
(gC) glutaredoxin (thioltransferase) (GLRX)
(gC) glutathione reductase (GSR)
(gC) glutathione S-transferase M3
(gC) glutathione peroxidase 3 (plasma) (GPX3)
(gC) metallothionein 1G (MT1G)
(gC) NADH dehydrogenase (ubiquinone) 1 alpha subcomplex, 10
(gC) cytochrome c (HCS)
(gC) cytochrome P450, subfamily XXIV (CYP24)
(gC) NADH dehydrogenase (ubiquinone) 1 alpha subcomplex, 7
(gK) cytochrome P450, polypeptide 7 (CYP2A7)
(gM) holocytochrome c synthase (HCCS)
(gM) NAD(P)H dehydrogenase, quinone 2 (NQO2), mRNA
(gC) microsomal glutathione S-transferase 2 (MGST2)
(gC) cytochrome P450, subfamily IVF, polypeptide 12 (CYP4F12)
(gC) isocitrate dehydrogenase 2 (NADP+), mitochondrial (IDH2)
(gC) citrate synthase (CS)
(gC) succinate-CoA ligase, GDP-forming, beta subunit
(g) nuclear factor I/X (CCAAT-binding transcription factor) (NFIX)
(gC) zinc finger protein 36, C3H type-like 1 (ZFP36L1)
(gC) inhibitor of kappa light polypeptide gene enhancer in B-cells
(gC) TRAF family member-associated NFκβ activator (TANK)
(gC) lymphotoxin alpha (TNF- superfamily, member 1) (LTA)
(gC) Fas (TNF-RSF6) associated factor 1 (FAF1)
(gC) tumor necrosis factor (ligand) superfamily, member 13 (TNF-SF13)
(gC) LPS-induced TNF-alpha factor (PIG7)
(gC) Fas (TNF-RSF6)-associated via death domain (FADD)
(g) tumor necrosis factor, alpha-induced protein 3 (TNF-AIP3)
(gK) tumor necrosis factor receptor superfamily, member 5 (TNF-RSF5)
(gC) TNF-AIP3 interacting protein 2 (TNIP2)
(gC) TNF-AIP3 interacting protein 1 (TNIP1)
(gC) TNF- receptor-associated factor 1 (TRAF1)
(gC) tumor necrosis factor, alpha-induced protein 1 (TNF-AIP1)
(gC) tumor necrosis factor (ligand) superfamily, member 9 (TNF-SF9)
(gC) S100 calcium binding protein P (S100P)
(gC) protein phosphatase 4, regulatory subunit 1 (PPP4R1)
(gC) protein tyrosine kinase 9 (PTK9)
(gC) protein phosphatase 1, regulatory (inhibitor) subunit 15B
(gM) solute carrier family 38, member 2 (SLC38A2)
(gM) protein kinase C substrate 80K-H (PRKCSH)
(g) tumor-associated calcium signal transducer 2 (TACSTD2)
(gC) serine/threonine kinase 17a (apoptosis-inducing) (STK17A)
(gC) MAP kinase-interacting serine/threonine kinase 2 (MKNK2)
(gC) mitogen-activated protein kinase 6
(gC) phospholipase A2, group IVB (cytosolic)
(gC) insulin-like growth factor binding protein 1 (IGFBP1)
(gM) protein tyrosine phosphatase, receptor type, N (PTPRN)
(gN) G protein-coupled receptor 87 (GPR87)
(gC) serine/threonine kinase 17a (apoptosis-inducing) (STK17A)
(gC) mitogen-activated protein kinase 3
(gM) G protein-coupled receptor 48 (GPR48)
(gM) protein phosphatase 1, regulatory (inhibitor) subunit 12A (PPP1R12A)
(gC) mitogen-activated protein kinase kinase kinase 7
(gM) protein phosphatase 1, regulatory (inhibitor) subunit 11 (PPP1R11)
(gC) cyclin-dependent kinase (CDC2-like) 10 (CDK10)
(gC) potassium inwardly-rectifying channel, subfamily J, member 2 (KCNJ2)
(gC) mitogen-activated protein kinase kinase 1 (MAP2K1)
(gM) phospholipase C, delta 4 (PLCD4)
(gC) protein kinase C, delta
(gC) protein phosphatase 1, catalytic subunit, alpha isoform (PPP1CA)
(gC) protein kinase C binding protein 1
(gC) protein phosphatase 2 (formerly 2A), catalytic subunit
(gC) serine/threonine kinase 6 (STK6)
(gC) protein tyrosine phosphatase, receptor type, K (PTPRK)
(gC) G protein-coupled receptor kinase 5 (GPRK5)
(gC) protein tyrosine phosphatase, receptor type, N polypeptide 2 (PTPRN2)
Fatty acid metabolism
(gC) fatty acid amide hydrolase (FAAH)
(gC) fatty-acid-Coenzyme A ligase, long-chain 3 (FACL3)
(gC) long-chain fatty-acyl elongase (LCE)
(gC) fatty acid amide hydrolase
(gC) BCL2/adenovirus E1B 19kDa interacting protein 3 (BNIP3)
(gN) caspase 6, apoptosis-related cysteine protease (CASP6)
(gC) acid sphingomyelinase-like phosphodiesterase (ASM3A)
(gC) caspase 8, apoptosis-related cysteine protease (CASP8)
(gC) caspase 10, apoptosis-related cysteine protease (CASP10)
(gF) requiem, apoptosis response zinc finger gene (REQ)
(gC) UDP-glucose ceramide glucosyltransferase
(gC) sphingosine kinase 2 (SPHK2)
(gC) programmed cell death 5 (PDCD5)
(gC) sphingosine-1-phosphate lyase 1
(gC) BCL2-like 13 (apoptosis facilitator) (BCL2L13)
(gM) eukaryotic translation termination factor 1 (ETF1)
(gC) eukaryotic translation initiation factor 2, subunit 2 beta, 38kDa (EIF2S2)
(gC) translational inhibitor protein p14.5 (UK114)
(gC) eukaryotic translation initiation factor 3, subunit 10 theta
(gC) eukaryotic translation initiation factor 4 gamma, 3 (EIF4G3)
(gC) eukaryotic translation initiation factor 2B, subunit 2 beta, 39kDa (EIF2B2)
(gC) translocase of outer mitochondrial membrane 34 (TOMM34)
(gC) translocase of inner mitochondrial membrane 22 homolog (yeast) (TIMM22)
(gM) translocase of outer mitochondrial membrane 20
2. Identifying genes involved in an independent pathway related to cytotoxicity using CICA
GA/PLS determined the frequency with which each gene was selected to predict LDH release. The frequencies and the profile of LDH release were applied as structure and profile constraints respectively in CICA to extract a phenotype-relevant-component from the gene expression profile, see methods for more details. The independent component in this case identified a subset of genes whose profiles corresponded to the profile of LDH release. CICA determined the weights for each gene by minimizing the mutual information between the independent components and maximizing the correlation to the constraints. The weights determined by CICA are in the additional files [see Additional file 1]. Genes that had weights significantly different from zero with a 95% confidence using the Z-test were subjected to BN analysis for pathway reconstruction.
3. Reconstruct pathways related to cytotoxicity using BN
3.1 Perturbation of the reconstructed network
The reconstructed network allowed us to perturb the network in silico to identify which nodes had an impact on modulating LDH release. BN inference was used to predict the effect of perturbing a single gene/node on i) the other gene within the network, and ii) the level of LDH release in the palmitate cultures. The predictions of the simulated perturbations were subsequently confirmed with inhibitor (or activator) experiments. We altered the activity of the nodes by inhibiting (activating) the protein activity. Perturbations of the SCD (see section 3.2) and PKR (section 3.3) were simulated with BN inference to evaluate their effects on cytotoxicity, whereas PKC-δ (section 3.4) was perturbed to predict its effect on NF-κB activation.
3.2 Role of stearoyl-CoA desaturase (SCD) in palmitate-induced cytotoxicity
Simulating genetic perturbation and its effects on LDH release
Probability of LDH Release
3.3 Role of Bcl-2 and PKR in palmitate-induced cytotoxicity
In the reconstructed network, Bcl-2 was connected to the translocase of outer membrane (TOM20) and carnitine palmitoyl transferase (CPT-1). These connections are supported by published literature results. Targeting the Bcl-2 protein to the mitochondria is mediated by the interaction between the C terminus of Bcl-2 and TOM20 . Similarly, a direct interaction between Bcl-2 and CPT-1 has been confirmed through a co-immunoprecipitation study , thus a direct connection between Bcl-2 and CPT-1 found by the model is encouraging.
Simulating down-regulation of PKR and its effects on Bcl-2
Probability of Bcl-2 level
PKR was also found to be connected to apoptosis inhibitor (API5), which is connected to PP2AB56 and apoptotic chromatin condensation inducer (ACINUS), and the latter is connected to eIF2B suggesting that these factors are likely to be involved in the apoptotic signaling pathway. These connections are supported by published results in the literature. Caspase-3 activity was enhanced in the palmitate cultured cells (Figure 3), which may result in enhanced phosphorylation of eIF2-α. PKR can be cleaved by caspase-3, 7, 8 to liberate the eIF2-α kinase domain, which phosphorylates eIF2-α . Phosphorylation of eIF2-α by PKR will inhibit protein synthesis and lead to apoptosis . PKR also can bind to PP2A at the B56 alpha regulatory subunit (PP2AB56) and increase the phosphatase activity of PP2A . PP2A is a major Ser/Thr phosphatase involved in many signal transduction pathways. PP2A can dephosphorylate and inactivate the anti-apoptotic Bcl-2 at Ser-70 .
3.4 Activation of NF-κB by TNF-α is mediated by PKC-δ
Simulating down-regulation of PKC-δ and its effects on NF-κβ.
Probability of NF-κβ activation
With the availability of high dimensional biological data to characterize a cellular state, one of the challenges is the development of robust methods that can integrate various orthogonal datasets to identify the genes and pathways that induce a phenotype. The significance of the TIPS© framework is its ability to extract relevant information, both known and unknown, from high dimensional data. The phenotypic profile was used to guide the information extraction process. Proteins relay information from the genes to execute biological functions, which define the cellular phenotype. Thus, the effects of regulation occurring at the protein level manifest themselves in the phenotype. This paper illustrates that uncovering this information at the protein (i.e., intermediate) level may be achieved by integrating phenotype and gene expression data.
A handful of the connections were selected to illustrate the effectiveness of the framework. The selection was not intended to be comprehensive or exclusive of other potentially valid connections. The connections selected for further analysis and discussion were based upon i) evidence in the literature that a potential relationship may exist, although it may not be known what the exact nature of the relationship is or its relevance to toxicity, and ii) whether materials, e.g. assays or antibodies, are available to allow us to evaluate the connections.
Currently, only one phenotypic profile, e.g., LDH release, was used to identify the active network perturbed by TNF-α and FFA exposure. Metabolic profiles, which characterize the cellular phenotype, may also be used as constraints. Incorporating more metabolic profiles would improve the characterization of the phenotype and in turn the network reconstruction and predictions. An approach to add more constraints would be to apply ICA or PLS to extract several latent variables from the metabolic profiles . In addition, the current TIPS© approach selects genes based upon the data without considering a priori knowledge of the system under investigation. Using a purely data driven approach may result in important genes with modest changes escaping detection by statistical analysis such as ANOVA. Incorporating domain knowledge into the TIPS© approach could improve the selection of relevant genes. The prior knowledge could be incorporated with approaches such as gene set enrichment analysis (GSEA) . GSEA incorporates functional pathway information in the selection of significantly enriched functional gene groups.
Additionally, the current TIPS© approach reconstructs pathways from gene expression and metabolic profiles at a single time point. Dynamic regulatory interactions may be inferred if data from multiple time points are available. Indeed, the underlying biological regulatory mechanism is likely to be dynamic in response to changes in the environmental conditions. A power law model has been applied in continuous dynamic Bayesian network (DBN) analysis to model the connection between genes [45, 46]. The power law model can easily be extended to allow for delayed transcription [45, 46]. Both discrete  and continuous [45, 46] DBN has been applied to model gene regulatory networks. Therefore, to detect the dynamic property of biological networks, we plan to obtain time series data and incorporate a dynamic Bayesian network reconstruction component into the TIPS© framework.
Due to computational limitation as well as limited data, it is not possible to reconstruct a network with high confidence using all the genes across the genome. GA/PLS and ICA provided an approach to identify a relevant subset of genes for further analysis. However, useful information may be missed in the selection process or not identified due to low abundance transcripts. To address the former, an approach would be to use a more targeted array with a smaller subset of genes. To address the latter, methods such as kinetically monitored reverse transcriptase-initiated PCR (kRT-PCR) could be used to measure genes with low abundance transcripts .
In conclusion, we have demonstrated that TIPS© may be applied to reconstruct the active associations from gene expression and phenotypic profiles to help elucidate the pathways involved in regulating palmitate-induced cytotoxicity. The pathways identified and shown in Figure 4 are specific to the cytotoxicity induced by FFA and TNF-α. If other compounds were applied to a cell culture system (HepG2 or another cell type), new microarray and phenotype data would have to be collected and the TIPS© analysis applied to identify a different set of relevant genes specific to those compounds. This is important because many connections are context-specific (i.e. cell type and treatment). For instance, the regulation of Bcl-2 by TNFα is cell dependent. TNF-α suppresses Bcl-2 in FaO rat hepatoma cells  while it induces Bcl-2 in rat hippocampal neuron cells . Similarly, signaling pathways are stimuli specific, for example, TNFα activates IKK activity with a negative feedback through A20 while LPS activates IKK activity with a positive feedback through a different pathway involving NF-κB and IRF3 . Therefore, whether a pathway is activated will depend on the type of cell and the condition under consideration. The advantage of TIPS© is that it allows for these differences and aims to reconstruct these pathways from the context-specific data.
One million Hep G2/C3A cells (ATCC, Manassa, VA) were seeded into each well of 6-well culture plate. The cells were kept in 2 ml of medium containing Dulbecco's modified Eagles medium (Invitrogen, Carlsbad, CA) supplemented with 10% fetal bovine serum (ATCC) and 2% Penicillin-streptomycin (Invitrogen). Cells were incubated at 37°C and in 10% CO2 atmosphere. After cells reached confluence, the medium was replaced with 2ml of treatment of FFA, either palmitate (700 μM), oleate (700 μM), or linoleate (700 μM), and in the presence and absence of TNF-α (0, 20, 100 ng/ml). Fatty-acid-free bovine serum albumin (BSA) was used as a carrier for the FFAs.
For the LDH measurements, cells were cultured in different media for up to 48 hours and the supernatant collected. Cells were washed with phosphate buffered saline (PBS) and kept in 1% triton-X-100 in PBS for 24 h at 37°C. Cell lysate was then collected, vortexed for 15 seconds and centrifuged at 7000 rpm for 5 minutes. Cytotoxicity detection kit (Roche Applied Science, Indianapolis, IN) was used to measure the LDH release. LDH released was normalized to the total LDH (released + lyzed).
Gene expression profiling
Cells were cultured in 10 cm tissue culture plates until confluence and then exposed to different treatments. RNA was isolated with Trizol reagent. The gene expression profiles were obtained with cDNA microarray. Analyses were done at the Van Andel Institute, Grand Rapids, MI. The protocols are available online at . There were two biological replicates for each condition and each replicate was labeled with the Cy3 and Cy5 dyes. The microarray data has been deposited at the GEO website , with a query number of GSE5441.
Inhibitors and activators
1 μM Rottlerin (inhibits PKC-δ, BIOMOL Research Laboratories, Plymouth Meeting, PA), 10 μM HA14-1 (inhibits Bcl-2, BIOMOL Research Laboratories, Plymouth Meeting, PA), 6 μM PKR inhibitor (Calbiochem, EMD Bioscience, CA) were used in the inhibitor experiments. 50 μM of clofibrate and ciprofibrate (Sigma) were used as activators of SCD.
Cells were lysed with 1× SDS sample buffer. Proteins in cell lysates were separated on SDS-polyacrylamide gel electrophoresis (SDS-PAGE) and blotted onto nitrocellulose membrane. Non-specific binding sites were blocked with 5% non-fat milk. The membrane were then treated with anti-SCD (1:1000, Santa-Cruz Biotechnology Inc., CA), anti-PKC-δ (1:1000, Santa-Cruz Biotechnology Inc., CA), anti-Bcl-2 (1:2000, Cell signaling Inc., MA), and anti-NF-κB-P65 (1:1000, Cell signaling Inc., MA) to detect protein level of SCD, PKC-δ, Bcl-2, and NF-κB, respectively. Membranes were also immnoblotted with anti-β-actin (1:1000, Cell signaling Inc., MA) to monitor the loading level in each lane. All antibodies are against the human isoform.
The analysis of variance (ANOVA) was applied to compare the effect of treatment (e.g. FFA, TNF-α) and to determine whether a treatment has a significant effect. We applied a two-way ANOVA test to identify the genes that are affected by FFA, TNF-α or the interaction between FFA and TNF-α. The analysis was performed in MATLAB 6.3 using Stats Toolbox. A two step ANOVA analysis was performed to identify the genes that changed significantly due to FFA or TNF-α exposure. We identified a list of genes from the literature shown in the additional files [see Additional file 2], that are relevant to palmitate-induced cytotoxicity and applied ANOVA with P ≤ 0.05 to this list of genes (which we denote as "supervised" ANOVA). In addition, ANOVA analysis was applied to the entire list of genes with P ≤ 0.01 (which we denote as "unsupervised" ANOVA). The two lists of genes were then combined into one list, eliminating any overlaps between the lists. Using the supervised and unsupervised ANOVA tests, the expression level of 830 genes were found to be significant due to either TNF-α or FFA. This list of genes is in the additional files [see Additional file 3].
Genetic algorithm/partial least square analysis (GA/PLS)
where Met(treatment) and Met(control) are the metabolic function for the treated and control cultures, respectively; Gene(treatment) i and Gene(control) i are the expression level of gene i for the treated and control cultures, respectively.
Constrained Independent Component Analysis (CICA)
ICA is a statistical method that has been applied to reveal "hidden factors" underlying sets of signal measurements . The expression levels of the genes are the recorded signals which are affected by underlying regulatory pathways. We denote the signal measurements Y to be the gene expression data, S to be the independent components (pathways), and A to be the mixing matrix. The ICA model can be expressed as
Y = A S
The gene expression Y is supplied to the ICA model and the mixing matrix A can be uniquely estimated by assuming that the components in S are statistically independent to each other. Y(i,t) represents the expression of gene i in experiment t, A(i,j) represents weight of gene i in independent pathway j, S(j,t) represents the profile of independent pathway j in experiment t. Since the objective here is to identify a LDH release related independent pathway, A was further constrained with the frequency learned by GA/PLS and S was further constrained with the LDH release profile. Let F(i) be the frequency of gene i with respect to LDH release and a (i) be the weight of gene i in the pathway related to LDH release. Thus, a is constrained to have a correlation with F by equation (4), where ρ1 is a threshold value.
Corr1 = aT *diag (F)*a/(aT *a) > ρ1
Let s(t) be the profile of LDH related pathway in experiment t and L(t) be the profile of LDH release in experiment t. Similarly, s is constrained to have a correlation with L by equation (5), where ρ2 is a threshold value.
Corr2 = sT*L*LT*s/(s*sT) >ρ2
Implicit in the use of ICA is i) ICA separates only linear effects, ii) all the genes are assumed to be linearly independent, and iii) can handle only one Gaussian source, which is typically assumed to be the noise in the data, and the independent signal sources are non-Gaussian. The assumption that information flow from genes to proteins is linear is a reasonable first-order approximation since most genes translate to proteins which are not involved in regulation or feedback loops. We addressed the second assumption through pre-whitening the gene expression data with singular value decomposition to remove the principal components with small eigenvalues, i.e. linear dependency. For the third assumption, we evaluated the normality of the independent component and found the extracted independent component is non-Gausssian [see Additional file 4], therefore justifying the application of ICA to the gene data.
Bayesian network analysis
Bayesian networks are directed acyclic graphs (DAG) whose nodes correspond to variables and whose arcs represent the dependencies between variables. The dependencies are determined by the conditional probabilities of each node x i , given its parent node p a , Pr(x i | p a (x i )). A Bayesian network i) assumes conditional independence, such that each node is independent to its non-descendants, given it parents. For example, B and C are conditionally independent to each other given A in a network, then
Pr(B | C,A) = Pr(B|A)
In the example above:
Pr (A, B, C) = Pr (A) Pr (B | A) Pr (C | A)
In Phase 3 each arc is examined using the CI test and arcs are removed if two genes linked by an arc are conditionally independent. The LDH node was assigned as a leaf node in the learning process, which enabled the determination of direction in the network. BN PowerConstructor downloaded from Cheng Jie's group at  was used in the constraint based network reconstruction. Both the score and search based and the constraint based approaches have their advantages and disadvantages in different respects. For example, Heckerman et al.  showed that the scoring-based method is advantageous over the constraint-based methods when modeling a probability distribution of observations. However, Friedman et al.  showed theoretically that the general scoring-based method may result in poor prediction accuracy, which was also confirmed by Greiner et al. . Score and search method often are computationally more costly . Constraint based method, on the other hand, is able to discriminate between models with and without hidden variables and can indicate the presence of a hidden common cause between two variables . Therefore, we applied both approaches and compared the network learned using our data. We found that fewer connections were identified using the LibB approach and the LDH node was not connected to any genes (the full network is shown in the additional files [see Additional file 5]). Based upon this comparison, we decided to use the constraint based approach to generate the network for subsequent analyses.
Bayesian network inference
Bayesian network inference was used to predict the probabilities of a phenotype e.g. LDH or a gene will take a certain value with the other genes in the network at controlled levels. The posterior probability that the class node will take on a certain value given the values of the other nodes is determined based upon conditional probability. Suppose node A is the target node and b1 and c1 are the known values of evidence nodes B and C, respectively, we can predict the posterior probability Pr(A| xi, xj) according to the Bayesian rule:
Pr(A=a1|B=b1, C=c1)=Pr(B=b1,C=c1|A=a1)*Pr(A=a1)/Pr(B=b1,C=c1) (9)
However, applying this exact inference to a large network is computationally costly . Therefore, we applied an approximate inference algorithm, logic sampling , to infer the posterior probabilities. Briefly, logic sampling generates a case by randomly assigning values to each node weighted by the probability of that value occurring. To estimate the posterior probability Pr(X|E) where X is target node and E is the evidence node, we compute the ratio of the number of cases where both E and X are true to the number of cases where just E is true, i.e. Pr(X=x|E=e) = Pr(X=x, E=e)/Pr(E=e). The inference process was conducted with Genie .
Availability and requirements
Project name: Three Stage Integrative Pathway Search (TIPS © )
Project home page: www.egr.msu.edu/tips
Operating system(s): Platform independent
Programming language: Matlab
License: GNU GPL for academics, license needed for non-academic users
AMP-activated protein kinase
free fatty acid
genetic algorithm coupled partial least square analysis
independent component analysis
nuclear factor kappa B
protein kinase C delta
double-stranded-RNA-dependent protein kinase
tumor necrosis factor alpha
The work was supported by the National Science Foundation (BES 0331297 and BES 0425821, SBIR 0610784), the National Institute of Health (1R01GM079688-01), the MSU Foundation and the Center for Systems Biology, the Environmental Protection Agency (RD83184701) and the Whitaker Foundation.
- Ideker T, Ozier O, Schwikowski B, Siegel AF: Discovering regulatory and signalling circuits in molecular interaction networks. Bioinformatics. 2002, 18 Suppl 1: S233-40.View ArticlePubMedGoogle Scholar
- Yeang CH, Mak HC, McCuine S, Workman C, Jaakkola T, Ideker T: Validation and refinement of gene-regulatory pathways on a network of physical interactions. Genome Biol. 2005, 6 (7): R62-10.1186/gb-2005-6-7-r62.PubMed CentralView ArticlePubMedGoogle Scholar
- di Bernardo D, Thompson MJ, Gardner TS, Chobot SE, Eastwood EL, Wojtovich AP, Elliott SJ, Schaus SE, Collins JJ: Chemogenomic profiling on a genome-wide scale using reverse-engineered gene networks. Nat Biotechnol. 2005, 23 (3): 377-383. 10.1038/nbt1075.View ArticlePubMedGoogle Scholar
- Basso K, Margolin AA, Stolovitzky G, Klein U, Dalla-Favera R, Califano A: Reverse engineering of regulatory networks in human B cells. Nat Genet. 2005, 37 (4): 382-390. 10.1038/ng1532.View ArticlePubMedGoogle Scholar
- Margolin AA, Nemenman I, Basso K, Wiggins C, Stolovitzky G, Dalla Favera R, Califano A: ARACNE: an algorithm for the reconstruction of gene regulatory networks in a mammalian cellular context. BMC Bioinformatics. 2006, 7 Suppl 1: S7-10.1186/1471-2105-7-S1-S7.View ArticlePubMedGoogle Scholar
- Friedman N, Linial M, Nachman I, Pe'er D: Using Bayesian networks to analyze expression data. J Comput Biol. 2000, 7 (3-4): 601-620. 10.1089/106652700750050961.View ArticlePubMedGoogle Scholar
- Schafer J, Strimmer K: An empirical Bayes approach to inferring large-scale gene association networks. Bioinformatics. 2005, 21 (6): 754-764. 10.1093/bioinformatics/bti062.View ArticlePubMedGoogle Scholar
- Husmeier D: Sensitivity and specificity of inferring genetic regulatory interactions from microarray experiments with dynamic Bayesian networks. Bioinformatics. 2003, 19 (17): 2271-2282. 10.1093/bioinformatics/btg313.View ArticlePubMedGoogle Scholar
- Li Z, Chan C: Integrating gene expression and metabolic profiles. J Biol Chem. 2004, 279 (26): 27124-27137. 10.1074/jbc.M403494200.View ArticlePubMedGoogle Scholar
- Lin SM: Using functional genomic units to corroborate user experiments with the Rosetta compendium. Methods of Microarray Data Analysis II, Papers from CAMDA '01, Durham, NC, United States, Oct 15-16, 2001. 2002, 123-137.Google Scholar
- Li Z, Chan C: Inferring pathways and networks with a Bayesian framework. Faseb J. 2004, 18 (6): 746-748.PubMedGoogle Scholar
- Friedman N: Inferring cellular networks using probabilistic graphical models. Science. 2004, 303 (5659): 799-805. 10.1126/science.1094068.View ArticlePubMedGoogle Scholar
- Sachs K, Perez O, Pe'er D, Lauffenburger DA, Nolan GP: Causal protein-signaling networks derived from multiparameter single-cell data. Science. 2005, 308 (5721): 523-529. 10.1126/science.1105809.View ArticlePubMedGoogle Scholar
- Navarro VJ, R. J: Senior.Drug-related hepatoxicity. NEJM. 2006, 354: 731-739. 10.1056/NEJMra052270.View ArticlePubMedGoogle Scholar
- Felber JP, Golay A: Pathways from obesity to diabetes. Int J Obes Relat Metab Disord. 2002, 26 Suppl 2: S39-45. 10.1038/sj.ijo.0802126.View ArticlePubMedGoogle Scholar
- Kobayashi M: Molecular mechanism of insulin resistance. Saishin Igaku. 1998, 53 (6): 1210-1216.Google Scholar
- Tilg H: Cytokines and liver diseases. Can J Gastroenterol. 2001, 15 (10): 661-668.PubMedGoogle Scholar
- Watada H, Kawamori R: Insulin resistance and NASH. BIO Clinica. 2003, 18 (10): 874-879.Google Scholar
- Liebermeister W: Linear modes of gene expression determined by independent component analysis. Bioinformatics. 2002, 18 (1): 51-60. 10.1093/bioinformatics/18.1.51.View ArticlePubMedGoogle Scholar
- Martoglio AM, Miskin JW, Smith SK, MacKay DJ: A decomposition model to track gene expression signatures: preview on observer-independent classification of ovarian cancer. Bioinformatics. 2002, 18 (12): 1617-1624. 10.1093/bioinformatics/18.12.1617.View ArticlePubMedGoogle Scholar
- Srivastava S, Chan C: Hydrogen peroxide and hydroxyl radicals mediate palmitate-induced cytotoxicity to hepatoma cells: relation to mitochondrial permeability transition. Free Radic Res. 2007, 41 (1): 38-49. 10.1080/10715760600943900.View ArticlePubMedGoogle Scholar
- Hoppins SC, Nargang FE: The Tim8-Tim13 complex of Neurospora crassa functions in the assembly of proteins into both mitochondrial membranes. J Biol Chem. 2004, 279 (13): 12396-12405. 10.1074/jbc.M313037200.View ArticlePubMedGoogle Scholar
- Dobrzyn P, Dobrzyn A, Miyazaki M, Cohen P, Asilmaz E, Hardie DG, Friedman JM, Ntambi JM: Stearoyl-CoA desaturase 1 deficiency increases fatty acid oxidation by activating AMP-activated protein kinase in liver. Proc Natl Acad Sci U S A. 2004, 101 (17): 6409-6414. 10.1073/pnas.0401627101.PubMed CentralView ArticlePubMedGoogle Scholar
- Kerner J, Hoppel C: Fatty acid import into mitochondria. Biochim Biophys Acta. 2000, 1486 (1): 1-17.View ArticlePubMedGoogle Scholar
- Sampath H, Miyazaki M, Dobrzyn A, Ntambi JM: Stearoyl-CoA desaturase-1 mediates the pro-lipogenic effects of dietary saturated fat. J Biol Chem. 2007, 282 (4): 2483-2493. 10.1074/jbc.M610158200.View ArticlePubMedGoogle Scholar
- Listenberger LL, Ory DS, Schaffer JE: Palmitate-induced apoptosis can occur through a ceramide-independent pathway. J Biol Chem. 2001, 276 (18): 14890-14895. 10.1074/jbc.M010286200.View ArticlePubMedGoogle Scholar
- Ntambi JM: The regulation of stearoyl-CoA desaturase (SCD). Prog Lipid Res. 1995, 34 (2): 139-150. 10.1016/0163-7827(94)00010-J.View ArticlePubMedGoogle Scholar
- Rodriguez C, Cabrero A, Roglans N, Adzet T, Sanchez RM, Vazquez M, Ciudad CJ, Laguna JC: Differential induction of stearoyl-CoA desaturase and acyl-CoA oxidase genes by fibrates in HepG2 cells. Biochem Pharmacol. 2001, 61 (3): 357-364. 10.1016/S0006-2952(00)00557-8.View ArticlePubMedGoogle Scholar
- Miller CW, Ntambi JM: Peroxisome proliferators induce mouse liver stearoyl-CoA desaturase 1 gene expression. Proc Natl Acad Sci U S A. 1996, 93 (18): 9443-9448. 10.1073/pnas.93.18.9443.PubMed CentralView ArticlePubMedGoogle Scholar
- de Pablo MA, Susin SA, Jacotot E, Larochette N, Costantini P, Ravagnan L, Zamzami N, Kroemer G: Palmitate induces apoptosis via a direct effect on mitochondria. Apoptosis. 1999, 4 (2): 81-87. 10.1023/A:1009694124241.View ArticlePubMedGoogle Scholar
- Motz C, Martin H, Krimmer T, Rassow J: Bcl-2 and porin follow different pathways of TOM-dependent insertion into the mitochondrial outer membrane. J Mol Biol. 2002, 323 (4): 729-738. 10.1016/S0022-2836(02)00995-6.View ArticlePubMedGoogle Scholar
- Haddad JJ: On the antioxidant mechanisms of Bcl-2: a retrospective of NF-kappaB signaling and oxidative stress. Biochem Biophys Res Commun. 2004, 322 (2): 355-363. 10.1016/j.bbrc.2004.07.138.View ArticlePubMedGoogle Scholar
- Kim SH, Forman AP, Mathews MB, Gunnery S: Human breast cancer cells contain elevated levels and activity of the protein kinase, PKR. Oncogene. 2000, 19 (27): 3086-3094. 10.1038/sj.onc.1203632.View ArticlePubMedGoogle Scholar
- Hiasa Y, Kamegaya Y, Nuriya H, Onji M, Kohara M, Schmidt EV, Chung RT: Protein kinase R is increased and is functional in hepatitis C virus-related hepatocellular carcinoma. Am J Gastroenterol. 2003, 98 (11): 2528-2534.PubMedGoogle Scholar
- Saelens X, Kalai M, Vandenabeele P: Translation inhibition in apoptosis: caspase-dependent PKR activation and eIF2-alpha phosphorylation. J Biol Chem. 2001, 276 (45): 41620-41628. 10.1074/jbc.M103674200.View ArticlePubMedGoogle Scholar
- Xu Z, Williams BR: The B56alpha regulatory subunit of protein phosphatase 2A is a target for regulation by double-stranded RNA-dependent protein kinase PKR. Mol Cell Biol. 2000, 20 (14): 5285-5299. 10.1128/MCB.20.14.5285-5299.2000.PubMed CentralView ArticlePubMedGoogle Scholar
- Deng X, Ito T, Carr B, Mumby M, May WS: Reversible phosphorylation of Bcl2 following interleukin 3 or bryostatin 1 is mediated by direct interaction with protein phosphatase 2A. J Biol Chem. 1998, 273 (51): 34157-34163. 10.1074/jbc.273.51.34157.View ArticlePubMedGoogle Scholar
- Haddad JJ: Oxygen-sensing mechanisms and the regulation of redox-responsive transcription factors in development and pathophysiology. Respir Res. 2002, 3: 26-10.1186/rr190.PubMed CentralView ArticlePubMedGoogle Scholar
- Kilpatrick LE, Lee JY, Haines KM, Campbell DE, Sullivan KE, Korchak HM: A role for PKC-delta and PI 3-kinase in TNF-alpha-mediated antiapoptotic signaling in the human neutrophil. Am J Physiol Cell Physiol. 2002, 283 (1): C48-57.View ArticlePubMedGoogle Scholar
- Satoh A, Gukovskaya AS, Nieto JM, Cheng JH, Gukovsky I, Reeve JR, Shimosegawa T, Pandol SJ: PKC-delta and -epsilon regulate NF-kappaB activation induced by cholecystokinin and TNF-alpha in pancreatic acinar cells. Am J Physiol Gastrointest Liver Physiol. 2004, 287 (3): G582-91. 10.1152/ajpgi.00087.2004.View ArticlePubMedGoogle Scholar
- Dallot E, Mehats C, Oger S, Leroy MJ, Breuiller-Fouche M: A role for PKCzeta in the LPS-induced translocation NF-kappaB p65 subunit in cultured myometrial cells. Biochimie. 2005, 87 (6): 513-521. 10.1016/j.biochi.2005.02.009.View ArticlePubMedGoogle Scholar
- Sakurai H, Suzuki S, Kawasaki N, Nakano H, Okazaki T, Chino A, Doi T, Saiki I: Tumor necrosis factor-alpha-induced IKK phosphorylation of NF-kappaB p65 on serine 536 is mediated through the TRAF2, TRAF5, and TAK1 signaling pathway. J Biol Chem. 2003, 278 (38): 36916-36923. 10.1074/jbc.M301598200.View ArticlePubMedGoogle Scholar
- Griffin JL, Bonney SA, Mann C, Hebbachi AM, Gibbons GF, Nicholson JK, Shoulders CC, Scott J: An integrated reverse functional genomic and metabolic approach to understanding orotic acid-induced fatty liver. Physiol Genomics. 2004, 17 (2): 140-149. 10.1152/physiolgenomics.00158.2003.View ArticlePubMedGoogle Scholar
- Mootha VK, Lindgren CM, Eriksson KF, Subramanian A, Sihag S, Lehar J, Puigserver P, Carlsson E, Ridderstrale M, Laurila E, Houstis N, Daly MJ, Patterson N, Mesirov JP, Golub TR, Tamayo P, Spiegelman B, Lander ES, Hirschhorn JN, Altshuler D, Groop LC: PGC-1alpha-responsive genes involved in oxidative phosphorylation are coordinately downregulated in human diabetes. Nat Genet. 2003, 34 (3): 267-273. 10.1038/ng1180.View ArticlePubMedGoogle Scholar
- Li SP, Tsend JJ, Wang SC: Reconstructing gene regulatory networks from time-series microarray data. Physica A. 2005, 350: 63-69. 10.1016/j.physa.2004.11.032.View ArticleGoogle Scholar
- Kim S, Imoto S, Miyano S: Dynamic Bayesian network and nonparametric regression for nonlinear modeling of gene networks from time series gene expression data. Biosystems. 2004, 75 (1-3): 57-65. 10.1016/j.biosystems.2004.03.004.View ArticlePubMedGoogle Scholar
- Yu J, Smith VA, Wang PP, Hartemink AJ, Jarvis ED: Advances to Bayesian network inference for generating causal networks from observational biological data. Bioinformatics. 2004, 20 (18): 3594-3603. 10.1093/bioinformatics/bth448.View ArticlePubMedGoogle Scholar
- Holland MJ: Transcript abundance in yeast varies over six orders of magnitude. J Biol Chem. 2002, 277 (17): 14363-14366. 10.1074/jbc.C200101200.View ArticlePubMedGoogle Scholar
- Kim BC, Kim HT, Mamura M, Ambudkar IS, Choi KS, Kim SJ: Tumor necrosis factor induces apoptosis in hepatoma cells by increasing Ca(2+) release from the endoplasmic reticulum and suppressing Bcl-2 expression. J Biol Chem. 2002, 277 (35): 31381-31389. 10.1074/jbc.M203465200.View ArticlePubMedGoogle Scholar
- Tamatani M, Che YH, Matsuzaki H, Ogawa S, Okado H, Miyake S, Mizuno T, Tohyama M: Tumor necrosis factor induces Bcl-2 and Bcl-x expression through NFkappaB activation in primary hippocampal neurons. J Biol Chem. 1999, 274 (13): 8531-8538. 10.1074/jbc.274.13.8531.View ArticlePubMedGoogle Scholar
- Werner SL, Barken D, Hoffmann A: Stimulus specificity of gene expression programs determined by temporal control of IKK activity. Science. 2005, 309 (5742): 1857-1861. 10.1126/science.1113319.View ArticlePubMedGoogle Scholar
- cDNA microarry protocol at Van Andel Institute . [http://www.vai.org/Research/Services/LMT/Protocols.aspx]
- GEO website . [http://www.ncbi.nlm.nih.gov/geo/]
- Ni TC, Savageau MA: Model assessment and refinement using strategies from biochemical systems theory: application to metabolism in human red blood cells. J Theor Biol. 1996, 179 (4): 329-368. 10.1006/jtbi.1996.0072.View ArticlePubMedGoogle Scholar
- Ni TC, Savageau MA: Application of biochemical systems theory to metabolism in human red blood cells. Signal propagation and accuracy of representation. J Biol Chem. 1996, 271 (14): 7927-7941. 10.1074/jbc.271.14.7927.View ArticlePubMedGoogle Scholar
- Savageau MA: Biochemical systems analysis. II. The steady-state solutions for an n-pool system using a power-law approximation. J Theor Biol. 1969, 25 (3): 370-379. 10.1016/S0022-5193(69)80027-5.View ArticlePubMedGoogle Scholar
- Moles CG, Mendes P, Banga JR: Parameter estimation in biochemical pathways: a comparison of global optimization methods. Genome Res. 2003, 13 (11): 2467-2474. 10.1101/gr.1262503.PubMed CentralView ArticlePubMedGoogle Scholar
- Cheng J, Kelly J, Bell DA, Liu W: Learning belief networks from data: An information theory based approach. Artificial Intelligence Journal. 2002, 137: 43-90. 10.1016/S0004-3702(02)00191-1.View ArticleGoogle Scholar
- LibB website . [http://www.cs.huji.ac.il/~nir/]
- BN Powerconstructor website . [http://www.cs.ualberta.ca/~jcheng/bnsoft.htm]
- Heckerman D, Meek C, Cooper G: A Bayesian approach to causal discovery. Technical Report MSR-TR-97-05. 1997, Microsoft ResearchGoogle Scholar
- Friedman N, Geiger D, Goldszmidt M: Bayesian Network Classifiers. MachineLearning. 1997, 131-161.Google Scholar
- Greiner RG, Schuurmans D: Learning Bayesian nets that perform well. Proceedings of UAI-97. 1997Google Scholar
- Sprites P, Glymour C, Scheines R: Causation, Prediction, and Search. 1993, New York , Springer- VerlagView ArticleGoogle Scholar
- Cooper GF: The computational complexity of probabilistic inference using Bayesian belief networks. Artificial Intelligence. 1990, 42 (2-3): 393-405. 10.1016/0004-3702(90)90060-D.View ArticleGoogle Scholar
- Henrion M: Propagating uncertainty in Bayesian networks by probabilistic logic sampling. Uncertainty in Artificial Intellgience 2. 1998, New York, N. Y. , Elsevier Science Publishing Company, Inc, 149-163.Google Scholar
- BN inference tool Genie website . [http://genie.sis.pitt.edu/]
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.