In silico identification of NF-kappaB-regulated genes in pancreatic beta-cells
© Naamane et al; licensee BioMed Central Ltd. 2007
Received: 05 October 2006
Accepted: 15 February 2007
Published: 15 February 2007
Pancreatic beta-cells are the target of an autoimmune attack in type 1 diabetes mellitus (T1DM). This is mediated in part by cytokines, such as interleukin (IL)-1β and interferon (IFN)-γ. These cytokines modify the expression of hundreds of genes, leading to beta-cell dysfunction and death by apoptosis. Several of these cytokine-induced genes are potentially regulated by the IL-1β-activated transcription factor (TF) nuclear factor (NF)-κB, and previous studies by our group have shown that cytokine-induced NF-κB activation is pro-apoptotic in beta-cells. To identify NF-κB-regulated gene networks in beta-cells we presently used a discriminant analysis-based approach to predict NF-κB responding genes on the basis of putative regulatory elements.
The performance of linear and quadratic discriminant analysis (LDA, QDA) in identifying NF-κB-responding genes was examined on a dataset of 240 positive and negative examples of NF-κB regulation, using stratified cross-validation with an internal leave-one-out cross-validation (LOOCV) loop for automated feature selection and noise reduction. LDA performed slightly better than QDA, achieving 61% sensitivity, 91% specificity and 87% positive predictive value, and allowing the identification of 231, 251 and 580 NF-κB putative target genes in insulin-producing INS-1E cells, primary rat beta-cells and human pancreatic islets, respectively. Predicted NF-κB targets had a significant enrichment in genes regulated by cytokines (IL-1β or IL-1β + IFN-γ) and double stranded RNA (dsRNA), as compared to genes not regulated by these NF-κB-dependent stimuli. We increased the confidence of the predictions by selecting only evolutionary stable genes, i.e. genes with homologs predicted as NF-κB targets in rat, mouse, human and chimpanzee.
The present in silico analysis allowed us to identify novel regulatory targets of NF-κB using a supervised classification method based on putative binding motifs. This provides new insights into the gene networks regulating cytokine-induced beta-cell dysfunction and death.
Pancreatic insulin-producing beta-cells are selectively destroyed by the immune system in type 1 diabetes mellitus (T1DM). The autoimmune assault causes beta-cell dysfunction and death via direct contact with activated immune cells, such as macrophages and lymphocytes, and/or by exposure to soluble mediators secreted by these cells, such as pro-inflammatory cytokines, oxygen free radicals and nitric oxide (NO). The cytokines interleukin (IL)-1β, interferon (IFN)-γ and tumor necrosis factor (TNF)-α induce beta-cell death mainly by apoptosis in rodent and human islets of Langerhans . Beta-cell apoptosis is a complex and highly regulated process that depends on the expression of a large number of pro- and anti-apoptotic genes .
Using microarray analyses, we have identified diverse beta-cell gene networks regulated by IL-1β and IFN-γ [3–7]. Cytokines induce stress response genes that are either protective or deleterious for beta-cell survival, whereas genes related to differentiated beta-cell functions are down-regulated. Several of the cytokine effects in beta-cells depend on the activation of the transcription factor (TF) nuclear factor (NF)-κB [2, 3]. NF-κB is a homo- or hetero-dimeric complex of proteins from the Rel/NF-κB family, which includes p65, c-rel, relB, p50/p105 and p52/p100. In non-simulated cells NF-κB is located in the cytoplasm as an inactive protein associated with the inhibitor of NF-κB (IκB)α. When cells are simulated by agonists such as cytokines, bacterial products or viruses, IκBα is phosphorylated on serines 32 and 36 by an IκB kinase complex and degraded in the proteasome. This allows NF-κB to translocate to the nucleus where it binds to a set of related DNA target sites (κB-sites) and regulates gene expression .
Depending on the cell type and stimulatory cue NF-κB can exert anti- or pro-apoptotic functions [8, 9]. Inhibition of cytokine-induced NF-κB activation protects pancreatic beta-cells in vitro  and in vivo  against apoptosis, suggesting that NF-κB is mostly pro-apoptotic in beta-cells. To identify cytokine-regulated and NF-κB-dependent beta-cell gene networks, we performed a microarray analysis in cytokine-treated rat beta-cells in which NF-κB activation was blocked by an NF-κB super-repressor (IκB(SA)2). By this approach, 66 cytokine-modified and NF-κB regulated genes were identified, including genes coding for cytokines and chemokines and several TFs such as c-Myc, C/EBPβ and C/EBPδ . NF-κB was also found to control, via induction of inducible nitric oxide synthase (iNOS) and NO production, the expression of other TFs such as growth arrest and DNA damage (Gadd)153 and pancreatic duodenal homeobox (PDX)-1. This study was, however, limited to a single time point (24 h), and was based on an array with capacity to detect only ~8,000 probes; thus it did not allow a broad detection of the different genes regulated by NF-κB in beta-cells.
Detailed knowledge of the patterns of gene expression involved in beta-cell death, together with a better understanding on their regulation, is crucial to understand and prevent beta-cell loss in T1DM. Microarray technology allows robust massive gene expression, and we have employed this tool with success for the initial studies on beta-cell gene networks [3–7]. Discovering gene networks, however, requires frequent usage of microarrays at different time points, with and without blockers of specific transcription factors. This demands large amounts of cells, posing a major problem when dealing with rare cells such as primary beta-cells. Moreover, since there is cross talk between different networks, blocking transcription factors is seldom specific. Validation of molecular regulation of beta-cell gene expression has been done by molecular biology techniques such as gel shift assay, transient transfection assay and chromatin immunoprecipitation. These techniques are time consuming (1–2 years of work per gene) and only allow the study of transcriptional regulation of one gene at a time [12–15]. Clearly, novel approaches are required to elucidate the nature of large regulatory systems organized as networks .
To obtain comprehensive information on the NF-κB-regulated gene networks in beta-cells, we presently utilized a bioinformatics approach  to predict potential NF-κB-responsive genes. An increasing number of studies have used in silico analysis of regulatory sequences to assist the laboratory-based approaches in the search of TF targets . Some DNA sequence-based approaches used to decipher regulatory networks relies on the prior knowledge of transcription factor binding site (TFBS) preferences (which can be modelled as a position-specific scoring matrix (PSSM)) , whereas others discover new binding sites without prior consideration of the identity of the binding factor .
Predicting TFBSs in gene promoter regions using PSSMs is limited by the high number of false matches due to the low information content of the often short and degenerate TFBSs . Consequently, it is necessary to use additional information on gene regulation to improve the correlation between in silico predictions and in vivo functional binding sites. TFs are often part of cis-regulatory modules (CRMs) , and the presence of multiple binding sites for a particular TF in the upstream region of a gene increases the likelihood that the TF truly binds the gene . Moreover, regulatory sequences are often preserved through evolution by selective pressure . Thus, conserved TFBSs between different species are more likely to be functional. Against this background, we incorporated these three biological properties of gene regulation to increase the accuracy of our predictions using discriminant analysis.
Discriminant analysis is a powerful statistical pattern recognition method widely applied for data analysis in biomedical research [24, 25]. It has been successfully utilized to identify yeast genes involved in methionine and phosphate metabolism on the basis of upstream regulatory motifs . Discriminant analysis uses a training set to learn how to recognize targets for a given TF based on putative regulatory elements present in their promoter regions. We presently utilized this classification method for the first time in a mammalian system to identify new genes potentially regulated by NF-κB in pancreatic beta-cells. For this purpose, the initial analysis searched a set of 120 known NF-κB target genes (positive examples) for TFBSs over-representation using TFM-Explorer tool . The top matching scores of the most significant over-represented PSSMs in this positive control set were then used to describe the 1 kb upstream sequences from the transcription start site (TSS) of both positive and negative examples of NF-κB regulation (120 genes each). This dataset was then used to train and test two alternative methods for discrimination of NF-κB target genes, namely linear and quadratic discriminant analysis (LDA and QDA). Following these preliminary steps, a large group of human and rat beta-cell genes, detected in our previous array analysis [3–7] was then searched for potential NF-κB target genes. To further increase the reliability of the predictions, we performed a conservation-based filter taking into account the number of homologous upstream regions that are predicted as NF-κB targets in other genomes, namely rat, mouse, human and chimpanzee. Validation of the in silico analysis was achieved by comparison with our previous microarray gene expression data obtained from beta-cells exposed to different NF-κB-dependent stimuli.
Over-represented TFBSs in the upstream sequences of NF-κB regulated genes
Locally over-represented PSSMs in the upstream sequences of 120 known NF-κB-regulated genes
These over-represented matrices were used to scan and characterise the upstream sequences of beta-cell expressed genes by the patser program (see Methods). Since the presence of multiple high matrix-scores for a given TF in the upstream region of a gene increases probability of binding, each TFBS was represented by its five top matching scores. Subsequently, each gene upstream sequence was characterized by a 30-element matrix matching score vector. Note that the matrices used for scanning the sequences are partly redundant, since several of them represent the binding specificity of NF-κB. This type of redundancy is however efficiently treated by discriminant analysis.
Performance of discriminant analysis
Prediction of NF-κB target genes in beta-cells
Phylogeny-based filtering of predicted NF-κB target genes
Comparison of predicted NF-κB-regulated genes in different species
Targets in 1
Targets in 2
Total # of genes
Comparison of in silico analysis against microarray data
When comparing the present in silico findings against mRNAs that are up- or down-regulated by 6 or 24 h exposure of primary purified rat beta-cells to IL-1β and/or IFN-γ , we found that 27 of 190 cytokine (IL-1β + IFN-γ)-simulated genes (14.2%) and 15 of 99 IL-1β-regulated genes (15.2%) were predicted as NF-κB target genes in rat beta-cells (see Additional file 7: Table S7). In both cases there was a nearly 2-fold enrichment for putative NF-κB target genes as compared to non-cytokine regulated genes (p < 0.005; Fisher's exact test). We also checked the enrichment of predicted NF-κB target genes in a list of 84 dsRNA-simulated genes . 19 of the genes in this list (21%) (see Additional file 7: Table S7) and only 234 of 3488 non-dsRNA-induced genes (6.7%) were predicted as NF-κB target genes, with a 3-fold enrichment (p < 10-5; Fisher's exact test).
A combination of NF-κB blocking with microarray analysis , has identified 66 cytokine-induced and NF-κB-regulated (direct or indirect targets) genes in primary rat beta-cells. Of note, this study used only one late time point, namely 24 h. 53 of the 66 genes were present in the set of beta-cell-expressed genes. To render the comparison more reliable, NO-regulated genes were removed from this list. It has been shown by a time course microarray analysis  that cytokines induce a late NO production which indirectly modifies the expression of nearly 50% of the cytokine-affected mRNAs after 12 h. Among the 53 NF-κB-regulated genes, 17 are NO-independent (putative direct targets)  and 6 of these cytokine induced, NF-κB-regulated and NO-independent genes (see Additional file 7: Table S7) were predicted as putative NF-κB target genes by the LDA classifier (35.3%, p < 0.0007; Fisher's exact test).
Functional classes and temporal gene expression clusters enriched in predicted NF-κB target genes
Distribution of functional classes and temporal gene expression clusters between predicted NF-κB target and non-target genes in INS-1 cells.
Hormones and growth factors
Apoptosis ER stress
List of cytokine-regulated and NO-independent genes predicted as NF-κB targets in primary rat beta-cells.
Ensembl gene ID
Nitric oxide synthase, inducible
Small inducible cytokine B10 precursor
Macrophage inflammatory protein 2 precursor
NF-κB inhibitor alpha
H-2 class II histocompatibility antigen, gamma chain
RT1 class Ia, locus A1
RT1 class II, locus Ba
Protein kinase C, delta type
Growth regulated alpha protein precursor
ATP-sensitive inward rectifier potassium channel 11
RT1 class I, CE4
Lipid phosphate phosphohydrolase 1
Multidrug resistance protein 1
Superoxide dismutase 2, mitochondrial
Tumor necrosis factor precursor
CD166 antigen precursor
RT1 class Ib gene RT1-M3
S-100 protein, alpha chain.
Interferon regulatory factor 1
Voltage-dependent anion-selective channel protein 1
DNA-binding protein inhibitor ID-2
Synaptonemal complex protein SC65.
Transcription factor GATA-4
Hydroxymethylglutaryl-CoA synthase, cytoplasmic
Identification of TF target genes by computational approaches poses more difficulties in higher eukaryotes than in organisms with smaller genomes such as yeast. Scanning of large mammalian genomes for PSSMs matches is done in an enormous sequence space, as compared to the short size of DNA motifs recognized by TFs . This leads to a poor accuracy in TF target prediction, since only a small fraction of the predicted binding sites will have a functional role . To reduce false predictions, the present study was restricted to the 1 kb upstream of the TSS for each gene. In line with other studies [34, 35], we have previously observed that most PSSM-predicted NF-κB binding sites are located within 1 kb upstream the TSS . This restriction, however, may decrease sensitivity and contribute for the fact that some known target genes escaped detection.
PSSM-based approaches for TFBSs prediction often relay on broad and sometimes inaccurate assumptions, and do not take in consideration putative combinatorial interactions between TFs that recognize multiple sites . To incorporate such biological annotation to the prediction of NF-κB target genes, we first searched for common regulatory elements in the upstream sequences of a set of known NF-κB target genes. In addition to the PSSMs of the different members of the Rel/NF-κB family, one PSSM corresponding to C/EBP was also over-represented in the set of positive controls (Table 1). NF-κB and C/EBP are known to interact, and their binding sites combine and form regulatory modules for several genes [37, 38]. Moreover, matrix-based methods have been already used to predict genes with composite NF-κB:C/EBP regulatory sites . To account for the frequent presence of multiple binding sites for the same TF in a given regulatory region, we detected multiple hits from each locally over-represented PSSM to characterize individual upstream sequences, and used them as input to train the classifier.
Alignment-based phylogenetic footprinting methods are widely used to improve the specificity of TF target genes prediction [28, 40]. These methods, however, rely on the assumption that the regulatory regions are sufficiently conserved to be aligned. Instead of searching for putative TFBS that are situated in conserved regions in alignment between orthologous sequences, we used our classifier to screen each gene in parallel with a set of its homologs in other species: a given gene was considered as an NF-κB target only if its homologs were also classified as NF-κB regulated genes. This improved the accuracy of our predictions (several of the genes identified by this approach have been previously shown in other tissues to be NF-κB-dependent [41, 42]), but lead to an 70–80% decrease in the number of hits. In other words, although this approach apparently decreases false positives, it leads to a large increase in false negatives. Thus, manganese superoxide dismutase (MnSOD) and c-Myc, known NF-κB dependent genes [14, 43], were lost following this step.
The validation of computational methods is crucial to assess the significance of bioinformatics predictions. One of the possible ways for validation is the use of global gene expression intersection . Thus, we utilized microarray datasets from our group reporting the transcriptional response of beta-cells to different putative NF-κB-dependent stimuli [3–6]. By comparing them to our in silico predictions we observed that, in general, genes regulated by putative NF-κB-dependent stimuli had a 2–3-fold higher probability to be predicted as NF-κB targets than non-responsive genes (p < 0.005). Considering that these microarray experiments included few time points and a limited set of genes, and that gene expression can also be regulated by variables such as chromatin configuration, which is not detected by in silico approaches, the 20–30% agreement between our predictions and actual gene expression data is reasonable. Functional classes such as cytokines and chemokines, MHC-related genes and adhesion molecules (Table 3), whose expression is known to be regulated by NF-κB in other tissues [45, 46], were significantly enriched in a set of 32 manually annotated genes predicted as NF-κB targets. In agreement with these observations, GO analysis indicated that categories such as "immune response" and "antigen presentation and processing" are over-represented in putative NF-κB-dependent genes. In addition, the temporal expression profile described by the enriched temporal "cluster 1" (see ) is consistent with NF-κB regulation.
List of primary rat beta-cell genes predicted as NF-κB targets in both rat and a primate species (top 30 genes).
Ensembl gene ID
Nitric oxide synthase, inducible
Small inducible cytokine B10 precursor
RAB5A, member RAS oncogene family
Macrophage inflammatory protein 2 precursor
Sulfonylurea receptor 1
PREDICTED: vascular endothelial growth factor B
NF-κB inhibitor alpha
cAMP response element binding protein
Small inducible cytokine B5 precursor
Mucosal addressing cell adhesion molecule 1 precursor
Platelet-activating factor acetylhydrolase IB gamma subunit
RT1 class I, CE7
RT1 class I, CE5 isoform 2
RT1 class I, A3
RT1 class Ia, locus A1
Neuroendocrine convertase 2 precursor
Pro-epidermal growth factor precursor
Protein kinase C, delta type
Interferon beta precursor
hydroxy-delta-5-steroid dehydrogenase, 3 beta- and steroid delta-isomerase 7
RT1 class I, CE4
Serine/threonine-protein kinase PCTAIRE-2
START domain containing 3
myeloid-associated differentiation marker
Improvement of in silico analysis may be achieved by a more efficient integration of other types of genomic data. For instance by adding gene expression profiles to the matrix score vectors describing the upstream sequences, we can provide an important discriminative criterion to the classifier. Expression profiles of genes regulated by the same TF are often highly correlated , and addition of this information to the classifier may improve prediction specificity.
The sequencing of the human, rat and mouse genomes [34, 49–51] allows the development of new approaches to determine global cellular regulatory mechanisms by in silico sequence analysis. In the present work discriminant analysis has been successfully applied to identify novel NF-κB-regulated genes in pancreatic beta-cells. The discriminant classifier was developed based on the matrix score profiles of putative TFBSs in the upstream sequences of NF-κB-regulated and non-regulated genes and showed reasonable predictive power. The results obtained provide new insights into the modeling of gene networks regulating cytokine-induced beta-cell dysfunction and death, and open several new avenues for research. In future work, the method will be improved and applied for detecting the regulatory targets of other TFs, such as STAT-1, that also regulate key beta-cell genes implicated in beta-cell death .
Microarray gene expression data, in combination with the present and future in silico sequence analysis, will hopefully provide valuable tools to unravel the architecture of key beta-cell gene networks. The in silico work will help to characterize gene clusters regulated by similar transcription factors, and will focus the laborious promoter studies on selected genes. This combined approach will identify the genetic network structure of beta-cells and might generate new targets for drug design and imaging. For instance, and based on this approach, we have already developed in vivo approaches to prevent experimental diabetes by blocking NF-κB  and STAT-1  and identified several interesting targets for beta-cell imaging (Flamez D, Kutlu B, Goodman N and Eizirik DL, unpublished data).
In conclusion, the present approach constitutes a "proof of principle" for the integrated use of functional genomics [3–7] and bioinformatics (; present study) in the detailed molecular characterization of a relevant cell type for human pathology. By following this integrated approach, we expect to fully map the interacting networks of genes and proteins downstream of the pro-apoptotic signals leading to beta-cell death in T1DM. This will allow us to move the search for a cure for T1DM from an empiric and often blind approach to one that is really mechanistically driven – the ultimate outcome being the development of logical and targeted therapies to prevent the disease.
To train and evaluate the discriminant analysis methods used in this study, we acquired a calibration dataset consisting of putative promoter sequences for positive and negative examples of NF-κB regulation. The set of positive examples was extracted from a compilation of genes known to contain functional NF-κB binding sites from diverse tissues of human, mouse and rat . From this collection we selected 96 human, 17 mouse and 7 rat genes with a strong experimental evidence for NF-κB binding. As negative examples we selected 120 genes with the least significant changes in expression in a microarray analysis where rat beta-cells were stimulated by cytokines . These genes are supposed not to be regulated by NF-κB.
Upstream sequence collections
We analyzed the promoters of sets of genes expressed in rat primary beta-cells (3575 genes) [3–5], insulin-producing INS-1E cells (3068 genes)  or human islets (9443 genes) . For each gene the 1 kb upstream sequence, starting from the TSS, was retrieved from the ENSEMBL database (release 35, ) and analyzed as explained in the next section. The choice of the 1 kb limit for upstream sequence was based on findings on rodent genomes indicating that most annotated TFBSs are located at this position .
PSSM selection and binding site scoring
The web application TFM-Explorer  was used to determine PSSMs enriched in NF-κB target genes. This program identifies all potential TFBSs in the set of promoter sequences using all available vertebrate matrices of the matrix library collected in the TRANSFAC database. It reports statistically significant regions where predicted binding sites show local over-representation. The top six significant matrices discovered by this method were used to scan both the upstream sequences present in the calibration dataset and those corresponding to genes expressed in primary rat beta-cells, INS-1E cells and human pancreatic islets. This scanning step was performed using the pattern-matching program patser . For a given PSSM of width w, the patser program slided a window of length w along both strand sequences and assigned a score to each position; the top five matching scores were retrieved for each analyzed upstream sequence. Each upstream sequence was thus represented by a 30-element TFBS matrix score vector (5 top scores × 6 matrices).
Discriminant analysis seeks to find a rule for accurately predicting a categorical response (i.e. regulated vs. not regulated) based on a set of measured variables (i.e. TFBS matrix-scores) . Our selected dataset was used to train two discrimination methods, LDA and QDA, for recognizing NF-κB target genes according to the observed matrix matching scores in their promoter regions. The ultimate goal was to allocate a gene to a regulation group, NF-κB target or non target genes, based on the 30-element vector of TFBS matrix scores. In addition to assigning each element to a group (regulated or not), discriminant analysis estimates posterior probabilities, indicating the probability for this element to belong to the respective groups and classifying the gene as belonging to the group with the highest posterior probability. Discriminant analysis also allows specifying prior probabilities to estimate the fraction of elements expected in the different groups. LDA and QDA differ in that LDA is based on the assumption that the variables are multivariate normally distributed in each group, with different mean vectors but identical covariance matrices, whereas QDA is based on the assumption of group-specific covariance matrices.
Cross validation, variable selection and noise reduction
Stratified 5-fold cross-validation
To evaluate the accuracy of the classification methods utilized in this work, the above described dataset was first divided into five subsets of equal size, with the positive and negative examples of NF-κB regulation represented by the same number of genes. In each experiment four subsets were used for training and the remaining one for testing. Performance statistics were then averaged over the five test folds. Three different statistics were used to evaluate the predictive performance of LDA and QDA. Sensitivity is Sn = TP/(TP+FN), specificity is Sp = TN/(TN+FP) and positive predictive value is PPV = TP/(TP+FP), where TP, TN, FP and FN refer to the number of True Positives, True Negatives, False Positives and False Negatives, respectively.
Variable selection is a crucial step in machine learning. Due to the problem of over-fitting many classification methods perform poorly when taking into consideration large numbers of variables . The problem of over-dimensionality is particularly sensible in QDA, since this method considers each pairwise combination of variables. To reduce the number of variables we presently applied a forward stepwise procedure, which starts from an empty set of variables and adds at each step a single variable which produces the greatest improvement in the performance of the classifier . Within each training phase, the forward stepwise variable selection procedure was performed using an internal leave-one-out cross-validation (LOOCV) [55, 56]. LOOCV is the extreme case of the k-fold cross validation procedure: if one has N data examples, N experiments will be performed with N-1 training cases and 1 test case. The performance statistics are then averaged over the N test folds.
Noise reduction by iterative procedure
Given the protocol for building the calibration dataset, the training groups may themselves contain errors. In particular, the negative control set might contain genes which would be regulated by NF-κB under different conditions than those presently tested. In addition, the positive training set might contain genes for which the binding sites are outside the 1 Kb sequence considered in this analysis. Such erroneous training examples should be removed as they can affect the performance of the discriminant analysis . To address this issue, an iterative procedure was performed within each training phase to reduce noise. The classifier was first trained using the original training subset; then, in the next round of the procedure, the elements which were misclassified during the internal LOOCV were removed, and the remaining examples were used as a pre-processed training subset. The procedure was iterated until the training subset was not any more modified.
Phylogeny-based filtering of predicted NF-κB target genes
Rat, human, mouse and chimpanzee homologs from ENSEMBL database were retrieved for each gene expressed in rat primary beta-cells, insulin producing INS-1E cells or human pancreatic islets. The discriminant procedure took as input these sets of homologs, predicted NF-κB regulation and returned a posterior probability for each gene.
The classification was initially performed for each organism separately. Then, the overlap predictions (number of genes predicted in both organisms) was computed for each pair of organisms and tested for significance by the hypergeometric distribution using the compare-classes program from the Regulatory Sequence Analysis Tools (RSAT) suite . The program compare-classes compares two classifications (clustering results, functional classes, etc.) and assesses the statistical significance of common members between each pair of classes by calculating the hypergeometric probability. For the final prediction, we required homologs predicted as NF-κB-regulated genes to be present in at least one rodent and one primate species.
Comparison of in silico analysis against microarray data
The microarray data utilized for the comparison against the present in silico analysis were obtained using the rat genome Affymetrix U34-A Gene Chips containing ~8,000 probes or the human genome U133-A arrays containing 22,000 probe sets corresponding to 14500 distinct genes. Using these arrays, we detected 3575, 3068 and 9443 genes expressed in respectively primary rat beta-cells [3–5], insulin producing rat INS-1E cells  and primary human islets . Integrated information on these genes is available at the "Beta-Cell Gene Expression Bank" .
To assess the statistical significance of the over-representation of predicted NF-κB-regulated genes we used the Fisher exact probability test for 2 × 2 contingency tables implemented in the statistical package R .
From nearly 500 genes described as cytokine-regulated , 225 NO-independent genes were retrieved and mapped to the list of 3068 genes expressed in INS-1E cells, resulting in 32 genes predicted as NF-κB-target. The program compare-classes was used to detect significant overlaps between annotated classes (14 different functional classes and 15 temporal gene expression clusters, described in ), and the subsets of genes predicted as NF-κB targets (32 genes) or not (193 genes).
The FatiGO (Fast Assignment and Transference of Information using Gene Ontology (GO)) web tool , available at , was used to search for significant differences in distributions of GO:Biological Process (GO:BP) categories between predicted NF-κB-regulated and non-regulated groups of genes. GO:BP categories that were statistically over- or under-represented in the predicted sets of NF-κB target genes were identified by a Fisher's exact test (adjusted p < 0.05) that consider multiple testing. Adjusted p-values returned by FatiGO were calculated using the false discovery rate (FDR) .
This work was supported by a grant from the European Union (Integrated Project EuroDia LSHM-CT-2006-518153 in the Framework Program 6 of the European Community), the Fonds National de la Recherche Scientifique (FNRS), Actions de Recherche Concertées de la Communauté Française, Belgium (DLE) and EFSD/Pfizer – Resource Awards for European Diabetes Research. We are grateful to Dr. Burak Kutlu for helpful discussions in the initial part of the study.
- Eizirik DL, Mandrup-Poulsen T: A choice of death – the signal-transduction of immune-mediated beta-cell apoptosis. Diabetologia 2001, 44: 2115–2133. 10.1007/s001250100021View ArticlePubMedGoogle Scholar
- Cnop M, Welsh N, Jonas JC, Jorns A, Lenzen S, Eizirik DL: Mechanisms of pancreatic beta-cell death in type 1 and type 2 diabetes: many differences, few similarities. Diabetes 2005, 54(Suppl 2):S97–107.View ArticlePubMedGoogle Scholar
- Cardozo AK, Kruhoffer M, Leeman R, Orntoft T, Eizirik DL: Identification of novel cytokine-induced genes in pancreatic β-cells by high-density oligonucleotide arrays. Diabetes 2001, 50: 909–920.View ArticlePubMedGoogle Scholar
- Cardozo AK, Heimberg H, Heremans Y, Leeman R, Kutlu B, Kruhoffer M, Orntoft T, Eizirik DL: A comprehensive analysis of cytokine-induced and nuclear factor-κB-dependent genes in primary rat pancreatic β-cells. J Biol Chem 2001, 276: 48879–48886. 10.1074/jbc.M108658200View ArticlePubMedGoogle Scholar
- Rasschaert J, Liu D, Kutlu B, Cardozo AK, Kruhoffer M, ORntoft TF, Eizirik DL: Global profiling of double stranded RNA- and IFN-γ-induced genes in rat pancreatic beta cells. Diabetologia 2003, 46: 1641–1657. 10.1007/s00125-003-1245-yView ArticlePubMedGoogle Scholar
- Kutlu B, Cardozo AK, Darville MI, Kruhoffer M, Magnusson N, Orntoft T, Eizirik DL: Discovery of gene networks regulating cytokine-induced dysfunction and apoptosis in insulin-producing INS-1 cells. Diabetes 2003, 52: 2701–2719.View ArticlePubMedGoogle Scholar
- Ylipaasto P, Kutlu B, Rasilainen S, Rasschaert J, Salmela K, Teerijoki H, Korsgren O, Lahesmaa R, Hovi T, Eizirik DL, Otonkoski T, Roivainen M: Global profiling of coxsackievirus- and cytokine-induced gene expression in human pancreatic islets. Diabetologia 2005, 48: 1510–1522. 10.1007/s00125-005-1839-7View ArticlePubMedGoogle Scholar
- Hayden MS, Ghosh S: Signaling to NF-κB. Genes Dev 2004, 18: 2195–224. 10.1101/gad.1228704View ArticlePubMedGoogle Scholar
- Ortis F, Cardozo AK, Crispim D, Storling J, Mandrup-Poulsen T, Eizirik DL: Cytokine-induced proapoptotic gene expression in insulin-producing cells is related to rapid, sustained, and nonoscillatory nuclear factor-κB activation. Mol Endocrinol 2006, 20: 1867–1879. 10.1210/me.2005-0268View ArticlePubMedGoogle Scholar
- Heimberg H, Heremans Y, Jobin C, Leemans R, Cardozo AK, Darville M, Eizirik DL: Inhibition of cytokine-induced NF-κB activation by adenovirus-mediated expression of a NF-κB super-repressor prevents β-cell apoptosis. Diabetes 2001, 50: 2219–2224.View ArticlePubMedGoogle Scholar
- Eldor R, Yeffet A, Baum K, Doviner V, Amar D, Ben-Neriah Y, Christofori G, Peled A, Carel JC, Boitard C, Klein T, Serup P, Eizirik DL, Melloul D: Conditional and specific NF-κB blockade protects pancreatic beta cells from diabetogenic agents. Proc Natl Acad Sci USA 2006, 103: 5072–5077. 10.1073/pnas.0508166103PubMed CentralView ArticlePubMedGoogle Scholar
- Darville MI, Eizirik DL: Cytokine induction of Fas gene expression in insulin-producing cells requires the transcription factors NF-κB and C/EBP. Diabetes 2001, 50: 1741–1748.View ArticlePubMedGoogle Scholar
- Darville MI, Eizirik DL: Regulation by cytokines of the inducible nitric oxide synthase promoter in insulin-producing cells. Diabetologia 1998, 41: 1101–1108. 10.1007/s001250051036View ArticlePubMedGoogle Scholar
- Darville MI, Ho YS, Eizirik DL: NF-κB is required for cytokine-induced manganese superoxide dismutase expression in insulin-producing cells. Endocrinology 2000, 141: 153–162. 10.1210/en.141.1.153PubMedGoogle Scholar
- Kutlu B, Darville MI, Cardozo AK, Eizirik DL: Molecular regulation of monocyte chemoattractant protein-1 expression in pancreatic β-cells. Diabetes 2003, 52: 348–355.View ArticlePubMedGoogle Scholar
- Davidson EH, McClay DR, Hood L: Regulatory gene networks and the properties of the developmental process. Proc Natl Acad Sci USA 2003, 100: 1475–1480. 10.1073/pnas.0437746100PubMed CentralView ArticlePubMedGoogle Scholar
- Gonze D, Pinloche S, Gascuel O, van Helden J: Discrimination of yeast genes involved in methionine and phosphate metabolism on the basis of upstream motifs. Bioinformatics 2005, 21: 3490–3500. 10.1093/bioinformatics/bti558View ArticlePubMedGoogle Scholar
- Blais A, Dynlacht BD: Constructing transcriptional regulatory networks. Genes Dev 2005, 19: 1499–1511. 10.1101/gad.1325605View ArticlePubMedGoogle Scholar
- Marchal K, De Keersmaecker S, Monsieurs P, van Boxel N, Lemmens K, Thijs G, Vanderleyden J, De Moor B: In silico identification and experimental validation of PmrAB targets in Salmonella typhimurium by regulatory motif detection. Genome Biol 2004, 5: R9. 10.1186/gb-2004-5-2-r9PubMed CentralView ArticlePubMedGoogle Scholar
- Tompa M, Li N, Bailey TL, Church GM, De Moor B, Eskin E, Favorov AV, Frith MC, Fu Y, Kent WJ, Makeev VJ, Mironov AA, Noble WS, Pavesi G, Pesole G, Regnier M, Simonis N, Sinha S, Thijs G, van Helden J, Vandenbogaert M, Weng Z, Workman C, Ye C, Zhu Z: Assessing computational tools for the discovery of transcription factor binding sites. Nat Biotechnol 2005, 23: 137–144. 10.1038/nbt1053View ArticlePubMedGoogle Scholar
- Wasserman WW, Sandelin A: Applied bioinformatics for the identification of regulatory elements. Nat Rev Genet 2004, 5: 276–287. 10.1038/nrg1315View ArticlePubMedGoogle Scholar
- Holloway DT, Kon M, DeLisi C: Integrating genomic data to predict transcription factor binding. Genome Inform 2005, 16: 83–94.PubMedGoogle Scholar
- Dermitzakis ET, Clark AG: Evolution of transcription factor binding sites in Mammalian gene regulatory regions: conservation and turnover. Mol Biol Evol 2002, 19: 1114–1121.View ArticlePubMedGoogle Scholar
- Grouven U, Bergel F, Schultz A: Implementation of linear and quadratic discriminant analysis incorporating costs of misclassification. Comput Methods Programs Biomed 1996, 49: 55–60. 10.1016/0169-2607(95)01705-4View ArticlePubMedGoogle Scholar
- Zhang MQ: Discriminant analysis and its application in DNA sequence motif recognition. Brief Bioinform 2000, 1: 331–342. 10.1093/bib/1.4.331View ArticlePubMedGoogle Scholar
- Defrance M, Touzet H: Predicting transcription factor binding sites using local over-representation and comparative genomics. BMC Bioinformatics 2006, 7: 396. 10.1186/1471-2105-7-396PubMed CentralView ArticlePubMedGoogle Scholar
- Stormo GD: DNA binding sites: representation and discovery. Bioinformatics 2000, 16: 16–23. 10.1093/bioinformatics/16.1.16View ArticlePubMedGoogle Scholar
- Bulyk ML: Computational prediction of transcription-factor binding site locations. Genome Biol 2003, 5: 201. 10.1186/gb-2003-5-1-201PubMed CentralView ArticlePubMedGoogle Scholar
- Wingender E, Chen X, Hehl R, Karas H, Liebich I, Matys V, Meinhardt T, Pruss M, Reuter I, Schacherer F: TRANSFAC: an integrated system for gene expression regulation. Nucleic Acids Res 2000, 28: 316–319. 10.1093/nar/28.1.316PubMed CentralView ArticlePubMedGoogle Scholar
- Takahata N, Satta Y: Evolution of the primate lineage leading to modern humans: phylogenetic and demographic inferences from DNA sequences. Proc Natl Acad Sci USA 1997, 94: 4811–4815. 10.1073/pnas.94.9.4811PubMed CentralView ArticlePubMedGoogle Scholar
- Springer MS, Murphy WJ, Eizirik E, O'Brien SJ: Placental mammal diversification and the Cretaceous-Tertiary boundary. Proc Natl Acad Sci USA 2003, 100: 1056–1061. 10.1073/pnas.0334222100PubMed CentralView ArticlePubMedGoogle Scholar
- Zhu Z, Pilpel Y, Church GM: Computational identification of transcription factor binding sites via a transcription-factor-centric clustering (TFCC) algorithm. J Mol Biol 2002, 318: 71–81. 10.1016/S0022-2836(02)00026-8View ArticlePubMedGoogle Scholar
- Holstege FC, Clevers H: Transcription factor target practice. Cell 2006, 124: 21–23. 10.1016/j.cell.2005.12.026View ArticlePubMedGoogle Scholar
- Waterston RH, Lindblad-Toh K, Birney E, Rogers J, Abril JF, Agarwal P, Agarwala R, Ainscough R, Alexandersson M, An P, Antonarakis SE, Attwood J, Baertsch R, Bailey J, Barlow K, Beck S, Berry E, Birren B, Bloom T, Bork P, Botcherby M, Bray N, Brent MR, Brown DG, Brown SD, Bult C, Burton J, Butler J, Campbell RD, Carninci P, et al.: Initial sequencing and comparative analysis of the mouse genome. Nature 2002, 420: 520–562. 10.1038/nature01262View ArticlePubMedGoogle Scholar
- Elkon R, Linhart C, Sharan R, Shamir R, Shiloh Y: Genome-wide in silico identification of transcriptional regulators controlling the cell cycle in human cells. Genome Res 2003, 13: 773–780. 10.1101/gr.947203PubMed CentralView ArticlePubMedGoogle Scholar
- Kutlu B, Naamane N, Berthou L, Eizirik DL: New approaches for in silico identification of cytokine-modified beta cell gene networks. Ann N Y Acad Sci 2004, 1037: 41–58. 10.1196/annals.1337.007View ArticlePubMedGoogle Scholar
- Betts JC, Cheshire JK, Akira S, Kishimoto T, Woo P: The role of NF-κB and NF-IL6 transactivating factors in the synergistic activation of human serum amyloid A gene expression by interleukin-1 and interleukin-6. J Biol Chem 1993, 268: 25624–25631.PubMedGoogle Scholar
- Matsusaka T, Fujikawa K, Nishio Y, Mukaida N, Matsushima K, Kishimoto T, Akira S: Transcription factors NF-IL6 and NF-κB synergistically activate transcription of the inflammatory cytokines, interleukin 6 and interleukin 8. Proc Natl Acad Sci USA 1993, 90: 10193–10197. 10.1073/pnas.90.21.10193PubMed CentralView ArticlePubMedGoogle Scholar
- Shelest E, Kel AE, Goessling E, Wingender E: Prediction of potential C/EBP/NF-κB composite elements using matrix-based search methods. In Silico Biol 2003, 3: 71–79.PubMedGoogle Scholar
- Qiu P: Recent advances in computational promoter analysis in understanding the transcriptional regulatory network. Biochem Biophys Res Commun 2003, 309: 495–501. 10.1016/j.bbrc.2003.08.052View ArticlePubMedGoogle Scholar
- Pahl HL: Activators and target genes of Rel/NF-κB transcription factors. Oncogene 1999, 18: 6853–6866. 10.1038/sj.onc.1203239View ArticlePubMedGoogle Scholar
- Rel/NF-κB Transcription Factors[http://people.bu.edu/gilmore/nf-kb/target/]
- Duyao MP, Buckler AJ, Sonenshein GE: Interaction of an NF-κB-like factor with a site upstream of the c-myc promoter. Proc Natl Acad Sci USA 1990, 87: 4727–4731. 10.1073/pnas.87.12.4727PubMed CentralView ArticlePubMedGoogle Scholar
- Wasserman WW, Krivan W: In silico identification of metazoan transcriptional regulatory regions. Naturwissenschaften 2003, 90: 156–166.PubMedGoogle Scholar
- Radhakrishnan SK, Kamalakaran S: Pro-apoptotic role of NF-κB: Implications for cancer therapy. Biochim Biophys Acta 2006, 1766: 53–62.PubMedGoogle Scholar
- NF-κB target genes[http://bioinfo.lifl.fr/NF-KB/]
- Lee KW, Lee Y, Kwon HJ, Kim DS: Sp1-associated activation of macrophage inflammatory protein-2 promoter by CpG-oligodeoxynucleotide and lipopolysaccharide. Cell Mol Life Sci 2005, 62: 188–198. 10.1007/s00018-004-4399-yView ArticlePubMedGoogle Scholar
- Ohmori Y, Hamilton TA: Cooperative interaction between interferon (IFN) stimulus response element and κB sequence motifs controls IFN γ- and lipopolysaccharide-stimulated transcription from the murine IP-10 promoter. J Biol Chem 1993, 268: 6677–6688.PubMedGoogle Scholar
- Lander ES, Linton LM, Birren B, Nusbaum C, Zody MC, Baldwin J, Devon K, Dewar K, Doyle M, FitzHugh W, Funke R, Gage D, Harris K, Heaford A, Howland J, Kann L, Lehoczky J, LeVine R, McEwan P, McKernan K, Meldrim J, Mesirov JP, Miranda C, Morris W, Naylor J, Raymond C, Rosetti M, Santos R, Sheridan A, Sougnez C, et al.: Initial sequencing and analysis of the human genome. Nature 2001, 409: 860–921. 10.1038/35057062View ArticlePubMedGoogle Scholar
- Venter JC, Adams MD, Myers EW, Li PW, Mural RJ, Sutton GG, Smith HO, Yandell M, Evans CA, Holt RA, Gocayne JD, Amanatides P, Ballew RM, Huson DH, Wortman JR, Zhang Q, Kodira CD, Zheng XH, Chen L, Skupski M, Subramanian G, Thomas PD, Zhang J, Gabor Miklos GL, Nelson C, Broder S, Clark AG, Nadeau J, McKusick VA, Zinder N, et al.: The sequence of the human genome. Science 2001, 291: 1304–51. 10.1126/science.1058040View ArticlePubMedGoogle Scholar
- Gibbs RA, Weinstock GM, Metzker ML, Muzny DM, Sodergren EJ, Scherer S, Scott G, Steffen D, Worley KC, Burch PE, Okwuonu G, Hines S, Lewis L, DeRamo C, Delgado O, Dugan-Rocha S, Miner G, Morgan M, Hawes A, Gill R, Celera , Holt RA, Adams MD, Amanatides PG, Baden-Tillson H, Barnstead M, Chin S, Evans CA, Ferriera S, Fosler C, et al.: Genome sequence of the Brown Norway rat yields insights into mammalian evolution. Nature 2004, 428: 493–521. 10.1038/nature02426View ArticlePubMedGoogle Scholar
- Gysemans CA, Ladriere L, Callewaert H, Rasschaert J, Flamez D, Levy DE, Matthys P, Eizirik DL, Mathieu C: Disruption of the γ-interferon signaling pathway at the level of signal transducer and activator of transcription-1 prevents immune destruction of beta-cells. Diabetes 2005, 54: 2396–2403.View ArticlePubMedGoogle Scholar
- Hertz GZ, Stormo GD: Identifying DNA and protein patterns with statistically significant alignments of multiple sequences. Bioinformatics 1999, 15: 563–577. 10.1093/bioinformatics/15.7.563View ArticlePubMedGoogle Scholar
- Huberty C: Applied Discriminant Analysis. New York: John Wiley & Sons; 1994.Google Scholar
- Levner I: Feature selection and nearest centroid classification for protein mass spectrometry. BMC Bioinformatics 2005, 6: 68. 10.1186/1471-2105-6-68PubMed CentralView ArticlePubMedGoogle Scholar
- Lorena AC, de Carvalho AC: Evaluation of noise reduction techniques in the splice junction recognition problem. Genetics and Molecular Biology 2004, 27: 665–672. 10.1590/S1415-47572004000400031View ArticleGoogle Scholar
- van Helden J: Regulatory sequence analysis tools. Nucleic Acids Res 2003, 31: 3593–3596. 10.1093/nar/gkg567PubMed CentralView ArticlePubMedGoogle Scholar
- Beta-Cell Gene Expression Bank[http://t1dbase.org/cgi-bin/enter_bcgb.cgi]
- Statistical package R[http://cran.r-project.org]
- Al-Shahrour F, Diaz-Uriarte R, Dopazo J: FatiGO: a web tool for finding significant associations of Gene Ontology terms with groups of genes. Bioinformatics 2004, 20: 578–580. 10.1093/bioinformatics/btg455View ArticlePubMedGoogle Scholar
- Benjamini Y, Hochberg Y: Controlling the false discovery rate: a practical and powerful approach to multiple testing. J Roy Stat Soc B 1995, 57: 289–300.Google Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.