Skip to main content

Advertisement

Functional and genetic analysis of the colon cancer network

Article metrics

Abstract

Cancer is a complex disease that has proven to be difficult to understand on the single-gene level. For this reason a functional elucidation needs to take interactions among genes on a systems-level into account. In this study, we infer a colon cancer network from a large-scale gene expression data set by using the method BC3Net. We provide a structural and a functional analysis of this network and also connect its molecular interaction structure with the chromosomal locations of the genes enabling the definition of cis- and trans-interactions. Furthermore, we investigate the interaction of genes that can be found in close neighborhoods on the chromosomes to gain insight into regulatory mechanisms. To our knowledge this is the first study analyzing the genome-scale colon cancer network.

Background

Colon cancer is one of the leading causes of cancer related mortality in the western world [1]. It is a complex disease that is thought to mainly arise from polypoid lesions in the intestines as a result of inherited or somatic genetic alterations. These precursor lesions acquire further aberrations as they progress from adenoma to adenocarcinoma to metastatic disease, which in a simplified view can be described as a successive cascade of genetic changes [2, 3]. The most common gene mutations occurring in colorectal cancer effect APC (tumor supressor), MLH1, TP53, SMAD4, KRAS and BRAF [4]. While significant progress has recently been made in characterizing the heterogeneity of the resulting disease subtypes and the effects of different combinations of these common mutations, a better understanding of the underlying gene networks is required, particularly, since the identification of general biomarkers has been unsuccessful as the disease stages and forms are highly specific to individuals. One reason for this observation is that genes are organized in non-linear overlapping pathways and act in a complex cellular network. Such an organizational structure allows alternative regulatory mechanisms to differentially control similar biological processes. Hence, multiple combinations of genes can result in similar phenotypic outcomes. As a result, cancer can be considered a pathway disease, which cannot be well characterized by individual marker genes [5, 6]. For example, in colorectal cancer, activation of Wnt signaling is observed in nearly all tumors. However this can be mediated by inactivating mutation of the APC gene or hyper-activation of beta-catenin, or through mutation of genes with functions analogous to APC [7].

Due to experimental limitations, our knowledge of the underlying network in the cancer specific context is limited. Rather gene regulatory networks are inferred from large-scale gene expression data and provide a description of the mutual dependency structure between individual genes. The relationships represent different interaction types within the gene network that involve transcriptional regulatory interactions, (e.g. transcription factor target gene interactions); protein-protein interactions (e.g. between units of a protein complex) or more transient protein modifying interactions (e.g. phosphorylation events).

There are many factors that are thought to influence the regulation and explain changes of gene expression or signaling pathways that govern growth and differentiation processes. In sporadic colon cancer chromosomal instability [8] and microsatellite instability have been well described as phenotypes associated with subclasses of tumor types. In addition, epigenetic alterations such as methylation that affect gene expression of genes responsible for processes related to cancer progression have been shown to play important roles in disease development and progression [9]. Consequently, genetic and epigenetic events can lead to deregulation of multiple adjacent genes. For example, overexpression of multiple genes on Chromosome 13q is frequently observed in colorectal cancer [1014].

In our study, we perform a systems analysis of the colon cancer gene regulatory network with respect to functional properties of the network structure and known cancer genes. To this end, we infer a BC3Net [15] gene regulatory network from a large-scale colon cancer gene expression data set (GSE2109) provided by the International Genomics Consortium (IGC). Furthermore, we explore the role of interactions between genes co-located on the same or on different chromosomes. We call these different interaction types cis- and trans-interactions. Finally, we study close neighborhoods on the chromosomes with respect to the connectivity of genes they contain as well as their biological function. The goal of our study is to identify and analyze co-regulated subnetworks that may allow to identify regions under major regulatory programs on the chromosome level that could help to understand the general principles of colon cancer.

This paper is organized as follows: In the next section, we describe all methods and data we are using for our analysis. In the 'Results' section, we present our findings and in the section 'Discussion' we interpret our results. The paper finishes with the section 'Conclusions' with a summary.

Methods

Gene expression data set

For our study, we use gene expression data from colon cancer tissue samples from the Expression Project for Oncology (expO) (http://www.intgen.org/expo/) microarray database maintained by the International Genomics Consortium (IGC). The data are obtained from the GEO NCBI repository (GSE2109 ) [16] containing a total of 289 Affymetrix samples in CEL format from the platform hgu133plus2. The 289 samples correspond to a number of different histologies, as shown in Table 1, and 149 samples are from female and 139 are from male patients.

Table 1 Overview of the histologies of the 289 colon cancer samples provided by Expression Project for Oncology (expO).

Preprocessing and normalization of the data

We normalize the microarray samples for the selected tissue types using RMA and quantile normalization [17] using log2 expression intensities for each probe set. Because a gene can be represented by more than one probe set, we use the median expression value as summary statistic for different probe sets. Entrez gene ID to Affymetrix probe set annotation is obtained from the "hgu133plus2.db" R package. If a probe set is unmapped, we exclude it from our analysis. After these preprocessing steps, we have 19, 738 genes and 289 samples we use for our analysis.

Inference of the colon cancer gene regulatory network

In recent years many network inference methods have been introduced [1821]. In this paper, for inferring the colon cancer network from gene expression data, we use the BC3Net algorithm [15], because it has been demonstrated that BC3Net does not only lead to meaningful biological results but it possess also a favorable computational complexity making a large-scale analysis feasible [15, 22].

Briefly, BC3Net is a bagging version of C3Net [23, 24] that generates from one dataset, D, an ensemble of B independent bootstrap datasets, { D k b } k = 1 B , by sampling from D with replacement by using a non-parametric bootstrap with B = 100. Then, for each generated data set D k b in the ensemble, a network G k b is inferred by using C3Net [23, 24]. From the ensemble of networks { G k b } k = 1 B we construct one aggregate network, G w b , which is used to determine the statistical significance of the connection between gene pairs. Then we test the significance of each edge using a binomial test. This results in the final network BC 3Net

Census cancer and colon cancer specific genes

The Cancer Gene Census (CGC) [25] (Version 2011 − 03 − 22) (http://www.sanger.ac.uk/genetics/CGP/Census/) provides information about genes that are frequently observed within tumors of different types of cancer. The CGC list comprises a total of 457 cancer genes, from these 457 genes, 440 are present in the colon cancer gene expression data set.

CSPNN: Connected shortest path neighbor network

In order to analyze subnetworks of the whole colon cancer gene regulatory network, we extract a connected shortest path neighbor network (CSPNN) in the following way. First, we define a set of genes, L1, e.g., by using cancer genes. Then we determine all shortest paths between these genes using the Dijkstra distance [26]. This results in a second set of genes that contains all genes on these shortest paths, including the genes in L1, we call L2. Mapping L2 onto the network BC 3Net gives us a connected subnetwork. To thissubnetwork we add all next neighbors of the genes in L1 resulting in the CSPNN.

GPEA: Gene pair enrichment analysis

It has been shown that genes that cluster together in a co-expression network share a common biological function [27]. We extend this analysis to take the connectivity structure of a gene regulatory network into more detailed account. Specifically, for testing the statistical enrichment of GO-terms in the inferred colon cancer network, we are applying a hypergeometric test that is based on 'interactions' (edges). Due to the fact that 'interactions' always involve a 'pair of genes' this test is called g ene p air e nrichment a nalysis (GPEA) [15, 28]. For our analysis, we obtain information from the Gene Ontology database for entrez IDs of genes from the Bioconductor [29] annotation packages org.Hs.eg.db (v2.9.0) and GO.db (v2.9.0).

In the following, we briefly describe a GPEA. In this description, we use the terms 'interaction', 'edge' and 'gene pair' synonymously. For p genes there is a total of N = p(p − 1)/2 different gene pairs. If there are p GO genes for a particular GO-term then the total number of gene pairs for this GO-term is m GO = p GO (p GO − 1)/2. Furthermore, if we suppose that the inferred colon cancer network BC 3Net contains n interactions, of which k interactions are among genes from the given GO-term, then a p-value for the enrichment of gene pairs of this GO-term can be calculated from the following hypergeometric distribution

p ( k |GO - term ) = i = k m G O P ( X = i |GO-term ) = i = k m G O m G O i N - m G O n - i N n
(1)

This p-value gives an estimate for the probability to observe k or more interactions between genes from the given GO-term.

Chromosome cooperativity analysis

For analyzing the 'cooperativity' among chromosomes, we define a statistical test that estimates if there are chromosome pairs that contain a statistically significant number of interactions between them [30]. For instance, for chromosome i and j we calculate the number of interactions, s i,j , from the colon cancer network BC 3Net and apply a statistical hypothesis test to see if this number is larger than expected by chance, i.e., s rand | i , j

We obtain the sampling distribution for the null hypothesis

H 0 : s i , j = s rand| i , j for  i , j { 1 , 2 , , X , Y }
(2)

from gene label randomizations in the colon cancer network. For our analysis we used E = 100, 000.

For each randomization, e E, we calculate the number of interactions s i , j e between each chromosome pair ( i , j { 1 , 2 , , 22 , X , Y } from which we estimate the p-values by

p i , j = e = 1 E I ( s i , j e > s i , j ) E
(3)

Here, I(), is the indicator function that gives a value of '1' if its argument is true and '0' otherwise. We would like to emphasize that by utilizing the connectivity structure of the colon cancer network BC 3Net in combination with a gene label resampling will conserve not only the total number of interactions among genes, but also the structural properties of the network. Also the uneven number of genes on the 24 chromosomes is accommodated by our resampling procedure. In total, we perform 300 = (242 − 24)/2 + 24 tests and adjust for multiple testing by applying a Benjamini & Hochberg [31] correction controlling the FDR for a significance level of α = 0.05. This guarantees a false discovery rate of FDR ≤ α [32].

Results

Colon cancer gene regulatory network

Using the gene expression data set from expO and the BC3Nnet algorithm, we infer a colon cancer gene regulatory network (GRN), briefly denoted as BC 3Net.This regulatory network consists of 19, 738 genes and contains 135, 194 interactions (edges) among these genes. With the exception of 14 genes the overall colon cancer network is connected. Technically, this means that the giant connected component (GCC) [33] of our colon cancer network has a size of 19, 724 genes. For this network, we find an average shortest path length of 4.52 (measured with the Dijkstra distance [34]) and an edge density of =6.91 0 - 4 . The degree distribution of the colon cancer network follows a power law distribution with an exponent of α = 3.22 indicating that the resulting network is scale-free [35], as has been previously found for many different types of biological networks [3638], including GRNs [30, 39].

Functional GPEA of biological processes

We evaluate our colon cancer GRN network based on functional knowledge about genes that are involved in similar biological processes as defined in the Gene Ontology (GO) database [40]. On the assumption that functionally related genes are likely to interact with each other, we sought to identify the functional modules that are most prominently represented in our inferred colon cancer GRN network. For this reason, we perform a GPEA analysis for GO-terms with a term size larger than 2 and less than 1000 genes and a significance level ofα = 0.001 with a Bonferroni multiple testing correction. Furthermore, in order to study the relevance of the identified functional modules for cancer hallmarks, we test for the enrichment of cancer census genes [25].

In total, we test 7, 989 GO-terms from the category Biological Process and find 430 (5.38%) statistically significant terms. The 50 most significant terms of the GPEA analysis are shown in Table 2. The significant GO-terms describe a variety of biological processes such as cell cycle phase (938 edges), translational initiation (155 edges), elongation (156 edges) and termination (130 edges), organelle fission (318 edges), viral transcription (137 edges), cellular respiration (122 edges), type I interferon-mediated signaling pathway (62 edges) and regulation of immune system process (609 edges).

Table 2 Biological Process GPEA analysis showing the 50 most significant terms.

From the 457 defined cancer census genes 440 are present in our colon cancer GRN. In Table 2, we show for each GO-term the number of cancer census genes (column seven - CG). For these, we perform a cancer census gene enrichment analysis using a hypergeometric test with a significance level of α = 0.05 and a Benjamini & Hochberg correction. Overall, from the 50 most significant GO-terms in Table 2, we find 23 to be enriched with cancer genes (indicated in Table 2 by "+"). Overall, the 50 most significant GO-terms comprise in total 4, 197 genes, of which 228 are cancer genes (51.81% = 228/440 of all census genes present in the colon cancer network).

In Additional file 1, we show a table with all 458 significant GO-terms.

Core subnetwork of colon cancer genes

In order to learn about the immediate interactions between well known colon cancer genes, we extract a connected shortest path neighbor network (CSPNN - see 'Methods' section) from our colon cancer network in the following way. For the 6 known colon cancer genes L1 = {APC, MLH1, TP53, SMAD4, KRAS and BRAF}, we determine all shortest paths between these genes in BC 3Net. This results in the gene set L2 containing all genes on these shortest paths. Mapping L2 back onto BC 3Net gives us a connected subnetwork to which we add the next neighbor genes of L1. This results in the CSPNN containing in total 107 genes and 184 interactions. Among the 107 genes are 7 known cancer genes (in addition to the 6 colon cancer genes it contains PRDM16 from the cancer census gene list).

Figure 1 shows a graphical visualization of this network. Its average shortest path length is 4.6 and from a functional GPEA, we find as most significant biological process 'macromolecular complex assembly' (GO:0071363), with a nominal p-value of p nominal = 4.3e − 5. It is interesting to observe the interaction between the tumor supressor APC and the motor protein KIF3B. KIF3B belongs to a microtuble dependent motor protein complex (KIF3A-KIF3B -KAP3 ) that is a suggested transport mechanism of the APC protein along microtubles [41]. The interaction between the tumor supressor TP53 and the SUMO-specific protease SENP3 was reported in [42]. SENP3 is suggested as a regulator of the p53-Mdm2 pathway. We also observe an interaction between SMAD2 and SMAD4. SMAD2 and SMAD4 are both members of the SMAD protein complex [43]. Further, SMAD4 shows a direct connection to CEACAM8. CEACAM8 belongs to the CEA gene family and is involved in cell adhesion and migration. The measurement of CEA levels in serum is used in the clinic for monitoring the recurrence of colorectal cancer [44].

Figure 1
figure1

CSPNN for the 6 colon cancer genes APC, MLH1, TP53, SMAD4, KRAS and BRAF (red). Genes on shortest paths and next neighbor genes are shown in gray besides if they are present in the census cancer gene list (PRDM16 (blue)). In total, this network contains 107 genes, including 7 census cancer genes, and 184 interactions.

Linking interactions in the colon cancer network with their genetic origin

Next, we study the relation between the genetic context and the structural connectivity of our colon cancer network BC 3Net in the following way. Interactions between genes on separate or the same chromosome can be seen as trans-interactions and cis-interactions, analogous to the trans- and cis-regulation of genes [45]. However, we would like to emphasize that there is a crucial difference between both types of connections. For 'regulation', the transcription of a gene is controlled by a cis- or trans-acting transcription factor, whereas an 'interaction' means any type of biochemical binding, not limited to transcription regulation, but also including protein-protein interaction, phosphorylation, ubiquitination or others. For our colon cancer network, we find that in total 27, 345(21.01%) interactions are cis-interactions and 102, 806(78.99%) edges correspond to trans-interactions.

In the following, we study three questions that address different chromosomal levels. First, we study the cooperativity of chromosomes in form of the enhancement of their interactions. This identifies pairs of chromosomes that are more cooperative with each other. Second, we study the inferrability of interactions in the colon cancer network with respect to their cis- or trans-acting role. This allows to us to learn about the heterogeneity of these interaction types. Third, we investigate chromosomal neighborhoods with respect to their functional enrichment of GO-terms of the structural connectivity in the colon cancer network.

Chromosome cooperativity

To enhance insight about the chromosome cooperativity, we conduct a statistical test as described in the Methods section 'Chromosome cooperativity analysis'. As a result, we find that 4 of the 300 chromosome pairs are statistically significant, shown in the table in Figure 2B. It is interesting to note that chromosome 22 is involved in two of these four connections. This is highlighted in Figure 2A by the link color green for Chr 22.

Figure 2
figure2

A: Statistically significant chromosome cooperations are highlighted by a link. B: The table shows the Benjamini & Hochberg (BH) adjusted p-values for these links.

Our analysis also sheds light on the cooperation of genes as measured by the prevalence of significant interactions between chromosome pairs. From this perspective, visualized in Figure 2A, one sees that only a rather limited number of chromosomes contribute to this cooperation on the chromosome level.

Heterogeneity of cis- and trans-interactions

To investigate the heterogeneity of cis- and trans-interactions in the colon cancer network, we utilize a measure called the ensemble consensus rate (ECR). Specifically, the colon cancer network inferred by BC3Net is aggregated from a bootstrap ensemble of individual networks { G k b } k = 1 B ; see Figure 3A. This aggregation step is based on the ensemble consensus rate (ECR) that measures how often an interaction is observed in the individual networks in the bootstrap ensemble. Formally, the ensemble consensus rate, ecr(i, j), is estimated for each potential interaction between gene i and gene j, as the following probability,

Figure 3
figure3

A: Connection between the ensemble consensus rate and BC3Net. B: Integrated ensemble consensus rate (ECR) for cis-interactions (red) and trans-interactions (blue). C: Median values of the individual ensemble consensus sets ECSmn for m,n { 1 , X , Y } .

ecr ( i , j ) =Pr finding an interaction between genes i and j in  { G k b } k = 1 B .
(4)

Due to the symmetry of the mutual information values utilized by C3Net, each of the bootstrap ensemble networks in { G k b } k = 1 B is undirected and it holds, ecr(i, j) = ecr(j, i).

In the following, we want to zoom-in potential effects of the chromosomal position of interacting genes on the structure of the colon cancer network. In order to accomplish this, we utilize the ECR from which this network is inferred. Specifically, for each chromosome, we determine the ECR of cis-interactions, between co-located genes on the same chromosome, and trans-interactions, between genes located on different chromosomes. This means, for each pair of chromosomes, m,n { 1 , 2 , X , Y } , we determine the following set,

EC S m n = { ecr ( i , j ) |gene i is on chromosome m, and gene j is on chromosome n } .
(5)

We call the set ECSmn the ensemble consensus set for chromosome m and n, because it contains all ECR values of the corresponding interacting genes that are located on chromosome m and n. As a consequence of symmetry of the ECR also the ensemble consensus sets are symmetric,

EC S m n = EC S n m .
(6)

For m = n these sets correspond to cis-interactions and for mn to trans-interactions. This means, in total, we have 24 ensemble consensus sets for cis-interactions, { EC S 1 , 1 , EC S 2 , 2 , EC S Y , Y } , and 276 ensemble consensus sets for trans-interactions, { EC S 1 , 2 , EC S 1 , 3 , EC S Y , 22 , EC S Y , X } .

The above separation in cis- and trans-interaction types allows a basic understanding of the wiring of the colon cancer network, conditioned on the chromosomes. We start our analysis by presenting results for integrated ensemble consensus sets, for a simplified overview. Here by integrated we mean an union over chromosomes. For the cis- and trans-interactions that means

E C S c i s = m { 1 , Y } EC S m , m
(7)
EC S t r a n s ( n ) = m { 1 , Y } EC S n , m for  n { 1 , Y }
(8)

In Figure 3B, we show a boxplot of the distributions of the average ECR rates for the 25 ensemble census sets; ECScis in red and the ECStrans(n) in blue. We observe almost a two-fold higher ECR for cis-interactions (median of means value is 0.1695) compared to trans-interactions (median of means value is 0.0993).

For the distribution of the trans-interactions (blue - Figure 3B) the chromosomes exhibit subtle variations. Chromosome 13 shows the largest and chromosome Y shows the smallest median ECR. In order to test, whether this observation is influenced by genes with a large degree, we compared the distribution of the average degree of trans gene pairs between the chromosomes and investigated the location of hub genes. As a result, we found that chromosome 13 has an increased average node degree, compared to all other chromosomes (not shown).

Table 3 shows the 10 major hub genes of the colon cancer network. For each hub gene, we extracted the subnetwork including its direct neighbors. The molecular function of the subnetworks for each hub gene are described by the most significant GO term identified by a Gene Ontology enrichment analysis (FDR = 0.1 and a Benjamini & Hochberg correction). The identified terms for the hub gene subnetworks have functional annotations related to cell adhesion and signaling such as synaptic transmission, detection of stimulus, sensory perception and receptor activity (Table 3).

Table 3 The 10 major hub genes of the colon cancer network.

The major hub gene OR7E104P is located on chromosome 13 with a degree of 458 (Table 3). The ECStrans median of means for chromosome 13 is 0.1108 (Figure 3B) and drops to 0.0953 (not shown) similar to the other chromosomes upon removal of the major hub OR7E104P. Hence, the subtle increase of the ECR for chromosome 13 is a result of the largest hub gene of the colon cancer network.

In Figure 3C, we show results for the 300 individual ensemble consensus sets ECSmn. For reasons of simplicity, we show only the median ensemble consensus rates instead of box plots, to obtain a compressed visualization. Overall, we observe also for the individual ECS higher cis- than trans- consensus rates. Furthermore, chromosome 13 and chromosome Y appear elevated and demeaned (see column colors).

Chromosomal neighborhood-induced GPEA analysis

Finally, we study the connection between chromosomal neighborhoods and interactions between genes, as given by the colon cancer network. Specifically, we want to identify genomic regions with enriched subnet- works of interacting genes that are adjacent, i.e., co-located, on the chromosomes. This analysis is based on a GPEA where the gene sets are defined from a sliding window along the human chromosome, comprising co-located genes within such a window. See Figure 4A for a schematic visualization and the definition of our gene sets. For our analysis, we use a window length of 1 Mb (mega bases) and slide this window in steps of 500 Kb (Kilo bases) along the chromosomes. That means consecutive windows have an overlap of 500 Kb. We perform a GPEA for a total of 3, 987 chromosome window gene sets, whenever a window contains at least 2 genes that are present in the colon cancer GRN.

Figure 4
figure4

A: Analysis procedure for a GPEA. B: Shown are the locations of the largest 146 network components corresponding to gene sets of 1 Mb windows (red dots) along the chromosomes. Blue dots indicate the location of cancer census genes. C: The top ranked largest network component corresponding to the positional gene set on chromosome 8 with 29 genes (red).

From our analysis, we find 260 (6.52%) of the 3, 987 gene sets with a significant enrichment of interactions (α = 0.001 and Bonferroni correction). The 35 most significant genomic regions from this GPEA are shown in Table 4. In this table, each row corresponds to one window gene set and the first column indicates the chromosome, the second the locus and the third the start base pair. Column four and five give the number of genes in the window gene set and the number of edges (interactions) between these genes in the colon cancer network. The p-value in column six corresponds to the result from the GPEA.

Table 4 Chromosomal neighborhood-induced GPEA and GO analysis.

Column seven shows the number of genes in the giant connected component (GCC). For these genes we perform a (conventional) Gene Ontology enrichment analysis to characterize the biological function for each window gene set. In column nine, we show the most significant GO term (α = 0.05 and Benjamini & Hochberg FDR correction) as a result from this analysis. Furthermore, we find that 44/260 of the chromosome window subnetworks have a GCC with more than ≥ 10 genes. The genomic locations of these 44 gene sets are visualized in Figure 4B.

The 260 chromosome windows comprise a total of 4,292/18,307 (23.44%) genes with 93/425 (21.88%) cancer census genes. The identified chromosomal locations describe a variety of biological processes that are involved in regulation transcription, nucleosome assembly, cell adhesion, signaling (e.g., TOR signaling, type-I interferon-mediated signaling pathway), cell cycle and antigen processing and presentation (Table 4).

The most significant chromosome window is located on chromosome 8 at 145-146 Mb, which corresponds to the chromosome band 8q24.3. In the literature genomic aberration in the locus 8q24 are frequently observed in colon cancer e.g., [4648]. Figure 4C shows the corresponding largest connected component on chromosome 8 146-147 Mb with 29 genes including the census cancer gene RECQL4.

Discussion

In this study, we inferred a colon cancer gene regulatory network and investigated its functional and structural meaning. Overall, we found our colon cancer regulatory network consists of 19, 718 genes interconnected by 135, 194 interactions. Within this network, approximately 5% of the gene ontology (GO) terms we studied were enriched and functional annotations for the 50 most significant GO terms (see Table 2) included 11 that denote gene clusters involved in engagement with cellular and molecular inflammatory mediators or infective agents. Thirteen terms are involved in gene transcription, translation and mRNA degradation implicated in generic signaling processes while 10 had clear association with cell cycle regulation or progression. Five terms had functions in processing of subcellular protein complexes and organelles while a further 7 are associated with protein targeting to membranes or other spatial domains. These 12 terms have key functional annotations required for compartmentalized signaling for control of cytoskeletal dynamics in simultaneous subcellular and cellular processes, including vesicle trafficking, endocytosis, cytokinesis, cell migration and morphogenesis [49, 50]. By integration of complex biological information with widely adopted GO terms for major human cancer, this study will enhance the quality and accuracy of functional annotations within emerging GRNs that may be used in predictive cancer science.

The analysis of chromosome cooperativity revealed that there are only very few chromosome pairs (1.3% = 4/300) that have an enhanced number of interactions among the genes located on these chromosomes (see Figure 2) and chromosomes 22 is involved in 2 of the 4 significant connections. An increase for trans-interactions between two chromosomes may result from a spatial proximity of the genes in the nucleus leading to an increased co-regulation of gene expression because the spatial organization of chromosomes and the intermingling between chromosomes (chromosome kissing) in the nucleus is crucial for the regulation of gene activation, gene silencing and the process of genomic translocations [51, 52].

Only by connecting the interaction structure of the colon cancer network with the chromosomal locations of the genes enabled the definition of cis- and trans-interactions. This allowed the analysis of structural properties of the genes in the gene regulatory network with respect to their chromosomal positions. Along these lines, we found that interacting genes that are co-located on the same chromosome were observed to have an almost two-fold higher ensemble consensus rate (ECR) compared to trans-located gene pairs, where the corresponding genes reside on different chromosomes. This result holds for the integrated as well as individual ECRs.

A possible explanation for this observation may be related to the underlying structure of the 'true' gene regulatory network of colon cancer. Specifically, in [53], we found that interactions connecting peripheral genes, i.e., genes with only one or two interactions, are more easy to infer than highly connected genes from the center of a network, e.g., hub genes. Hence, cis-interactions may correspond to interactions between genes in the periphery of the 'true' colon cancer network and trans-interactions connect more densely connected genes. Furthermore, in [53] it was shown that peripheral regions of 'true' gene regulatory networks are enriched for membrane proteins and membrane signaling. Hence, the observed heterogeneity of cis- and trans-interactions in our study may also be related to the known inferential heterogeneity [53] of gene regulatory networks.

From studying the connectivity of chromosomal neighborhoods, we found 260 of such neighborhoods to be statistically significant from a GPEA. Furthermore, we found 44 of these to have ≥ 10 genes. An additional GO enrichment analysis of genes in the GCC of these subnetworks showed that several of these subnetworks are involved in 'DNA dependent transcriptional regulation' (see Table 4). Moreover, 8 significant subnetworks are located on chromosome 17, which had been also identified from our chromosome cooperativity analysis.

A general explanation for the presence of 'DNA dependent transcriptional regulation' among the significant chromosomal neighborhoods is certainly related to the basic coordination of transcription of a cell, because in order to allow the transcription of genes chromatin modifications such as histone acetylations are required to allow the unwinding of DNA and make it accessible for transcriptional activity. Given the complexity of these processes and the energy expended, it is not unsurprising that genes are not randomly distributed on the chromosomes. Instead, it is believed that in a mammalian organism genes involved in regulatory programs can be co-ordinately controlled. For instance, transcriptional analysis of the cell cycle [54] suggests that a quartile of cell cycle regulatory genes are adjacent on the chromosome. Similar results have been found for a cardiac transcriptome [55]. These observations suggest a global regulatory organization of gene expression at the chromosomal level and the location of the chromosome in the nucleus has been shown to exert a major effect on transcriptional activity [56]. Certainly, the simplest form of such co-regulation is that of proximally located genes, typically located within the scale of a few Mb [57].

Co-regulated expression of proximal genes was known for a long time, however, it was assumed that genes are regulated locally, at the level of transcription factors. The first large-scale study of genes expression along chromosomes (Human Transcriptome Map) shed light on the global expression patterns: along human chromosomes, highly expressed genes tend to cluster in large domains, interspersed with domains of weakly expressed genes [58]. Similar spatial patterns of genes expression were found in mouse genome [59] and other model organisms (reviewed in [60]). In the nucleus, clusters of actively transcribed genes tend to co-localize, indicating long-range intrachromosomal interactions [61]. Thus, clustering of highly-expressed genes does not reflect individual gene regulation, but microenviroment of chromosomal domain, defined by chromatin structure and subnuclear localization [62]. Our finding that subnetworks of interacting genes are indeed co-located on the chromosomes indicates that, generally, subnetworks in biological networks have many interesting functional properties, some of them are yet to be discovered.

Conclusions

An interesting future extension would be a comparative analysis of more than one cancer network to learn about commonalities, and differences, of different cancer types with respect to the hallmarks of cancer. For instance, a comparative analysis of these networks could employ similarity or distance measures based on topological indices [63, 64] rather than using classical graph similarity measures [65].

Unfortunately, currently, there are severe practically limitations for such an approach, most notably the lack of a database making such cancer networks available. In this respect, the colon cancer network we inferred in this study can also contribute to such a comparative network analysis, extending its usage significantly beyond a single study.

References

  1. 1.

    Ferlay J, Shin HR, Bray F, Forman D, Mathers C, Parkin DM: Estimates of worldwide burden of cancer in 2008: GLOBOCAN 2008. International Journal of Cancer. 2010, 127 (12): 2893-2917. 10.1002/ijc.25516.

  2. 2.

    Fearon E, Vogelstein B: A genetic model for colorectal tumorigenesis. Cell. 1990, 61: 759-67. 10.1016/0092-8674(90)90186-I.

  3. 3.

    Bellacosa A: Genetic hits and mutation rate in colorectal tumorigenesis: versatility of Knudson's theory and implications for cancer prevention. Genes Chromosomes Cancer. 2003, 38: 382-8. 10.1002/gcc.10287.

  4. 4.

    Tejpar S, Bertagnolli M, Bosman F, Lenz H, Garraway L, Waldman F, Warren R, Bild A, Collins-Brennan D, Hahn H, Harkin D, Kennedy R, Ilyas M, Morreau H, Proutski V, Swanton C, Tomlinson I, Delorenzi M, Fiocca R, Van Cutsem E, Roth A: Prognostic and predictive biomarkers in resected colon cancer: current status and future perspectives for integrating genomics into biomarker discovery. Oncologist. 2010, 15: 390-404. 10.1634/theoncologist.2009-0233.

  5. 5.

    Hanahan D, Weinberg R: The hallmarks of cancer. Cell. 2000, 100: 57-70. 10.1016/S0092-8674(00)81683-9.

  6. 6.

    Hanahan D, Weinberg R: Hallmarks of cancer: the next generation. Cell. 2011, 144: 646-74. 10.1016/j.cell.2011.02.013.

  7. 7.

    Najdi R, Holcombe R, Waterman M: Wnt signaling and colon carcinogenesis: beyond APC. J Carcinog. 2011, 10: 5-10.4103/1477-3163.78111.

  8. 8.

    Pino M, Chung D: The chromosomal instability pathway in colon cancer. 2010, Gastroenterology, 138: 2059-72.

  9. 9.

    van Engeland M, Derks S, Smits K, Meijer G, Herman J: Colorectal cancer epigenetics: complex simplicity. J Clin Oncol. 2011, 29: 1382-91. 10.1200/JCO.2010.28.2319.

  10. 10.

    Tsafrir D, Bacolod M, Selvanayagam Z, Tsafrir I, Shia J, Zeng Z, Liu H, Krier C, Stengel R, Barany F, Gerald W, Paty P, Domany E, Notterman D: Relationship of gene expression and chromosomal abnormalities in colorectal cancer. Cancer Res. 2006, 66: 2129-37. 10.1158/0008-5472.CAN-05-2569.

  11. 11.

    Platzer P, Upender M, Wilson K, Willis J, Lutterbaugh J, Nosrati A, Willson J, Mack D, Ried T, Markowitz S: Silence of chromosomal amplifications in colon cancer. Cancer Res. 2002, 62: 1134-8.

  12. 12.

    Xiao X, Zhou X, Yan G, Sun M, Du X: Chromosomal alteration in Chinese sporadic colorectal carci- nomas detected by comparative genomic hybridization. Diagn Mol Pathol. 2007, 16: 96-103. 10.1097/PDM.0b013e31803190f2.

  13. 13.

    Andersen C, Wiuf C, Kruhoffer M, Korsgaard M, Laurberg S, Orntoft T: Frequent occurrence of uniparental disomy in colorectal cancer. Carcinogenesis. 2007, 28: 38-48. 10.1093/carcin/bgl086.

  14. 14.

    Neklason D, Tuohy T, Stevens J, Otterud B, Baird L, Kerber R, Samowitz W, Kuwada S, Leppert M, Burt R: Colorectal adenomas and cancer link to chromosome 13q22.1-13q31.3 in a large family with excess colorectal cancer. J Med Genet. 2010, 47: 692-9. 10.1136/jmg.2009.076091.

  15. 15.

    de Matos Simoes R, Emmert-Streib F: Bagging statistical network inference from large-scale gene expression data. 2012, PLoS ONE, 7 (3): e33624-

  16. 16.

    Edgar R, Domrachev M, Lash A: Gene Expression Omnibus: NCBI gene expression and hybridization array data repository. Nucleic Acids Res. 2002, 30: 207-10. 10.1093/nar/30.1.207.

  17. 17.

    Irizarry R, Hobbs B, Collin F, Beazer-Barclay Y, Antonellis K, Scherf U, Speed T: Exploration, normalization, and summaries of high density oligonucleotide array probe level data. Biostatistics. 2003, 4: 249-64. 10.1093/biostatistics/4.2.249.

  18. 18.

    Faith JJ, Hayete B, Thaden JT, Mogno I, Wierzbowski J: Large-Scale Mapping and Validation of Escherichia coli Transcriptional Regulation from a Compendium of Expression Profiles. 2007, PLoS Biol, 5-

  19. 19.

    Meyer P, Lafitte F, Bontempi G: minet: A R/Bioconductor Package for Inferring Large Transcriptional Networks Using Mutual Information. BMC Bioinformatics. 2008, 9: 461-10.1186/1471-2105-9-461.

  20. 20.

    Emmert-Streib F, Glazko G, Altay G, de Matos Simoes R: Statistical inference and reverse engineering of gene regulatory networks from observational expression data. Frontiers in Genetics. 2012, 3: 8-

  21. 21.

    Fogelberg C, Palade V: DENSE STRUCTURAL EXPECTATION MAXIMISATION WITH PAR- ALLELISATION FOR EFFICIENT LARGE-NETWORK STRUCTURAL INFERENCE. International Journal on Artificial Intelligence Tools. 2013, 22 (03): 1350011-10.1142/S0218213013500115.

  22. 22.

    de Matos Simoes R, Dehmer M, Emmert-Streib F: B-cell lymphoma gene regulatory networks: Biological consistency among inference methods. Front Genet. 2013, 4: 281-

  23. 23.

    Altay G, Emmert-Streib F: Inferring the conservative causal core of gene regulatory networks. BMC Syst Biol. 2010, 4: 132-10.1186/1752-0509-4-132.

  24. 24.

    Altay G, Emmert-Streib F: Structural Influence of gene networks on their inference: Analysis of C3NET. Biology Direct. 2011, 6: 31-10.1186/1745-6150-6-31.

  25. 25.

    Futreal P, Coin L, Marshall M, Down T, Hubbard T, Wooster R, Rahman N, Stratton M: A census of human cancer genes. Nat Rev Cancer. 2004, 4: 177-83. 10.1038/nrc1299.

  26. 26.

    Dijkstra EW: A note on two problems in connexion with graphs. Numerische Mathematik. 1959, 1: 269-271. 10.1007/BF01386390.

  27. 27.

    Lee H, Hsu A, Sajdak J, Qin J, Pavlidis P: Coexpression analysis of human genes across many microarray data sets. Genome Res. 2004, 14: 1085-94. 10.1101/gr.1910904.

  28. 28.

    de Matos Simoes R, Dehmer M, Emmert-Streib F: Interfacing cellular networks of S. cerevisiae and E. coli: Connecting dynamic and genetic information. BMC Genomics. 2013, 14: 324-10.1186/1471-2164-14-324.

  29. 29.

    Gentleman R, Carey V, Bates D, Bolstad B, Dettling M, Dudoit S, Ellis B, Gautier L, Ge Y, Gentry J, Hornik K, Hothorn T, Huber W, Iacus S, Irizarry R, Leisch F, Li C, Maechler M, Rossini A, Sawitzki G, Smith C, Smyth G, Tierney L, Yang J, Zhang J: Bioconductor: open software development for computational biology and bioinformatics. Genome Biol. 2004, 5: R80-10.1186/gb-2004-5-10-r80.

  30. 30.

    Emmert-Streib F, de Matos Simoes R, Mullan P, Haibe-Kains B, Dehmer M: The gene regulatory network for breast cancer: Integrated regulatory landscape of cancer hallmarks. Front Genet. 2014, 5: 15-

  31. 31.

    Benjamini Y, Hochberg Y: Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal Statistical Society, Series B (Methodological). 1995, 57: 125-133.

  32. 32.

    Dudoit S, van der Laan M: Multiple Testing Procedures with Applications to Genomics. 2007, New York; London: Springer

  33. 33.

    Dorogovtesev S, Mendes J: Evolution of Networks: From Biological Nets to the Internet and WWW. 2003, Oxford University Press

  34. 34.

    Dijkstra E: A note on two problems in connection with graphs. Numerische Math. 1959, 1: 269-271. 10.1007/BF01386390.

  35. 35.

    Barabási AL, Albert R: Emergence of scaling in random networks. Science. 1999, 206: 509-512.

  36. 36.

    Albert R: Scale-free networks in cell biology. Journal of Cell Science. 2005, 118 (21): 4947-4957. 10.1242/jcs.02714.

  37. 37.

    Bornholdt S, Schuster H: Handbook of Graphs and Networks: From the Genome to the Internet. 2003, Wiley-VCH

  38. 38.

    van Noort V, Snel B, Huymen MA: The yeast coexpression network has a small-world, scale-free architecture and can be explained by a simple model. EMBO reports. 2004, 5 (3): 280-284. 10.1038/sj.embor.7400090.

  39. 39.

    Basso K, Margolin AA, Stolovitzky G, Klein U, Dalla-Favera R, Califano A: Reverse Engineering of Regu- latory Networks in Human B Cells. Nature Genetics. 2005, 37 (4): 382-390. 10.1038/ng1532.

  40. 40.

    Ashburner M, Ball C, Blake J, Botstein D, Butler H: Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nature Genetics. 2000, 25: 25-29. 10.1038/75556.

  41. 41.

    Jimbo T, Kawasaki Y, Koyama R, Sato R, Takada S, Haraguchi K, Akiyama T: Identification of a link between the tumour suppressor APC and the kinesin superfamily. Nat Cell Biol. 2002, 4 (4): 323-7. 10.1038/ncb779.

  42. 42.

    Nishida T, Yamada Y: The nucleolar SUMO-specific protease SMT3IP1/SENP3 attenuates Mdm2- mediated p53 ubiquitination and degradation. Biochem Biophys Res Commun. 2011, 406 (2): 285-91. 10.1016/j.bbrc.2011.02.034.

  43. 43.

    Fleming N, Jorissen R, Mouradov D, Christie M, Sakthianandeswaren A, Palmieri M, Day F, Li S, Tsui C, Lipton L, Desai J, Jones I, McLaughlin S, Ward R, Hawkins N, Ruszkiewicz A, Moore J, Zhu H, Mariadason J, Burgess A, Busam D, Zhao Q, Strausberg R, Gibbs P, Sieber O: SMAD2, SMAD3 and SMAD4 mutations in colorectal cancer. Cancer Res. 2013, 73 (2): 725-35. 10.1158/0008-5472.CAN-12-2706.

  44. 44.

    Duffy M: Carcinoembryonic antigen as a marker for colorectal cancer: is it clinically useful?. Clin Chem. 2001, 47 (4): 624-30.

  45. 45.

    Cheung VG, Nayak RR, Wang IX, Elwyn S, Cousins SM, Morley M, Spielman RS: Polymorphic cis- and trans-regulation of human gene expression. PLoS biology. 2010, 8 (9):

  46. 46.

    Ghadimi BM, Grade M, Liersch T, Langer C, Siemer A, Füzesi L, Becker H: Gain of chromosome 8q23-24 is a predictive marker for lymph node positivity in colorectal cancer. Clin Cancer Res. 2003, 9 (5): 1808-1814.

  47. 47.

    Tomlinson I, Webb E, Carvajal-Carmona L, Broderick P, Kemp Z, Spain S, Penegar S, Chandler I, Gorman M, Wood W, Barclay E, Lubbe S, Martin L, Sellick G, Jaeger E, Hubner R, Wild R, Rowan A, Fielding S, Howarth K, Silver A, Atkin W, Muir K, Logan R, Kerr D, Johnstone E, Sieber O, Gray R, Thomas H, Peto J, Cazier JB, Houlston R: A genome-wide association scan of tag SNPs identifies a susceptibility variant for colorectal cancer at 8q24.21. Nature Genetics. 2007, 39 (8): 984-988. 10.1038/ng2085.

  48. 48.

    Zanke B, Greenwood C, Rangrej J, Kustra R, Tenesa A, Farrington S, Prendergast J, Olschwang S, Chiang T, Crowdy E, Ferretti V, Laflamme P, Sundararajan S, Roumy S, Olivier J, Robidoux F, Sladek R, Montpetit A, Campbell P, Bezieau S, O'Shea A, Zogopoulos G, Cotterchio M, Newcomb P, McLaughlin J, Younghusband B, Green R, Green J, Porteous M, Campbell H, Blanche H, Sahbatou M, Tubacher E, Bonaiti-Pellie C, Buecher B, Riboli E, Kury S, Chanock S, Potter J, Thomas G, Gallinger S, Hudson T, Dunlop M: Genome-wide association scan identifies a colorectal cancer susceptibility locus on chromosome 8q24. Nat Genet. 2007, 39: 989-94. 10.1038/ng2089.

  49. 49.

    Gowrishankar K, Ghosh S, Saha S, C R, Mayor S, Rao M: Active Remodeling of Cortical Actin Regulates Spatiotemporal Organization of Cell Surface Molecules. Cell. 2012, 149 (6): 1353-1367. 10.1016/j.cell.2012.05.008.

  50. 50.

    Pertz O: Spatio-temporal Rho GTPase signaling - where are we now?. Journal of Cell Science. 2010, 123 (11): 1841-1850. 10.1242/jcs.064345.

  51. 51.

    Branco MR, Pombo A: Intermingling of chromosome territories in interphase suggests role in translocations and transcription-dependent associations. PLoS Biol. 2006, 4 (5): e138-10.1371/journal.pbio.0040138.

  52. 52.

    Cavalli G: Chromosome kissing. Curr Opin Genet Dev. 2007, 17 (5): 443-450. 10.1016/j.gde.2007.08.013.

  53. 53.

    de Matos Simoes R, Emmert-Streib F: Influence of Statistical Estimators of Mutual Information and Data Heterogeneity on the Inference of Gene Regulatory Networks. PLoS ONE. 2011, 6 (12): e29279-10.1371/journal.pone.0029279.

  54. 54.

    Cho R, Campbell M, Winzeler E, Steinmetz L, Conway A, Wodicka L, Wolfsberg T, Gabrielian A, Landsman D, Lockhart D, Davis R: A genome-wide transcriptional analysis of the mitotic cell cycle. Mol Cell. 1998, 2: 65-73. 10.1016/S1097-2765(00)80114-8.

  55. 55.

    Vogel J, von Heydebreck A, Purmann A, Sperling S: Chromosomal clustering of a human transcriptome reveals regulatory background. BMC Bioinformatics. 2005, 6: 230-10.1186/1471-2105-6-230.

  56. 56.

    Boyle S, Gilchrist S, Bridger J, Mahy N, Ellis J, Bickmore W: The spatial organization of human chromosomes within the nuclei of normal and emerin-mutant cells. 2001, Hum Mol Genet, 10: 211-9.

  57. 57.

    Hurst L, Pal C, Lercher M: The evolutionary dynamics of eukaryotic gene order. Nat Rev Genet. 2004, 5: 299-310. 10.1038/nrg1319.

  58. 58.

    Caron H, Schaik Bv, Mee Mvd, Baas F, Riggins G, Sluis Pv, Hermus MC, Asperen Rv, Boon K, Voute PA, Heis- terkamp S, Kampen Av, Versteeg R: The Human Transcriptome Map: Clustering of Highly Expressed Genes in Chromosomal Domains. Science. 2001, 291 (5507): 1289-1292. 10.1126/science.1056794.

  59. 59.

    Singer GAC, Lloyd AT, Huminiecki LB, Wolfe KH: Clusters of Co-expressed Genes in Mammalian Genomes Are Conserved by Natural Selection. Molecular Biology and Evolution. 2005, 22 (3): 767-775.

  60. 60.

    Hurst LD, Pal C, Lercher MJ: The evolutionary dynamics of eukaryotic gene order. Nature reviews Genetics. 2004, 5 (4): 299-310. 10.1038/nrg1319.

  61. 61.

    Fraser P, Bickmore W: Nuclear organization of the genome and the potential for gene regulation. Nature. 2007, 447 (7143): 413-417. 10.1038/nature05916.

  62. 62.

    Hanin L, Awadalla SS, Cox P, Glazko G, Yakovlev A: Chromosome-specific spatial periodicities in gene expression revealed by spectral analysis. Journal of Theoretical Biology. 2009, 256 (3): 333-342. 10.1016/j.jtbi.2008.10.015.

  63. 63.

    Mueller L, Kugler K, Graber A, Emmert-Streib F, Dehmer M: Structural Measures for Network Biology Using QuACN. BMC Bioinformatics. 2011, 12: 492-10.1186/1471-2105-12-492.

  64. 64.

    Dehmer M, Grabner M, Mowshowitz A, Emmert-Streib F: An efficient heuristic approach to detecting graph isomorphism based on combinations of highly discriminating invariants. Advances in Computational Mathematics. 2013, 39 (2): 311-325. 10.1007/s10444-012-9281-0.

  65. 65.

    Bunke H: What is the distance between graphs?. Bulletin of the EATCS. 1983, 20: 35-39.

  66. 66.

    Team R: A Language and Environment for Statistical Computing. R Development Core [ISBN 3-900051-07-0]. 2008, R Foundation for Statistical Computing, Vienna, Austria

  67. 67.

    Csardi G, Nepusz T: The igraph software package for complex network research. InterJournal. 2006, Complex Systems, 1695-[http://igraph.sf.net]

Download references

Acknowledgements

We would like to thank the International Genomics Consortium (IGC) for making the expO data set available. Furthermore, we would like to thank Shailesh Tripathi for fruitful discussions. For our numerical simulations we used R [66] and for the visualization of networks igraph [67]. Finally, we thank the administrators of the DELL computer cluster at the Queen's University Belfast.

Declarations

MD thanks the Austrian Science Funds for supporting this work (project P26142).

This article has been published as part of BMC Bioinformatics Volume 15 Supplement 6, 2014: Knowledge Discovery and Interactive Data Mining in Bioinformatics. The full contents of the supplement are available online at http://www.biomedcentral.com/bmcbioinformatics/supplements/15/S6.

Author information

Correspondence to Frank Emmert-Streib.

Additional information

Competing interests

The authors declare that they have no competing interests.

Authors' contributions

FES conceived the study. RDMS and FES analyzed the data. FES, RDMS, GG, SMD, BHK, AH, MD and FCC interpreted the results and wrote the paper. All authors read and approved the final manuscript.

Frank Emmert-Streib, Ricardo de Matos Simoes contributed equally to this work.

Electronic supplementary material

Rights and permissions

This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Emmert-Streib, F., de Matos Simoes, R., Glazko, G. et al. Functional and genetic analysis of the colon cancer network. BMC Bioinformatics 15, S6 (2014) doi:10.1186/1471-2105-15-S6-S6

Download citation

Keywords

  • Colon Cancer
  • Gene Ontology
  • Gene Pair
  • Gene Regulatory Network
  • Average Short Path Length