Bioinformatics approaches for cross-species liver cancer analysis based on microarray gene expression profiling

Background The completion of the sequencing of human, mouse and rat genomes and knowledge of cross-species gene homologies enables studies of differential gene expression in animal models. These types of studies have the potential to greatly enhance our understanding of diseases such as liver cancer in humans. Genes co-expressed across multiple species are most likely to have conserved functions. We have used various bioinformatics approaches to examine microarray expression profiles from liver neoplasms that arise in albumin-SV40 transgenic rats to elucidate genes, chromosome aberrations and pathways that might be associated with human liver cancer. Results In this study, we first identified 2223 differentially expressed genes by comparing gene expression profiles for two control, two adenoma and two carcinoma samples using an F-test. These genes were subsequently mapped to the rat chromosomes using a novel visualization tool, the Chromosome Plot. Using the same plot, we further mapped the significant genes to orthologous chromosomal locations in human and mouse. Many genes expressed in rat 1q that are amplified in rat liver cancer map to the human chromosomes 10, 11 and 19 and to the mouse chromosomes 7, 17 and 19, which have been implicated in studies of human and mouse liver cancer. Using Comparative Genomics Microarray Analysis (CGMA), we identified regions of potential aberrations in human. Lastly, a pathway analysis was conducted to predict altered human pathways based on statistical analysis and extrapolation from the rat data. All of the identified pathways have been known to be important in the etiology of human liver cancer, including cell cycle control, cell growth and differentiation, apoptosis, transcriptional regulation, and protein metabolism. Conclusion The study demonstrates that the hepatic gene expression profiles from the albumin-SV40 transgenic rat model revealed genes, pathways and chromosome alterations consistent with experimental and clinical research in human liver cancer. The bioinformatics tools presented in this paper are essential for cross species extrapolation and mapping of microarray data, its analysis and interpretation.

(page number not for citation purposes)

Background
For decades, classical toxicology has used risk assessments based on animal studies for regulatory decisions. The underlying assumption is that important biological functions are often conserved across species. In continuation of this paradigm, the effort in toxicogenomics is placed on studying rodents and other surrogates using advanced genomics technologies, such as DNA microarrays. Microarray studies enable simultaneous measurement of the expression of large numbers of genes. Given the completion of the DNA sequence of the human, mouse and rat genomes [1][2][3], genes identified in microarray studies can be readily compared across-species with respect to the gene orthologs [4,5]. This assumes that genes coexpressed across multiple species are likely to have conserved functions [6][7][8]. Thus, microarray analysis offers the possibility of furthering our understanding of crossspecies commonalities and differences that could lead to more effective use of animal models to understand the cause and progression of diseases in human at the mechanistic level.
Hepatocellular carcinoma (HCC) is a leading cause of death worldwide and, like most cancers, is a genetic disease caused by the accumulation of genetic and epigenetic cell alterations. The progression of hepatic neoplasia is characterized by increasing genetic instability, including duplication and deletion of parts of chromosomes and an increasing proliferative growth advantage of the affected cells. Molecular cytogenetic techniques, such as Comparative genomic hybridization (CGH) and Spectral karyotyping (SKY) [9][10][11], have allowed evaluation of chromosomal aberrations in HCC. More recently, Crawley [12] has demonstrated the ability of comparative genomic microarray analysis (CGMA) to elucidate alteration of specific genes together with the genetic changes at the chromosome level based on microarray data. Thus, microarray analysis provides an unprecedented opportunity to further the understanding of the etiology and progression of liver cancer.
Bioinformatics methods and tools are essential to analyze and interpret data from microarrays. The critical and urgent task is to associate altered patterns of gene expression with disease. Interpreting microarray data in the context of signaling and regulatory pathways is a particularly effective bioinformatics approach to transform data into biological meaning and to generate hypotheses for further research. Using pathways, disease mechanisms can be interpreted as disturbances of the intricate interconnections among genes, molecules and cells. Most reported pathway analysis of microarray data has examined the role of differentially expressed genes in pathways selected with a priori knowledge. Alternatively, significant pathways can be identified based on statistical analysis, poten-tially leading to new discoveries and a more complete interpretation of microarray data in the context of biological processes at the mechanistic level.
The primary mechanism for the analysis of HCC is by the administration of carcinogenic agents. A number of model systems have been developed to understand the pathogenesis of primary liver cancer [13][14][15]. Additionally, the development of transgenic models permit the analysis of the genetic basis for the induction and progression of HCC [16][17][18][19]. The albumin-Simian virus 40 (SV40) T antigen transgenic rat contains the mouse albumin-promoter/enhancer linked to the coding region of the SV40 large T antigen (SV 40 tag). SV40 T antigen inactivates both p53 and Rb, resulting in spontaneous development of hepatic neoplasms (adenoma and carcinoma) within 6-9 months. Thus, the Albumin-SV40 T antigen transgenic rat can be used to examine liver cancer development and maintenance [20][21][22].
In this manuscript, we describe a bioinformatics process where microarray data from the SV40 transgenic rat was examined for application to the study of HCC in human. We first used a novel visualization tool to investigate liver cancer by mapping chromosomal location of differentially expressed genes from the rat model to the chromosomal regions of human orthologs. Then, CGMA analysis was used to relate gene expression bias patterns to cytogenetic aberration profiles on human. Lastly, a statistical approach was used to identify several pathways involved in human HCC based on the rat microarray. The pathway analysis reveals that the expected involvement in apoptosis, cell cycle, growth and differentiation, genetic stability and methionine metabolism are important for cancer development, maintenance and progression. The results indicate that the gene expression profiles of the transgenic rat model may be useful in the study of human liver cancer.

Microarray experiment and results
The details of the microarray experimental procedure is reported elsewhere [21]. Briefly, RNA samples were isolated from the rat liver tissues of six samples, two controls, two adenomas and two carcinomas. The laser capture microdissected samples were amplified prior to microarray hybridization. An NCI cDNA array (IncyteGem2) was used that contains 10238 probes representing 9984 unique genes. Gene expression profiles were produced for all six samples with dye flip, which resulted in a total of 12 arrays.
The log2 ratio-based mean global normalization was first applied and the normalized ratios of the swapped dye labels were then averaged. A total of 9150 genes remained for further analysis after removing non hybridized genes due to low intensity. Significantly differentially expressed genes were determined using an F-test with P < 0 .05.

Data analysis using ArrayTrack
Most analyses were conducted using in-house software, ArrayTrack http://edkb.fda.gov/webstart/arraytrack/. ArrayTrack is bioinformatics software, where data management, analysis and interpretation are fully integrated [37]. ArrayTrack consists of three components: (1) Micro-arrayDB for storing microarray data; (2) LIBRARY for data interpretation that contains many types of functional information about genes, proteins and pathways; and (3) TOOL that provides functionality for microarray data analysis. LIBRARY contains many sub-libraries and data in these sub-libraries is extracted from different biological databases in public domain (e.g., NCBI bioinformatics resources) [38]. In this project, information for orthology analysis, chromosome-based analysis and pathway analysis was retrieved from LIBRARY. More specifically, • Gene Orthology Analysis -The human and mouse orthologs to rat were obtained from the Orthologene Library in ArrayTrack. The content of the Orthologene Library are mainly derived from the NCBI HomoloGene database [4,5]. ArrayTrack allows rapid matching of a large number of genes across human, mouse and rat for the gene orthology analysis.
• Chromosome-based analysis -The cytogenetic locations of genes were exported directly from the Gene Library of ArrayTrack. A novel visualization tool, the Chromosome Plot, was developed to study the effect of a gene expression pattern on liver cancer through identifying the altered cytogenetic regions of each chromosome. Figure 1 shows a bar chart depiction with the y-axis giving cytogenetic location along each rat chromosome represented by 20 vertical bars extending along the x-axis. This kind of plot has two uses. It depicts rat genes in their cytogenetic locations on each chromosome using color coding expression information as red for up-regulation, green for down and grey for unaffected genes (e.g., Figure  1). Thus, the plot provides for a specific species a compact visual display of cytogenetic blocks and/or chromosomes altered. Alternatively, the genes can also be mapped to the chromosomal location of another species, and color coded according to the chromosome of the experiment species. • Pathway analysis -The pathway data was obtained from the Pathway Library in ArrayTrack. The Pathway Library contains pathways from both the Kyoto Encyclopedia of Genes and Genomes (KEGG) [35] and PathArt (Jubilant Biosys Ltd, Columbia, MD 21045) that can be searched separately or in combination in ArrayTrack. The Fisher Exact Test [39] was used to estimate the statistical significance of pathway i: Where N is total number of genes on the chip (i.e., 9150), m is the number of differential expressed genes identified using the F-test (i.e., 2223), n i is the number of genes out of N that belong to pathway i while m i is the number of genes out of M differential expressed genes belong to pathway i. The two-sided Fisher's Exact Test p-value less than 0.05, suggest that the probability of significant genes in this pathway is not expected by chance alone.

Comparative Genomic Microarray Analysis (CGMA)
CGMA identifies cytogenetic regions containing unidirectional gene expression biases. The biased regions possibly indicate chromosomal gains and losses [12]. Of the total 9150 genes, GenBank accession numbers (Refseq in NCBI) for human orthologs to rodent genes were obtained for 2925 genes out of 3414 genes with Homologene ID using the Orthologene Library and Gene Library in ArrayTrack; ESTs and genes that may be unique to rodent were excluded. A two-tailed z-statistic was then computed to test whether chromosomal regions exhibited gene-expression biases [12]. CGMA was done for each of two adenoma samples and each of the two carcinoma p m N m n N n N m m m n m N n m m Expressed genes in microarray mapped to the rat cytoge-netic location and chromosome Figure 1 Expressed genes in microarray mapped to the rat cytogenetic location and chromosome. The genes were obtained from an ANOVA analysis among two controls, two adenomas and two carcinomas samples of the transgenic rat. The cytogenetic location of genes is on the yaxis for each of the 20 rat chromosomes that are displayed as separate bars along the x-axis. Red and green areas are the significant genes that are up or down regulated, respectively, and grey represents those genes not differentially expressed.

Results
A total of 2223 differentially expressed genes was identified across three groups of samples (i.e., normal, adenoma and carcinoma) based on an F-test. The differentially expressed genes were first mapped to the rat chromosome. As depicted in Figure 1, the differentially expressed genes primarily occurred in several chromosomes, indicating that these chromosomes were altered in rats with neoplasm compared to normal rats. Specifically, a large number of up-regulated genes mapping in the rat chromosome 1q is consistent with previous findings of high amplification in rat liver cancer [22].
To investigate the cross-species extrapolation based on the results from the transgenic rat, the differentially expressed genes were first mapped to the orthologous chromosomal location of the mouse chromosomes. As depicted in Figure 2, the majority of the differentially expressed genes from the rat 1q that are known to be important for the rat liver cancer development appear mainly on the mouse chromosomes 7, 17, and 19 (displayed as the orange band in Figure 2). A comparison of rat to human shows that the differentially expressed genes from the rat 1q appear primarily on human chromosomes 10, 11 and 19 ( Figure 3).
The results suggest that the mouse chromosomes (7, 17 and 19) and human chromosomes (10, 11 and 19) might be important in liver cancer for these species. The findings are supported by a number of reports [12,23,24]. Table 1 lists the cytogenetic location of the differentially expressed genes from the rat 1q and location of the orthologous gene in human and mouse. There are seven groups of significantly expressed genes (called gene blocks); genes in each group are consecutive to each other and across species. The genes in the same blocks could be coordinately expressed to perform similar transcriptional programs or physiological processes across species in liver cancer development and maintenance. For example, the human gene blocks 10q24-26, 11p15.5, 11q13-15 and 19q13.2 have corresponding blocks on rat 1q, and corresponding blocks on mouse chromosomes 7 and 19. These blocks are associated with several cancer-related processes and functions, including apoptosis, M phase, cell communication and nuclear division as seen in a statistical analysis based on Gene Ontology (results not shown).
To further confirm the validity of cross-species extrapolation, we investigated chromosomal aberration in human based on the differentially expressed genes from the rat model using CGMA. Genes significantly differentially expressed in rat (shown in Figure 1) mapped to the orthologous genes on chromosome of mouse Genes significantly differentially expressed in rat (shown in Figure 1) mapped to the orthologous genes on chromo-somes of human   up-regulation and most of these are associated with adenoma. The CGMA results were further compared with Karyotype results in the Cancer Genome Anatomy Project (CGAP) in the Mitelman Database [25]. Of 15 affected chromosomal regions identified from of rat gene expression data, 10 regions are also reported in CGAP. This is shown in the last column of Table 2 that lists both the number of citations and number of patients in CGAP.
We also investigated which pathways in human were significantly affected based on the differentially expressed genes identified in the transgenic rat model. Pathway analysis is a particular effective way to examine how the findings in the rat model relate to human in the context of biological functions. Table 3 summarizes the results of the pathways analysis. Fifteen pathways were significantly altered in a Fisher's Exact Test with p < 0.05. They predominately involve cell cycle, cell growth and differentiation. Most identified pathways are confirmed by a large literature to be associated with many cancers types [26,27]. Examples are 1) the p53 pathway involved in response to DNA damage, 2) the Rb pathway involves in the control of cell cycle, and 3) the transforming growth factor-beta (TGF-beta) pathway involved in growth inhibition. In addition, altered methionine metabolism pathway and regulation of P27 during cell cycle progression are known to be critical for cancer progression [28].

Discussion
This study investigates the implications of using microarray results from the albumin-SV40 transgenic rat for the study of human liver cancer. We demonstrated the importance of bioinformatics to interpret microarray data for the cross-species comparisons. Specifically, two in-house bioinformatics tools are of importance for the analysis, the Chromosome Plot and ArrayTrack. The Chromosome Plot not only provides a visual presentation of the gene expression pattern at the level of gene order across chromosomes (e.g., Figure 1), but also can be used to map chromosome and cytogenetic location of differentially expressed genes from one species to another (e.g., Figures  2 and 3). ArrayTrack software that integrates data from public repositories was used to identify the cross-species orthologous genes, their chromosomal locations and, most importantly, the pathways that may be related to liver cancer. In addition, CGMA analysis was performed to investigate the variability of the multiple chromosome aberration patterns based on gene expression data, which is compared with the results presented in CGAP.

Implications of orthologs and chromosome-based analysis
The recently completed sequencing of the rat genome provides a basis for future research to elucidate how differences and commonalities affect the ability of rat models to predict human disease. The rat genome project reported that almost all human genes known to be associated with disease have orthologous genes in the rat genome, and that human, mouse and rat genomes are approximately 90% orthologous [1]. We also analyzed orthologous genes between human, rat and mouse among the 9150 genes on the chip using ArrayTrack. The chip was found in Orthologene Library in ArrayTrack to contain 3414 human, 3365 mouse and 1950 rat genes, with the rest of genes being either EST tags or Riken genes (about 1500).
The results showed that 92% of human genes are orthologous to either rat or mouse.
Although a large number of genes was identified to be differentially expressed from the rat model, some of these genes may result from the cancer rather than causally   related. In addition, the function of a specific gene and its involvement in cancer might not be conserved across species. Thus, as important as structural and functional homology of specific genes is, the conservation of function of blocks of genes is likely to be more important in cross-species comparison. We found seven distinct blocks of significantly differentially expressed genes within different cytogenetic regions of the rat 1q with homologous chromosomal segments in human and mouse (Table 1). However, human, mouse and rat have very different chromosomal arrangements. The genes in these blocks appear consecutively in contiguous cytogenetic regions, irrespective of species and chromosomal location. This finding is not surprising considering the close evolution-  ary distance between the species where 278 orthologous segments are reported to be shared between human and rat, and 280 segments are reported to be shared between human and mouse [1].
It is proposed that these seven blocks of genes may be of significance for liver cancer development, maintenance and progression across human, rat, and mouse. For example, genes in the blocks may be coordinately expressed to share transcription programs or to respond to the genomic instability observed in liver cancer. Several genes in Table 1 show large fold changes, and are implicated in cancer development and maintenance. For example, Rps16, Rps19 and Rps3 code for ribosomal proteins, and their altered expression has been associated with liver and other tumors [29,30]. Insulin-like growth factor2 receptor (Igf2r) is mutated in many human HCC tumors and the gene's haploid insufficiency has been suggested as an early event in human hepatocarcinogenesis [31]. Cyclin D1 (Ccnd1) and cyclin-dependent kinase inhibitor 1C (Cdkn1c = p57) are critical for the cell cycle, including G1 progression and G1/S transition. Cyclin D1 has been shown to be amplified in 10 to 20% of HCCs.

Implication of CGMA analysis
Chromosomal aberrations are common in cancers, particularly in advanced stages. CGH has been employed to determine gross DNA gains and losses at chromosomal and sub-chromosomal levels [10]. CGH, however, is timeconsuming, and lacks the resolution and sensitivity to detect changes at the gene level; for example, CGH is unable to detect copy number changes within narrow regions of chromosomes (alternation of <1 Mb). It fails to identify putative tumor-suppressor genes or oncogenes [32]. These limitations might be overcome by using CGMA [12]. CGMA identifies cytogenetic regions containing unidirectional gene expression biases. Such region-dependent expression change may be the result of allelic imbalances commonly found in liver and other cancers. Evidence shows that DNA copy number alterations (deletion, low, mid and higher-level amplification) with an average 2fold change in DNA copy number corresponds about a 1.5-fold change in mRNA level [33]. Therefore, CGMA based on microarray data measuring mRNA level could be related to DNA level.
Using CGMA, we identified 15 out of 46 (23 p and 23 q arms) human chromosomal regions that could be involved in liver cancer development, maintenance and progression. These chromosomal aberrations are consistent with the CGAP report for 10 out of 15 chromosome regions by karyotypes (Table 2). Although CGAP database ( Table 2) cites no evidence of involvement of chromosome 19 in human liver cancer, we found that genes in both chromosome 19q and 19p are significantly down-regulated for three out of four tumor samples. In addition, there is also a block of genes in 19q that corresponds to rat 1q (Table 1) while a large number of differentially expressed genes also occurs in 19p using the Chromosome Plot (Figure is not shown). Analysis of both human 19p and q suggests the possible relevance of the chromosome in human liver cancer. The genes significantly altered in rat micorarray corresponding to human 19p13 are JunB, Rab8a (Mel), Tnfsf9 and Dnmt1. Further investigation is required to confirm their relevance to human HCC. Comparing with the findings by Crawley et al [12], we predicted chromosomal gains for five (12q, 16, 17q, 19p and 20q) out of eight of those reported by Crawley who carried out CGMA with human HCC gene expression arrays. Both our analysis and that of Crawley's suggest the importance of 19p in the human liver cancer, a region of aberration not previously discovered with CGH analysis.
These results indicate human 19p as a region of aberration not previously discovered with CGH analysis.

Implications of pathway analysis
Pathways are the best vehicle for interpretation of biological functions of genes. An important goal of modern biology is to identify the interaction and regulatory networks among biological molecules. A logical approach is to analyze the gene expression changes in the context of known biological pathways [34,35]. A number of human pathways were found to be significantly altered using the Fisher's Exact Test by comparing the number of genes with altered expression in a pathway to the number of genes on the chip in the same pathway.
We inferred which pathways are involved in human liver cancer from the differentially expressed genes in the transgenic rat liver cancer model. It is important to point out that the statistically significant pathways identified in this process were solely based on the analysis of microarray data together with the Orthologene and Pathway Libraries in ArrayTrack, and thus required no a priori knowledge regarding cancer genes and the pathways that they control. The results of the pathway analysis given in Table 3 include those involved in apoptosis, cell cycle, cell growth and differentiation and others that are significant in liver cancer.
Most of the altered pathways are involved in cell cycle regulation. In cancer, disruption of cell cycle regulation is accomplished by coordinating the activity of cyclindependent kinases, checkpoint controls, and DNA repair pathways, which, when perturbed lead to uncontrolled cell growth [27]. Not surprisingly, our studies support that P53 and Rb signaling pathways as well as cyclin mediated pathway including G1-S checkpoints are altered. Both p53 and Rb are tumor-suppressor genes, and their products are transcription factors that respond to a variety of stress sig-nals and are often associated with the progression of neoplastic diseases. The transgenic model implies that both P53 and Rb signaling pathways could be disrupted in the human liver cancer. Without P53 and Rb, cell cycle arrest and/or programmed cell death (apoptosis) are inhibited, leading to accumulation of mutations and genetic instability. Since P53 is deactivated in this transgenic model, we also observed that the pathway that influences Ras and Rho proteins during G1 to S transition is altered. Ras is a proto-oncogene that is involved in multiple signal transduction pathways transmitting pro-proliferative signals to the nucleus, while Rho proteins are members of the extended Ras family that modulate gene expression, cell cycle progression, and cell proliferation and survival.
Categoies of altered pathways are associated with growth and differentiation. Genes in the ECM (extracellular matrix) and Integrin mediated signaling pathway have been reported to be over-expressed in human HCC, though the mechanism is not fully understood [36]. In addition, an excess of TGF-beta is thought to overwhelm the cell in two ways. First, it promotes the overgrowth of blood vessels. Second, excess TGF-beta suppresses T cells and other components of the immune system that would normally attack aberrant cells.
The human-relevant liver cancer pathways based on SV 40 transgenic rat liver model are confirmed by reports on human liver cancer models. Therefore, the pathway analysis using Fisher's Exact Test is novel and efficient.

Conclusion
We presented several bioinformatics approaches to extrapolate microarray data involving rat liver cancer to the human. Microarray has been widely used in many fields of medical and biological research. The current challenge of bioinformatics of microarray is no longer to identify a list of differentially expressed genes, but to develop effective bioinformatics processes and tools for data interpretation and knowledge discovery. In this study, we first developed a Chromosome Plot that provides a compact visual summary of gene expression data at the level of chromosomal location for identification of altered chromosomal regions. This tool facilitates cross-species comparison. The information available in ArrayTrack on gene ontology, gene orthologs and gene pathways was then used to interpret the microarray data. Finally, the CGMA bioinformatics tool was uesd to infer HCC chromosomal aberrations in the human based on microarray data from rat. The important lesson of this study is how to limit the information using bioinformatics resources and statistical means to present an unbiased (or statistical) view to interpret microarray results with respect to genes, pathways, chromosomes and functions. Based on a thorough bioinformatics analysis, we found that the albumin-SV40 trans-genic rat is a useful animal model for prediction of human liver cancer.