CHESS (CgHExpreSS): A comprehensive analysis tool for the analysis of genomic alterations and their effects on the expression profile of the genome
© Lee and Kim; licensee BioMed Central Ltd. 2009
Received: 23 July 2009
Accepted: 16 December 2009
Published: 16 December 2009
Genomic alterations frequently occur in many cancer patients and play important mechanistic roles in the pathogenesis of cancer. Furthermore, they can modify the expression level of genes due to altered copy number in the corresponding region of the chromosome. An accumulating body of evidence supports the possibility that strong genome-wide correlation exists between DNA content and gene expression. Therefore, more comprehensive analysis is needed to quantify the relationship between genomic alteration and gene expression. A well-designed bioinformatics tool is essential to perform this kind of integrative analysis. A few programs have already been introduced for integrative analysis. However, there are many limitations in their performance of comprehensive integrated analysis using published software because of limitations in implemented algorithms and visualization modules.
To address this issue, we have implemented the Java-based program CHESS to allow integrative analysis of two experimental data sets: genomic alteration and genome-wide expression profile. CHESS is composed of a genomic alteration analysis module and an integrative analysis module. The genomic alteration analysis module detects genomic alteration by applying a threshold based method or SW-ARRAY algorithm and investigates whether the detected alteration is phenotype specific or not. On the other hand, the integrative analysis module measures the genomic alteration's influence on gene expression. It is divided into two separate parts. The first part calculates overall correlation between comparative genomic hybridization ratio and gene expression level by applying following three statistical methods: simple linear regression, Spearman rank correlation and Pearson's correlation. In the second part, CHESS detects the genes that are differentially expressed according to the genomic alteration pattern with three alternative statistical approaches: Student's t-test, Fisher's exact test and Chi square test. By successive operations of two modules, users can clarify how gene expression levels are affected by the phenotype specific genomic alterations. As CHESS was developed in both Java application and web environments, it can be run on a web browser or a local machine. It also supports all experimental platforms if a properly formatted text file is provided to include the chromosomal position of probes and their gene identifiers.
CHESS is a user-friendly tool for investigating disease specific genomic alterations and quantitative relationships between those genomic alterations and genome-wide gene expression profiling.
It is well-known that genomic alterations frequently occur in many cancer patients and play important mechanistic roles in pathogenesis of cancer. Recently, in order to identify genomic alteration regions, a comparative genomic hybridization (CGH) technology has been extensively applied to various types of cancer cases. CGH is a molecular cytogenetic method that detects gain or loss of genomic DNA content of an individual, which is accomplished by measuring the ratio between the intensity of test DNA and that of reference DNA . As the technology of array-based CGH has advanced, the resulting unprecedented detailed examination of chromosomal regions has led to efforts to discover genomic alterations as the genetic markers for various diseases [2, 3]. Those efforts have served to emphasize the fact that genomic alterations play important roles in a particular disease. Furthermore, genomic alterations can modify the expression level of genes due to changed copy number in the relevant chromosomal regions. Recent studies have been concerned with verifying the existence of a strong genome-wide correlation between DNA content and gene expression [4, 5]. These important CGH-based biological discoveries have spurred the widespread use of the technique, which has prompted the need for a genomic alteration analysis tool.
CHESS's main features
Built-in GA detection
Frequent GA region definition
Enrichment analysis for GO and KEGG
Integrative analysis of genomic CGH and gene expression
Free for academic
We chose Java as the programming language because it is publicly available and ensures cross-platform compatibility. Moreover, CHESS was developed in both Java application and webstart environments; it can be run on a web browser or a local machine. If CHESS is operated through a web browser, any uploaded data is not transmitted anywhere because all the analysis are performed locally using webstart. CHESS can deal with high-density arrays on commonly used desktop computers. For example, we were able to load 30 Agilent 244 k arrays in 3 min on a computer with 3 GB of memory and a 2.4 Ghz processor.
CHESS is composed of two primary modules: a genomic alteration analysis module and an integrative analysis module. The first module is responsible for the detection of genomic alteration regions and investigates whether the detected regions are phenotype specific or not. Genes located in the altered regions are automatically listed and biological information is given by an implemented annotation module. The integrative analysis module includes a combined analysis of genomic alteration and gene expression. For user's rapid understanding of the complete analyzed results, CHESS provides a resulting figure on a whole chromosomal scale. CHESS also provides the analyzed results on a single chromosomal scale for further detailed analysis.
CHESS was developed to support all experimental platforms if only a properly formatted text file is given including general information such as probe identification, chromosomal location and normalized log transformed ratio values for each probe. CHESS handles two sets of experimental data: signal ratio values from a CGH experiment and those from a gene expression experiment. Firstly, the CGH ratio file acquired from the CGH experiment should define header information in the first four lines; total number of samples, total number of probes, sample names and their clinical information. The test/reference ratio values should be written from the fourth line, in which the first five fields record probe identification, probe name or alias, chromosome number, start position and stop position with tab separations. The rest of the columns are considered as the ratio values for each sample. When the user successfully loads the CGH ratio file and finishes detecting the genomic alteration regions, a GA (Genomic Alteration) file is automatically generated that has the exactly the same format as the CGH ratio file. However, the GA file contains discrete values concerning genomic alteration in the form of gain (+1), loss (-1) and no genomic alteration (0), instead of the continuous signal values of the CGH ratio file. To support the output from other detection algorithms as aCGH  and DNAcopy , CHESS allows direct loading of the GA file. Secondly, the gene expression file is also a tab delimited text file to record the level of mRNA expression. It can have two kinds of data formats according to its experimental method, single channel array and dual channel array. The single channel array hybridizes two samples on the separate arrays and makes two separate files that record intensity values for test and reference samples. In the other hand, the dual channel array hybridizes two samples on the same array and creates one ratio file that records ratio values of test/reference intensity values. CHESS handles these two kinds of data formats and they have same file format. The first two lines include information on data dimension of used sample number and probe number. The third line lists the sample names separated by tabs. The rest of the lines record ratio values or intensity values for entire probe in which the first three columns list probe name, the corresponding gene symbol, chromosome number and the subsequent columns record actual expression values. Finally, the gene mapping file is needed to match probes used in CGH to a corresponding gene for biological interpretation. It has a very simple format, in which the gene symbol is followed by CGH probe identification with tab separations.
Definition of genomic alteration
Identification of phenotype specific genomic alteration regions
Integrative analysis of genomic alteration and gene expression data
Case study of CHESS using colorectal cancer data set
We have implemented a Java-based program named CHESS for the comprehensive analysis of genomic alteration. Functionally, CHESS can be divided into two parts. The first function is responsible for detection of genomic alteration region from the CGH data, and investigation of the relationship between detected alterations and the particular phenotype. The other function is the statistical analysis of the influence of genomic alteration on gene expression profiling. CHESS provides various optional statistical methods for these kinds of analysis, which enables users to choose the proper algorithm for their own data. Additionally, CHESS's detailed visualization module helps users understand massive data easily and intuitively. Finally, CHESS can be used as an essential tool for researchers who study genomic alteration as a molecular marker and characterize its underlying role on downstream mechanism(s) in the pathogenesis of a disease.
Availability and requirements
Project name: CgHExpreSS
Project homepage: http://biostone.khu.ac.kr/CHESS/
Operating systems: Windows and Linux
Programming language: Java
Other requirements: JRE 6 or higher (Java Runtime Environment)
License: free non-commercial research use license
Any restrictions to use by non-academics: none
This research was funded by KyungHee University. (KHU-20060424)
- Lockwood WW, Chari R, Chi B, Lam WL: Recent advances in array comparative genomic hybridization technologies and their applications in human genetics. Eur J Hum Genet 2006, 14(2):139–148. 10.1038/sj.ejhg.5201531View ArticlePubMedGoogle Scholar
- van Beers EH, Nederlof PM: Array-CGH and breast cancer. Breast Cancer Res 2006, 8(3):210. 10.1186/bcr1510PubMed CentralView ArticlePubMedGoogle Scholar
- Diep CB, Kleivi K, Ribeiro FR, Teixeira MR, Lindgjaerde OC, Lothe RA: The order of genetic events associated with colorectal cancer progression inferred from meta-analysis of copy number changes. Genes Chromosomes Cancer 2006, 45(1):31–41. 10.1002/gcc.20261View ArticlePubMedGoogle Scholar
- Hyman E, Kauraniemi P, Hautaniemi S, Wolf M, Mousses S, Rozenblum E, Ringner M, Sauter G, Monni O, Elkahloun A, Kallioniemi O, Kallioniemi A: Impact of DNA amplification on gene expression patterns in breast cancer. Cancer Res 2002, 62(21):6240–6245.PubMedGoogle Scholar
- Tsafrir D, Bacolod M, Selvanayagam Z, Tsafrir I, Shia J, Zeng Z, Liu H, Krier C, Stengel RF, Barany F, Gerald WL, Paty PB, Domany E, Notterman DA: Relationship of gene expression and chromosomal abnormalities in colorectal cancer. Cancer Res 2006, 66(4):2129–2137. 10.1158/0008-5472.CAN-05-2569View ArticlePubMedGoogle Scholar
- Kim TM, Jung YC, Rhyu MG, Jung MH, Chung YJ: GEAR: genomic enrichment analysis of regional DNA copy number changes. Bioinformatics 2008, 24(3):420–421. 10.1093/bioinformatics/btm582View ArticlePubMedGoogle Scholar
- Shankar G, Rossi MR, McQuaid DE, Conroy JM, Gaile DG, Cowell JK, Nowak NJ, Liang P: aCGHViewer: A Generic Visualization Tool For aCGH data. Cancer Inform 2006, 2: 36–43.PubMed CentralPubMedGoogle Scholar
- Liva S, Hupe P, Neuvial P, Brito I, Viara E, La Rosa P, Barillot E: CAPweb: a bioinformatics CGH array analysis platform. Nucleic Acids Res 2006, (34 Web Server):W477–481. 10.1093/nar/gkl215
- van Wieringen WN, Belien JA, Vosse SJ, Achame EM, Ylstra B: ACE-it: a tool for genome-wide integration of gene dosage and RNA expression data. Bioinformatics 2006, 22(15):1919–1920. 10.1093/bioinformatics/btl269View ArticlePubMedGoogle Scholar
- Conde L, Montaner D, Burguet-Castell J, Tarraga J, Medina I, Al-Shahrour F, Dopazo J: ISACGH: a web-based environment for the analysis of Array CGH and gene expression which includes functional profiling. Nucleic Acids Res 2007, (35 Web Server):W81–85. 10.1093/nar/gkm257
- La Rosa P, Viara E, Hupe P, Pierron G, Liva S, Neuvial P, Brito I, Lair S, Servant N, Robine N, Manie E, Brennetot C, Janoueix-Lerosey I, Raynal V, Gruel N, Rouveirol C, Stransky N, Stern MH, Delattre O, Aurias A, Radvanyi F, Barillot E: VAMP: visualization and analysis of array-CGH, transcriptome and other molecular profiles. Bioinformatics 2006, 22(17):2066–2073. 10.1093/bioinformatics/btl359View ArticlePubMedGoogle Scholar
- Chari R, Coe BP, Wedseltoft C, Benetti M, Wilson IM, Vucic EA, MacAulay C, Ng RT, Lam WL: SIGMA2: a system for the integrative genomic multi-dimensional analysis of cancer genomes, epigenomes, and transcriptomes. BMC Bioinformatics 2008, 9: 422. 10.1186/1471-2105-9-422PubMed CentralView ArticlePubMedGoogle Scholar
- Fridlyand J, Snijders AM, Pinkel D, Albertson DG, Jain A: Hidden Markov models approach to the analysis of array CGH data. J Multivariate Analysis 2004, 90: 132–153. 10.1016/j.jmva.2004.02.008View ArticleGoogle Scholar
- Olshen AB, Venkatraman ES, Lucito R, Wigler M: Circular binary segmentation for the analysis of array-based DNA copy number data. Biostatistics 2004, 5(4):557–572. 10.1093/biostatistics/kxh008View ArticlePubMedGoogle Scholar
- Price TS, Regan R, Mott R, Hedman A, Honey B, Daniels RJ, Smith L, Greenfield A, Tiganescu A, Buckle V, Ventress N, Ayyub H, Salhan A, Pedraza-Diaz S, Broxholme J, Ragoussis J, Higgs DR, Flint J, Knight SJ: SW-ARRAY: a dynamic programming solution for the identification of copy-number changes in genomic DNA using array comparative genome hybridization data. Nucleic Acids Res 2005, 33(11):3455–3464. 10.1093/nar/gki643PubMed CentralView ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.