Genovar: a detection and visualization tool for genomic variants

Background Along with single nucleotide polymorphisms (SNPs), copy number variation (CNV) is considered an important source of genetic variation associated with disease susceptibility. Despite the importance of CNV, the tools currently available for its analysis often produce false positive results due to limitations such as low resolution of array platforms, platform specificity, and the type of CNV. To resolve this problem, spurious signals must be separated from true signals by visual inspection. None of the previously reported CNV analysis tools support this function and the simultaneous visualization of comparative genomic hybridization arrays (aCGH) and sequence alignment. The purpose of the present study was to develop a useful program for the efficient detection and visualization of CNV regions that enables the manual exclusion of erroneous signals. Results A JAVA-based stand-alone program called Genovar was developed. To ascertain whether a detected CNV region is a novel variant, Genovar compares the detected CNV regions with previously reported CNV regions using the Database of Genomic Variants (DGV, http://projects.tcag.ca/variation) and the Single Nucleotide Polymorphism Database (dbSNP). The current version of Genovar is capable of visualizing genomic data from sources such as the aCGH data file and sequence alignment format files. Conclusions Genovar is freely accessible and provides a user-friendly graphic user interface (GUI) to facilitate the detection of CNV regions. The program also provides comprehensive information to help in the elimination of spurious signals by visual inspection, making Genovar a valuable tool for reducing false positive CNV results. Availability: http://genovar.sourceforge.net/.

 Notation of known and unknown CNV in terms of the above discovered CNV regions using DGV.
 Graphic representation of CNV region and aCGH information with multiple samples.
 User-intuitive and fast navigation on chromosomes for retrieving reads in terms of locus range query.
 Graphic-based display of sequence alignment results and read information in BAM (binary sequence alignment/map) format.
 Calculation of read-depth and allele frequency of each locus in the alignment area specified by the user and identification of known SNPs and CNV from dbSNP and DGV, respectively.
 Comparison of sequence alignment results between different samples. The input format of Array CGH contains probe number and name, chromosome number, probe starting and ending locus, and a series of log2 intensity values corresponding to one or more samples. Columns are delimited by '  After loading an Array CGH input file, Genovar displays a view of CGH value corresponding to the first sample in the file in a whole-chromosome context (Fig 2).

System Inputs
Users can choose other samples using a toolbar menu. In the resulting view, hyperand hypo-expressed regions are represented by green and red color, respectively. For -5 -further, more detailed analysis, Genovar also provides a detailed view on the singlechromosome scale, with log ratio values related to a specific chromosome given in table form. A plot of each log ratio value is also offered, along with a cytoband view, including zoom in and out functions (Fig 2).  -6 -Log ratio values, in table format, are automatically scrolled by clicking a specific position of the cytoband view. If mapping information between gene and probe has been loaded as well, a gene name for each record will be given instead of just a chromosome number. When user double-clicks a gene name in table, NCBI gene searching is performed if internet connection is available. A statistical summary pertaining to the region of interest (Fig 4) is also given in single-chromosome view using a pop-up menu. This statistical summary includes mean, median, max, min, sum, and variance, as well as an outlier filter, for a particular region; user-defined high or low values are discarded as outliers. -7 -Genovar detects copy number variant regions using the Smith-Waterman Array (SW-ARRAY) algorithm. Users should input parameters such as median absolute deviation (MAD) and size of island block to start the algorithm (Fig 7). Setting higher MAD values and island block sizes results in stricter CNV region detection. -8 -As a result, CNV regions on the whole chromosome scale (Fig 7) are provided, and users can choose specific chromosome regions for further detailed analysis using popup menu. -9 -6. Genomic comparison with multiple array CGH samples.

Copy Number Detection and Reporting known CNV region.
Another useful function of Genovar is comparison of CNV regions (Fig 11) or CGH values between samples (Fig 13).  (Fig 11). In this view, the user can query details regarding a particular region; detailed CNV regions with absolute loci are obtained by assigning starting and ending positions using pop-up menu. Comparison of Array CGH log ratio intensities among samples (Fig 14) is another common analysis task. To handle this, our system supports a view of CGH values for a given sample compared with those for other samples. In Fig 14,  Genovar shows multiple sequence alignment results simultaneously (Fig 19). This function is quite powerful because differences between two alignment results, for example, in SNP and read mapping information, are intuitively displayed. Calculation of allele frequencies and SNP calling for each sample are separately performed, and differences in SNPs between samples are directly shown. In Fig 19, the user has queried into dbSNP to survey already-reported SNPs, and Genovar has identified known SNPs from dbSNP. In this way, Genovar provides unknown SNP information in particular sequence alignment results to the user. dbSNP versions supported by Genovar include dbSNP132, dbSNP131 of Human Genome 19 (GRCh37), and dbSNP130, dbSNP129, dbSNP128 and dbSNP126 of Human Genome 18 (build 36).

Database
Currently, Genovar uses the Database of Genomic Variants (DGV, http://projects.tcag.ca/variation) [1] and Single Nucleotide Polymorphism Database (dbSNP) [2] of the National Center for Biotechnology Information (NCBI) to detect unknown variants. Using Database menu, user can directly connect DGV and dbSNP, and choose version of human genome(hg18/hg19) and dbSNP; dbSNP132, dbSNP131 of Human Genome 19 (GRCh37), and dbSNP130, dbSNP129, dbSNP128 and dbSNP126 of Human Genome 18 (build 36). Users are required to connect internet for these services.