Gevab: a prototype genome variation analysis browsing server
- Woo-Yeon Kim†1,
- Sang-Yoon Kim†1,
- Tae-Hyung Kim†1,
- Sung-Min Ahn2, 3,
- Ha Na Byun1,
- Deokhoon Kim2,
- Dae-Soo Kim1,
- Yong Seok Lee1,
- Ho Ghang1,
- Daeui Park1,
- Byoung-Chul Kim1,
- Chulhong Kim1,
- Sunghoon Lee1,
- Seong-Jin Kim2Email author and
- Jong Bhak1Email author
© Kim et al; licensee BioMed Central Ltd. 2009
Published: 3 December 2009
The first Korean individual diploid genome sequence data (KOREF) was publicized in December 2008.
A Korean genome variation analysis and browsing server (Gevab) was constructed as a database and web server for the exploration and downloading of Korean personal genome(s). Information in the Gevab includes SNPs, short indels, and structural variation (SV) and comparison analysis between the NCBI human reference and the Korean genome(s). The user can find information on assembled consensus sequences, sequenced short reads, genetic variations, and relationships between genotype and phenotypes.
Most known genome browsers, such as NCBI genome  and Craig Venter's genome browsers , were built for consensus sequences from multiple individuals to construct a reference human genome. Examples of haplotype genome browsers are NCBI, UCSC , Ensembl , and Venter genome browsers. Recently, the first Asian (Chinese) diploid genome database was published, containing analysis and browsing facilities [5, 6]. There are a number of general purpose genome annotation servers. They include Entrez Gene , Ensembl genes, OMIM  disease associations, HapMap , SNPedia , and genetic variations of several individual genomes such as Venter , Watson , YH (Chinese), and NA18507 (Yoruba) . We have developed an individual genome variation analysis and browsing server (Gevab) for the first Korean personal genome sequence (KOREF).
This server is useful to analyze a diploid human genome produced to study the complex features of human genetic variations. The system integrated multiple variation information such as Venter, Watson, YH, dbSNP, and HapMap genotypes as well as gene information. Hence, users can comparatively study the genotypes in human. Gevab also provides information for SNPs, short indels, and SVs on the KOREF genome. Gevab has two parts: genome variation analysis and genome mapping.
Materials and methods
KOREF data were generated using the Illumina GA and resulted in 82.73 gigabase (Gb) of sequence (about 1248 million paired 36-base reads and about 504 million 75-base reads).
Using the MAQ (Mapping and Assembly with Qualities)  program, these sequences were aligned to the NCBI human genome reference (build 36, without Ns, 2,858,029,377 bp). In total, 99.9% of the NCBI reference genome was covered with an average of 25.92-fold depth (sequencing depth was 28.95-fold).
Database and browser software
In the Gevab Korean genome variation browsing part, the consensus genome sequence and genetic variants include SNPs, short indels, and SVs can be displayed. Gevab used GBrowse  developed by GMOD  for variation viewing, and the genome map browser part was developed by KOBIC.
Analysis of KOREF
From the KOREF genome sequence, 3.44 millions SNPs were identified and validated using Illumina 1 M-duo and Affy 6.0 BeadChip. We identified 342,965 short indels (-29 - +14 bp). Indels that co-occurred within a window size of 20 bp were filtered out, since they were primarily from length polymorphisms in homopolymeric tracts of A or T. Using paired-end reads, we found 2920 deletions and 415 inversion structural variants (SV) in the range of 0.1~100 kb. In addition, we detected 963 insertion events in the range of 175~250 bp. These insertions are present in the KOREF genome but absent in the NCBI reference genome. MySql and PHP, python, and AJAX were used in database construction and interface utility.
Features of Gevab
Features of Gevab, Venter, Watson, and YH genome browsers. Availability of features is indicated by "O" for "yes" and "X" for "no."
variations to compare
Venter, Watson, YH, dbSNP, HapMap
Gevab's map browser
The genome map browser provides reads mapping and quality information obtained from a personal genome project. A search can be done by chromosomal position. The width of a displayed region can be controlled. The browser also has zoom in and out and left and right movement functions.
Gevab's variation browser
KOREF data access
The KOREF database is developed and maintained by KOBIC (Korean Bioinformation Center). The database contains all the raw and processed data of KOREF, including KOREF consensus sequence, genetic variants, and short read alignments. These data are available for downloading. The KOREF data have been deposited in the NCBI Short Read Archive (Accession Number SRA008175).
Gevab contains all the raw and processed data of a Korean genome sequence, variants, and annotation. Gevab provides open and public access to all data of an individual personal diploid genome.
The variation browser part was designed to present genetic variant evidence, including the position, number, and status of reads, GC content, and several mapping information. These provide valuable detailed information such as comparison and validation of genetic variations to further communities for sequencing individual genome.
Other papers from the meeting have been published as part of BMC Genomics Volume 10 Supplement 3, 2009: Eighth International Conference on Bioinformatics (InCoB2009): Computational Biology, available online at http://www.biomedcentral.com/1471-2164/10?issue=S3.
This work was supported by a grant from the KRIBB Research Initiative Program of Korea, by the Korea Science and Engineering Foundation (KOSEF) grant funded by the Korean government (MOST), the National Research Foundation of Korea (NRF) grant (No. R11-2008-044-03004-0, S.M.A.), a grant from Ministry of Knowledge Economy (Standard Reference Data Program), and generous funding from the Gachon University of Medicine and Science & Gachon University Gil Hospital. We thank Ryu Gichan for crucial administration assistance, Ryu Jeawoon and Cho Suan for web application, and Maryana Bhak for editing.
This article has been published as part of BMC Bioinformatics Volume 10 Supplement 15, 2009: Eighth International Conference on Bioinformatics (InCoB2009): Bioinformatics. The full contents of the supplement are available online at http://www.biomedcentral.com/1471-2105/10?issue=S15.
- Axelrod N, Lin Y, Ng PC, Stockwell TB, Crabtree J, Huang J, Kirkness E, Strausberg RL, Frazier ME, Venter JC, et al.: The HuRef Browser: a web resource for individual human genomics. Nucleic Acids Res 2009, (37 Database):D1018–1024. 10.1093/nar/gkn939Google Scholar
- Kuhn RM, Karolchik D, Zweig AS, Wang T, Smith KE, Rosenbloom KR, Rhead B, Raney BJ, Pohl A, Pheasant M, et al.: The UCSC Genome Browser Database: update 2009. Nucleic Acids Res 2009, (37 Database):D755–761. 10.1093/nar/gkn875Google Scholar
- Hubbard TJ, Aken BL, Ayling S, Ballester B, Beal K, Bragin E, Brent S, Chen Y, Clapham P, Clarke L, et al.: Ensembl 2009. Nucleic Acids Res 2009, (37 Database):D690–697. 10.1093/nar/gkn828Google Scholar
- Wang J, Wang W, Li R, Li Y, Tian G, Goodman L, Fan W, Zhang J, Li J, Guo Y, et al.: The diploid genome sequence of an Asian individual. Nature 2008, 456(7218):60–65. 10.1038/nature07484PubMed CentralView ArticlePubMedGoogle Scholar
- Chinese genome browser[http://yh.genomics.org.cn/]
- Maglott D, Ostell J, Pruitt KD, Tatusova T: Entrez Gene: gene-centered information at NCBI. Nucleic Acids Res 2007, (35 Database):D26–31. 10.1093/nar/gkl993Google Scholar
- Amberger J, Bocchini CA, Scott AF, Hamosh A: McKusick's Online Mendelian Inheritance in Man (OMIM). Nucleic Acids Res 2009, (37 Database):D793–796. 10.1093/nar/gkn665Google Scholar
- Frazer KA, Ballinger DG, Cox DR, Hinds DA, Stuve LL, Gibbs RA, Belmont JW, Boudreau A, Hardenbol P, Leal SM, et al.: A second generation human haplotype map of over 3.1 million SNPs. Nature 2007, 449(7164):851–861. 10.1038/nature06258View ArticlePubMedGoogle Scholar
- Levy S, Sutton G, Ng PC, Feuk L, Halpern AL, Walenz BP, Axelrod N, Huang J, Kirkness EF, Denisov G, et al.: The diploid genome sequence of an individual human. PLoS Biol 2007, 5(10):e254. 10.1371/journal.pbio.0050254PubMed CentralView ArticlePubMedGoogle Scholar
- Wheeler DA, Srinivasan M, Egholm M, Shen Y, Chen L, McGuire A, He W, Chen YJ, Makhijani V, Roth GT, et al.: The complete genome of an individual by massively parallel DNA sequencing. Nature 2008, 452(7189):872–876. 10.1038/nature06884View ArticlePubMedGoogle Scholar
- Bentley DR, Balasubramanian S, Swerdlow HP, Smith GP, Milton J, Brown CG, Hall KP, Evers DJ, Barnes CL, Bignell HR, et al.: Accurate whole human genome sequencing using reversible terminator chemistry. Nature 2008, 456(7218):53–59. 10.1038/nature07517PubMed CentralView ArticlePubMedGoogle Scholar
- Li H, Ruan J, Durbin R: Mapping short DNA sequencing reads and calling variants using mapping quality scores. Genome Res 2008, 18(11):1851–1858. 10.1101/gr.078212.108PubMed CentralView ArticlePubMedGoogle Scholar
- Stein LD, Mungall C, Shu S, Caudy M, Mangone M, Day A, Nickerson E, Stajich JE, Harris TW, Arva A, et al.: The generic genome browser: a building block for a model organism system database. Genome Res 2002, 12(10):1599–1610. 10.1101/gr.403602PubMed CentralView ArticlePubMedGoogle Scholar
- Sherry ST, Ward MH, Kholodov M, Baker J, Phan L, Smigielski EM, Sirotkin K: dbSNP: the NCBI database of genetic variation. Nucleic Acids Res 2001, 29(1):308–311. 10.1093/nar/29.1.308PubMed CentralView ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.