Skip to main content

Table 2 Major tables in the Reference Variant Store that hold all imported variants and annotations

From: Integrating 400 million variants from 80,000 human samples with extensive annotations: towards a knowledge base to analyze disease cohorts

Table Description
Summary main table that stores each variant by chromosomal location, reference and alternate allele, dbSNP, and GRCh36/38 locations; most other tables are dependent tables
Impact effect(s) on gene, transcript, intron/exon, missense/ non-sense, CDS and amino acid change, where applicable; by transcript
Frequencies allele frequencies in large-scale sequencing studies (1000 Genomes, ESP6500, ExAC, Scripps Wellderly, etc.)
Predictions computational predictions of functional impact, such as PolyPhen-2, MutationAssessor, SIFT, CADD, PROVEAN, GWAVA, and ensemble scores
Phenotypes disease-associations from ClinVar, HGMD, OMIM, etc.
Regions observed and predicted regions that contain the given variant: functional and regulatory elements (ENCODE), protein domains (InterPro), microRNA target sites (miRanda)
Source maps each variant to the study/studies in which it was observed; also stores pass- or non-pass flags according to filtering criteria if provided by the study
Comments optional: human expert comments on specific variants, pertaining to disease, impact, etc.
Staging_summary registry that holds potentially new variants while they are not yet automatically annotated and copied to the production summary table
Staging_impact holds results from computational models regarding effects of the mutation (protein level)