Skip to main content

Table 2 Major tables in the Reference Variant Store that hold all imported variants and annotations

From: Integrating 400 million variants from 80,000 human samples with extensive annotations: towards a knowledge base to analyze disease cohorts

Table

Description

Summary

main table that stores each variant by chromosomal location, reference and alternate allele, dbSNP, and GRCh36/38 locations; most other tables are dependent tables

Impact

effect(s) on gene, transcript, intron/exon, missense/ non-sense, CDS and amino acid change, where applicable; by transcript

Frequencies

allele frequencies in large-scale sequencing studies (1000 Genomes, ESP6500, ExAC, Scripps Wellderly, etc.)

Predictions

computational predictions of functional impact, such as PolyPhen-2, MutationAssessor, SIFT, CADD, PROVEAN, GWAVA, and ensemble scores

Phenotypes

disease-associations from ClinVar, HGMD, OMIM, etc.

Regions

observed and predicted regions that contain the given variant: functional and regulatory elements (ENCODE), protein domains (InterPro), microRNA target sites (miRanda)

Source

maps each variant to the study/studies in which it was observed; also stores pass- or non-pass flags according to filtering criteria if provided by the study

Comments

optional: human expert comments on specific variants, pertaining to disease, impact, etc.

Staging_summary

registry that holds potentially new variants while they are not yet automatically annotated and copied to the production summary table

Staging_impact

holds results from computational models regarding effects of the mutation (protein level)