Skip to main content

Table 2 Comparison of query performance between GenESysV and GEMINIa

From: GenESysV: a fast, intuitive and scalable genome exploration open source tool for variants generated from high-throughput sequencing projects

Query

GenESysV filters and attributes

GEMINI query

AshkenazimTrio (6,312,781 variants)

1000 Genomes Project phase3 (2504 samples, 85,211,311 variants)

GenESysV

GEMINI

GenESysV

GEMINI

Get all novel and detrimental variants

Filters: Limit Variants to dbSNP_ID: excluded, IMPACT: HIGH, FILTER: PASS.

select chrom, start, ref., alt, qual, impact_severity, filter from variants where in_dbsnp = 0 and impact_severity == ‘HIGH’ and filter is Null

0.73 s/0.21 s

2 m41.35 s/1.06 s

33.22 s/0.49 s

2 m42.80s/0.75 s

(64)b

(74)

(55)

(20)

Attributes: CHROM, POS, REF, ALT, IMPACT, QUAL, FILTER.

Get all rare, loss-of-function variants

Filters: EUR_AF (<): 0.01, Consequence: frameshift_variant, splice_acceptor_variant, splice_donor_variant, start_lost, start_retained_variant, stop_gained, stop_lost.

select chrom, start, ref., alt, qual, gene from variants where is_lof = 1 and aaf_1kg_eur < 0.01 and filter is Null limit 400

1.20s/0.34 s

2.39 s/0.35 s

9.97 s/0.53 s

2.60s/0.59 s

(315)

(269)

(400)c

(400)c

FILTER: PASS.

Attributes: CHROM, POS, REF, ALT SYMBOL.

Get rare, loss-of-function variants and is also heterozygous in selected samples

Filters: Consequence: frameshift_variant, splice_acceptor_variant, splice_donor_variant, start_lost, start_retained_variant, stop_gained, stop_lost.

select chrom, start, ref., alt, qual, gene, gts.HG003, gts.HG004 from variants where is_lof = 1 and aaf_1kg_eur < 0.01 and filter is Null" --gt-filter “gt_types.HG003 == HET” or “gt_types.HG004 == HET”

0.71 s/0.37 s

3.21 s/0.47 s

51.47 s/2.28 s

1 m33.57 s/3.52 s

(239)e

(213)

(31)

(36)

FILTER: PASS, EUR_AF (<): 0.01, Sample_ID: HG003d, HG004, GT: 0|1,1|0.

Attributes: CHROM, POS, REF, ALT, SYMBOL, Sample_ID, GT.

Get missense variants in human HLA region

Filters: CHROM: 6, POS (>=): 28477797, POS (<=): 33448354, Consequence: missense_variant.

Select chrom, start, ref., alt, gene, max_aaf_all, impact, rs_ids from variants where chrom = ‘chr6’ and start > = 28,477,797 and end <= 33,448,354 and impact= ‘missense_variant’ limit 400.

0.41 s/0.39 s

3.70s/0.51 s

6.77 s/0.62 s

7.72 s/0.78 s

(400)c

(400)c

(400)c

(400)c

Attributes: CHROM, POS, dbSNP_ID, REF, ALT SYMBOL, MAX_AF.

  1. aTesting performed in a 16 CPU core (2.3GHz Intel Xeon E312xx (Sandy Bridge, IBRS update)) cloud instance running Ubuntu 16.04 OS, with 32 GB memory and solid state drive. VCF files are annotated with VEP
  2. bQuery time (No. variants returned). The first number in the query time field is the time spent on the query when the system is cold, i.e. system cache is empty. The second number is the time spent on repeating queries when the data is cached by the first run of the same query. Each query was run three times and the median values are used for reporting
  3. cThese queries return more than 400 variants (a default upper value set in GenESysV to return for display in the web-browser). To avoid measuring time spent in file downloading, we limited the number of variants returned by GEMINI to 400 to make them compatible
  4. dThese sample IDs are for the AshkenazimTrio dataset. They are replaced with HG00096 and HG00097, respectively, when testing against the 1000 Genomes Project Phase3 dataset
  5. eGenESysV does not always return the same number of variants as GEMINI for the equivalent queries. See supplement material for a possible explanation