Skip to main content

Table 1 The 248 SNP features used in CERENKOV

From: CERENKOV2: improved detection of functional noncoding SNPs using data-space geometric features

Feature(s)

Feature type

Raw data src.

Feature description

normChromCoord

continuous

UCSC

the SNP coordinate (normalized to chrom. length)

majorAlleleFreq

continuous

UCSC/1KG

the major allele frequency (1KG)

minorAlleleFreq

continuous

UCSC/1KG

the next-to-major allele frequency (1KG)

phastCons

continuous

UCSC

46-way placental mammal phastCons score [6]

GERP ++

continuous

UCSC

bp-level GERP ++ [80] score

avg_GERP

continuous

UCSC

avg. GERP score [81] in ±100 bp window

avg_daf

continuous

1KG

average derived allele frequency in ±1 kbp region

avg_het

continuous

1KG

average heterozygosity rate in ±1 kbp region

maf1kb

continuous

UCSC/1KG

average of the MAF values for all SNPs in ±1 kbp window

eqtlPvalue

continuous

GTEx

-log10 min(p) for GTEx eQTL for the SNP, across 13 tissues [75]

GC5Content

integer (0-5)

UCSC

GC content in a 5 bp window

GC7Content

integer (0-7)

UCSC

GC content in a 7 bp window

GC11Content

integer (0-11)

UCSC

GC content in a 11 bp window

local_purine

integer (0-11)

UCSC

number of purine bases in local 11 bp window

local_CpG

integer (0-10)

UCSC

number of CpG dinucleotides in 11 bp window

ss_dist

integer

UCSC

signed distance to nearest exon boundary

tssDistance

integer

Ensembl75

signed distance to nearest Ensembl TSS

gencode_tss

integer

GENCODE

signed distance to nearest GENCODE TSS

tfCount

integer

UCSC

sqrt(count) of ENCODE ChIP-seq TFBS overlap. SNP

uniformDhsScore

integer

UCSC

sum scores of ENCODE uniform DHS peaks overlap. SNP

uniformDhsCount

integer

UCSC

count of ENCODE uniform DHS peaks overlap. SNP

masterDhsScore

integer

UCSC

sum scores of ENCODE master DHS peaks overlap. SNP

masterDhsCount

integer

UCSC

count of ENCODE master DHS peaks overlap. SNP

chrom

categorical (23)

UCSC

the chromosome to which the SNP maps

nestedrepeat

categorical (2)

UCSC

SNP is in a RepeatMasker [70] DNA repeat

simplerepeat

categorical (2)

UCSC

SNP is in a Tandem Repeats Finder [71] repeat

cpg_island

categorical (2)

UCSC

SNP is in an epigenome-predicted CpG island [72]

geneannot

categorical (4)

UCSC

classifies SNP location as CDS, intergenic, UTR, or intron

majorAllele

categorical (4)

UCSC/1KG

the major allele for the SNP

minorAllele

categorical (4)

UCSC/1KG

the next-to-major allele for the SNP

pwm

categorical (22)

Ensembl75

ID of the Jaspar 2014 [74] motif in which SNP is a match

chromhmm

6 ×categ. (26)

UCSC

ChromHMM label in Gm12878, H1hesc, HeLaS3, HepG2, HUVEC and K562 cells

segway

6 ×categ. (26)

UCSC

Segway label in Gm12878, H1hesc, HeLaS3, HepG2, HUVEC and K562 cells

ch_comb_WEAKENH

categorical (4)

Ensembl75

ChromHMM label in Ensembl Reg. Seg. build

ch_comb_ENH

categorical (6)

Ensembl75

ChromHMM label in Ensembl Reg. Seg. build

ch_comb_REP

categorical (7)

Ensembl75

ChromHMM label in Ensembl Reg. Seg. build

ch_comb_TSSFLANK

categorical (5)

Ensembl75

ChromHMM label in Ensembl Reg. Seg. build

ch_comb_TRAN

categorical (7)

Ensembl75

ChromHMM label in Ensembl Reg. Seg. build

ch_comb_TSS

categorical (7)

Ensembl75

ChromHMM label in Ensembl Reg. Seg. build

ch_comb_CTCFREG

categorical (7)

Ensembl75

ChromHMM label in Ensembl Reg. Seg. build

ENCODE_TFBS

160 ×categ. (2)

UCSC

160 features for SNP being in an ENCODE TFBS [84] peak

FsuRepliSeq

16 ×continuous

UCSC

Replication Timing by Repli-chip [66] from ENCODE/FSU

UwRepliSeq

16 ×continuous

UCSC

Replication Timing by Repli-seq [65] from ENCODE/UW

SangerTfbsSummary50kb

continuous

Ensembl75

Summary of Ensembl TFBS peaks from 18 human cell types

NkiLad

categorical (2)

UCSC

SNP is in a Lamina Associated Domain (NKI study [85], Tig-3 cells)

vistaEnhancerCnt

categorical (2)

UCSC

count of VISTA [73] HMR-Conserved Non-coding Human Enhancers [86] overlap. SNP

vistaEnhancerTotalScore

categorical (2)

UCSC

sum scores of VISTA [73] HMR-Conserved Non-coding Human Enhancers [86]

eigen

continuous (2)

Eigen

Eigen & Eigen-PC v1.1 raw scorea [21]

  1. Abbreviations are as follows: UCSC, UC Santa Cruz Genome Browser portal; 1KG, 1,000 Genomes Project; Ensembl75, Ensembl Release 75 [82]; GENCODE, the GENCODE project release 19 [83]; ENCODE, Encyclopedia of DNA Elements [30]; FSU, Florida State University; UW, University of Washington; NKI, Netherlands Cancer Institute; GTEx, the genotype tissue-expression project; GERP, the Genomic Evolutionary Rate Profiling score; CDS, coding DNA sequence; UTR, untranslated region; MAF, minor allele frequency; HMR, human-mouse-rat; TSS, transcription start site