From: CERENKOV2: improved detection of functional noncoding SNPs using data-space geometric features
Feature(s) | Feature type | Raw data src. | Feature description |
---|---|---|---|
normChromCoord | continuous | UCSC | the SNP coordinate (normalized to chrom. length) |
majorAlleleFreq | continuous | UCSC/1KG | the major allele frequency (1KG) |
minorAlleleFreq | continuous | UCSC/1KG | the next-to-major allele frequency (1KG) |
phastCons | continuous | UCSC | 46-way placental mammal phastCons score [6] |
GERP ++ | continuous | UCSC | bp-level GERP ++ [80] score |
avg_GERP | continuous | UCSC | avg. GERP score [81] in ±100 bp window |
avg_daf | continuous | 1KG | average derived allele frequency in ±1 kbp region |
avg_het | continuous | 1KG | average heterozygosity rate in ±1 kbp region |
maf1kb | continuous | UCSC/1KG | average of the MAF values for all SNPs in ±1 kbp window |
eqtlPvalue | continuous | GTEx | -log10 min(p) for GTEx eQTL for the SNP, across 13 tissues [75] |
GC5Content | integer (0-5) | UCSC | GC content in a 5 bp window |
GC7Content | integer (0-7) | UCSC | GC content in a 7 bp window |
GC11Content | integer (0-11) | UCSC | GC content in a 11 bp window |
local_purine | integer (0-11) | UCSC | number of purine bases in local 11 bp window |
local_CpG | integer (0-10) | UCSC | number of CpG dinucleotides in 11 bp window |
ss_dist | integer | UCSC | signed distance to nearest exon boundary |
tssDistance | integer | Ensembl75 | signed distance to nearest Ensembl TSS |
gencode_tss | integer | GENCODE | signed distance to nearest GENCODE TSS |
tfCount | integer | UCSC | sqrt(count) of ENCODE ChIP-seq TFBS overlap. SNP |
uniformDhsScore | integer | UCSC | sum scores of ENCODE uniform DHS peaks overlap. SNP |
uniformDhsCount | integer | UCSC | count of ENCODE uniform DHS peaks overlap. SNP |
masterDhsScore | integer | UCSC | sum scores of ENCODE master DHS peaks overlap. SNP |
masterDhsCount | integer | UCSC | count of ENCODE master DHS peaks overlap. SNP |
chrom | categorical (23) | UCSC | the chromosome to which the SNP maps |
nestedrepeat | categorical (2) | UCSC | SNP is in a RepeatMasker [70] DNA repeat |
simplerepeat | categorical (2) | UCSC | SNP is in a Tandem Repeats Finder [71] repeat |
cpg_island | categorical (2) | UCSC | SNP is in an epigenome-predicted CpG island [72] |
geneannot | categorical (4) | UCSC | classifies SNP location as CDS, intergenic, UTR, or intron |
majorAllele | categorical (4) | UCSC/1KG | the major allele for the SNP |
minorAllele | categorical (4) | UCSC/1KG | the next-to-major allele for the SNP |
pwm | categorical (22) | Ensembl75 | ID of the Jaspar 2014 [74] motif in which SNP is a match |
chromhmm | 6 ×categ. (26) | UCSC | ChromHMM label in Gm12878, H1hesc, HeLaS3, HepG2, HUVEC and K562 cells |
segway | 6 ×categ. (26) | UCSC | Segway label in Gm12878, H1hesc, HeLaS3, HepG2, HUVEC and K562 cells |
ch_comb_WEAKENH | categorical (4) | Ensembl75 | ChromHMM label in Ensembl Reg. Seg. build |
ch_comb_ENH | categorical (6) | Ensembl75 | ChromHMM label in Ensembl Reg. Seg. build |
ch_comb_REP | categorical (7) | Ensembl75 | ChromHMM label in Ensembl Reg. Seg. build |
ch_comb_TSSFLANK | categorical (5) | Ensembl75 | ChromHMM label in Ensembl Reg. Seg. build |
ch_comb_TRAN | categorical (7) | Ensembl75 | ChromHMM label in Ensembl Reg. Seg. build |
ch_comb_TSS | categorical (7) | Ensembl75 | ChromHMM label in Ensembl Reg. Seg. build |
ch_comb_CTCFREG | categorical (7) | Ensembl75 | ChromHMM label in Ensembl Reg. Seg. build |
ENCODE_TFBS | 160 ×categ. (2) | UCSC | 160 features for SNP being in an ENCODE TFBS [84] peak |
FsuRepliSeq | 16 ×continuous | UCSC | Replication Timing by Repli-chip [66] from ENCODE/FSU |
UwRepliSeq | 16 ×continuous | UCSC | Replication Timing by Repli-seq [65] from ENCODE/UW |
SangerTfbsSummary50kb | continuous | Ensembl75 | Summary of Ensembl TFBS peaks from 18 human cell types |
NkiLad | categorical (2) | UCSC | SNP is in a Lamina Associated Domain (NKI study [85], Tig-3 cells) |
vistaEnhancerCnt | categorical (2) | UCSC | count of VISTA [73] HMR-Conserved Non-coding Human Enhancers [86] overlap. SNP |
vistaEnhancerTotalScore | categorical (2) | UCSC | sum scores of VISTA [73] HMR-Conserved Non-coding Human Enhancers [86] |
eigen | continuous (2) | Eigen | Eigen & Eigen-PC v1.1 raw scorea [21] |