Skip to main content

Table 1 Final set of optimized features chosen for machine learning

From: Application of machine learning in SNP discovery

Feature Number

Feature

Variable Type

1

Sequence depth

Continuous

2

Variation type

transition transversion indel

3

PolyBayes probability

Continuous

4

Frequency of major allele

Continuous

5

Frequency of minor allele

Continuous

6

Relative distance from closest end

Continuous

7

Agreement in the forward and reverse reads

Continuous

8

Maximum quality of the major allele

Continuous

9

Maximum quality of the minor allele

Continuous

10

Average quality of major allele

Continuous

11

Average quality of minor allele

Continuous

12

Haplotype of second variation

Continuous

13

Local average quality

Continuous

14

Overall average quality

Continuous

15

Alignment quality

Continuous

16

Common repeats

Repeat_type

  1. A detailed definition and explanation of these features is given the methods section. The values for the features can be continuous in a given numerical range or discrete with limited options.