Skip to main content

Table 1 Alignment features used for machine learning

From: Machine learning on alignment features for parent-of-origin classification of simulated hybrid RNA-seq

Feature type

Extraction

Technical notes

(A) Per read alignment

  

AS: Alignment Score

ED: Edit Distance

MM: Mismatch count

HQMM: HQ mismatch count

GO: Gap Open count

GE: Gap Extend count

INS: Insertion count

HQINS: HQ insertion count

DELS: Deletion count

HQDEL: HQ deletion count

Taken from:

P1 R1,

P1 R2,

P2 R1,

P2 R2

10 feature types,

40 features total

High-quality (HQ) means that the base call quality score is the maximal value. The HQ requirement was applied to the one base involved in a mismatch or insertion, and to the two surrounding bases for deletion. INS or DEL refer to an extra or missing base in the read, respectively. GO is the number of separate indels, and GE is the number of bases in indels

(B) Compare totals per parent

  

AS diff

ED diff

MM diff

HQMM diff

GO diff

GE diff

INS diff

DELS diff

HQINS diff

HQDEL diff

MAT diff

Subtract

(P1 R1 + P1 R2)

from

(P2 R1 + P2 R2)

Each difference represents the sum over the read pair alignments to parent 2 minus the equivalent sum for parent 1

MAT is the matched base count. See A) for other feature types

(C) Compare spans per parent

  

Span diff

Subtract P1 span from P2 span

Span is the length of the read pair alignment along the reference

(D) The better alignment score

  

Parent choice

Compare P1 to P2

Use -1 or + 1 to indicate whether P1 or P2 had the greater alignment score, respectively, or 0 if tied