Skip to main content

Table 3 Features used to estimate phase confidence. The top 10 features, ranked by their importance (GINI importance, as calculated by scikit-learn [10] for random forests)

From: Accurate genome-wide phasing from IBD data

Feature

Importance

Proportion of database individuals with IBD segments assigned to both sides of the family, both on the largest supercluster

0.388

Proportion of IBD segments that are partially assigned to one parental side and partially to the other

0.108

Number of close family members that do not share DNA with all other close family

0.103

(log) number of database individuals with shared DNA

0.088

(log) number of IBD segments assigned to the largest supercluster

0.081

Number of missing edges in close family network (pairs of close family that do not share IBD with each other)

0.032

Proportion of the genome overlapped by at least one IBD segment

0.030

Ratio of the number of database individuals with shared DNA on one IBD segment to the number with shared DNA on multiple IBD segments

0.029

Proportion of the genome overlapped by at least two IBD segments

0.021

Proportion of IBD segments assigned to the largest supercluster

0.018