Skip to main content

Table 3 Parameter setting for the experimentations. The T-Trees parameters were tuned according to the experiments and remarks reported in [36]. In the hybrid FLTM / T-Trees approach, k was automatically adjusted to the current cluster size. The values of the FLTM parameters were set after indications from [32], except for card max and δ which were tuned according to our own experience. The cardinality of a latent variable L is computed as an affine function of the number of child SNPs, n c : card(L)=min(α+β×n c , card max ). To avoid questionable empirical setting of DBSCAN’s R parameter, for each of the 14 datasets analyzed, we ran the FLTM learning algorithm for a wide range of possible values of R. For each dataset, we retained the FLTM model with the R parameter that optimized an entropy-based criterion. The value of the other DBSCAN’s parameter, N min , was set to the minimum

From: A method combining a random forest-based technique with the modeling of linkage disequilibrium through latent variables, to run multilocus genome-wide association studies

Method

Parameter

Description

Value

T-Trees and hybrid approach

 

Size for the blocks of contiguous SNPs (T-Trees)

20

 

T

Number of meta-trees in the random forest

1000

 

S n

Threshold size (in number of observations), to control meta-tree leaf size

2000

 

S t

Threshold size (in number of meta-nodes), to forbid expanding a meta-tree beyond this size

∞

 

K (T-Trees)K (hybrid)

Number of contiguous blocks of SNPs, or number of clusters in LDMap, to be selected at random at each meta-node, to compute its cut-point

1000

 

s n

Threshold size (in number of observations), to control embedded tree leaf size

1

 

s t

Threshold size (in number of nodes), to forbid expanding an embedded tree beyond this size

5

 

k

Number of variables in a block (T-Trees) or cluster (hybrid), to be selected at random, at each node, to compute its cut-point

size of block or of cluster

FLTM

α

Three parameters to model the cardinality of each

0.2

 

β

latent variable as an affine function with a maximum

2

 

c a r d max

threshold

10

 

Ï„

Threshold to control the quality of latent variables

0.3

 

nb−EM−restarts

Number of random restarts for the EM algorithm

10

 

δ

Maximal physical distance (bp), to allow two SNPs in the same cluster

50×103

DBSCAN

R

Maximum radius of the neighborhood to be considered to grow a cluster

value selected in 0.05 to 0.9, step 0.05

 

N min

Minimum number of points required within a cluster

2