ClusterTAD: an unsupervised machine learning approach to detecting topologically associated domains of chromosomes from Hi-C data

Table 1 The lists of TADs identified by the seven different algorithms in Fig. 4

Algorithm	K = 4	K = 5	K = 6
a	{(1,8), (9,14), (15,20), (21,25), and (26,30)}.	{(1,8), (9,14), (15,20), (21,25), and (26,30)}.	{(1,8), (9,14), (15,20), (21,25), and (27,30)}.
b	{(1,8), (9,14), (15,20), (21,25), and (26,30)}.	{(1,8), (9,14), (15,20), (21,25), and (26,30)}.	{(1,8), (9,14), (15,20), (21,25), and (27,30)}.
c	{(1,8), (9,14), (15,20), and (21,30)}.	{(1,8), (9,14), (15,20), (21,25), and (26,30)}.	{(1,8), (15,20), (21,25), and (26,30)}.
d	{(1,8), (9,14), (15,20), and (21,30)}.	{(1,8), (9,14), (15,20), (21,25), and (26,30)}.	{(1,8), (15,20), (21,25), and (26,30)}.
e	{{(1,8), (9,14), (15,20), (21,25), and (26,30)}.	{(1,8), (9,14), (15,20), (21,25), and (26,30)}.	{(1,8), (15,20), (21,25), and (26,30)}.
f	{(1,8), (9,14), (15,20), (21,25), and (26,30)}.	{(1,8), (9,14), (15,20), (21,25), and (26,30)}.	{(1,8), (15,20), (21,25), and (26,30)}.
g	{(1,8), (9,14), (15,20), (21,25), and (26,30)}.	{(1,8), (9,14), (15,20), (21,25), and (26,30)}.	{(1,8), (9,14), (15,20), (21,25), and (27,30)}.

The table contains the lists of TADs extracted for K = 4, K = 5 and K = 6 (from left, middle to right) by the seven algorithms: (a) HC-eulcidean, (b) KM-eulidean, (c) HC-pearson, (d) KM-pearson, (e) HC-cityblock, (f) KM-cityblock, and (g) EM. HC denotes the hierarchical clustering algorithm, KM the K-means algorithm, and EM the expectation maximization algorithm. HC-euclidean denotes the combination of the hierarchical clustering algorithm and the Euclidean distance metric. A TAD is represented as {start, end}, where “start” is the TAD start region, and “end” is the TAD end region. The best TAD set for the synthetic data is {(1, 8), (9, 14), (15, 20), (21, 25), and (26, 30)}

ISSN: 1471-2105