Skip to main content

Table 1 Four agglomeration methods considered automatically in DrGA to specify the appropriate one

From: DrGA: cancer driver gene analysis in a simpler manner

Cluster distance measure

Description

Formula

Single method

The distance between two clusters, c1 and c2, is defined as the shortest distance between two points, x1 and x2 in each cluster

\(D\left( {c_{1} ,c_{2} } \right){ } = { }\mathop {\min }\limits_{{x_{1} \in c_{1} , x_{2} \in c_{2} }} D\left( {x_{1} ,x_{2} } \right)\quad \quad (4)\)

Complete method

The distance between two clusters, c1 and c2, is defined as the longest distance between two points, x1 and x2 in each cluster

\(D\left( {c_{1} ,c_{2} } \right){ } = { }\mathop {\max }\limits_{{x_{1} \in c_{1} , x_{2} \in c_{2} }} D\left( {x_{1} ,x_{2} } \right)\quad \quad (5)\)

Average method

The distance between two clusters, c1 and c2, is defined as the average distance between each point in one cluster to every point in the other cluster

\(D\left( {c_{1} ,c_{2} } \right){ } = { }\frac{1}{{n_{c1} n_{c2} }}\mathop \sum \limits_{i = 1}^{{n_{c1} }} \mathop \sum \limits_{j = 1}^{{n_{c2} }} D\left( {x_{i} ,x_{j} } \right)\quad \quad (6)\)

Ward’s method

Minimizes the total within-cluster error sum of squares, and then, at each stage, iteratively identifies pairs of groups with minimum between-group distance and carry out the merger of those two

\(TD_{{c_{1} \cup c_{2} }} = \mathop \sum \limits_{{x \in c_{1} \cup c_{2} }} D\left( {x,\mu_{{c_{1} \cup c_{2} }} } \right)^{2} \quad \quad (7)\)

  1. D(X,Y) the distance between X and Y, c1 and c2 cluster 1 and cluster 2, x1 and x2 a point in cluster 1 and a point in cluster 2, TDtotal distance, \(\mu\) mean