DrGA: cancer driver gene analysis in a simpler manner

Table 1 Four agglomeration methods considered automatically in DrGA to specify the appropriate one

Cluster distance measure	Description	Formula
Single method	The distance between two clusters, c₁ and c₂, is defined as the shortest distance between two points, x₁ and x₂ in each cluster	\(D\left( {c_{1} ,c_{2} } \right){ } = { }\mathop {\min }\limits_{{x_{1} \in c_{1} , x_{2} \in c_{2} }} D\left( {x_{1} ,x_{2} } \right)\quad \quad (4)\)
Complete method	The distance between two clusters, c₁ and c₂, is defined as the longest distance between two points, x₁ and x₂ in each cluster	\(D\left( {c_{1} ,c_{2} } \right){ } = { }\mathop {\max }\limits_{{x_{1} \in c_{1} , x_{2} \in c_{2} }} D\left( {x_{1} ,x_{2} } \right)\quad \quad (5)\)
Average method	The distance between two clusters, c₁ and c₂, is defined as the average distance between each point in one cluster to every point in the other cluster	\(D\left( {c_{1} ,c_{2} } \right){ } = { }\frac{1}{{n_{c1} n_{c2} }}\mathop \sum \limits_{i = 1}^{{n_{c1} }} \mathop \sum \limits_{j = 1}^{{n_{c2} }} D\left( {x_{i} ,x_{j} } \right)\quad \quad (6)\)
Ward’s method	Minimizes the total within-cluster error sum of squares, and then, at each stage, iteratively identifies pairs of groups with minimum between-group distance and carry out the merger of those two	\(TD_{{c_{1} \cup c_{2} }} = \mathop \sum \limits_{{x \in c_{1} \cup c_{2} }} D\left( {x,\mu_{{c_{1} \cup c_{2} }} } \right)^{2} \quad \quad (7)\)

D(X,Y) the distance between X and Y, c₁ and c₂ cluster 1 and cluster 2, x₁ and x₂ a point in cluster 1 and a point in cluster 2, TDtotal distance, \(\mu\) mean