Skip to main content

Table 1 Workflow of Clustered Gene Filtering (CGF)

From: DeepGene: an advanced cancer type classifier based on deep learning and somatic point mutations

Input: Gene data matrix A ∈ {0, 1}m × n, distance threshold d CGF , group element threshold n CGF .

1: Sum A by row and sort the result in descent order, and then obtain the sorted index A * sum ;

2: Initialize each element as ungrouped;

3: For each ungrouped element i in A * sum :

 (a) For each ungrouped element j in A * sum other than i:

  i. Calculate the similarity d(A(i, :), A(j, :));

  ii. If d(A(i, :), A(j, :)) > d CGF , assign j into the group of i;

4: Set the output gene index set g out  = ∅;

5: For each group c of A after step 3:

 (a) If group element number n c  ≥ n CGF , select the top n CGF genes with the highest mutation occurrence frequency as g c ;

 (b) g out  = g out  ∪ g c ;

6: Apply the index set g out on A and get the filtered gene data A CGF  = A(g out , :);

Output: A CGF , i.e. the gene data after CGF