From: DeepGene: an advanced cancer type classifier based on deep learning and somatic point mutations
Input: Gene data matrix A ∈ {0, 1}m × n, distance threshold d CGF , group element threshold n CGF . 1: Sum A by row and sort the result in descent order, and then obtain the sorted index A * sum ; 2: Initialize each element as ungrouped; 3: For each ungrouped element i in A * sum :  (a) For each ungrouped element j in A * sum other than i:   i. Calculate the similarity d(A(i, :), A(j, :));   ii. If d(A(i, :), A(j, :)) > d CGF , assign j into the group of i; 4: Set the output gene index set g out  = ∅; 5: For each group c of A after step 3:  (a) If group element number n c  ≥ n CGF , select the top n CGF genes with the highest mutation occurrence frequency as g c ;  (b) g out  = g out  ∪ g c ; 6: Apply the index set g out on A and get the filtered gene data A CGF  = A(g out , :); Output: A CGF , i.e. the gene data after CGF |