DeepGene: an advanced cancer type classifier based on deep learning and somatic point mutations

BMC Bioinformatics

Table 1 Workflow of Clustered Gene Filtering (CGF)

Input: Gene data matrix A ∈ {0, 1}^m × n, distance threshold d _CGF, group element threshold n _CGF. 1: Sum A by row and sort the result in descent order, and then obtain the sorted index A ^_sum ; 2: Initialize each element as ungrouped; 3: For each ungrouped element i in A ^_sum : (a) For each ungrouped element j in A ^_sum other than i: i. Calculate the similarity d(A(i, :), A(j, :)); ii. If d(A(i, :), A(j, :)) > d _CGF, assign j into the group of i; 4: Set the output gene index set g _out = ∅; 5: For each group c of A after step 3: (a) If group element number n _c ≥ n _CGF, select the top n _CGF genes with the highest mutation occurrence frequency as g _c; (b) g _out = g _out ∪ g _c; 6: Apply the index set g _out on A and get the filtered gene data A _CGF = A(g _out, :); Output*: A _CGF, i.e. the gene data after CGF

ISSN: 1471-2105