Skip to main content

Table 1 All Features that were used in order to train the machine learning algorithm. Each of these features was calculated for each of the clusters

From: Detecting false positive sequence homology: a machine learning approach

Feature

Description

Aliscore

The number of positions identified by Aliscore as randomly aligned

Length

The length of the alignment

# of Sequences

The number of sequences in the alignment

# of Gaps

Number of base positions marked with a gap

# of Amino Acids

Number of amino acids in the alignment

Range

Longest non-aligned sequence length minus shortest non-aligned sequence length

Amino Acid Charged

Standard deviation for the proportions of amino acids in the charged class for each sequence

Amino Acid Uncharged

Standard deviation for the proportions of amino acids in the uncharged class for each sequence

Amino Acid Special

Standard deviation for the proportions of amino acids in the non-charged and non-hydrophobic class for each sequence

Amino Acid Hydrophobic

Standard deviation for the proportions of amino acids in the hydrophobic class for each sequence