Skip to main content
Fig. 1 | BMC Bioinformatics

Fig. 1

From: PIC-Me: paralogs and isoforms classifier based on machine-learning approaches

Fig. 1

Illustration of the five features. A Two amino acid sequences (sequence1 and sequence2) are aligned. Matches, mismatches, and gaps between the two sequences are colored in black, red, and yellow, respectively. Green underlining indicates the consecutive identical or non-identical blocks (CB). B Sequence similarity (SS) is the percentage of matched sequences in the aligned sequences. C Inverse count of consecutive identical or non-identical blocks (ICCB) is the inverse count of CB. D Match-mismatch fraction (MMF) indicates the overall number of consecutive matches and mismatches–namely, it is the sum of the sequence lengths minus one of all CB divided by the length of the alignment. E Twilight zone (TZ) is a range of sequence similarity; a 20% cut-off score was used here. F Expression level difference (ELD) is the difference in the expression levels between two genes. From the example alignment in A, the scores of the first three features were 0.625 for SS, 0.091 for ICCB, and 0.656 for MMF, which are detailly described in Additional file 1: Fig. S1

Back to article page