Skip to main content
Fig. 4 | BMC Bioinformatics

Fig. 4

From: Reduction, alignment and visualisation of large diverse sequence families

Fig. 4

Peptide-based sequence score. Two protein sequences, A and B, are compared (which differ only in the A →S substitution marked by an asterisk). Tri-peptides are extracted and sorted using a binary tree (in NlogN time). For simplicity, only the first 10 positions in sequence A are shown and the peptides are represented using the one-letter amino acid code, rather than their hash value (big numbers). The tree is parsed in a depth-first, left-right order (starting at the root node “o”) with the node value (peptide) being written just once when first leaving upwards (u) or down-right (r). The ordered peptide list is: AFEu, EAFr, ERLu, FERu, GLEr, LEAr, LKEu, RLKu, with the lower-case suffix indicating the condition on which the peptide was written. The two lists of sorted peptides for sequences A and B can be scanned for common entries in linear time (right side). The same code was also used for pairs of scored sequences with the numeric value being the score for the pair. If only the M highest pairs are to be stored for each sequence, then as the tree is loaded, if M higher pairs are encountered (left moves) the pair can be skipped and at the end, just the M highest entries extracted. In the peptide example: if M=2, then AEF would not be entered (or ERL)

Back to article page