GENPPI: standalone software for creating protein interaction networks from genomes

BMC Bioinformatics

Table 2 With values 1 and 25 for the aa-limit and check-limit parameters, respectively, our heuristic guarantees a minimum identity percentage equal to 92.55% for pairs of similar classified proteins (Table 3)

Amino acids	A	R	N	D	C	E	Q	G	H	I	L	K	M	F	P	S	T	W	Y	V
A histogram	12	2	8	6	4	10	1	9	2	11	6	13	3	2	4	14	5	1	5	10
B histogram	11	3	8	6	4	11	1	9	2	11	8	12	3	2	4	13	4	1	4	11
abs(A-B):	1	1	0	0	0	1	0	0	0	0	2	1	0	0	0	1	1	0	1	1

According to the heuristics of GENPPI, proteins A and B are similar because, in the difference of their amino acid histograms, at least 25 of the 26 possible types presented frequency differences less than or equal to 1. In this table, we present only the 20 principal amino acids for the sake of exemplification. For the proteins A and B, in fasta format below, we have 94.5% identity (96.9% similar) according to the Needleman–Wunsch Algorithm. Amino acids in bold format are the different ones between A and B sequences
>A Protein
MAYSKKVMDHYENPRNVGSFSNSDNNVGSGLVGAPACGDVMKLQIKVNEKGIIEDACFKTYGCGS
AIASSSLVTEWVKGKSITEAESIRNTTIVEELELPPVKIHCSILAEDAIKAAIADYKSKKYSN
>B Protein
MAYSKKVMDHYENPRNVGSFSNSDLNVGSGLVGAPACGDVMKLQIKVNEEGIIEDACFKTYGCGS
AIASSSLVTEWVKGKSIVEAESIRNTTIVEELELPPVKIHCSILAEDAIKAAISDYKRKKNLN

ISSN: 1471-2105