mGOASVM: Multi-label protein subcellular localization based on gene ontology and support vector machines

BMC Bioinformatics

Table 7 Performance of mGOASVM with different inputs and different numbers of homologous proteins for (a) the virus dataset and (b) the plant dataset

(a) Performance on the viral protein dataset
Input data type	*#homo*	N _d *(GO)*	Locative accuracy	Actual accuracy
AC	0	331	244/252 = 96.8%	191/207 = 92.3%
S	1	310	244/252 = 96.8%	184/207 = 88.9%
S	2	455	235/252 = 93.3%	178/207 = 86.0%
S	4	664	221/252 = 87.7%	160/207 = 77.3%
S	8	1134	202/252 = 80.2%	130/207 = 62.8%
S + AC	1	334	242/252 = 96.0%	188/207 = 90.8%
S + AC	2	460	238/252 = 94.4%	179/207 = 86.5%
S + AC	4	664	230/252 = 91.3%	169/207 = 81.6%
S + AC	8	1134	216/252 = 85.7%	145/207 = 70.1%
(b) Performance on the plant protein dataset
Input data	*#homo*	N _d *(GO)*	Locative accuracy	Actual accuracy
AC	0	1532	1023/1055 = 97.0%	863/978 = 88.2%
S	1	1541	1015/1055 = 96.2%	855/978 = 87.4%
S	2	1906	907/1055 = 85.8%	617/978 = 63.1%
S + AC	1	1541	1010/1055 = 95.7%	859/978 = 87.8%
S + AC	2	1906	949/1055 = 90.0%	684/978 = 70.0%

S: Sequence; AC: Accession Number; # homo: Number of homologs used in the experiments; N_d(GO): Number of Distinct GO Terms. # homo=0 means only the true accession number is used.

ISSN: 1471-2105