Transmembrane helix prediction using amino acid property features and latent semantic analysis

BMC Bioinformatics

Table 1 Accuracy of TM prediction with TMHMM architecture but using property representations of residues in comparison to full 20 letter amino acid representation.

1	2	3	4	5	6
Sequence representation	Number of Symbols	Segment F-score	Segment F-score as % of that with amino acid representation	Q₂	Q₂ score as % of that with amino acid representation
Amino acids	20	92	-	81	-
Polarity	2	59	64%	65	80%
Aromaticity	3	84	91%	74	91%
Electronic property	5	85	92%	76	94%

The TMHMM architecture, with model parameters corresponding to version 1 http://www.binf.ku.dk/~krogh/TMHMM/TMHMM1.0.model was used for TM helix prediction. Note that all other comparisons used TMHMM 2.0. The first row of results labeled Amino acids corresponds to this "rewired" TMHMM. Accuracies are computed locally, with metrics corresponding to those defined in [2]. F-score is the geometric mean of the segment recall (Qobs,htm) and segment precision (Qprec,htm). Next row, marked 'polarity' corresponds to the TMHMM when the observation probabilities of 20 amino acids are grouped to form 2 observations, namely polar and nonpolar. Columns 4 and 6 give what percentage contribution is made by polarity representation in comparison to that of 20 amino acid representation. For example, polarity representation achieves 59% F-score, which is 64% of the 92% segment F-score achieved by amino acid representation. Results for aromaticity representation and electronic property representation are given similarly in the next two rows.

ISSN: 1471-2105